Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection

Size: px
Start display at page:

Download "Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection"

Transcription

1 Scalable PDEs p.1/107 Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection Dan Pelleg Andrew Moore (chair) Manuela Veloso Geoff Gordon Nir Friedman, the Hebrew University

2 Scalable PDEs p.2/107 Clustering Large Data Sets Quickly 70 weight axles length

3 Scalable PDEs p.3/107 Sloan Digital Sky Survey as an example of data collection, storage, and sharing: Goal: map, in detail, one-quarter of the entire sky 5 years to complete 200 million objects in catalog 25 TB raw data, 5 TB catalog data Access over the web at

4 SkyServer Scalable PDEs p.4/107 Supported activities on the SDSS SkyServer: Browse Learn Search by coordinates Send SQL query APIs for direct integration

5 Advancing SkyServers Scalable PDEs p.5/107 Make it easier to ask the right question Make it easier to understand the answer

6 Scalable PDEs p.6/107 Requirements from Next Generation data analysis tools: Fast Comprehensible output Turn-key

7 Scalable PDEs p.7/107 Focus on clustering. Very general Lots of applications In particular, mixture-model based clustering.

8 Scalable PDEs p.8/107 Talk outline: K-means and X-means: fast spatial clustering Mixture of Rectangles: highly legible model Anomaly Hunting Sub-linear component learner Active learner User interface

9 K-means Scalable PDEs p.9/107

10 K-means Scalable PDEs p.10/107 During the K-means algorithm, we maintain a set of centroids.

11 K-means Scalable PDEs p.11/107 In every iteration, each data point is associated with its closest centroid.

12 K-means Scalable PDEs p.12/107 At the end of an iteration, we move each centroid to the center of mass of all points associated with it.

13 K-means Scalable PDEs p.13/107

14 K-means Scalable PDEs p.14/107

15 K-means Scalable PDEs p.15/107

16 Cost of K-means Scalable PDEs p.16/107 Cost per iteration: #records #centroids

17 kd-tree Scalable PDEs p.17/107

18 A kd-tree Scalable PDEs p.18/107

19 A kd-tree Scalable PDEs p.19/107

20 A kd-tree Scalable PDEs p.20/107

21 A kd-tree Scalable PDEs p.21/107

22 A kd-tree Scalable PDEs p.22/107

23 A kd-tree Scalable PDEs p.23/107

24 A kd-tree Scalable PDEs p.24/107 A binary tree to store data points. Each node stores statistics about all points contained in it. Not the only structure meeting these conditions.

25 K-means Scalable PDEs p.25/107

26 Center-of-mass calculation Scalable PDEs p.26/107 Suppose Q is the set of all points that belong to some centroid C. The new position of C is: C x Q = x Q Let {Q p } be a partition of Q. Then we can write the new position as: C p x Q = p x p Q p This helps if the sums of each Q p are known. They are known for kd-nodes.

27 K-means Scalable PDEs p.27/107

28 A kd-node owned by a centroid Scalable PDEs p.28/107 The boundary line between centroids G and R does not intersect the rectangle H.

29 A kd-node owned by a centroid Scalable PDEs p.28/107 The boundary line between centroids G and R does not intersect the rectangle H. The point in H which is closest to R is on the same side of the boundary as G.

30 A kd-node owned by a centroid Scalable PDEs p.28/107 The boundary line between centroids G and R does not intersect the rectangle H. The point in H which is closest to R is on the same side of the boundary as G. Scanning every point in the node is not needed.

31 A kd-node not owned by a centroid Scalable PDEs p.29/107

32 A kd-node not owned by a centroid Scalable PDEs p.29/107

33 A kd-node not owned by a centroid Scalable PDEs p.29/107 The boundary line between centroids G and R does intersect the rectangle H.

34 A kd-node not owned by a centroid Scalable PDEs p.29/107 The boundary line between centroids G and R does intersect the rectangle H. The point in H which is closest to G is not on the same side of the boundary as R.

35 A kd-node not owned by a centroid Scalable PDEs p.29/107 The boundary line between centroids G and R does intersect the rectangle H. The point in H which is closest to G is not on the same side of the boundary as R. We try our luck with the child rectangles.

36 Run time Scalable PDEs p.30/ clusters 50 clusters 500 clusters gpetro data 0.4 time points 2-D data

37 K-means: summary Scalable PDEs p.31/107 Popular and trusted statistical method Very fast algorithm; not approximation Not restricted to kd-trees Still requires K from user

38 X-means Scalable PDEs p.32/107

39 X-means Scalable PDEs p.33/107 The number of clusters K is not always known in advance. Estimate from data; measure the goodness of fit, and penalize complex models. Do this on a local scale.

40 Local Splits Scalable PDEs p.34/107 Start with a small value for K. Run K-means to convergence.

41 Scalable PDEs p.35/107 This defines regions of points which belong to a specific class.

42 In each region run 2-means independently. Scalable PDEs p.36/107

43 Scalable PDEs p.37/107

44 Scalable PDEs p.38/107 BIC(k=1)=2471 BIC(k=2)=3088 BIC(k=1)=2018 BIC(k=2)=1859 BIC(k=1)=1935 BIC(k=2)=1784 For each region compute the contribution of splitting the class in two.

45 Commit the split only if the score goes up. Scalable PDEs p.39/107

46 X-means: summary Scalable PDEs p.40/107 Can accurately estimate the K in K-means Naturally fits in the fast K-meansframework In a single step chooses between 2 K options Better, faster, than looping over K

47 K-means and X-means package Scalable PDEs p.41/107 Code released in late 2000 Over 200 licenses granted Users in: Bioinformatics Music information retrieval Computer hardware and software analysis Many more X-means scoring function independently analyzed and improved (Hamerly et al. 2003)

48 K-means and X-means users Scalable PDEs p.42/107

49 K-means and X-means users Scalable PDEs p.42/107

50 Mixtures of Rectangles Scalable PDEs p.43/107

51 Gaussian clusters Scalable PDEs p.44/107 Domain: credit card approval. Take the following vector: [(AGE 18) 2, (taxrate 6) 2, (income 10000) 2, (edunum 8) 2 ] and compute its dot-product with: 1/[4.9,.3, 730, 209]. If the result is small enough, approve.

52 My approach Scalable PDEs p.45/107 If 18 AGE 46 and 5 taxrate 7 then approve.

53 2-D PDF Scalable PDEs p.46/

54 Mixture of Dependency Trees Scalable PDEs p.47/107

55 Motivation Scalable PDEs p.48/107 Given a data-set, want to understand it A Bayes net fits the bill, but expensive to find Compromise: look for a simpler structure dependency tree

56 Scalable PDEs p.49/107 Burglar Thunder Barking Phone Call Alarm P (A B) N(c A + m A b, σ 2 A )

57 The Chow-Liu algorithm Scalable PDEs p.50/107 A 1 A 2 A M X 1 X 2 X 3 X 4. X R

58 The Chow-Liu algorithm Scalable PDEs p.50/107 A 1 A 2 A M X 1 X 2 X 3 X 4. X R A5 I(1; 4) I(1; 5) A1 A4 I(1; 3) I(4; 3) A2 A3

59 The Chow-Liu algorithm Scalable PDEs p.50/107 A 1 A 2 A M X 1 X 2 X 3 X 4. X R A5 I(1; 4) I(1; 5) A1 MST A5 A1 A4 A4 I(1; 3) I(4; 3) A2 A2 A3 A3

60 The Chow-Liu algorithm Scalable PDEs p.50/107 A 1 A 2 A M X 1 X 2 X 3 X 4. X R Total Cost: O(RM 2 )+cost of MST algorithm. A5 I(1; 4) I(1; 5) A1 MST A5 A1 A4 A4 I(1; 3) I(4; 3) A2 A2 A3 A3

61 MST: using the blue-edge rule Scalable PDEs p.51/107 Given a cut, the lightest edge across it must be part of the MST.

62 MST: using the red-edge rule Scalable PDEs p.52/107 Given a cycle, the heaviest edge in it must not be part of the MST.

63 Scalable PDEs p.53/107 Idea: repeatedly use the red-edge rule Stop when all we have left is a tree This tree must be the MST Tarjan: bad idea.

64 Walkthrough Scalable PDEs p.54/107

65 Walkthrough Scalable PDEs p.55/107 Tree edge Non tree edge

66 Walkthrough Scalable PDEs p.56/107 Tree edge Non tree edge

67 Walkthrough Scalable PDEs p.57/107 Can I eliminate this edge? Tree edge Non tree edge

68 Walkthrough Scalable PDEs p.58/107 Tree edge Non tree edge

69 Walkthrough Scalable PDEs p.59/107 Tree edge Non tree edge Eliminated edge

70 Walkthrough Scalable PDEs p.60/107

71 Walkthrough Scalable PDEs p.61/107

72 Walkthrough Scalable PDEs p.62/107

73 Walkthrough Scalable PDEs p.63/107

74 Walkthrough Scalable PDEs p.64/107

75 Walkthrough Scalable PDEs p.65/107

76 Walkthrough Scalable PDEs p.66/107

77 Walkthrough Scalable PDEs p.67/107

78 Walkthrough Scalable PDEs p.68/107

79 Walkthrough Scalable PDEs p.69/107

80 Walkthrough Scalable PDEs p.70/107

81 Saving Work Scalable PDEs p.71/107 We want to avoid scanning the full data-set for a given edge. Scan just a sample Derive a confidence interval using the CLT Or Hoeffding bounds Now need to deal with intervals instead of point estimates

82 Comparing intervals Scalable PDEs p.72/107 c d a b

83 Comparing intervals Scalable PDEs p.72/107 c d a b a Case 1: c b d

84 Comparing intervals Scalable PDEs p.72/107 c d a b a Case 1: If this happens, we save work. c b d

85 Comparing intervals Scalable PDEs p.73/107 c d a b

86 Comparing intervals Scalable PDEs p.73/107 c d a b a Case 2: d b c

87 Comparing intervals Scalable PDEs p.73/107 c d a b a Case 2: d b c Another lucky occurrence.

88 Comparing intervals Scalable PDEs p.74/107 c d a b

89 Comparing intervals Scalable PDEs p.74/107 c d a b Case 3: a c b d

90 Comparing intervals Scalable PDEs p.74/107 c d a b Case 3: We have two options: a c b d

91 Comparing intervals Scalable PDEs p.74/107 c d a b Case 3: We have two options: Work harder a c b d

92 Comparing intervals Scalable PDEs p.74/107 c d a b Case 3: We have two options: Work harder Procrastinate a c b d

93 Scalable PDEs p.75/107 So far we assumed that we can always eliminate an edge in the cycle In fact, this is not necessary

94 Walkthrough - alternative scenario Scalable PDEs p.76/107

95 Walkthrough - alternative scenario Scalable PDEs p.77/107 Not enough information to eliminate.

96 Walkthrough - alternative scenario Scalable PDEs p.78/107 Not enough information to eliminate. Leave for later.

97 Walkthrough - alternative scenario Scalable PDEs p.79/107 Later...

98 Walkthrough - alternative scenario Scalable PDEs p.80/107 Let s examine this edge again.

99 Walkthrough - alternative scenario Scalable PDEs p.81/107 The tree path has changed! We can eliminate!

100 Walkthrough - alternative scenario Scalable PDEs p.82/107

101 Experimental Results Scalable PDEs p.83/107

102 Experimental Results Scalable PDEs p.84/107 How much work does it save?

103 Experimental Results Scalable PDEs p.84/107 How much work does it save? cells per edge e e+06 records

104 Experimental Results Scalable PDEs p.84/107 How much work does it save? cells per edge e e+06 records most of it.

105 Experimental Results Scalable PDEs p.85/107 Does it scale with the number of attributes?

106 Experimental Results Scalable PDEs p.85/107 Does it scale with the number of attributes? running time number of attributes

107 Experimental Results Scalable PDEs p.85/107 Does it scale with the number of attributes? running time number of attributes Yes!

108 Experimental Results Scalable PDEs p.86/107 How good are the generated trees?

109 Evaluation Scalable PDEs p.87/107 Exhaustive algorithm

110 Evaluation Scalable PDEs p.87/107 Exhaustive algorithm My algorithm

111 Evaluation Scalable PDEs p.87/107 Exhaustive algorithm My algorithm 35% subsample

112 Evaluation Scalable PDEs p.87/107 Exhaustive algorithm My algorithm 35% subsample Informed subsample

113 Experimental Results Scalable PDEs p.88/107 How good are the generated trees?

114 Experimental Results Scalable PDEs p.88/107 How good are the generated trees? 2 relative log-likelihood e e+06 records

115 Experimental Results Scalable PDEs p.88/107 How good are the generated trees? 2 relative log-likelihood e e+06 records Better then those obtained by uniformly using same fraction of data.

116 Experimental Results Scalable PDEs p.89/107 Does it work for real data?

117 Experimental Results Scalable PDEs p.89/107 Does it work for real data? NAME ATTR. RECORDS TYPE DATA USAGE MIST SAMPLE CENSUS-HOUSE N 1.0% COLORHISTOGRAM N 0.5% COOCTEXTURE N 4.6% ABALONE N 21.0% COLORMOMENTS N 0.6% CENSUS-INCOME C 0.05% COIL C 0.9% IPUMS C 0.06% KDDCUP C 0.02% LETTER N 1.5% COVTYPE C 0.009% PHOTOZ N 0.008%

118 Experimental Results Scalable PDEs p.89/107 Does it work for real data? NAME ATTR. RECORDS TYPE DATA USAGE MIST SAMPLE CENSUS-HOUSE N 1.0% COLORHISTOGRAM N 0.5% COOCTEXTURE N 4.6% ABALONE N 21.0% COLORMOMENTS N 0.6% CENSUS-INCOME C 0.05% COIL C 0.9% IPUMS C 0.06% KDDCUP C 0.02% LETTER N 1.5% COVTYPE C 0.009% PHOTOZ N 0.008% Better 7/12 times, worse 4/12, one tie.

119 Anomaly Hunting Scalable PDEs p.90/107

120 Anomaly Hunting Scalable PDEs p.91/107 Want to sift a large data set for strangest objects. First attempt: build a statistical model for/from the data, flag whatever does not fit it well.

121 Boring Anomalies Scalable PDEs p.92/107

122 The Oracle Framework Scalable PDEs p.93/107 Random set of records

123 The Oracle Framework Scalable PDEs p.94/107 Random set of records Ask expert to classify

124 The Oracle Framework Scalable PDEs p.95/107 Random set of records Ask expert to classify Build model from data and labels

125 The Oracle Framework Scalable PDEs p.96/107 Random set of records Ask expert to classify Build model from data and labels Run all data through model

126 The Oracle Framework Scalable PDEs p.97/107 Random set of records Ask expert to classify Spot "important" records Build model from data and labels Run all data through model

127 The Oracle Framework Scalable PDEs p.98/107 Random set of records Ask expert to classify Spot "important" records Build model from data and labels Run all data through model

128 The Oracle Framework Scalable PDEs p.99/107 Random set of records Ask expert to classify Spot "important" records Build model from data and labels Run all data through model

129 The Oracle Framework Random set of records Ask expert to classify Spot "important" records Build model from data and labels Run all data through model Scalable PDEs p.100/107

130 The Oracle Framework Random set of records Ask expert to classify Spot "important" records Build model from data and labels Run all data through model Scalable PDEs p.101/107

131 Anomaly Hunting Run GUI Scalable PDEs p.102/107

132 Interesting Anomalies Scalable PDEs p.103/107

133 Contributions Fast K-means implementation [KDD99] Extension to X-means [ICML00] Widely used, cited [HPL124,NIPS03,ICME01,ASPL02,IEEE01] Novel mixture model for comprehensibility [ICML01] probably approximately correct approach for dependency trees [NIPS02] Active learning framework for general mixtures User-centered anomaly hunting process [GLC03] Scalable PDEs p.104/107

134 Scalable PDEs p.105/107

135 Why scientific? Assumptions on data: Mostly real-valued Not sparse No or very few labels Scalable PDEs p.106/107

136 Thesis Statement We can efficiently perform clustering on very large data sets. Scalable PDEs p.107/107

ALTERNATIVE METHODS FOR CLUSTERING

ALTERNATIVE METHODS FOR CLUSTERING ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Lecture 8: The EM algorithm

Lecture 8: The EM algorithm 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses

More information

IBL and clustering. Relationship of IBL with CBR

IBL and clustering. Relationship of IBL with CBR IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

CS 8520: Artificial Intelligence

CS 8520: Artificial Intelligence CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

More information

Towards the world s fastest k-means algorithm

Towards the world s fastest k-means algorithm Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

More information

Markov Random Fields and Segmentation with Graph Cuts

Markov Random Fields and Segmentation with Graph Cuts Markov Random Fields and Segmentation with Graph Cuts Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem Administrative stuffs Final project Proposal due Oct 27 (Thursday) HW 4 is out

More information

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016

CS 231A CA Session: Problem Set 4 Review. Kevin Chen May 13, 2016 CS 231A CA Session: Problem Set 4 Review Kevin Chen May 13, 2016 PS4 Outline Problem 1: Viewpoint estimation Problem 2: Segmentation Meanshift segmentation Normalized cut Problem 1: Viewpoint Estimation

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation

More information

Spatial biosurveillance

Spatial biosurveillance Spatial biosurveillance Authors of Slides Andrew Moore Carnegie Mellon awm@cs.cmu.edu Daniel Neill Carnegie Mellon d.neill@cs.cmu.edu Slides and Software and Papers at: http://www.autonlab.org awm@cs.cmu.edu

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

Lecture 12 Recognition

Lecture 12 Recognition Institute of Informatics Institute of Neuroinformatics Lecture 12 Recognition Davide Scaramuzza 1 Lab exercise today replaced by Deep Learning Tutorial Room ETH HG E 1.1 from 13:15 to 15:00 Optional lab

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

Clustering. Image segmentation, document clustering, protein class discovery, compression

Clustering. Image segmentation, document clustering, protein class discovery, compression Clustering CS 444 Some material on these is slides borrowed from Andrew Moore's machine learning tutorials located at: Clustering The problem of grouping unlabeled data on the basis of similarity. A key

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

GRID BASED CLUSTERING

GRID BASED CLUSTERING Cluster Analysis Grid Based Clustering STING CLIQUE 1 GRID BASED CLUSTERING Uses a grid data structure Quantizes space into a finite number of cells that form a grid structure Several interesting methods

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

http://www.xkcd.com/233/ Text Clustering David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Administrative 2 nd status reports Paper review

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these

More information

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value Access Methods This is a modified version of Prof. Hector Garcia Molina s slides. All copy rights belong to the original author. Basic Concepts search key pointer Value record? value Search Key - set of

More information

Lecture 7: Segmentation. Thursday, Sept 20

Lecture 7: Segmentation. Thursday, Sept 20 Lecture 7: Segmentation Thursday, Sept 20 Outline Why segmentation? Gestalt properties, fun illusions and/or revealing examples Clustering Hierarchical K-means Mean Shift Graph-theoretic Normalized cuts

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016 CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0

More information

Semi-supervised learning and active learning

Semi-supervised learning and active learning Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

More information

What to do with Scientific Data? Michael Stonebraker

What to do with Scientific Data? Michael Stonebraker What to do with Scientific Data? by Michael Stonebraker Outline Science data what it looks like Hardware options for deployment Software options RDBMS Wrappers on RDBMS SciDB Courtesy of LSST. Used with

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Clustering: Overview and K-means algorithm

Clustering: Overview and K-means algorithm Clustering: Overview and K-means algorithm Informal goal Given set of objects and measure of similarity between them, group similar objects together K-Means illustrations thanks to 2006 student Martin

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #17. Loops: Break Statement

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #17. Loops: Break Statement Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #17 Loops: Break Statement (Refer Slide Time: 00:07) In this session we will see one more feature that is present

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

R-Trees. Accessing Spatial Data

R-Trees. Accessing Spatial Data R-Trees Accessing Spatial Data In the beginning The B-Tree provided a foundation for R- Trees. But what s a B-Tree? A data structure for storing sorted data with amortized run times for insertion and deletion

More information

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 19 Python Exercise on Naive Bayes Hello everyone.

More information

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means 1 MACHINE LEARNING Kernel for Clustering ernel K-Means Outline of Today s Lecture 1. Review principle and steps of K-Means algorithm. Derive ernel version of K-means 3. Exercise: Discuss the geometrical

More information

STREAMING ALGORITHMS. Tamás Budavári / Johns Hopkins University ANALYSIS OF ASTRONOMY IMAGES & CATALOGS 10/26/2015

STREAMING ALGORITHMS. Tamás Budavári / Johns Hopkins University ANALYSIS OF ASTRONOMY IMAGES & CATALOGS 10/26/2015 STREAMING ALGORITHMS ANALYSIS OF ASTRONOMY IMAGES & CATALOGS 10/26/2015 / Johns Hopkins University Astronomy Changed! Always been data-driven But we used to know the sources by heart! Today large collections

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Scalable K-Means++ Bahman Bahmani Stanford University

Scalable K-Means++ Bahman Bahmani Stanford University Scalable K-Means++ Bahman Bahmani Stanford University K-means Clustering Fundamental problem in data analysis and machine learning By far the most popular clustering algorithm used in scientific and industrial

More information

DS-Means: Distributed Data Stream Clustering

DS-Means: Distributed Data Stream Clustering DS-Means: Distributed Data Stream Clustering Alessio Guerrieri and Alberto Montresor University of Trento, Italy Abstract. This paper proposes DS-means, a novel algorithm for clustering distributed data

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

Data Mining Techniques for Massive Spatial Databases. Daniel B. Neill Andrew Moore Ting Liu

Data Mining Techniques for Massive Spatial Databases. Daniel B. Neill Andrew Moore Ting Liu Data Mining Techniques for Massive Spatial Databases Daniel B. Neill Andrew Moore Ting Liu What is data mining? Finding relevant patterns in data Datasets are often huge and highdimensional, e.g. astrophysical

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems

Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems Andrew W Leung Ethan L Miller University of California, Santa Cruz Minglong Shao Timothy Bisson Shankar Pasupathy NetApp 7th USENIX

More information

PASCAL. A Parallel Algorithmic SCALable Framework for N-body Problems. Laleh Aghababaie Beni, Aparna Chandramowlishwaran. Euro-Par 2017.

PASCAL. A Parallel Algorithmic SCALable Framework for N-body Problems. Laleh Aghababaie Beni, Aparna Chandramowlishwaran. Euro-Par 2017. PASCAL A Parallel Algorithmic SCALable Framework for N-body Problems Laleh Aghababaie Beni, Aparna Chandramowlishwaran Euro-Par 2017 Outline Introduction PASCAL Framework Space Partitioning Trees Tree

More information

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li

Parallel Physically Based Path-tracing and Shading Part 3 of 2. CIS565 Fall 2012 University of Pennsylvania by Yining Karl Li Parallel Physically Based Path-tracing and Shading Part 3 of 2 CIS565 Fall 202 University of Pennsylvania by Yining Karl Li Jim Scott 2009 Spatial cceleration Structures: KD-Trees *Some portions of these

More information

Nearest Neighbors Classifiers

Nearest Neighbors Classifiers Nearest Neighbors Classifiers Raúl Rojas Freie Universität Berlin July 2014 In pattern recognition we want to analyze data sets of many different types (pictures, vectors of health symptoms, audio streams,

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Chapter 5: Outlier Detection

Chapter 5: Outlier Detection Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 5: Outlier Detection Lecture: Prof. Dr.

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical

More information

Clustering Lecture 3: Hierarchical Methods

Clustering Lecture 3: Hierarchical Methods Clustering Lecture 3: Hierarchical Methods Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

Image Segmentation. Shengnan Wang

Image Segmentation. Shengnan Wang Image Segmentation Shengnan Wang shengnan@cs.wisc.edu Contents I. Introduction to Segmentation II. Mean Shift Theory 1. What is Mean Shift? 2. Density Estimation Methods 3. Deriving the Mean Shift 4. Mean

More information

CLUSTERING. JELENA JOVANOVIĆ Web:

CLUSTERING. JELENA JOVANOVIĆ   Web: CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm

More information

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science. Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Image Segmentation Some material for these slides comes from https://www.csd.uwo.ca/courses/cs4487a/

More information

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

k-means demo Administrative Machine learning: Unsupervised learning Assignment 5 out Machine learning: Unsupervised learning" David Kauchak cs Spring 0 adapted from: http://www.stanford.edu/class/cs76/handouts/lecture7-clustering.ppt http://www.youtube.com/watch?v=or_-y-eilqo Administrative

More information

Spatial Data Management

Spatial Data Management Spatial Data Management [R&G] Chapter 28 CS432 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite imagery, where each pixel stores a measured value

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression

More information

Hierarchical Clustering

Hierarchical Clustering Hierarchical Clustering Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree-like diagram that records the sequences of merges

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Lecture 12 Recognition. Davide Scaramuzza

Lecture 12 Recognition. Davide Scaramuzza Lecture 12 Recognition Davide Scaramuzza Oral exam dates UZH January 19-20 ETH 30.01 to 9.02 2017 (schedule handled by ETH) Exam location Davide Scaramuzza s office: Andreasstrasse 15, 2.10, 8050 Zurich

More information

Spatial Data Management

Spatial Data Management Spatial Data Management Chapter 28 Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Types of Spatial Data Point Data Points in a multidimensional space E.g., Raster data such as satellite

More information

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm

Gaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster

More information