Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
|
|
- Kathryn Richards
- 5 years ago
- Views:
Transcription
1 Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine the performance of estimators again... Last time We examined the EM algorithm in some depth and showed how it could be used to fit discrete gaussian mixtures We then looked under the hood and examined why the procedure converges We then related the entire EM enterprise in this context to a somewhat simpler algorithm known as that is popular in the clustering literature The EM algorithm While we expressed the general algorithm in terms of f(y X, θ i 1 ) log f(x, y θ) y you see how this conditional expectation relates to the indicator formulation we followed for normal mixtures We closed by examining some of the properties of estimators
2 The EM algorithm Recall that four our complete data likelihood, our data were of the form (X 1, Y 1 ),..., (X n, Y n ), so that the the likelihood became n J [α j N(X i ; µ j, σ j )] Ij(Yi) i=1 j=1 and the log-likelihood could be written as n n J log f(x i, Y i θ) = I j (Y i ) [log α j + log N(X i ; µ i, σ j )] i=1 i=1 j=1 Clustering Last time we went a little fast past a rather big area in statistics and data mining, clustering Broadly, clustering describes the process of identifying groups in a data set, groups that are in some way closely related Usually, the groups can be characterized by a few parameters; perhaps a small number of representative data points or maybe the group means (often called the cluster centers ) These parameters can, in turn, be examined and compared to help expose significant structures in a data set The EM algorithm Since the only term in this expression that involves Y i is the indicator function Taking our conditional expectation of the log-likelihood with respect to Y i given X i and a guess θ 0 for θ, is equivalent to our approach of replacing I j (Y i ) with its conditional expectation Again, there is a nice expression of this algorithm in terms of natural parameters and estimates for an exponential family; this is just a hint at the connection Clustering clustering seeks to identify K groups and their associated centers µ 1,..., µ K so as to minimize an overall objective function V = K k=1 X i S k X i µ k 2 Last time we described an iterative algorithm that alternately forms group means and then assigns data points to the group with the closest mean
3 Relationship to clustering With, we want to divide our data X 1,..., X n into, well, K groups; the algorithm is pretty simple Make an initial guess for means µ 01,..., µ 0K Until there s no change in these means do: 1. Use the estimated means to classify your data into clusters; each point X i is associated with the closest mean using simple Euclidean distance 2. For each cluster k, form the mean of the data associated with the group and vector quantization Vector quantization (VQ) is a lossy data compression method that builds a block code for a source; each point in our (in this case 2-d) space is represented by the nearest codeword Historically this was a hard problem because it involved a lot of multi-dimensional integrals; in the 1980s, a VQ algorithm was proposed* based on a training set of source vectors, X 1,..., X n In short, we would like to design a codebook µ 1,..., µ K and a partition S 1,..., S K to represent the training set so that the overall distortion measure V = K k=1 is as small as possible X i S k X i µ k 2 * Such algorithms are usually referred to as LBG-VQ for the group proposing the idea, Linde, Buzo and Gray temperature CAD 157 Clustering At the right, we ask for three clusters; and below we present the result, with cluster centers highlighted in black Note that the algorithm assigns points according to the nearest group mean and so in the end we have divisions based on the Voronoi tessellation of these center points Let s consider some real data; at the right we have temperatures at 6am and 6pm for 232 consecutive (from January to November of 2005) days as recorded by CAD node 157 These two measurements are fairly highly correlated (0.85) and so divides the data along the ellipse lengthwise Arguably, clustering is not really achieving much in this (or the previous case) in terms of insight about the data Let s consider a harder case... temperature 6pm temperature 6pm! ! !10! temperature 6am!10! temperature 6am
4 Below we plot a time series of our temperature measurements, averaged across hours for all 232 days The jigsaw pattern is the basic diurnal effect, warmer during the day, colder at night Can we get some insight into the kinds of patterns we see during each day? Do the patterns change with time of year? At the right we have the same plot but colored according to a clustering on the 24-dimensional data for two through five clusters For this we need to collapse our data by day... What do we observe? What is the clustering highlighting here? At the right we have, well, all the data; that is, all 232 curves, each representing the temperature over the course of a day What do we think of this plot? temperature (C)! hours past midnight Our data space is 24- dimensional; each observation is the vector of average temperatures computed over the course of a day That means our distances are computed in 24-dimensional space and our group means live in 24-dimensional space So rather than treat them as abstract cluster centers, we can plot them as curves (color coding on the right matches that on the previous slide for K=5 groups) group means average temperature! ! hour since midnight day
5 Clustering Ok, so that wasn t very stirring; it gets warmer in the summer Instead, let s start by subtracting out the daily average and then apply ; this should have the effect of highlighting within-day shapes What do we observe? judges similarity (or dissimilarity) based on the nearness of points; the standard Euclidean distance is applied in data space There are many dimension-reduction procedures that operate on pairwise distances between rows in a data table with the goal of providing you a display or some kind of summary that s easier to work with than the original data In the upcoming lectures, we will talk about hierarchical clustering as well as dimension reduction techniques like multi-dimensional scaling As a final comment, the mixture modeling we started with also provides a clustering of the data, but with soft rather than hard group assignments group means! Properties of estimators At the right we have the group means for a 3-group fit; again, we can display group means as curves What do we notice now? What are the dominant patterns? average temperature! hour since midnight Last time we started examining properties of estimators, specifically focusing on their mean and variance We are, for the moment, in a frequentist paradigm, meaning that the quantities we will evaluate are based on the idea of repeated sampling day
6 Properties of estimators Suppose we are given a sample X 1,..., X n of size n that are independent draws from a distribution f An estimate θ n of a parameter θ is just some function of these points X ; that is θn = θ 1,..., X n n (X 1,..., X n ) We view θ n as a random variable in the sense that each time we repeat our experiments, we would collect another sample of data, producing a different estimate Variance We can also consider the variance of the an estimate; in short, how spread out is the sampling distribution? The standard deviation of θ n is called its standard error and is denoted se ( θ n ) = var ( θ n ) We refer to the distribution of θ n over these repeated experiments as its sampling distribution Mean squared error Unbiasedness The bias of an estimate is defined to be bias ( θ n ) = E θ n θ ; here the expectation is sampling distribution of θ n We say that an estimate is unbiased if E θ n = θ so that bias ( θ n ) = 0 We often judge the reasonableness of an estimator based on its mean squared error MSE = E( θ n θ) 2 This quantity captures both bias and variance MSE = E( θ n θ) 2 = E( θ n E θ n + E θ n θ) 2 = E( θ n E θ n ) 2 + (E θ n θ) 2 + 2(E θ n θ)e( θ n E θ n ) = var ( θ n ) + bias ( θ n ) 2
7 Properties of estimators We say that an estimator θ n is consistent if as n gets large its distribution concentrates around the parameter θ To go one level deeper, we need to recall a definition from probability (that you may or may not have had) A sequence of random variables Z 1, Z 2, Z 3,... is said to converge in probability to another random variable Z, written P Z n Z, if, for every ɛ > 0 Example: Means and the WLLN We can establish consistency of the sample using the so-called weak law of large numbers: If Z 1,..., Z n are independent draws from the same distribution having mean µ, then the sample mean Z P µ as n P ( Z n Z > ɛ) 0 Consistency Therefore, we say that an estimator is consistent if it converges in probability to θ * It is possible to show that if both the bias and standard error of an estimate tend to zero as we collect more and more data (that is, the MSE tends to zero) then the estimate is consistent Example: Means and the WLLN An easy proof of the WLLN can be found from Chebychev s inequality*, namely that for a random variable Z Pr( Z EZ t) var(z) t 2 assuming the mean and variance of Z are finite * or, to be precise, to a random variable that takes on the value with probability 1 θ * Actually, you don t need a second moment for the WLLN to be true, but this is a fast way to prove it.
8 Note that the weak law of large numbers implies that the sample mean is a consistent estimate of the population mean; we don t have to put a lot of modeling assumptions for this to happen Now, another good estimate of the center of a distribution is the median (recall for the normal case, the mean and median are the same) Let s consider consistency of the median; assume we have data X 1,..., X n from some continuous distribution f with median µ ; let X denote the sample median To make things easy, let s also assume that we have an odd number of points ( n odd) so that the sample median is the (n + 1)/2 element in the list of sorted data Substituting this into our starting equation (and assuming we have an odd number of samples) we find that Pr( X µ > ɛ) = Pr(S n > (n + 1)/2) = Pr(S n np > (n + 1)/2 np) = Pr(S n np > n(1/2 p) + 1/2) < Pr(S n np > n(1/2 p)) < p(1 p) n(1/2 p) 0 as n so that Pr( X µ > ɛ) 0 ; a similar argument can be used to show that Pr( X µ < ɛ) 0, giving us consistency To prove consistency, let s take ɛ > 0 and consider Pr( θ X µ > ɛ) = Pr( θ X > ɛ + µ) = Pr(at least (n + 1)/2 of the X i s are bigger than µ + ɛ) Let S n denote the number of sample points X 1,..., X n that are larger than µ + ɛ ; that means S n has a binomial distribution (n, p) where p = Pr(X i > µ + ɛ) < 0.5 Comparing consistent estimators In many cases, the differences between estimators really show up in large samples ; that is, as we let the number of data points tend to infinity, we start to see differences To formalize this, we will consider the the asymptotic distribution of a sequence of estimators
9 Example: Means and the CLT Given a sample X 1,..., X n of independent draws from a distribution with mean µ and standard deviation σ, we know that the sample mean has mean µ and standard deviation σ/ X n The Central Limit Theorem states that Z n = X µ n(x µ) D = Z σ var(x) Example: Means and the CLT The CLT implies that the n(x µ) has a normal limiting distribution with mean zero and variance σ 2 What about the median? where Z has a standard normal (mean zero, standard deviation one) distribution Example: Means and the CLT Now, given a sample X 1,..., X n that come from a distribution f, can be shown that n( X µ) also has a limiting normal distribution having zero mean but with variance 1/[2f( µ)] 2 To make this precise (as we had to do with convergence in probability) we say that a sequence of random variables Z 1, Z 2,... converges in distribution to Z if lim F n(x) = F (x) n where F n is the CDF of Z n and F is the CDF of Z, at all points where is continuous F n Suppose our our data come from a normal distribution; that is, suppose f is a gaussian f(x) = 1 2πσ 2 e (x eµ)/2σ2 where we have inserted µ since the mean and median are the same for this distribution Therefore, f( µ) = 1/ 2πσ 2 so n( X µ) has a limiting normal distribution with mean zero and variance πσ 2 /2
10 So, if we use the mean to estimate the center of a distribution, we have an asymptotic variance of σ 2 ; if we use the median the asymptotic variance is 1/[2f( µ)] 2 In the normal case, the latter expression becomes πσ 2 /2 ; we can then compute the so-called asymptotic relative efficiency between using the median and the mean for data that come from a normal family σ 2 1/[2f( µ)] 2 = 2 π = This means that if our data really come from a normal distribution, we re better off using the sample mean instead of the sample median Given data from the contaminated distribution f(x) = (1 ɛ)n(x; 0, 1) + ɛn(x; 0, τ) we know that the variance of this mixture is given by σ 2 = (1 ɛ) + ɛτ 2 ; also, the median of this family is 0 so that f(0) = 1 ( 1 ɛ + ɛ ) 2π τ Therefore, the relative efficiency between the mean and the median is given by (1 ɛ) + ɛτ 2 1/[2f(0)] 2 = 2 ( π [(1 ɛ) + ɛτ 2 ] 1 ɛ + ɛ ) 2 τ ɛ = 0.1 Now consider a contaminated normal family that s often used in so-called robustness studies; Tukey (1960) considered data generated by the normal mixture f(x) = (1 ɛ)n(x; 0, 1) + ɛn(x; 0, τ) This family allows one to contaminate a standard normal distribution (first component) with some outliers (second component) At the left we have plots of the asymptotic relative efficiency for four values of ɛ and τ ranging from 2 to 10 We also have a Q-Q plot for one member of the family ɛ = 0.1, τ = 4 that has a relative efficiency of 1.36 ARE tau Normal Q!Q plot, tau=4, epsilon=0.1 ɛ = 0.05 ɛ = 0.03 ɛ = 0.01 If we had observations solely from a normal distribution, then we know the sample mean (the MLE) is an efficient estimate; but if we start to introduce outliers, what happens? In this case, the median outperforms the mean; notice the effect of the observations from normal with greater spread Sample Quantiles!5 0 5!3!2! Theoretical Quantiles
11 With this mixture device, we can see clearly the tradeoff between the mean and the median Next time we will return to estimation in the context of parametric models and examine the performance of the MLE
DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm
DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)
More informationChapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationClustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford
Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationLecture 2 The k-means clustering problem
CSE 29: Unsupervised learning Spring 2008 Lecture 2 The -means clustering problem 2. The -means cost function Last time we saw the -center problem, in which the input is a set S of data points and the
More informationCluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008
Cluster Analysis Jia Li Department of Statistics Penn State University Summer School in Statistics for Astronomers IV June 9-1, 8 1 Clustering A basic tool in data mining/pattern recognition: Divide a
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationCOMS 4771 Clustering. Nakul Verma
COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find
More informationClustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B
Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional
More informationFall 09, Homework 5
5-38 Fall 09, Homework 5 Due: Wednesday, November 8th, beginning of the class You can work in a group of up to two people. This group does not need to be the same group as for the other homeworks. You
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationK-Means Clustering. Sargur Srihari
K-Means Clustering Sargur srihari@cedar.buffalo.edu 1 Topics in Mixture Models and EM Mixture models K-means Clustering Mixtures of Gaussians Maximum Likelihood EM for Gaussian mistures EM Algorithm Gaussian
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationNotes on Simulations in SAS Studio
Notes on Simulations in SAS Studio If you are not careful about simulations in SAS Studio, you can run into problems. In particular, SAS Studio has a limited amount of memory that you can use to write
More informationToday s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ
Clustering CS498 Today s lecture Clustering and unsupervised learning Hierarchical clustering K-means, K-medoids, VQ Unsupervised learning Supervised learning Use labeled data to do something smart What
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationOptimization and Simulation
Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationVoronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013
Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer
More informationIntroduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering
Introduction to Pattern Recognition Part II Selim Aksoy Bilkent University Department of Computer Engineering saksoy@cs.bilkent.edu.tr RETINA Pattern Recognition Tutorial, Summer 2005 Overview Statistical
More informationMachine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves
Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves
More informationK-means clustering Based in part on slides from textbook, slides of Susan Holmes. December 2, Statistics 202: Data Mining.
K-means clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 K-means Outline K-means, K-medoids Choosing the number of clusters: Gap test, silhouette plot. Mixture
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More informationIntroduction to Mobile Robotics
Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationAssignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018
Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments
More information6.001 Notes: Section 4.1
6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationDiscrete Mathematics Course Review 3
21-228 Discrete Mathematics Course Review 3 This document contains a list of the important definitions and theorems that have been covered thus far in the course. It is not a complete listing of what has
More informationMixture Models and EM
Table of Content Chapter 9 Mixture Models and EM -means Clustering Gaussian Mixture Models (GMM) Expectation Maximiation (EM) for Mixture Parameter Estimation Introduction Mixture models allows Complex
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Prototype Methods Aaron Bobick School of Interactive Computing Administrivia Problem 2b was extended to March 25. Done? PS3 will be out this real soon (tonight) due April 10.
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More informationChapter 6 Normal Probability Distributions
Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationIntegrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.
Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationExpectation-Maximization. Nuno Vasconcelos ECE Department, UCSD
Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization
More informationPerformance Evaluation
Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance
More informationProbabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks
Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Dezhen Song CS Department, Texas A&M University Technical Report: TR 2005-2-2 Email: dzsong@cs.tamu.edu
More informationCS Introduction to Data Mining Instructor: Abdullah Mueen
CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.
More informationMachine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization
Machine Learning A 708.064 WS15/16 1sst KU Version: January 11, 2016 Exercises Problems marked with * are optional. 1 Conditional Independence I [3 P] a) [1 P] For the probability distribution P (A, B,
More informationDOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA
Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion
More informationALTERNATIVE METHODS FOR CLUSTERING
ALTERNATIVE METHODS FOR CLUSTERING K-Means Algorithm Termination conditions Several possibilities, e.g., A fixed number of iterations Objects partition unchanged Centroid positions don t change Convergence
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationLecture 8: The EM algorithm
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses
More informationChapter 6 Continued: Partitioning Methods
Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationLAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT
NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling
More informationData Mining. Lecture 03: Nearest Neighbor Learning
Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F. Provost
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationClustering. Shishir K. Shah
Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More informationColoring 3-Colorable Graphs
Coloring -Colorable Graphs Charles Jin April, 015 1 Introduction Graph coloring in general is an etremely easy-to-understand yet powerful tool. It has wide-ranging applications from register allocation
More informationCHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION
CHAPTER 6 INFORMATION HIDING USING VECTOR QUANTIZATION In the earlier part of the thesis different methods in the spatial domain and transform domain are studied This chapter deals with the techniques
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationClustering Distance measures K-Means. Lecture 22: Aykut Erdem December 2016 Hacettepe University
Clustering Distance measures K-Means Lecture 22: Aykut Erdem December 2016 Hacettepe University Last time Boosting Idea: given a weak learner, run it multiple times on (reweighted) training data, then
More informationAM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 2 Wednesday, January 27th 1 Overview In our previous lecture we discussed several applications of optimization, introduced basic terminology,
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationDerivatives and Graphs of Functions
Derivatives and Graphs of Functions September 8, 2014 2.2 Second Derivatives, Concavity, and Graphs In the previous section, we discussed how our derivatives can be used to obtain useful information about
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationGaussian Mixture Models For Clustering Data. Soft Clustering and the EM Algorithm
Gaussian Mixture Models For Clustering Data Soft Clustering and the EM Algorithm K-Means Clustering Input: Observations: xx ii R dd ii {1,., NN} Number of Clusters: kk Output: Cluster Assignments. Cluster
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Nearest Neighbor Learning Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. R. Mooney (UT Austin) Prof E. Keogh (UCR), Prof. F.
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationGetting to Know Your Data
Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss
More informationNearest Neighbor Classification
Nearest Neighbor Classification Charles Elkan elkan@cs.ucsd.edu October 9, 2007 The nearest-neighbor method is perhaps the simplest of all algorithms for predicting the class of a test example. The training
More informationWeighted and Continuous Clustering
John (ARC/ICAM) Virginia Tech... Math/CS 4414: http://people.sc.fsu.edu/ jburkardt/presentations/ clustering weighted.pdf... ARC: Advanced Research Computing ICAM: Interdisciplinary Center for Applied
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationVoluntary State Curriculum Algebra II
Algebra II Goal 1: Integration into Broader Knowledge The student will develop, analyze, communicate, and apply models to real-world situations using the language of mathematics and appropriate technology.
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationClustering Color/Intensity. Group together pixels of similar color/intensity.
Clustering Color/Intensity Group together pixels of similar color/intensity. Agglomerative Clustering Cluster = connected pixels with similar color. Optimal decomposition may be hard. For example, find
More informationCPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015
CPSC 340: Machine Learning and Data Mining Robust Regression Fall 2015 Admin Can you see Assignment 1 grades on UBC connect? Auditors, don t worry about it. You should already be working on Assignment
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationSpatial Outlier Detection
Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationCSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection
CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:
More information