Machine Learning. K-means Algorithm

Similar documents
Expectation Maximization (EM). Mixtures of Gaussians. Learning probability distribution

Support Vector Machines. CS534 - Machine Learning

Exact solution, the Direct Linear Transfo. ct solution, the Direct Linear Transform

Unsupervised Learning and Clustering

CS 534: Computer Vision Model Fitting

Feature Reduction and Selection

Classification / Regression Support Vector Machines

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

Priority queues and heaps Professors Clark F. Olson and Carol Zander

Machine Learning 9. week

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Unsupervised Learning

Announcements. Supervised Learning

High Dimensional Data Clustering

Prof. Feng Liu. Spring /24/2017

Multi-stable Perception. Necker Cube

The ray density estimation of a CT system by a supervised learning algorithm

EXTENDED BIC CRITERION FOR MODEL SELECTION

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

Geometric Transformations and Multiple Views

Unsupervised Learning and Clustering

Section 2.3: Calculating Limits using the Limit Laws

Mathematics 256 a course in differential equations for engineering students

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

INF Repetition Anne Solberg INF

Problem Set 3 Solutions

An Ensemble Learning algorithm for Blind Signal Separation Problem

Lecture 4: Principal components

Support Vector Machines

Three supervised learning methods on pen digits character recognition dataset

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Context-Specific Bayesian Clustering for Gene Expression Data

The AVL Balance Condition. CSE 326: Data Structures. AVL Trees. The AVL Tree Data Structure. Is this an AVL Tree? Height of an AVL Tree

KFUPM. SE301: Numerical Methods Topic 8 Ordinary Differential Equations (ODEs) Lecture (Term 101) Section 04. Read

CERIAS Tech Report Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering by Yu-Sung Wu, Saurabh Bagchi, Navjot Singh,

Programming in Fortran 90 : 2017/2018

Lecture #5.3 Mirrors

Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Article. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

Mixed Linear System Estimation and Identification

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Machine Learning. Topic 6: Clustering

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

CS 6140: Machine Learning Spring 2016

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Biostatistics 615/815

Radial Basis Functions

INF 4300 Support Vector Machine Classifiers (SVM) Anne Solberg

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Bayesian Networks: Independencies and Inference. What Independencies does a Bayes Net Model?

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

LogisBcs. CS 6140: Machine Learning Spring K-means Algorithm. Today s Outline 3/27/16

LOOP ANALYSIS. The second systematic technique to determine all currents and voltages in a circuit

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

SUV Color Space & Filtering. Computer Vision I. CSE252A Lecture 9. Announcement. HW2 posted If microphone goes out, let me know

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Mode-seeking by Medoidshifts

Lecture #15 Lecture Notes

5.0 Quality Assurance

A Distributed First and Last Consistent Global Checkpoint Algorithm

A Robust Method for Estimating the Fundamental Matrix

LECTURE : MANIFOLD LEARNING

Greedy Technique - Definition

Path Planning for Formation Control of Autonomous

Measuring Integration in the Network Structure: Some Suggestions on the Connectivity Index

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

X- Chart Using ANOM Approach

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Machine Learning: Algorithms and Applications

11. APPROXIMATION ALGORITHMS

Estimating Human Body Pose from a Single Image via the Specialized Mappings Architecture

THE classic dichotomy between generative and discriminative

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Anonymisation of Public Use Data Sets

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

SVM-based Learning for Multiple Model Estimation

GSLM Operations Research II Fall 13/14

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Summarizing Data using Bottom-k Sketches

5 The Primal-Dual Method

Investigations of Topology and Shape of Multi-material Optimum Design of Structures

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Hierarchical Optimization on Manifolds for Online 2D and 3D Mapping

A B-Snake Model Using Statistical and Geometric Information - Applications to Medical Images

Discriminative classifiers for object classification. Last time

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

Transcription:

Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data set n terms of K clusters eac of wc s summarzed by a prototype µ k Intalze prototypes ten terate between two pases: step: assgn eac data pont to nearest prototype M step: update prototypes to be te cluster means Smplest verson s based on ucldean dstance 2 1

robablstc Clusterng Represent te probablty dstrbuton of te data as a mture model - captures uncertanty n cluster assgnments - gves model for data dstrbuton Consder mtures of Gaussans 3 Mamum Lkelood Soluton Mamzng w.r.t. te mean gves te sample mean Mamzng w.r.t covarance gves te sample covarance 4 2

Gaussan Mtures Lnear super-poston of Gaussans Normalzaton and postvty requre Can nterpret te mng coeffcents as pror probabltes 5 ample: Mture of 3 Gaussans 1 0.5 0 0 0.5 1 a 6 3

Contours of robablty Dstrbuton 1 0.5 0 0 0.5 1 b 7 Samplng from te Gaussan To generate a data pont: frst pck one of te components wt probablty ten draw a sample from tat component Repeat tese two steps for eac new data pont 8 4

1 Syntetc Data Set 0.5 0 0 0.5 1 a 9 Fttng te Gaussan Mture We ws to nvert ts process gven te data set fnd te correspondng parameters: mng coeffcents means covarances If we knew wc component generated eac data pont te mamum lkelood soluton would nvolve fttng eac component to te correspondng cluster roblem: te data set s unlabelled We sall refer to te labels as latent = dden varables 10 5

Syntetc Data Set Wtout Labels 1 0.5 0 0 0.5 1 b 11 osteror robabltes We can tnk of te mng coeffcents as pror probabltes for te components For a gven value of we can evaluate te correspondng posteror probabltes. Tese are gven from Bayes teorem by 12 6

osteror robabltes colour coded 1 0.5 0 0 0.5 1 a 13 Mamum Lkelood for te GMM Te lkelood functon takes te form Note: sum over components appears nsde te Tere s no closed form soluton for mamum lkelood Solved by epectaton-mamzaton M algortm F K 14 7

M Algortm Informal Dervaton Let us proceed by smply dfferentatng te lkelood D µ π Σ = { π k N N K = 1 k = 1 k } N k = N µ Σ k k For µ j N = 1 k µ = π N j j 1 Σ j k Nk π γ γ j j u γ j j = 0 15 M Algortm Informal Dervaton Smlarly for te covarances For mng coeffcents use a Lagrange multpler constran: sum up to 1 16 8

M Algortm Informal Dervaton Te solutons are not closed form snce tey are coupled Suggests an teratve sceme for solvng tem: make ntal guesses for te parameters alternate between te followng two stages: -step: evaluate responsbltes M-step: update parameters usng ML results ac M cycle guaranteed not to decrease te lkelood 17 18 9

19 20 10

21 22 11

23 Relaton to K-means Consder GMM wt common covarances Σ=I Take lmt Responsbltes become bnary M algortm s precsely equvalent to K-means 24 12

M for GMM Iterate. On te t t teraton -step: compute epected classes of all data ponts for eac class M-step: compute mamum lkelood µ gven our data s class membersp dstrbutons e.g. 25 Te M Algortm In General Gven observed varable X unobserved Z -step: epected lkelood for GMM known co-varance parameters are M-step: mamze Q to fnd te new. 26 13

Te M Algortm Identfy te suffcent statstcs for estmatng te s Intalze te s to some arbtrary non-zero values 0 Iterate te -step and te M-step. Durng step k compute te epected values of te suffcent statstcs based on te current parameter estmates k -step derve k+1 as an ML estmate usng te values of te suffcent statstcs computed n te -step M-step k + 1 k Termnate wen L data L data 27 Is Incomplete Log Lkelood Mamzed? Teorem: Let be our ncomplete data our dden data and a parametrc model tat generates and. If we coose suc tat ncreasng epected LL ten ' > > ' Increasng lkelood Lemma: p p p q tat s p p p q 28 14

15 29 Takng epectaton on bot sdes w.r.t. we ave roof of M = = = = ' > = * Is? 30 roof of M Cont = ' ' ' + = ' * Substtute for n * we ave: ' Now by assumpton we ave: By te lemma we ave: ' > Addng te two gves

M Summary For learnng from partly unobserved data ML estmate of M estmate = arg ma data = arg ma Z X [ X Z were X s observed part of data Z s unobserved. ] 31 Usng M n ractce M may not work well n practce otental problems get stuck at a local mamum Solutons: select dfferent startng ponts searc by smulated annealng overfttng te tranng data Solutons: use eld-out data add regularzaton te underlyng generatve model s ncorrect Solutons: f te model 32 16

Over-fttng n Gaussan Mture Models Sngulartes n lkelood functon wen a component collapses onto a data pont: ten consder Lkelood functon gets larger as we add more components and ence parameters to te model not clear ow to coose te number K of components 33 Can M really mprove te underlyng classfer? It depends on weter te data s generated by a mture tere s a 1-to-1 mappng between te mture components and classes 34 17