Unsupervised Learning and Clustering

Unsupervsed Learnng and Clusterng

Why consder unlabeled samples?. Collectng and labelng large set of samples s costly Gettng recorded speech s free, labelng s tme consumng 2. Classfer could be desgned on small set of labeled samples and tuned on a large unlabeled set 3. Tran on large unlabeled set and use supervson on groupngs found 4. Characterstcs of patterns may change wth tme 5. Unsupervsed methods can be used to fnd useful features 6. Exploratory data analyss may dscover presence of sgnfcant subclasses affectng desgn

Mxture Denstes and Identfablty Samples come from c classes Prors are nown Pω ) Forms of the class-condtonal are nown Values for ther parameters are unnown Probablty densty functon of samples s: px, θ ) c px ω, θ ) P ω )

Gradent Ascent for Mxtures Mxture densty: Lelhood of observed samples: Log-lelhood: Gradent w.r.t. θ : MLE must satsfy: ) ), x ) x, c P p p ω θ ω θ n x p D p ) ) θ θ n x p l ) ln θ c n P x p x p l ) ), ) ω θ ω θ θ θ 0 ) ˆ, ln ˆ), n x p x P θ ω θ ω θ

Gaussan Mxture Unnown mean vectors, yelds Leadng to an teratve scheme for mprovng estmates t n n x P x x P ) ˆ,.. ˆ ˆ where ˆ), ˆ), ˆ c µ µ µ µ ω µ ω µ )) ˆ, ) ˆ + n x x P µ ω µ )) ˆ, n x P µ ω

-means clusterng Gaussan case wth all parameters unnown leads to a formulaton: begn ntalze n, c, µ,µ 2,..,µ c do classfy n samples accordng to nearest µ recompute µ untl no change n µ end

-means clusterng wth one feature One-dmensonal example Sx startng ponts lead local maxma whereas two for both of whch µ 0) µ 2 0) lead to a saddle pont

-means clusterng wth two features Two-dmensonal example There are three means and there are three steps n the teraton. Vorono tesselatons based on means are shown

Data Descrpton and Clusterng

Data Descrpton Learnng the structure of multdmensonal patterns from a set of unlabelled samples Form clouds of ponts n d-dmensonal space If data were from a sngle normal dstrbuton, mean and covarance metrc would suffce as a descrpton

Data sets havng dentcal statstcs upto second order,.e., same µ andσ

Mxture of c normal dstrbutons approach Estmatng parameters s non-trval Assumpton of partcular parametrc forms can lead to poor or meanngless results Alternatvely use nonparametrc approach: peas or modes can ndcate clusters If goal s to fnd sub-classes use clusterng procedures

Smlarty Measures Two Issues. How to measure smlarty between samples? 2. How to evaluate parttonng? If dstance s a good measure of dssmlarty dstance between samples n same cluster must be smaller than dstance between samples n dfferent clusters Two samples belong to the same cluster f dstance between them s less than a threshold d 0 Dstance threshold affects number and sze of clusters

Smlarty Measures for Clusterng Mnows Metrc d ' q dx, x' ) x x Metrc based on data tself: Mahanalobs dstance Angle between vectors as smlarty /q q 2 s Eucldean, q s Manhattan or cty bloc metrc s x, x') t x x x x' ' Cosne of angle between vectors s nvarant to rotaton and dlaton but not translaton and general lnear transformatons

Bnary Feature Smlarty Measures s x, x') ' x x t x x' Numerator no of attrbutes possessed by both x and x Denomnator x t xx t x ) /2 s geometrc mean of no of attrbutes possessed by x and x xt x ' s x, x') Fracton of attrbutes shared d s x, x') t x x + t ' x x x' t x ' t x x ' Tanmoto coeffcent: Rato of number of shared attrbutes to number possessed by x or x

Issues n Choce of Smlarty Functon Tanmoto coeffcent used n Informaton Retreval and Taxonomy Fundamental ssues n Measurement Theory Combnng features s trcy: nches versus meters Nomnal, ordnal, nterval and rato scales

Crteron Functons for Clusterng Sum of squared errors crteron Mean of samples n D m 2 x x D J e c x D x m 2 Crteron s not best when two clusters are of unequal sze Sutable when they are compact clouds

Related Mnmum Varance Crtera J e 2 where c s n s n 2 x D x' D x x' 2 Can be replaced by other smlarty functon sx,x ) Optmal partton extremzes the crteron functon

Scatter Crtera Derved from Scatter Matrces Trace crteron Determnant Crteron Invarant Crtera

Herarchcal Clusterng

Dendrogram

Agglomeratve Algorthm

Nearest Neghbor Algorthm

Farthest Neghbor Algorthm

How to determne nearest clusters