Macne Learnng CS 6375 --- Sprng 2015 Gaussan Mture Model GMM pectaton Mamzaton M Acknowledgement: some sldes adopted from Crstoper Bsop Vncent Ng. 1 K-means Algortm Specal case of M Goal: represent a data set n terms of K clusters eac of wc s summarzed by a prototype µ k Intalze prototypes ten terate between two pases: step: assgn eac data pont to nearest prototype M step: update prototypes to be te cluster means Smplest verson s based on ucldean dstance 2 1
robablstc Clusterng Represent te probablty dstrbuton of te data as a mture model - captures uncertanty n cluster assgnments - gves model for data dstrbuton Consder mtures of Gaussans 3 Mamum Lkelood Soluton Mamzng w.r.t. te mean gves te sample mean Mamzng w.r.t covarance gves te sample covarance 4 2
Gaussan Mtures Lnear super-poston of Gaussans Normalzaton and postvty requre Can nterpret te mng coeffcents as pror probabltes 5 ample: Mture of 3 Gaussans 1 0.5 0 0 0.5 1 a 6 3
Contours of robablty Dstrbuton 1 0.5 0 0 0.5 1 b 7 Samplng from te Gaussan To generate a data pont: frst pck one of te components wt probablty ten draw a sample from tat component Repeat tese two steps for eac new data pont 8 4
1 Syntetc Data Set 0.5 0 0 0.5 1 a 9 Fttng te Gaussan Mture We ws to nvert ts process gven te data set fnd te correspondng parameters: mng coeffcents means covarances If we knew wc component generated eac data pont te mamum lkelood soluton would nvolve fttng eac component to te correspondng cluster roblem: te data set s unlabelled We sall refer to te labels as latent = dden varables 10 5
Syntetc Data Set Wtout Labels 1 0.5 0 0 0.5 1 b 11 osteror robabltes We can tnk of te mng coeffcents as pror probabltes for te components For a gven value of we can evaluate te correspondng posteror probabltes. Tese are gven from Bayes teorem by 12 6
osteror robabltes colour coded 1 0.5 0 0 0.5 1 a 13 Mamum Lkelood for te GMM Te lkelood functon takes te form Note: sum over components appears nsde te Tere s no closed form soluton for mamum lkelood Solved by epectaton-mamzaton M algortm F K 14 7
M Algortm Informal Dervaton Let us proceed by smply dfferentatng te lkelood D µ π Σ = { π k N N K = 1 k = 1 k } N k = N µ Σ k k For µ j N = 1 k µ = π N j j 1 Σ j k Nk π γ γ j j u γ j j = 0 15 M Algortm Informal Dervaton Smlarly for te covarances For mng coeffcents use a Lagrange multpler constran: sum up to 1 16 8
M Algortm Informal Dervaton Te solutons are not closed form snce tey are coupled Suggests an teratve sceme for solvng tem: make ntal guesses for te parameters alternate between te followng two stages: -step: evaluate responsbltes M-step: update parameters usng ML results ac M cycle guaranteed not to decrease te lkelood 17 18 9
19 20 10
21 22 11
23 Relaton to K-means Consder GMM wt common covarances Σ=I Take lmt Responsbltes become bnary M algortm s precsely equvalent to K-means 24 12
M for GMM Iterate. On te t t teraton -step: compute epected classes of all data ponts for eac class M-step: compute mamum lkelood µ gven our data s class membersp dstrbutons e.g. 25 Te M Algortm In General Gven observed varable X unobserved Z -step: epected lkelood for GMM known co-varance parameters are M-step: mamze Q to fnd te new. 26 13
Te M Algortm Identfy te suffcent statstcs for estmatng te s Intalze te s to some arbtrary non-zero values 0 Iterate te -step and te M-step. Durng step k compute te epected values of te suffcent statstcs based on te current parameter estmates k -step derve k+1 as an ML estmate usng te values of te suffcent statstcs computed n te -step M-step k + 1 k Termnate wen L data L data 27 Is Incomplete Log Lkelood Mamzed? Teorem: Let be our ncomplete data our dden data and a parametrc model tat generates and. If we coose suc tat ncreasng epected LL ten ' > > ' Increasng lkelood Lemma: p p p q tat s p p p q 28 14
15 29 Takng epectaton on bot sdes w.r.t. we ave roof of M = = = = ' > = * Is? 30 roof of M Cont = ' ' ' + = ' * Substtute for n * we ave: ' Now by assumpton we ave: By te lemma we ave: ' > Addng te two gves
M Summary For learnng from partly unobserved data ML estmate of M estmate = arg ma data = arg ma Z X [ X Z were X s observed part of data Z s unobserved. ] 31 Usng M n ractce M may not work well n practce otental problems get stuck at a local mamum Solutons: select dfferent startng ponts searc by smulated annealng overfttng te tranng data Solutons: use eld-out data add regularzaton te underlyng generatve model s ncorrect Solutons: f te model 32 16
Over-fttng n Gaussan Mture Models Sngulartes n lkelood functon wen a component collapses onto a data pont: ten consder Lkelood functon gets larger as we add more components and ence parameters to te model not clear ow to coose te number K of components 33 Can M really mprove te underlyng classfer? It depends on weter te data s generated by a mture tere s a 1-to-1 mappng between te mture components and classes 34 17