Biostatistics 615/815

Similar documents
CS 534: Computer Vision Model Fitting

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Unsupervised Learning

Multiple optimum values

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Unsupervised Learning and Clustering

Support Vector Machines

Feature Reduction and Selection

Machine Learning: Algorithms and Applications

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Monte Carlo Integration

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

Lecture 4: Principal components

Machine Learning. Topic 6: Clustering

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Improving Low Density Parity Check Codes Over the Erasure Channel. The Nelder Mead Downhill Simplex Method. Scott Stransky

Support Vector Machines

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Active Contours/Snakes

X- Chart Using ANOM Approach

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Complex Filtering and Integration via Sampling

CSE 326: Data Structures Quicksort Comparison Sorting Bound

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Multi-stable Perception. Necker Cube

Programming in Fortran 90 : 2017/2018

Smoothing Spline ANOVA for variable screening

A Robust Method for Estimating the Fundamental Matrix

Edge Detection in Noisy Images Using the Support Vector Machines

5 The Primal-Dual Method

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CSE 326: Data Structures Quicksort Comparison Sorting Bound

Wishing you all a Total Quality New Year!

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Mathematics 256 a course in differential equations for engineering students

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Meta-heuristics for Multidimensional Knapsack Problems

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Hermite Splines in Lie Groups as Products of Geodesics

Solving two-person zero-sum game by Matlab

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

EXTENDED BIC CRITERION FOR MODEL SELECTION

Optimizing Document Scoring for Query Retrieval

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Three supervised learning methods on pen digits character recognition dataset

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

LEAST SQUARES. RANSAC. HOUGH TRANSFORM.

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

Programming Assignment Six. Semester Calendar. 1D Excel Worksheet Arrays. Review VBA Arrays from Excel. Programming Assignment Six May 2, 2017

LECTURE NOTES Duality Theory, Sensitivity Analysis, and Parametric Programming

AP PHYSICS B 2008 SCORING GUIDELINES

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

CS221: Algorithms and Data Structures. Priority Queues and Heaps. Alan J. Hu (Borrowing slides from Steve Wolfman)

Inverse Kinematics (part 2) CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Spring 2016

Structure from Motion

Intro. Iterators. 1. Access

The Codesign Challenge

GSLM Operations Research II Fall 13/14

Fitting & Matching. Lecture 4 Prof. Bregler. Slides from: S. Lazebnik, S. Seitz, M. Pollefeys, A. Effros.

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

S1 Note. Basis functions.

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Stability Region based Expectation Maximization for Model-based Clustering

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

Adjustment methods for differential measurement errors in multimode surveys

An Improved Image Segmentation Algorithm Based on the Otsu Method

Announcements. Supervised Learning

Fitting: Deformable contours April 26 th, 2018

A multi-level thresholding approach using a hybrid optimal estimation algorithm

A DATA ANALYSIS CODE FOR MCNP MESH AND STANDARD TALLIES

IMAGE MATCHING WITH SIFT FEATURES A PROBABILISTIC APPROACH

Lecture 9 Fitting and Matching

Classifier Selection Based on Data Complexity Measures *

CS 268: Lecture 8 Router Support for Congestion Control

We Two Seismic Interference Attenuation Methods Based on Automatic Detection of Seismic Interference Moveout

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Cost-efficient deployment of distributed software services

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Outline. Midterm Review. Declaring Variables. Main Variable Data Types. Symbolic Constants. Arithmetic Operators. Midterm Review March 24, 2014

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Random Variables and Probability Distributions

SENSITIVITY ANALYSIS IN LINEAR PROGRAMMING USING A CALCULATOR

Self-tuning Histograms: Building Histograms Without Looking at Data

Monte Carlo 1: Integration

High-Boost Mesh Filtering for 3-D Shape Enhancement

Performance improvement for optimization of non-linear geometric fitting problem in manufacturing metrology*

Classification Based Mode Decisions for Video over Networks

arxiv: v1 [cs.db] 15 Jan 2016

Multi-objective Design Optimization of MCM Placement

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Transcription:

The E-M Algorthm Bostatstcs 615/815 Lecture 17

Last Lecture: The Smplex Method General method for optmzaton Makes few assumptons about functon Crawls towards mnmum Some recommendatons Multple startng ponts Restart maxmzaton at proposed soluton

Summary: The Smplex Method hgh reflecton Orgnal Smplex low contracton reflecton and expanson multple contracton

Improvements to amoeba() Dfferent scalng along each dmenson If parameters have dfferent mpact on the lkelhood Track total functon evaluatons Avod gettng stuck f functon does not cooperate Rotate smplex p If the current smplex s leadng to slow mprovement

optm() Functon n R optm(pont, functon, method) Pont startng pont for mnmzaton Functon that accepts pont as argument p p g Method can be "Nelder-Mead" for smplex method (default) "BFGS", "CG" and other optons use gradent

Other Methods for Mnmzaton n Multple Dmensons Typcally, sophstcated methods wll Use dervatves May be calculated numercally. How? Select a drecton for mnmzaton, usng: Weghted average of prevous drectons Current gradent Avod rght angle turns

One parameter at a tme Smple but neffcent approach Consder Parameters θ = (θ 1, θ 2,, θ k ) Functon f (θ) M th t t h t Maxmze θ wth respect to each θ n turn Cycle through parameters

The Ineffcency θ 2 θ 1

Steepest Descent Consder Parameters θ = (θ 1, θ 2,, θ k ) F f( ) Functon f(θ; x) Score vector d ln f d ln f S = =,..., dθ dθ1 Fnd maxmum along θ + δs d ln f dθ k

Stll neffcent Consecutve steps are stll perpendcular!

Other Strateges for Multdmensonal Optmzaton Most strateges wll defne a seres of vectors or lnes through parameter space Estmate of mnmum mproved by addng an optmal multple l of each vector Some ntutve choces mght be: The functon gradent Unt vectors along one dmenson

The key s to rght angle turns! Most methods that use dervatves don t smply optmze functon along current gradent or the unt vectors

Today The E-M algorthm General algorthm for mssng data problems Requres "specalzaton" to the problem at hand Frequently appled to mxture dstrbutons

The E-M Algorthm Orgnal Ctaton Dempster, Lard and Rubn (1977) J Royal Statstcal Socety (B) 39:1-3838 Cted n over 9,184 research artcles For comparson Nelder and Mead (1965) Computer Journal 7: 308-313 Cted n over 8,094 research artcles

The Basc E-M Strategy X = (Y, Z) Complete data X Observed data Y Mssng data Z (eg. what we d lke to have!) (eg. ndvdual observatons) (eg. class assgnments) The algorthm Use estmated parameters to nfer Z Update estmated parameters usng Y and Z Repeat untl convergence

The E-M Algorthm Consder a set of startng parameters Use these to estmate the mssng data Use complete data to update parameters Repeat as necessary

Settng for the E-M Algorthm... Problem s smpler to solve for complete data Maxmum lkelhood estmates can be calculated usng standard methods Estmates of mxture parameters could be obtaned n straghtforward manner f the orgn of each observaton s known

Fllng In Mssng Data The mssng data s the group assgnment for each observaton Complete data generated by assgnng observatons to groups Probablstcally We wll use fractonal assgnments

The E-Step: Mxture of Normals Estmate mssng data Estmate assgnment of observatons to groups How? Condtonal on current parameter values Bascally, classfy each observaton

Classfcaton Probabltes = = l l j j x f x f, x j Z ), ( ), ( ),, Pr( η φ π η φ π π φ η l l l f ), ( η φ Results from the applcaton of Bayes' theorem Results from the applcaton of Bayes theorem Implemented n classprob() functon Implemented n classprob() functon classprob(nt j, double x, nt k, double *prob, double *mean, double *sd)

C Code: Updatng Group Membershps vod update_class_prob(nt n, double * data, nt k, double * prob, double * mean, double * sd, double ** class_prob) { nt, j; for ( = 0; < n; ++) for (j = 0; j < k; j++) class_prob[][j] = classprob(j, data[], [ k, prob, mean, sd); }

The M-Step Update mxture parameters to maxmze the lkelhood of the data Appears trcky, but becomes smple when we assume cluster assgnments are correct We smply use the sample proportons, and weghted means and varances to update parameters Ths step s guaranteed never to decrease lkelhood

Updatng Mxture Proportons π = Pr( Z = j x, π, φ, η ) n "Count" the observatons assgned to each group

C Code: Updatng Mxture Proportons vod update_prob(nt n, double * data, nt k, double * prob, double ** class_prob) { nt, j; for (nt j = 0; j < k; j++) { prob[j] = 0.0; 0 for (nt = 0; < n; ++) prob[j] += class_prob[][j]; } prob[j] /= n; }

Updatng Component Means j, x j Z, x j Z x η η μ = = = ),, Pr( ),, Pr( ˆ π φ π φ, x j Z x η = = ),, Pr( φ π nπ j Calculate weghted mean for group Calculate weghted mean for group Weghts are probabltes of group membershp

C Code: Update Component Means vod update_mean(nt n, double * data, nt k, double * prob, double * mean, double ** class_prob) { nt, j; for (nt j = 0; j < k; j++) { mean[j] = 0.0; 0 for (nt = 0; < n; ++) mean[j] += data[] * class_prob[][j]; } mean[j] /= n * prob[j] + TINY; }

Updatng Component Varances ˆ 2 σ = 2 ( x μ ) Pr( Z = j x, π, φ, η ) nπ j Calculate weghted sum of squared dfferences Weghts are probabltes of group membershp

C Code: Update Component Std Devatons vod update_sd(nt n, double * data, nt k, double * prob, double * mean, double * sd, double ** class_prob) { nt, j; for (nt j = 0; j < k; j++) { sd[j] = 0.0; 0 for (nt = 0; < n; ++) sd[j] += square(data[] - mean[j]) * class_prob[][j]; } sd[j] /= (n * prob[j] + TINY); sd[j] = sqrt(sd[j]); }

C Code: Update Mxture vod update_parameters (nt n, double * data, nt k, double * prob, double * mean, double * sd, double ** class ass_prob) { // Frst, we update the mxture proportons update_prob(n, data, k, prob, class_prob); // Next, update the mean for each component update_mean(n, data, k, prob, mean, class_prob); // Fnally, update the standard devaton update_sd(n, data, k, prob, mean, sd, class_prob); }

E-M Algorthm For Mxtures 1. Guesstmate startng parameters 2. Use Bayes' theorem to calculate group assgnment probabltes 3. Update parameters usng estmated assgnments 4. Repeat steps 2 and 3 untl lkelhood lh s stable

C Code: The E-M Algorthm double em(nt n, double * data, nt k, double * prob, double * mean, double * sd, double eps) { double llk = 0, prev_llk = 0; double ** class_prob = alloc_matrx(n, k); start_em(n, data, k, prob, mean, sd); do { prev_llk = llk; update_class_prob(n, data, k, prob, mean, sd, class_prob); update_parameters(n, data, k, prob, mean, sd, class_prob); llk = mxllk(n, ( data, k, prob, mean, sd); } whle (!check_tol(llk, prev_llk, eps) ); return llk; }

Pckng Startng Parameters Mxng proportons Assumed equal Means for each group Pck one observaton as the group mean Varances for each group Use overall varance

C Code: Pckng Startng Parameters vod start_ em(nt( n,, double * data,, nt k, double * prob, double * mean, double * sd) { nt, j; double mean1 = 0.0, sd1 = 0.0; for ( = 0; < n; ++) mean1 += data[]; mean1 /= n; for ( = 0; < n; ++) ) sd1 += square(data[] - mean1); sd1 = sqrt(sd1 / n); for (j = 0; j < k; j++) ) { prob[j] = 1.0 / k; mean[j] = data[rand() % n]; sd[j] = sd1; } }

Example Applcaton Old Fathful Eruptons (n = 272) Old Fathful Eruptons Freque 0 5 10 ncy 15 20 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Duraton (mns)

Usng Smplex Method A Mxture of Two Normals Ft 5 parameters Proporton n 1 st component, 2 means, 2 varances 44/50 runs found mnmum Requred about ~700 evaluatons Frst component contrbutes 0.348 of mxture Means are 2.018 and 4.273 Varances are 0.055 and 0.191 Maxmum log-lkelhood = -276.36

Usng E-M Algorthm A Mxture of Two Normals Ft 5 parameters 50/50 runs found maxmum Requred about ~25 evaluatons Frst component contrbutes 0.348 of mxture Means are 2.018 and 4.273 Varances are 0.055 and 0.191 Maxmum log-lkelhood = -276.36

Two Components Old Fathful Eruptons Ftted Dstrbuton 0 0.0 Freq quency 5 10 15 20 De ensty 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 Duraton (mns) 1 2 3 4 5 6 Duraton (mns)

Smplex Method: A Mxture of Three Normals Ft 8 parameters 2 proportons, 3 means, 3 varances Requred about ~1400 evaluatons Found best soluton n 7/50 runs Other solutons effectvely ncluded only 2 components The best solutons Components contrbutng.339, 0.512 and 0.149 Component means are 2.002, 4.401401 and 3.727 Varances are 0.0455, 0.106, 0.2959 Maxmum log-lkelhood = -267.89

Three Components Old Fathful Eruptons Ftted Dstrbuton Freq quency 0 5 10 Densty 0.1 0.2 0.3 15 20 0.4 0.5 0.6 0.7 1 2 3 4 5 6 Duraton (mns) 1 2 3 4 5 6 Duraton (mns) 0.0

E-M Algorthm: A Mxture of Three Normals Ft 8 parameters 2 proportons, 3 means, 3 varances Requred about ~150 evaluatons Found log-lkelhood of ~267.89 n 42/50 runs Found log-lkelhood of ~263.91 n 7/50 runs The best solutons Components contrbutng.160, 0.195 and 0.644 Component means are 1.856, 2.182 and 4.289 Varances are 0.00766, 0.0709 and 0.172 Maxmum log-lkelhood = -263.91

Three Components Old Fathful Eruptons Ftted Densty 0 0.0 Freq quency 5 10 De ensty 0.2 0.4 15 20 0.6 0.8 1 2 3 4 5 6 Duraton (mns) 1 2 3 4 5 6 Duraton (mns)

Convergence for E-M Algorthm LogLkelhood -200 Lkelh hood -300-400 Lkelhood -266-268 LogLkelhood -270 0 50 100 150 200 Iteraton -500 0 50 100 150 200 Iteraton

Convergence for E-M Algorthm Mxture Means 5 4 Me ean 3 2 1 0 50 100 150 200 Iteraton

E-M Algorthm: A Mxture of Four Normals Ft 11 parameters 3 proportons, 4 means, 4 varances Requred about ~300 evaluatons Found log-lkelhood lk lh of ~267.89 n 1/50 runs Found log-lkelhood of ~263.91 n 2/50 runs Found log-lkelhood of ~257.46 n 47/50 runs "Appears" more relable than wth 3 components

1.0 Four Components Old Fathful Eruptons Freq quency 0 5 10 De ensty 15 20 0.2 0.4 0.6 0.8 1 2 3 4 5 6 Duraton (mns) 1 2 3 4 5 6 Duraton 0.0

Today The E-M algorthm Mssng data formulaton Applcaton to mxture dstrbutons Consder multple startng ponts

Further Readng There s a nce dscusson of the E-M algorthm, wth applcaton to mxtures at: http://en.wkpeda.org/wk/em_algorthmalgorthm