Flexible Mixture Modeling and Model-Based Clustering in R

Size: px
Start display at page:

Download "Flexible Mixture Modeling and Model-Based Clustering in R"

Transcription

1 Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R 0 / 170

2 Outline Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Outline 1 / 170

3 Program Day 1: Afternoon 13:30 14:00 Registration 14:00 15:40 Lecture Introduction Estimation and inference 15:40 16:00 COFFEE BREAK 16:00 17:00 Practicals Day 2: Morning 09:00 10:40 Lecture 10:40 11:00 COFFEE BREAK 11:00 12:00 Practicals Mixtures of normal distributions Package mclust Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Outline 2 / 170

4 Program / 2 Day 2: Afternoon 14:00 15:40 Lecture 15:40 16:00 COFFEE BREAK 16:00 17:00 Practicals Day 3: Morning 09:00 10:40 Lecture Latent class analysis Package flexmix Mixtures of regression models Extensions and variants Extending flexmix 10:40 11:00 COFFEE BREAK 11:00 12:00 Practicals Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Outline 3 / 170

5 Material Website: Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Outline 4 / 170

6 Introduction Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 5 / 170

7 Finite mixture models Flexible model class with different special cases depending on data types and application. Types of application: Semi-parametric tool to approximate general distribution functions. Modeling of unobserved heterogeneity / detecting groups in data Model-based clustering. Areas of application: Astronomy Biology Economy... Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 6 / 170

8 Finite mixture models / 2 Histogram of faithful$waiting Density faithful$waiting Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 7 / 170

9 Finite mixture models / 3 Histogram of faithful$waiting Density faithful$waiting Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 8 / 170

10 Finite mixture models / Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 9 / 170

11 Finite mixture models / Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 10 / 170

12 Finite mixture models / x yn Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 11 / 170

13 Finite mixture models / x yn Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 12 / 170

14 Model definition A finite mixture model is given by K H(y x, Θ) = π k F k (y x, ϑ k ), k=1 where K π k = 1 and π k > 0 k. k=1 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 13 / 170

15 Model definition / 2 In the following we assume that the component specific density functions f k () exist and allow to define the density of the mixture distribution h(). h(y x, Θ) = K π k f k (y x, ϑ k ). k=1 Arbitrary and also different densities f k () can be used for each component. In most applications one assumes that f k () is from the same parametric distribution family or model for all k = 1,..., K. The component specific densities only differ in the values of the parameter vector ϑ k. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Introduction 14 / 170

16 Estimation and inference Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 15 / 170

17 Estimation The log-likelihood for N independent observations is given by [ N K ] log L(Θ) = l(θ) = log π k f k (y n x n, ϑ k ). n=1 k=1 Maximum-likelihood (ML) estimation: Direct optimization of likelihood (mostly in simpler cases). Expectation-maximization (EM) algorithm for more complicated models (Dempster, Laird and Rubin, 1977). EM followed by direct optimization for estimate of Hessian.... Bayesian estimation: Gibbs sampling (Diebolt and Robert, 1994) Markov chain Monte Carlo algorithm Applicable when the joint posterior distribution is not known explicitly, but the conditional posterior distributions of each variable/subsets of variables are known. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 16 / 170

18 EM algorithm General method for ML estimation in models with unobserved latent variables. The complete-data log-likelihood contains the observed and the unobserved / missing data. This is easier to estimate. Iterates between E-step, which computes the expectation of the complete-data log-likelihood, and M-step, where the expected complete-data log-likelihood is maximized. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 17 / 170

19 Missing data The component label vectors z n = (z nk ) k=1,...,k are treated as missing data. It holds that z nk {0, 1} and K k=1 z nk = 1 for all n = 1,..., N. The complete-data log-likelihood is given by K N log L c (Θ) = z nk [log π k + log f k (y n x n, ϑ k )]. k=1 n=1 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 18 / 170

20 EM algorithm: E-step Given the current parameter estimates Θ (i) replace the missing data z nk by the estimated a-posteriori probabilities ẑ (i) nk = P(k y n, x n, Θ (i) ) = π(i) k K u=1 f k(y n x n, ϑ (i) k ) π (i) u f k (y n x n, ϑ (i) u ) The conditional expectation of log L c (Θ) at the ith step is given by Q(Θ; Θ (i) ) = E Θ (i) [log L c (Θ) y, x] K N = ẑ (i) nk [log π k + log f k (y n x n, ϑ k )]. k=1 n=1. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 19 / 170

21 EM algorithm: M-step The next parameter estimate is given by: Θ (i+1) = arg max Q(Θ; Θ (i) ). Θ The estimates for the component sizes are given by: π (i+1) k = 1 N N n=1 ẑ (i) nk. The component specific parameter estimates are determined by: ϑ (i+1) k = arg max ϑ k N n=1 ẑ (i) nk log(f k(y n x n, ϑ k )). Weighted ML estimation of the component specific model. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 20 / 170

22 Mixtures of two univ. normal distributions: Example Finite mixture of two univariate normal distributions: h(y Θ) = πf N (y µ 1, σ 2 1) + (1 π)f N (y µ 2, σ 2 2), where f N ( µ, σ 2 ) is the density of the univariate normal distribution with mean µ and variance σ 2. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 21 / 170

23 Mixtures of two univ. normal distributions: Example / 2 1 Initialize the parameters µ (0) 1, σ(0) 1, µ(0) 2, σ(0) 2 and π (0). Set i = 0. 2 E-step: Determine the a-posteriori probabilities. The contributions to the likelihood for each observation and component are: z (i) n1 = π(i) f N (y n µ (i) 1, (σ2 1) (i) ), z (i) n2 = (1 π(i) )f N (y n µ (i) 2, (σ2 2) (i) ). The a-posteriori probability is given by: z (i) n1 = z (i) n1 z (i) n1 +. z(i) n2 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 22 / 170

24 Mixtures of two univ. normal distributions: Example / 3 3 M-step: π (i+1) = 1 N N n=1 µ (i+1) 1 = 1 Nπ (i) (σ 2 1) (i+1) = 1 Nπ (i) z (i) n1 N n=1 N n=1 (σ2) 2 (i+1) 1 = N(1 π (i) ) Increase i by 1. z (i) n1 y n, µ (i+1) 2 = z (i) n1 (y n µ (i) 1 )2 1 N(1 π (i) ) N (1 z (i) n1 )(y n µ (i) n=1 2 )2 4 Iterate between E- and M-step until convergence. N (1 z (i) n=1 n1 )y n Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 23 / 170

25 EM algorithm Advantages: The likelihood is increased in each step EM algorithm converges for bounded likelihoods. Relatively easy to implement: Disadvantages: Different mixture models require only different M-steps. Weighted ML estimation of the component specific model is sometimes already available in standard software. Standard errors have to be determined separately as the information matrix is not required during the algorithm. Convergence only to a local optimum. Slow convergence. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 24 / 170

26 Why does EM work? Given the definition of conditional probabilities it holds This can be transformed to P(z y, Θ ) = P(z, y Θ ) P(y Θ ). P(y Θ ) = P(z, y Θ ) P(z y, Θ ). For the log-likelihoods it thus holds l(θ ; y) = l c (Θ ; y, z) l 1 (Θ ; z y). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 25 / 170

27 Why does EM work? / 2 Taking expectation with respect to the conditional density of y, z y, Θ gives l(θ ; y) = E[l c (Θ ; y, z) y, Θ] E[l 1 (Θ ; z y) y, Θ] = Q(Θ, Θ) R(Θ, Θ). R(Θ, Θ) is the expected value of the log-likelihood of a density with parameter Θ taken with respect to the same density with parameter value Θ. Based on the Jensen s inequality it follows that R(Θ, Θ) is maximum for Θ = Θ. If Θ maximizes Q(Θ, Θ), it holds l(θ ; y) l(θ; y) = (Q(Θ, Θ) Q(Θ, Θ)) (R(Θ, Θ) R(Θ, Θ)) 0. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 26 / 170

28 EM as a Max-Max method The EM algorithm can also be interpreted as a joint maximization method. Consider the following function: F(Θ, P) = E P[l c (Θ ; y, z)] E P[log P(z)]. For fixed Θ the function F with respect to P(z) is maximized by P(z) = P[z y, Θ ]. In the M-step F() is maximized with respect Θ keeping P fixed. The observed log-likelihood and F(Θ, P) have the same value for P(z) = P[z y, Θ ]. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 27 / 170

29 EM as a Max-Max method / Model parameter Latent data parameter Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 28 / 170

30 EM algorithm: Variants Classification EM (CEM): assigns each observation to the component with the maximum a-posteriori probability. In general faster convergence than EM. Convergence to the classification likelihood. Stochastic EM (SEM): assigns each observation to one component by drawing from the multinomial distribution induced by the a-posteriori probabilities. Does not converge to the closest local optimum given the initial values. If started with too many components, some components will eventually become empty and will be eliminated. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 29 / 170

31 Bayesian estimation Determine the posterior density using Bayes theorem p(θ Y, X) h(y X, Θ)p(Θ), where p(θ) is the prior and Y = (y n ) n and X = (x n ) n. Standard prior distributions: Proper priors: Improper priors give improper posteriors. Independent priors for the component weights and the component specific parameters. Conjugate priors for the complete likelihood. Dirichlet distribution D(e 0,1,..., e 0,K ) for the component weights which is the conjugate prior for the multinomial distribution. Priors on the component specific parameters depend on the underlying distribution family. Invariant priors, e.g. the parameter for the Dirichlet prior is constant over all components: e 0,k e 0. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 30 / 170

32 Estimation: Gibbs sampling Starting with Z 0 = (z 0 n) n=1,...,n repeat the following steps for i = 1,..., I 0,..., I + I 0. 1 Parameter simulation conditional on the classification Z (i 1) : 1 Sample π 1,..., π K from D(( N n=1 z(i 1) nk + e 0,k ) k=1,...,k ). 2 Sample component specific parameters from the complete-data posterior p(ϑ 1,..., ϑ K Z (i 1), Y ) Store the actual values of all parameters Θ (i) = (π (i) k, ϑ(i) k ) k=1,...,k. 2 Classification of each observation (y n, x n ) conditional on knowing Θ (i) : Sample z (i) n from the multinomial distribution with parameter equal to the posterior probabilities. After discarding the burn-in draws the draws I 0 + 1,..., I + I 0 can be used to approximate all quantities of interest (after resolving label switching). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 31 / 170

33 Example: Gaussian distribution Assume an independence prior p(µ k, Σ 1 k ) f N (µ k ; b 0, B 0 )f W (Σ 1 k ; c 0, C 0 ). 1 Parameter simulation conditional on the classification Z (i 1) : 1 Sample π (i) 1,..., π(i) K from D(( N n=1 z(i 1) nk + e 0,k ) k=1,...,k ). 2 Sample (Σ 1 k ) (i) in each group k from a Wishart W(c k (Z (i 1) ), C k (Z (i 1) )) distribution. 3 Sample µ (i) k in each group k from a N (b k (Z (i 1) ), B k (Z (i 1) )) distribution. 2 Classification of each observation y n conditional on knowing Θ (i) : P(z (i) nk = 1 y n, Θ (i) ) π k f N (y n ; µ k, Σ k ) Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 32 / 170

34 Estimation: Gibbs sampling Advantages: Relatively easy to implement. Disadvantages: Different mixture models differ only in the parameter simulation step. Parameter simulation conditional on the classification is sometimes already available. Might fail to escape the attraction area of one mode Not all posterior modes are visited. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 33 / 170

35 Selecting the number of components A-priori known Information criteria: e.g. AIC, BIC, ICL: AIC = 2 log(l) + 2p, BIC = 2 log(l) + log(n)p, ICL = 2 log(l) + log(n)p + 2ent, where p is the number of estimated parameters and ent denotes the mean entropy of the a-posteriori probabilities. Likelihood ratio test statistic: in a ML framework. Comparison of nested models Regularity conditions are not fulfilled. Bayes factors in a Bayesian framework. Sampling schemes with a varying number of components in a Bayesian framework. Reversible-jump MCMC. Inclusion of birth-and-death processes. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 34 / 170

36 Initialization Construct a suitable parameter vector Θ (0). Random. Other estimation methods: e.g. moment estimators. Classify observations/assign a-posteriori probabilities to each observation. Random. Cluster analysis results: e.g. hierarchical clustering, k-means. Use short runs of EM, CEM or SEM with different initializations (Biernacki et al., 2003). Use different subsets of the complete data (Wehrens et al., 2004). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 35 / 170

37 Resampling for model diagnostics Testing for the number of components: For example, the likelihood-ratio test using the parametric bootstrap (McLachlan, 1987). Standard errors for the parameter estimates: For example using the parametric bootstrap with initialization in the solution (Basford et al., 1997). Identifiability: For example by testing for unimodality of the component specific parameter estimates using either empirical or parametric bootstrap with random initialization. Stability of induced partitions: For example by comparing the results based on the Rand index correct by chance (Hubert & Arabie, 1985) using either empirical or parametric bootstrap with random initialization. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 36 / 170

38 Identifiability Three kinds of identifiability issues: Label switching Overfitting: leads to empty components or components with equal parameters. Generic unidentifiability Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 37 / 170

39 Label switching The posterior distribution is invariant under a permutation of the components with the same component-specific model. Determine a unique labeling for component-specific inference: Impose a suitable ordering constraint, e.g. π s < π t s, t {1,..., S} with s < t. Minimize the distance to the Maximum-A-Posteriori (MAP) estimate. Fix the component membership for some observations. Relabeling algorithms. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Estimation and inference 38 / 170

40 Mixtures of distributions Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 39 / 170

41 Model definition A finite mixture of distributions is given by where H(y Θ) = K π k F k (y ϑ k ), k=1 K π k = 1 and π k > 0 k. k=1 In the case of mixtures of distributions sometimes one of the components is assumed to be from a different parametric family, e.g., to model noise. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 40 / 170

42 Distributions in the components We assume that we have p-dimensional data with Metric variables. Mixtures of multivariate normal distributions Binary / categorical variables. Latent class analysis: Assumes that variables within each component are independent. Both variable types. Mixed-mode data: Assuming again that variable types are independent within each component allows to separate the two cases (Hunt and Jorgensen, 1988). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 41 / 170

43 Generic identifiability Identifiable: (multivariate) normal-, Gamma-, exponential-, Cauchy- and Poisson distribution in the components. Not identifiable: continuous or discrete uniform distribution in the components. Identifiable under certain conditions: Mixture of binomial and multinomial distributions are identifiable if N 2K 1, where N is the parameter representing the number of trials. See for example Everitt & Hand (1981), Titterington et al. (1985), McLachlan & Peel (2000). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 42 / 170

44 Mixtures of normal distributions Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 43 / 170

45 Univariate normal distributions The density of a finite mixture of univariate normal distributions is given by h(y Θ) = K π k f N (y µ k, σk). 2 k=1 The likelihood is unbounded for degenerate solutions with σ 2 k = 0. Add a constraint that 1 σ 2 k > ɛ for all k or 2 π k > ɛ for all k (to achieve in general a similar effect). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 44 / 170

46 Univariate normal distributions / 2 Histogram of faithful$waiting Density faithful$waiting Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 45 / 170

47 Univariate normal distributions / 3 Histogram of faithful$waiting Density faithful$waiting Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 46 / 170

48 Univariate normal distributions / 4 The estimated model has the following parameters: Comp.1 Comp.2 pi mu sigma Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 47 / 170

49 Multivariate normal distributions For p-dimensional data the number of parameters to estimate is ( ) p(p + 1) K 1 + K p + 2 for unrestricted variance-covariance matrices. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 48 / 170

50 Multivariate normal distributions / 2 More parsimonious model specifications are: Restrict the variance-covariance matrices to be the same over components. Specify the structure of the variance-covariance matrix using its decomposition into volume, shape and orientation (Fraley & Raftery, 2002): Σ k = λ k D k diag(a k)d k. For these three characteristics restrictions can be imposed separately to be the same over components. In addition the orientation can be assumed to be equal to the identity matrix and the shape to be spherical. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 49 / 170

51 Multivariate normal distributions / 3 Specify a factor model for the variance-covariance matrix. Mixtures of factor analyzers (McLachlan & Peel, 2000) Analogous restrictions possible to obtain parsimonious Gaussian mixture models (McNicholas & Murphy, 2008). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 50 / 170

52 Multivariate normal distributions / Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 51 / 170

53 Multivariate normal distributions / Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 52 / 170

54 Multivariate normal distributions / Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 53 / 170

55 Mixtures of normal distributions: Example Example from Fraley & Raftery (2002), data from package spatstat. Estimation of a density for the occurrence of maple trees. 2-dimensional coordinates for 514 trees observed. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 54 / 170

56 Mixtures of normal distributions: Example / 2 y x Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 55 / 170

57 Mixtures of normal distributions: Example / 3 Mixtures with 1 to 9 components are fitted with all different restrictions on the variance-covariance matrices. The best model has 7 components and spherical variance-covariance matrices with different volume across components. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 56 / 170

58 Mixtures of normal distributions: Example / 4 y y x x Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 57 / 170

59 Package mclust Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 58 / 170

60 Package mclust Written by Chris Fraley, Adrian Raftery, Luca Scrucca, Thomas Brendan Murphy. In particular suitable for mixtures of uni- and multivariate normal distributions. Uses model-based hierarchical clustering for initialization. Variance-covariance matrices can be specified via volume, shape and orientation. Function Mclust() returns the best model according to the BIC. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 59 / 170

61 Package mclust: Model specification univariate: "E": identical variance "V": different variances multivariate: three letter abbreviation for volume: equal (E) / variable (V) shape: identity (I) / equal (E) / variable (V) orientation: identity (I) / equal (E) / variable (V) "EII": spherical, same volume "VII": spherical, different volume "EEI": diagonal matrix, same volume & shape "VEI": diagonal matrix, different volume, same shape "EVI": diagonal matrix, same volume, different shape "VVI": diagonal matrix, different volume & shape Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 60 / 170

62 Package mclust: Model specification / 2 "EEE": ellipsoidal, same volume, shape & orientation "EVE": ellipsoidal, same volume & orientation "VEE": ellipsoidal, same shape & orientation "VVE": ellipsoidal, same orientation "EEV": ellipsoidal, same volume & shape "VEV": ellipsoidal, same shape "EVV": ellipsoidal, same volume "VVV": ellipsoidal, different volume, shape and orientation Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 61 / 170

63 Package mclust: Model specification / 3 EII VII EEI VEI EVI VVI Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 62 / 170

64 Package mclust: Model specification / 4 EEE EVE VEE VVE EEV VEV EVV VVV Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 63 / 170

65 Package mclust: Function Mclust() Mclust() has the following arguments: data: A numeric vector, a numeric matrix or a data frame of observations. Categorical variables are not allowed. If a matrix or a data frame is supplied, the rows correspond to observations and columns to variables. G: An integer vector indicating the number of components for which the model should be fitted. The default is G = 1:9. modelnames: A character vector specifying the models (with different restrictions on the variance-covariance matrices) to fit. prior: The default is to use no prior. This argument allows to specify conditionally conjugate priors on the mean and the variance-covariance matrices of the components using priorcontrol(). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 64 / 170

66 Package mclust: Function Mclust() / 2 control: A list of control parameters for the EM algorithm. The defaults are obtained by calling emcontrol(). initialization: A list containing one or several elements of: (1) "hcpairs", providing the results of the hierarchical clustering, and (2) "subset", a logical or numeric vector to specify a subset. warn: A logical value to indicate if warnings (e.g., with respect to singularities) are issued. The default is to suppress warnings....: Catches unused arguments. Returns an object of class Mclust. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 65 / 170

67 Package mclust: Further methods The fitted object with class Mclust can be plotted using the associated plot method. The following plots can be generated: Comparison of the BIC values for all estimated models. Scatter plot indicating the classification for 2-dimensional data. 2-dimensional projection of the data with classification. 2-dimensional projection of the data with uncertainty of classification. Estimated density for 1- and 2-dimensional data. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 66 / 170

68 Package mclust: Maple trees > data("lansing", package = "spatstat") > maples <- subset(spatstat::as.data.frame.ppp(lansing), + marks == "maple", c("x", "y")) > library("mclust") > maplesmix <- Mclust(maples) > maplesmix Mclust model object: best model: spherical, varying volume (VII) with 7 compone > par(mfrow = c(2, 2)) > plot(maplesmix, what = "BIC") > plot(maplesmix, what = "classification") > plot(maplesmix, what = "uncertainty") > plot(maplesmix, what = "density") Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 67 / 170

69 Package mclust: Maple trees / 2 Classification BIC EII VII EEI VEI EVI VVI EEE EVE VEE VVE EEV VEV EVV VVV y Number of components x Uncertainty log Density Contour Plot y y x x Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 68 / 170

70 Latent class analysis Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 69 / 170

71 Latent class analysis Used for multivariate binary and categorical data. Often only local identifiability ensured. Observed variables are correlated because of the unobserved group structure. Within each component variables are assumed to be independent and the multivariate density is given by the product of the univariate densities. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 70 / 170

72 Latent class analysis / 2 In general the model can be written as K p h(y Θ) = f kj (y j ϑ kj ). k=1 π k In latent class analysis the densities f kj () for all k and j are from the same univariate parametric distribution family. j=1 For binary variables the Bernoulli distribution is used: f (y θ) = θ y (1 θ) 1 y. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 71 / 170

73 Latent class analysis: Example Example from Dayton (1999), data from package polca. Data from a survey among students about cheating. 319 students were questioned. The students were asked about their behavior during their bachelor studies. The following four questions are used in the following which are about 1 lying to avoid taking an exam, 2 lying to avoid handing a term paper in on time, 3 purchasing a term paper to hand in as their own or obtaining a copy of an exam prior to taking the exam, 4 copying answers during an exam from someone sitting near to them. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 72 / 170

74 Latent class analysis: Example / 2 The relative frequencies of agreement with the four questions are: 0.11, 0.12, 0.07, First it is assessed if the answers to the different questions are independent given the marginal distributions. The density is given by h(y Θ) = 4 j=1 θ y j j (1 θ j ) 1 y j, where θ j is the probability that question j is agreed to. A χ 2 -goodness-of-fit test is conducted: χ 2 = 136.3, p-value < 2e 16. This simple model does not fit the data well. Extension to mixture models. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 73 / 170

75 Latent class analysis: Example / 3 The finite mixture model is given by K 4 h(y Θ) = π k θ y j kj (1 θ kj) 1 y j, k=1 j=1 where θ kj is the probability to agree to question j, if the respondent is from component k. This model is fit with 2 components. The χ 2 -goodness-of-fit test gives χ 2 = 8.3, p-value = The null hypothesis that the data is from this model cannot be rejected. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 74 / 170

76 Latent class analysis: Example / 4 The size of the first component is The estimated parameters for both components are Comp.1 Comp.2 center.lieexam center.liepaper center.fraud center.copyexam Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 75 / 170

77 Package flexmix Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 76 / 170

78 Software: flexmix (Leisch, 2004; Grün & Leisch, 2008a) The function flexmix() provides the E-step and all data handling. The M-step is supplied by the user similar to glm() families. Multiple independent responses from different families Currently bindings to several GLM families exist (Gaussian, Poisson, Gamma, Binomial) Weighted, hard (CEM) and random (SEM) classification Components with prior probability below a user-specified threshold are automatically removed during iterations. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 77 / 170

79 flexmix design Primary goal is extensibility: ideal for trying out new mixture models. No replacement of specialized mixtures like Mclust(), but complement. Usage of S4 classes and methods. Formula-based interface. Multivariate responses: combination of univariate families: assumption of independence (given x), each response may have its own model formula, i.e., a different set of regressors. multivariate families: if family handles multivariate response directly, then arbitrary multivariate response distributions are possible. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 78 / 170

80 Function flexmix() flexmix() takes the following arguments: formula: A symbolic description of the model, which is fitted. The general form is y~x g, where y is the dependent variable, x the independent variables and g an optional categorical grouping variable for repeated observations. data: An optional data frame used for evaluating the formula. k: Number of components (not necessary if cluster is specified). cluster: Either a matrix with k columns which contains the initial a-posteriori probabilities; or a categorical variable or integer vector indicating the component memberships of observations. model: Object of class FLXM or a list of these objects. concomitant: Object of class FLXP. control: Object of class FLXcontrol or a named list. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 79 / 170

81 Function flexmix() / 2 Returns an object of class flexmix. Repeated calls of flexmix() with stepflexmix(): using random initialization. initflexmix(): combining short with long runs of EM. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 80 / 170

82 Controlling the EM algorithm FLXcontrol for setting the specifications for the EM algorithm: iter.max: maximum number of iterations. minprior: minimum component sizes. verbose: If larger than 0, then flexmix() provides status information each verbose iteration of the EM algorithm. classify: One of auto, weighted, CEM (or hard ), SEM (or random ). For convenience flexmix() also allows specification of the control argument through a named list, where the names are partially matched, e.g., flexmix(..., control = list(class = "r")) Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 81 / 170

83 Variants of mixture models Component specific models: FLXMxxx() Model-based clustering: FLXMCxxx() FLXMCmvnorm() FLXMCmvbinary() FLXMCmvpois()... Clusterwise regression: FLXMRxxx() FLXMRglm() FLXMRglmfix() FLXMRziglm()... Concomitant variable models: FLXPxxx() FLXPconstant() FLXPmultinom() Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 82 / 170

84 Methods for flexmix objects show(), summary(): some information on the fitted model. plot(): rootogram of posterior probabilities. refit(): refits an estimated mixture model to obtain the variance-covariance matrix. loglik(), BIC(),... : obtain log-likelihood and model fit criteria. parameters(), prior(): obtain component specific model parameters and prior class probabilities / component weights. posterior(), clusters(): obtain a-posteriori probabilities and assignments to the maximum a-posteriori probability. fitted(), predict(): fitted and predicted (component-specific) values. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 83 / 170

85 Package flexmix: Cheating Data preparation: > data("cheating", package = "polca") > X <- as.data.frame(table(cheating[, 1:4])) > X <- subset(x, Freq > 0) > X[, 1:4] <- sapply(x[,1:4], as.integer) - 1 > head(x) LIEEXAM LIEPAPER FRAUD COPYEXAM Freq Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 84 / 170

86 Package flexmix: Cheating / 2 Fit model with flexmix: > set.seed(201107) > FORMULA <- + cbind(lieexam, LIEPAPER, FRAUD, COPYEXAM) ~ 1 > CONTROL <- + list(tolerance = 10^-12, iter.max = 500) > cheatmix <- + stepflexmix(formula, k = 2, weights = ~ Freq, + model = FLXMCmvbinary(), data = X, + control = CONTROL, nrep = 3) 2 : * * * Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 85 / 170

87 Package flexmix: Cheating / 3 > cheatmix Call: stepflexmix(formula, weights = ~Freq, model = FLXMCmvbinary data = X, control = CONTROL, k = 2, nrep = 3) Cluster sizes: convergence after 314 iterations Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 86 / 170

88 Package flexmix: Cheating / 4 > summary(cheatmix) Call: stepflexmix(formula, weights = ~Freq, model = FLXMCmvbinary data = X, control = CONTROL, k = 2, nrep = 3) prior size post>0 ratio Comp Comp log Lik (df=9) AIC: BIC: > plot(cheatmix) Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 87 / 170

89 Package flexmix: Cheating / 5 Rootogram of posterior probabilities > 1e Comp. 1 Comp Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of distributions 88 / 170

90 Mixtures of regression models Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 89 / 170

91 Mixtures of regression models aka clusterwise regression Regression models are fitted for each component. Weighted ML estimation of linear and generalized linear models required. Heterogeneity between observations with respect to their regression parameters. Random effects can be estimated in a semi-parametric way. Possible models: Mixtures of linear regression models Mixtures of generalized linear regression models Mixtures of linear mixed regression models... Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 90 / 170

92 Mixtures of regression models / 2 The density of a mixture of regression models is given by K h(y x, w, Θ) = π k f k (y x, ϑ k ), k=1 where K π k = 1 and π k > 0 k. k=1 In the EM algorithm the weighted log-likelihoods of the regression models need to be maximized in the M-step. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 91 / 170

93 Generic identifiability Influencing factors: Component distribution (see mixtures of distributions). Model matrix. Repeated observations for each individual / partly labeled observations. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 92 / 170

94 Identifiability: Model matrix If only one observation per individual is available, the full rank of the model matrix is not sufficient for identifiability. Linear models: Mixtures of linear models with normally distributed errors are identifiable, if the number of components K is smaller than the minimum number of (feasible) hyper planes which are necessary to cover all covariate points (Hennig, 2000). Generalized linear models: Analogous condition for the linear predictor, additional constraints dependent on the distribution of the dependent variable (in particular for binomial and multinomial logit-models, see Grün & Leisch, 2008b). Coverage condition. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 93 / 170

95 Identifiability: Repetitions / labels At least one of the hyper planes in the coverage condition needs to cover all repeated / labeled observations where component membership is fixed. Violation of the coverage condition leads to: Intra-component label switching: If the labels are fixed in one covariate point according to some ordering constraint, then labels may switch in other covariate points for different parameterizations of the model. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 94 / 170

96 Illustration: Binomial logit models We consider a mixture density h(y x, Θ) = π 1 f B (y N, (1, x)β 1 ) + (1 π 1 )f B (y N, (1, x)β 2 ) with binomial logit models in the components and parameters π 1 = 0.5, β 1 = ( 2, 4), β 2 = (2, 4). Even if N 3 the mixture is not identifiable if there are only 2 covariate points available. The second solution is then given by π (2) 1 = 0.5, β (2) 1 = ( 2, 0), β (2) 2 = (2, 0). Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 95 / 170

97 Illustration: Binomial logit models / 2 Simulation design: Number of repetitions N {1, 10} in the same covariate point. Number of covariate points: #x {2, 5}, equidistantly spread across [0, 1]. 100 samples with 100 observations are drawn from the model for each combination of N and #x. Finite mixtures with 2 components are fitted to each sample. Solutions after imposing an ordering constraint on the intercept are reported. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 96 / 170

98 Illustration: Binomial logit models / 3 Relative frequency #x = 5 N = 1 #x = 2 N = 1 #x = 5 N = 10 #x = 2 N = x Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 97 / 170

99 Illustration: Binomial logit models / 4 5% 95% π #x = 5 N = 1 #x = 5 N = 10 x (Intercept) π #x = 2 N = 1 #x = 2 N = 10 x (Intercept) 5% 95% Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 98 / 170

100 Mixtures of linear regressions: Example GNP CO2 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 99 / 170

101 Mixtures of linear regressions: Example / 2 Example from Hurn, Justel and Robert (2003), data from package mixtools. 28 observations of countries on their gross national product per capita (GNP) and the estimated carbon dioxide emission per capita (CO2) in The relationship between the two variables is analyzed using a linear model. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 100 / 170

102 Mixtures of linear regressions: Example / GNP CO2 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 101 / 170

103 Mixtures of linear regressions: Example / 4 Call: lm(formula = CO2 ~ GNP, data = CO2data) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-05 GNP Residual standard error: 4.06 on 26 degrees of freedom Multiple R-squared: 0.047, Adjusted R-squared: 0.0 F-statistic: 1.28 on 1 and 26 DF, p-value: Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 102 / 170

104 Mixtures of linear regressions: Example / 5 It is assumed that the relationship between the two variables is not the same for all countries. A mixture of linear regression models is thus estimated. The model is given by h(co2 GNP, Θ) = K π k f N (CO2 α 0k + GNPα 1k, σk), 2 k=1 where α 0k is the component specific intercept, α 1k the component specific regression coefficient for GNP and σk 2 the component specific residual variance. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 103 / 170

105 Mixtures of linear regressions: Example / 6 Mixtures of linear regression models are fitted for 1 to 5 components. > data("co2data", package = "mixtools") > co2mix_step <- + stepflexmix(co2 ~ GNP, data = CO2data, k = 1:5) 1 : * * * 2 : * * * 3 : * * * 4 : * * * 5 : * * * Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 104 / 170

106 Mixtures of linear regressions: Example / 7 > co2mix_step Call: stepflexmix(co2 ~ GNP, data = CO2data, k = 1:5) iter converged k k0 loglik AIC BIC ICL 1 2 TRUE TRUE TRUE TRUE TRUE Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 105 / 170

107 Mixtures of linear regressions: Example / 8 The best model according to the BIC is selected. This has 2 components. > co2mix <- getmodel(co2mix_step) > co2mix Call: stepflexmix(co2 ~ GNP, data = CO2data, k = 2) Cluster sizes: convergence after 22 iterations Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 106 / 170

108 Mixtures of linear regressions: Example / 9 CO GNP Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 107 / 170

109 Mixtures of linear regressions: Example / 10 The estimated regression coefficients for each component together with the estimated standard errors are given by $Comp.1 Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 GNP $Comp.2 Estimate Std. Error z value Pr(> z ) (Intercept) GNP <2e-16 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 108 / 170

110 Mixtures of linear regressions: Example / 11 The countries assigned to the smaller component are: GNP CO2 country TUR MEX CAN AUS NOR USA Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 109 / 170

111 Mixtures of binomial logit models: Example Proportion of infected plants Number of aphids Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 110 / 170

112 Mixtures of binomial logit models: Example / 2 Data from package mixreg; taken from Boiteau et al. (1998) and Turner (2000). Observations from 51 independent experiments. The number of aphids which were released in a room with 12 infected and 69 healthy tobacco plants was varied. After 24 hours the previously healthy plants are checked for infection. For each experiment the number of aphids released (n.aphids) and the number of newly infected plants (n.inf) were recorded. It is assumed that the data comes from a mixture of 2 binomial logit-models. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 111 / 170

113 Mixtures of binomial logit models: Example / 3 The model is given by h(n.inf n.aphids, 69, Θ) = K π k f B (n.inf 69, θ k (n.aphids)), k=1 where f B ( N, θ) is the binomial distribution with number of trials parameter N and success probability θ. Furthermore logit(θ k ) = α 0k + α 1k n.aphids α 0k is the component specific intercept, α 1k the component specific regression coefficient for number of aphids. Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 112 / 170

114 Mixtures of binomial logit models: Example / 4 > aphidmix <- + stepflexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, + data = aphids, k = 2, nrep = 5, + model = FLXMRglm(family = "binomial")) 2 : * * * * * Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 113 / 170

115 Mixtures of binomial logit models: Example / 5 > aphidmix Call: stepflexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, data = aphids, model = FLXMRglm(family = "binomial"), k = 2, nrep = 5) Cluster sizes: convergence after 6 iterations Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 114 / 170

116 Mixtures of binomial logit models: Example / 6 > summary(aphidmix) Call: stepflexmix(cbind(n.inf, 69 - n.inf) ~ n.aphids, data = aphids, model = FLXMRglm(family = "binomial"), k = 2, nrep = 5) prior size post>0 ratio Comp Comp log Lik (df=5) AIC: BIC: Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 115 / 170

117 Mixtures of binomial logit models: Example / Proportion of infected plants Number of released aphids Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 116 / 170

118 Mixtures of binomial logit models: Example / 8 > aphidmix_ref <- refit(aphidmix) > summary(aphidmix_ref) $Comp.1 Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 n.aphids $Comp.2 Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16 n.aphids e-14 Bettina Grün c September 2017 Flexible Mixture Modeling and Model-Based Clustering in R Mixtures of,regression models 117 / 170

Tools and methods for model-based clustering in R

Tools and methods for model-based clustering in R Tools and methods for model-based clustering in R Bettina Grün Rennes 08 Cluster analysis The task of grouping a set of objects such that Objects in the same group are as similar as possible and Objects

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

were generated by a model and tries to model that we recover from the data then to clusters.

were generated by a model and tries to model that we recover from the data then to clusters. Model Based Clustering Model based clustering Model based clustering assumes that thedata were generated by a model and tries to recover the original model from the data. The model that we recover from

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

Package flexcwm. May 20, 2018

Package flexcwm. May 20, 2018 Type Package Title Flexible Cluster-Weighted Modeling Version 1.8 Date 2018-05-20 Author Mazza A., Punzo A., Ingrassia S. Maintainer Angelo Mazza Package flexcwm May 20, 2018 Description

More information

Beta Regression: Shaken, Stirred, Mixed, and Partitioned. Achim Zeileis, Francisco Cribari-Neto, Bettina Grün

Beta Regression: Shaken, Stirred, Mixed, and Partitioned. Achim Zeileis, Francisco Cribari-Neto, Bettina Grün Beta Regression: Shaken, Stirred, Mixed, and Partitioned Achim Zeileis, Francisco Cribari-Neto, Bettina Grün Overview Motivation Shaken or stirred: Single or double index beta regression for mean and/or

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Quantitative Biology II!

Quantitative Biology II! Quantitative Biology II! Lecture 3: Markov Chain Monte Carlo! March 9, 2015! 2! Plan for Today!! Introduction to Sampling!! Introduction to MCMC!! Metropolis Algorithm!! Metropolis-Hastings Algorithm!!

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week

Overview. Monte Carlo Methods. Statistics & Bayesian Inference Lecture 3. Situation At End Of Last Week Statistics & Bayesian Inference Lecture 3 Joe Zuntz Overview Overview & Motivation Metropolis Hastings Monte Carlo Methods Importance sampling Direct sampling Gibbs sampling Monte-Carlo Markov Chains Emcee

More information

Canopy Light: Synthesizing multiple data sources

Canopy Light: Synthesizing multiple data sources Canopy Light: Synthesizing multiple data sources Tree growth depends upon light (previous example, lab 7) Hard to measure how much light an ADULT tree receives Multiple sources of proxy data Exposed Canopy

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Latent Class Modeling as a Probabilistic Extension of K-Means Clustering

Latent Class Modeling as a Probabilistic Extension of K-Means Clustering Latent Class Modeling as a Probabilistic Extension of K-Means Clustering Latent Class Cluster Models According to Kaufman and Rousseeuw (1990), cluster analysis is "the classification of similar objects

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford Department of Engineering Science University of Oxford January 27, 2017 Many datasets consist of multiple heterogeneous subsets. Cluster analysis: Given an unlabelled data, want algorithms that automatically

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods - Physics 736 Experimental Methods in Nuclear-, Particle-, and Astrophysics - Statistical Methods - Karsten Heeger heeger@wisc.edu Course Schedule and Reading course website http://neutrino.physics.wisc.edu/teaching/phys736/

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Performance of Sequential Imputation Method in Multilevel Applications

Performance of Sequential Imputation Method in Multilevel Applications Section on Survey Research Methods JSM 9 Performance of Sequential Imputation Method in Multilevel Applications Enxu Zhao, Recai M. Yucel New York State Department of Health, 8 N. Pearl St., Albany, NY

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

MIXMOD : a software for model-based classification with continuous and categorical data

MIXMOD : a software for model-based classification with continuous and categorical data MIXMOD : a software for model-based classification with continuous and categorical data Christophe Biernacki, Gilles Celeux, Gérard Govaert, Florent Langrognet To cite this version: Christophe Biernacki,

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques

More information

Univariate Extreme Value Analysis. 1 Block Maxima. Practice problems using the extremes ( 2.0 5) package. 1. Pearson Type III distribution

Univariate Extreme Value Analysis. 1 Block Maxima. Practice problems using the extremes ( 2.0 5) package. 1. Pearson Type III distribution Univariate Extreme Value Analysis Practice problems using the extremes ( 2.0 5) package. 1 Block Maxima 1. Pearson Type III distribution (a) Simulate 100 maxima from samples of size 1000 from the gamma

More information

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc. Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization

More information

Case Study IV: Bayesian clustering of Alzheimer patients

Case Study IV: Bayesian clustering of Alzheimer patients Case Study IV: Bayesian clustering of Alzheimer patients Mike Wiper and Conchi Ausín Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School 2nd - 6th

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos Machine Learning for Computer Vision 1 22 October, 2012 MVA ENS Cachan Lecture 5: Introduction to generative models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question. Problem 1 Answer the following questions, and provide an explanation for each question. (5 pt) Can linear regression work when all X values are the same? When all Y values are the same? (5 pt) Can linear

More information

Analysis of Incomplete Multivariate Data

Analysis of Incomplete Multivariate Data Analysis of Incomplete Multivariate Data J. L. Schafer Department of Statistics The Pennsylvania State University USA CHAPMAN & HALL/CRC A CR.C Press Company Boca Raton London New York Washington, D.C.

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Neuroimage Processing

Neuroimage Processing Neuroimage Processing Instructor: Moo K. Chung mkchung@wisc.edu Lecture 2-3. General Linear Models (GLM) Voxel-based Morphometry (VBM) September 11, 2009 What is GLM The general linear model (GLM) is a

More information

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

DDP ANOVA Model. May 9, ddpanova-package... 1 ddpanova... 2 ddpsurvival... 5 post.pred... 7 post.sim DDP ANOVA with univariate response

DDP ANOVA Model. May 9, ddpanova-package... 1 ddpanova... 2 ddpsurvival... 5 post.pred... 7 post.sim DDP ANOVA with univariate response R topics documented: DDP ANOVA Model May 9, 2007 ddpanova-package..................................... 1 ddpanova.......................................... 2 ddpsurvival.........................................

More information

Bayesian Modelling with JAGS and R

Bayesian Modelling with JAGS and R Bayesian Modelling with JAGS and R Martyn Plummer International Agency for Research on Cancer Rencontres R, 3 July 2012 CRAN Task View Bayesian Inference The CRAN Task View Bayesian Inference is maintained

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information

Comparison of computational methods for high dimensional item factor analysis

Comparison of computational methods for high dimensional item factor analysis Comparison of computational methods for high dimensional item factor analysis Tihomir Asparouhov and Bengt Muthén November 14, 2012 Abstract In this article we conduct a simulation study to compare several

More information

Semiparametric Mixed Effecs with Hierarchical DP Mixture

Semiparametric Mixed Effecs with Hierarchical DP Mixture Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................

More information

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole

Cluster Analysis. Summer School on Geocomputation. 27 June July 2011 Vysoké Pole Cluster Analysis Summer School on Geocomputation 27 June 2011 2 July 2011 Vysoké Pole Lecture delivered by: doc. Mgr. Radoslav Harman, PhD. Faculty of Mathematics, Physics and Informatics Comenius University,

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING SECOND EDITION IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING ith Algorithms for ENVI/IDL Morton J. Canty с*' Q\ CRC Press Taylor &. Francis Group Boca Raton London New York CRC

More information

Growth Mixture Modeling with Measurement Selection

Growth Mixture Modeling with Measurement Selection Growth Mixture Modeling with Measurement Selection Abby Flynt Nema Dean arxiv:1710.06930v1 [stat.me] 18 Oct 2017 Assistant Professor Lecturer Department of Mathematics School of Mathematics and Statistics

More information

Bootstrapping Methods

Bootstrapping Methods Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques. . Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction

More information

Graphical Models & HMMs

Graphical Models & HMMs Graphical Models & HMMs Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Graphical Models

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information