Approximation methods and quadrature points in PROC NLMIXED: A simulation study using structured latent curve models

Size: px

Start display at page:

Download "Approximation methods and quadrature points in PROC NLMIXED: A simulation study using structured latent curve models"

Griselda Pierce
5 years ago
Views:

1 Approximation methods and quadrature points in PROC NLMIXED: A simulation study using structured latent curve models Nathan Smith ABSTRACT Shelley A. Blozis, Ph.D. University of California Davis Davis, CA, USA Structured latent curve models, an alternative to nonlinear mixed models, allow researchers to account for individual variability in longitudinal responses through the use of individual latent curves that may vary from the mean response curve. These models can be fit in SAS using PROC NLMIXED, a procedure where a variety of approximation methods and options are available to the savvy user, and the choice of which may affect both parameter estimates and run times. Previous research conducted by the authors using empirical data showed that the combination of approximation procedures with a variety of estimating options produced different results depending on the combination of conditions. These conditions included the 4 different integral approximations (GAUSS, FIRO, HARDY, and ISAMP) available in NLMIXED. Within each estimating procedure the effect of the number of quadrature points (q-point) will be tested when used adaptively or non-adaptively on parameter estimation and processing speed. As an additional condition, the feasibility of providing good/poor/no starting values will be evaluated across estimating and q-point options. This paper will present the results of the follow-up data simulation and will comment on the differences between the findings of the experimental study and the simulation study. The data simulation will be conducted in SAS and the syntax used to set up the simulations will be presented and discussed in the context of the results. Outcome variables of interest for the different combination of conditions were computational demand, accuracy in recovering parameters, and convergence criteria. INTRODUCTION The purpose of a longitudinal data analysis is often to characterize and evaluate change in a response across multiple occasions of measurement. A nonlinear mixed model allows use of a nonlinear function to characterize responses for a given data set. The model admits both fixed and random effects, thus allowing for the coefficients of the function to be different across individuals. These models are applicable to a variety of observed processes that typically follow a nonlinear trajectory such as measures from learning and memory studies and from which it is important to ascertain individual variability. Maximum likelihood estimation (MLE) is a widely used method for estimating the parameters of a statistical model. MLE selects as parameter estimates values that maximize a likelihood function. This is to say that the procedure selects parameter values that are most likely to have produced the sample data under the proposed model. SAS PROC NLMIXED, a procedure available in SAS, allows users to fit nonlinear mixed models by maximizing an approximate integrated likelihood over the random effects. The procedure offers a selection from four different integral approximations: Gaussian quadrature, Hardy quadrature, importance sampling, and a first-order Taylor series expansion, in addition to several optimization algorithms. Parameter estimates and standard errors for a given model and sample data may vary based on the applied method. The theory and computational techniques of the NLMIXED procedure are based mainly on Pinheiro and Bates (1995). In the current paper, the accuracy and computational efficiency of different estimating procedures for structured latent curve models are compared. Pinheiro and Bates used both empirical and simulated data to evaluate variations in approximation techniques for a logistic mixed model with a single random effect and a first-order compartment model with two random effects. Based on the comparisons, their results suggest that many methods can be used to obtain accurate parameters estimates but that adaptive Gaussian quadrature should be used for its balance of computational efficiency and accuracy. Lesaffre and Spiessens (2001) subsequently reviewed the effect of the number of quadrature points on parameter estimates using both adaptive and non-adaptive Gaussian quadrature and suggested that parameter estimates can depend on the number and positioning of quadrature points (q-points) used to evaluate the likelihood integral. This paper follows up a study done by the authors that used empirical data and examined the different integral approximation methods and quadrature point options in PROC NLMIXED (Smith & Blozis, 2014). Data from a learning study were used to evaluate the approximation methods under a variety of different conditions. The data were performance scores on a quantitative procedural learning task. Study participants learned a set of declarative 1

2 rules for identifying characteristics of visual stimuli presented in a series. The task was memory focused, and the response variable was the time it took the individual to complete the quantitative learning task, measured across multiple occasions. The total sample size for this data was 288 individuals with complete data at all 12 measurement occasions. Figure 1A represents a randomly selected sample of 20 individuals, and Figure 1B shows the mean response of all participants (N=288) across the 12 measurement occasions. Figure 1A: Twenty Randomly Sampled Individuals Figure 1B: Sample Average QTR Score The sample average follows a nonlinear trajectory and approaches an asymptote greater than zero. Although many of the individual scores seem to follow this pattern as well, there is a great deal of variability in individual responses. A structured latent curve model (SLCM) allows researchers to account for individual variability in longitudinal responses through the use of individual latent curves that may vary from the mean response curve. The data were evaluated under the 3 integral approximation procedures (FIRO, GUASS, and ISAMP) amenable to the SLCM and at several different q-point conditions (1,5,10,20,30). Additionally, the GAUSS and ISAMP options were tested adaptively (OPTION=AD) and non-adaptively (OPTION=NOAD). The effect of providing starting values on parameter estimation was combined with measures of computational efficiency and model fit for each testable variation. The purpose of this was to identify sensitive areas in likelihood approximation for structured latent curve models and to propose estimating procedures that consider both accuracy and computational efficiency. The general finding was that with good starting values the adaptive conditions were the least variable, and at low q- point conditions, the most efficient. Increasing the number of quadrature points did not affect parameter estimates when used adaptively, but unless good starting values were provided, the adaptive procedures were unable to converge on any estimates. Non-adaptive methods were extremely variable at low q-point conditions and did not stabilize until at least 20 q-points were used. This is important to note because in some situations a non-adaptive method will converge on a solution where an adaptive method will not; this may happen because the adaptive methods use information from the data and from the starting values provided to locate the best point on the x-axis at which to evaluate the integral. The general recommendation that emerged from these findings was to use adaptive quadrature whenever possible, but if non-adaptive methods were required, to use a sufficiently high number of q- points to ensure parameter stability. For a more complete report on the results please see Smith and Blozis(2014). The goal of the current paper was to conduct a simulation study that allowed for more generalized conclusions about approximation methods for SLCMs. Data were simulated to have non-linear characteristics and closely resemble the empirical data used in the first study. Three integral approximations were tested (GUASS, ISAMP, and FIRO) adaptively and non-adaptively under a range of q-point conditions. For this simulation, the effect of providing poor or no starting values in comparison to good starting values was not evaluated. A total of 250 datasets, each with a sample size of N=250, were generated using PROC IML and using code adapted from Wicklin (2013). Review of the Estimating Procedures Available in PROC NLMIXED and the Use of the Number Quadrature Points Within PROC NLMIXED the four available integral estimating procedures are Gaussian quadrature (GAUSS), a firstorder Taylor series expansion (FIRO), Importance Sampling (ISAMP), and Hardy quadrature (HARDY). Hardy quadrature is not used in this study because the model to be fitted to the data includes more than one random effect, and this is the limit of the Hardy procedure. These estimating procedures maximize an approximation to the likelihood function integrated over the random effects. There are also a variety of optimization algorithms available to the researcher, although in this paper, only the default, a dual quasi-newton algorithm, is considered. The default estimating procedure in PROC NLMIXED is an adaptive Gaussian quadrature. This approximation method makes use of predetermined abscissas, or set locations on the x-axis, to evaluate the integral. In this sense, 2

3 Gaussian quadrature can be thought of as a type of Monte Carlo integration where the points at which the integral is evaluated are pre-set and where random effect vectors are generated from a normal distribution. This deterministic quality means that given the same input, the same output will always be generated if the number of quadrature points remains constant. The weights are fixed beforehand in the Gaussian quadrature rule, and the scale is centered at zero, unless adaptive Gaussian quadrature is used, in which case the scale is centered at the conditional nodes of the estimated random effects (Pinheiro & Bates, 1995). It is important to note that the actual number of points on the grid used in the estimation is the square of the number of quadrature points. The number of quadrature points can be specified in PROC NLMIXED using the QPOINT= option. If none are specified then the procedure will automatically select an appropriate number for the model, not exceeding 31 quadrature points. Pinero and Bates (1995) have suggested that the greatest gain in precision in estimating parameters comes from the centering on conditional nodes that takes place in adaptive Gaussian quadrature and that relatively little is gained when increasing the number of q- points past 1. An advantage of non-adaptive Gaussian quadrature, which can be specified using the NOAD option in PROC NLMIXED, is that it does not require the posterior modes of the random effects be calculated for each iteration (Pinero & Bates, 1995), and thus is less computationally demanding and does not depend on the procedure being able to correctly locate the posterior modes of the random effects. Importance Sampling (ISAMP) is performed using Monte Carlo integration and is more flexible than Gaussian quadrature because it allows the user to specify the distribution that created the data, which in turn makes the samples that are drawn more likely to be relevant to the integral. The area that is most likely to be important to the integral is then oversampled. Because the user may specify the distribution that the data comes from, this technique can be more appropriate for applications to non-normal data, provided an appropriate distribution is chosen. The approximation obtained from importance sampling is similar to that obtained from Gaussian quadrature, the main difference being that importance sampling uses a pseudo-random mechanism to select quadrature points, whereas Gaussian quadrature uses predetermined weights and abscissas. Stochastic variability associated with different importance samples make it difficult to compare small changes in parameter values because each importance sample will yield slightly different results. In PROC NLMIXED the user may specify a SEED= number that will allow for comparisons across different conditions. Alternatively the first-order method of Beal and Sheiner (1982,1988) is available. This method takes a first-order Taylor series expansion (FIRO) around the empirical best linear unbiased prediction of the random effects. This method requires that the user specify a NORMAL distribution in the RANDOM statement. As will be shown later, this method generally converges on an estimate quickly and can often be used to generate starting values for parameter estimates when none are available. The HARDY option makes use of Hardy quadrature that uses the adaptive trapezoidal rule. This option is only available for one dimensional integrals, or models that specify only one random effect. METHODS Model A structured latent curve model (SLCM) uses random coefficients combined with a factor matrix of basis functions to yield individual latent curves. The mean response is assumed to follow a target function for which the basis functions of the factor matrix are based (Blozis, 2004). In a SLCM the columns of the factor matrix define the shape of the common curve from which individuals are allowed to vary. A SLCM is an extension of a latent curve model (LCM), which conversely does not allow for individual variation in model parameters that enter the target function in a nonlinear way. The two models, a SLCM and a LCM, share a similar structure, however. Given a response variabley i, a latent curve model (similar to a structured latent curve model) may be expressed as y i = Λ i η i + ε i where Λ i is a factor loading matrix whose elements are the basis functions evaluated according to time: λ 11 λ 12 λ 1J λ Λ i = 21 λ 22 λ 2J [ λ Ti1 λ Ti2 λ TiJ ] Each row corresponds to a measurement occasion and each column corresponds to the elements that define the shape of the curve. The columns of the Λ i matrix are often referred to as basis functions. The factor η i is a vector of random coefficients that denotes how much an individual latent response curve depends on the basis curve. Although the parameters that define the factor matrix may enter the functions nonlinearly, the random coefficients enter the individual-level model linearly (Blozis, 2004). Measurement error (ε i ) is assumed to be independent of the random coefficients (η i ) and to be normally distributed. This allows for the estimation of a nonlinear function using standard maximum likelihood estimation techniques. The data are assumed to be multivariate normal. 3

4 The function that is believed to represent the mean response of the data is often referred to as the target function (Browne, 1993). The target function is the assumption proposed by the researcher that the mean response of the data will follow some predetermined trajectory. Several possible target functions could be used to generate the Λ matrix for the structured latent curve model. Consequently, the target function generates a nonlinear curve which the mean response across time points is expected to follow under a given theory. The curve follows the smooth function f(θ, t) where θ is a vector (θ 1,, θ J ) of unknown fixed parameters and t = (1,2,, T) is a vector of discrete measurement occasions. When evaluated at each discrete time point the target function f T (θ, T) = μ T. The same function used in Blozis (2004), that is, a negatively accelerated exponential function decreasing monotonically to an asymptote greater than zero, was used to create the factor matrix for the current study. This function is as follows: f(θ, t) = θ 1 (θ 1 θ 2 )exp [ θ 3 (t 1)] where t refers to the measurement occasion and is centered to the first learning trial. The first-order partial derivatives of this function make up the columns of the factor matrix Λ i, resulting in a 12x3 matrix. θ 1 is the potential response time or asymptote, θ 2 is the initial response time or intercept, and θ 3 is the population initial rate of change. Let f j (θ, t) represent the first partial derivative of the target function f(θ, t) where f f(θ, t) j (θ, t) = θ j Given the target function specified above based on three θ parameters and 12 measurements occasions, the Λ matrix for the structured latent curve would then be: f 1 (θ, 1) f 2 (θ, 1) f 3 (θ, 1) f Λ = 1 (θ, 2) f 2 (θ, 2) f 3 (θ, 2) [ f 1 (θ, 12) f 2 (θ, 12) f 3 (θ, 12)] This expression can then accommodate a population curve that follows the form of the target function f(θ, t) and individual level variation in the vector η i which represents the individual i s dependence on the j th basis function (Blozis, 2004). Data Simulation Although these procedures may behave differently at small sample sizes, the current study only considers one sample size of N=250. This sample size was chosen because it was close to the size of the sample that was tested using the empirical data (N=228) and because we wanted to avoid any complications that might arise from a smaller sample size. Future studies should consider the application of these methods to smaller samples. The first step of the data simulation was to generate the η i matrix, or the vector of random coefficients that are different for every individual, across all sets (250). The following code, adapted from Wicklin (2013), simulates multivariate normal data given the mean and covariance of the variables to be tested. proc iml; N = 250; NumSamples = 250; /* specify population mean and covariance */ Mean = {8.6,16.3,.64}; Cov = { , , }; call randseed(4322); X = RandNormal(N*NumSamples, Mean, Cov); /*Create data set from simulated data*/ ID = colvec(repeat(t(1:numsamples), 1, N)); /* 1,1,1,...,2,2,2,...,3,3,3,... */ Z = ID X; create raneff from Z[c={"ID" "t1" "t2" "t3"}]; append from Z; close raneff; quit; 4

time point to generate one complete data set with 250 sets of 250 individuals at 12 time points.

5 This generates a 62500x4 matrix with all the individual level information across all replications. ID is an indicator variable that specifies the replication number Once the ID variable was selected out, the resulting η i matrix was then combined with the Λ matrix of basis functions (12x3) at each time point to generate one complete data set with 250 sets of 250 individuals at 12 time points. In other words, each row of the η i matrix was multiplied by the Λ matrix to generate a 1x12 vector for each individual across all individuals and replications. The resulting data set had 62,500 individuals in 250 sets. At this point, error variance was also added to each individual. Figure 2, included below, shows the simulated scores of a random sample of 24 individuals from the data. Figure 2: Random sample of simulated data across occasions Procedure For each of the estimating procedures considered (GAUSS, ISAMP, FIRO) and holding constant other conditions, the effect of the number of quadrature points (1, 2, 3, 4, 5, 10, 20, 30) was tested when used adaptively or nonadaptively on parameter estimation and processing speed. In summary, 2 (adaptive vs non-adaptive quadrature points) X 8 (1, 2, 3, 4, 5, 10, 20, and 30 Q-Points) X 2 (possible estimating procedures)+(firo) = 33 variations were evaluated and compared based on parameter estimates, processing speed, and model fit. Each of these 33 conditions were run across all 250 replications. The starting values provided to the procedure were the same values that were used to simulate the data and so can be considered good starting values. This was also implemented so that a bias could be computed. The SAS code used to test these conditions is included below PROC NLMIXED MAXITER=10000 GCONV=1e-10 METHOD=(INPUT) QPOINTS=(INPUT) NOAD(OPTIONAL); basis1 = 1-exp(-t3*(time-1)); basis2 = exp(-t3*(time-1)); 5

6 basis3 = (t1-t2)*(time-1)*exp(-t3*(time-1)); PARMS t1 = 8.6 t2 = 16.3 t3 =.64 s2e1 =.93 s2n1 = 4.32 cn2n1 = 7.26 s2n2 = 34.9 cn3n1 = cn3n2 =.48 s2n3 =.15; mean = basis1*(n1+t1) + basis2*(n2+t2) + basis3*(n3); var = s2e1; MODEL qrts ~ NORMAL(mean,var); RANDOM n1 n2 n3 ~ NORMAL([0,0,0], [s2n1, cn2n1,s2n2, cn3n1,cn3n2,s2n3]) SUBJECT=subid; by set; RUN; Note: Because convergence can sometimes be an issue, especially when considering several variations of quadrature points and estimation methods, the user can manually set the maximum number of function evaluations higher than the default using the MAXITER= option. Similarly, the GCONV= option is a convergence criterion for the relative gradient. This means that if the relative difference between two consecutive gradient values is less than the specified value, the convergence criteria will have been met and the procedure stops the iterative process. Results The analyses that were conducted previously on the empirical learning data suggested that parameter estimates may be unstable for lower quadrature point conditions, but only when non-adaptive approximation methods are used. In the case of the simulated data and subsequent analyses, the same pattern was observed. Figure 3 plots the different mean values of all three of the main fixed effects across quadrature point conditions and approximation methods. θ1 is the asymptote parameter, θ2 is the intercept, and θ3 is the rate of change parameter. The greatest change in the mean parameter estimates occurs between 1 and 5 quadrature points. It should be noted that the adaptive importance sampling, adaptive Gaussian quadrature, and FIRO methods yielded the exact same parameter estimates and standard errors, regardless of the number of quadrature points. This finding is in line with the conclusions of Pinheiro and Bates (1995) who also found that increasing the number of quadrature points past 1 for an adaptive approximation did not increase accuracy. What is interesting about the behavior of the parameter estimates in the study is that although adaptive methods showed greater stability across q-point conditions, they did not recover the true parameter estimates better than the non-adaptive methods when a larger number of q-point was used. Figure 3: Average values for fixed effects across quadrature point conditions 6

7 Averages of the fixed effects estimates were not consistent across all parameters. Although only three parameters are included here for illustrative purposes, an additional 7 parameters were estimated by the NLMIXED procedure. These parameters included the variances and covariances of the three random effects, as well as the error term at the first level of the model. The non-adaptive Gaussian quadrature approximation for QPOINTS=1 was unable to converge on any solutions for the covariance and variance estimates across all replications and instead took the starting values provided to the procedure as parameter estimates. Non-adaptive importance sampling was the least accurate and most biased of all the methods at low quadrature point conditions. Table 1A-D: Results from simulated data for fixed effects at QPOINTS=1, 5, 10, 30 Tables 1A-D include the means, relative bias, and standard deviations of the fixed effects estimates across all replications. The adaptive methods always recovered the nonlinear rate parameter estimate of θ3 with the least bias. An interesting trend to be noticed from these tables is that the adaptive estimates did not improve across QPOINT conditions, whereas the non-adaptive methods clearly did, showing less variability and bias with an increase of quadrature points. This demonstrates again the relation between quadrature points and accurate parameter estimation and suggests that when using an approximation method non-adaptively, one should be wary of using a small number of quadrature points. The question then arises: if an increase in the number of quadrature points can improve estimates and decrease bias, how many should be used? The answer to this question often comes down to the varying computational efficiency of the different procedures. Some of the differences in computational efficiency become more apparent when considering the time necessary to complete the different procedures adaptively and non-adaptively. Tables 2A and 2B reference the model fit and time used by each procedure at each level of tested quadrature point options for adaptive and non-adaptive methods, respectively. The run times are in minutes and are approximations for how long it took to complete an analysis on one simulated dataset. Because the analyses were looped through all 250 replications, the log file only displays the time 7

8 it took to process all replications. The run times displayed in Tables 2A and 2B are the total time divided by 250. This can be informative because the researcher who uses PROC NLMIXED to run an analysis is likely to run the procedure many fewer times than they were run for this simulation. Table 2A: Run times and fit for adaptive methods Table 2A: Run times and fit for adaptive methods The FIRO approximation method was among the quickest to produce estimates and provided parameter estimates that were comparable to the adaptive ISAMP and GAUSS approximations. As the number of quadrature points for the ISAMP and GAUSS approximations increased, so did the run times. In the case of the GAUSS approximation only.11 minutes (6.6 seconds) were required to generate parameter estimates for one adaptive quadrature point, but nearly 5.5 hours were needed to evaluate the same function with 30 adaptive quadrature points. In this simulation the increase in quadrature points was unnecessary given that parameter estimates and fit did not improve past one adaptive quadrature point. It is interesting to note, however, that although the parameter estimates do not change 8

9 past 1 q-point for the adaptive importance sampling, the fit statistics both improve and display less variability as more quadrature points are used. For the non-adaptive conditions, fit improves and run times increase as the number of quadrature points goes up. Run times were also shorter for non-adaptive methods. This was unsurprising because adaptive methods begin the iterative process of approximating the likelihood function at a location on the x-axis that is informed by characteristics of the data. The most dramatic improvement in fit is observed in the importance sampling approximation, and the most dramatic increase in run time is observed with the Gaussian quadrature approximation. A 30 q-point nonadaptive Gaussian approximation took around 1.5 hours to complete, where a corresponding important sampling approximation with 30 q-points took only 12 seconds to complete. Conclusions The structured latent curve model is not technically a fully nonlinear mixed model. While the shape of the mean curve is defined to be nonlinear, the random effects enter this model linearly. This facilitates the use of standard maximum likelihood techniques but may be the reason that the parameter estimates observed in this study are so similar under the adaptive condition. Because the model is easier to estimate and is conditionally linear, the procedure may be converging on a solution instead of just an approximation. It would be instructive in future studies to use these same experimental conditions and data using a different, fully nonlinear mixed model. The FIRO estimates were exactly the same as the ISAMP and GAUSS estimates across all conditions and were converged upon quicker than under any other condition, including when no starting values were provided. Much of the variability in the parameter estimates was observed when used non-adaptive quadrature points. The use of nonadaptive quadrature points took less CPU time but required more quadrature to reach the same level of accuracy. When using the NOAD option the user should be aware that parameter estimates may vary widely based on the number of quadrature points and approximation method used. In this vein, the authors suggest that if the NOAD option is to be used, a greater number of quadrature points also should be considered to ensure accuracy. A potential solution to long run times for the Gaussian quadrature approximations is to use importance sampling but with a very large number of quadrature points. The main difference between these two approximation methods is the mechanism by which the locations of the quadrature points are chosen, and it seems to follow that when using a semi-random method of selecting quadrature points(isamp) a larger number would provide better coverage. At 30 q- points the run time for non-adaptive GAUSS was approximately 1.5 hours, where 30 q-points of non-adaptive ISAMP took only 22 seconds. A further study would investigate the relationship between very high numbers of q- points( ) for ISAMP approximations and determine whether it was more computationally efficient than the GAUSS procedures while also remaining highly accurate. This particular data set had no missing values, which is uncommon in many longitudinal studies involving measurements taken on humans, and which further complicates the approximation procedures. A future study would include a number of data simulations which would allow for a closer examination of some of the findings of this study. It would be interesting and informative to re-evaluate these procedures with data that is less ideal and that may complicate the approximation process. A next step to take would be to include other starting value conditions where poor to average starting values were compared against the good/no starting value conditions. It is possible that we observed such consistency in parameter estimates because the starting values that were used were already very close to the values of the parameter estimates. 9

10 References 1. Blozis, S. (2004). Structured latent curve models for the study of change in multivariate repeated measures. Psychological Methods, 9:3, Beal, S.L., & Sheiner, L.B. (1982). Estimating population kinetics. CRC Crit. Rev. Biomed. Eng., 8, Beal, S.L., & Sheiner, L.B. (1988). Heteroskedastic nonlinear regression. Technometrics, 30, Davidian, W., & Giltinan, D. M. (1995). Nonlinear models for repeated measurement data. London: Chapman & Hall 5. Lesaffre, E., & Spiessens, B. (2001). On the number of quadrature points in a logistic random effects model: an example. Applied Statistics, 50, Pinheiro, J.C., & Bates, D.M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, Smith, N., & Blozis, B. (2014) Options in estimating nonlinear mixed models: quadrature points and approximation methods. Paper presented at Western Users of SAS Software 2015: Analytics and Statistics, San Jose, California, 3-5 September. Proceedings from WUSS 2014 conference proceedings archive: 8. Wicklin, R. (2013) Simulating Data with SAS. SAS institute Inc., Cary, NC, USA. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Nathan Smith University of California, Davis Department of Psychology nabsmith@ucdavis.edu Shelley A. Blozis, Ph.D. University of California, Davis Department of Psychology sablozis@ucdavis.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization