Structural Equation Models with Small Samples: A Comparative Study of Four Approaches

Size: px
Start display at page:

Download "Structural Equation Models with Small Samples: A Comparative Study of Four Approaches"

Transcription

1 University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Public Access Theses and Dissertations from the College of Education and Human Sciences Education and Human Sciences, College of (CEHS) Structural Equation Models with Small Samples: A Comparative Study of Four Approaches Frances L. Chumney University of Nebraska-Lincoln, franchumney@hotmail.com Follow this and additional works at: Part of the Educational Psychology Commons Chumney, Frances L., "Structural Equation Models with Small Samples: A Comparative Study of Four Approaches" (203). Public Access Theses and Dissertations from the College of Education and Human Sciences This Article is brought to you for free and open access by the Education and Human Sciences, College of (CEHS) at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Public Access Theses and Dissertations from the College of Education and Human Sciences by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln.

2 STRUCTURAL EQUATION MODELS WITH SMALL SAMPLES: A COMPARATIVE STUDY OF FOUR APPROACHES by Frances L. Chumney A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy Major: Psychological Studies in Education Under the Supervision of Professor James A. Bovaird Lincoln, Nebraska July 5, 203

3 STRUCTURAL EQUATION MODELS WITH SMALL SAMPLES: A COMPARATIVE STUDY OF FOUR APPROACHES Frances L. Chumney, Ph.D. University of Nebraska, 203 Adviser: James A. Bovaird The purpose of this study was to evaluate the performance of estimation methods (Maximum Likelihood, Partial Least Squares, Generalized Structured Components Analysis, Markov Chain Monte Carlo) when applied to structural equation models with small samples. Trends in educational and social science research require scientists to investigate increasingly complex phenomena with regard for the contextual factors which influence their occurrence and change. These additional layers of exploration lead to complex hypotheses and require advanced analytic approaches such as structural equation modeling. A mismatch exists between analytic technique and the realities of applied research. Structural equation modeling requires large samples in general and even larger samples for complex models; for applied researchers, large samples are often difficult and even impossible to obtain. The unique contribution of this study is the simultaneous evaluation of these four estimation methods to determine the analytic conditions under which each method might be of value to researchers. A simulation study with a factorial design was conducted. The design and data features of interest were sample size (50, 300, 000), number of items per latent variable (3, 5, 7), degree of model misspecification (correctly specified model, misspecified model), nature of the relationships between items and latent variables in the measurement models (reflective,

4 formative), and the four estimation methods named. Rate of convergence, bias of goodness of fit and estimates of model parameters and standard errors, and accuracy of standard error estimates were evaluated to determine the ability of each estimation method to recover model estimates under each experimental condition. The results indicate that when applied to normally distributed data, Maximum Likelihood generally outperforms the other three estimation methods across experimental conditions. The present study used simulated data to evaluate the performance of four estimation methods when applied to relatively simple structural equation models with small samples and normally distributed data, but future research will need to evaluate the performance of these methods with more complex models and data that is not normally distributed.

5 Acknowledgements i I am lucky to have had an amazing support system throughout my education. From teachers and faculty advisers, to friends and family, my journey has been filled with caring, creative individuals who inspired and supported me along the way. I would like to thank my graduate adviser, Dr. James Bovaird, for always acting in my best interests, but not hovering too closely. I would also like to thank the other members of my supervisory committee, Drs. Charles Ansorge, Jolene Smyth, and Greg Welch, for their support and flexibility in the time before, leading up to, and during my whirlwind dissertation. I cannot name everyone who has been there for me along the way, but I am grateful to fellow graduate students who kept me laughing through the final countdown. I would not have made it to the end of this journey without the people who shared my excitement, endured my anxiety, tolerated my neurotic text messages, and validated all the emotions I had along the way whether or not they were justified. For helping me battle those 99 luftballons, I offer my heartfelt appreciation to my sister, Cynthia Estep, and fellow graduate student Natalie Koziol. This most recent journey has been a long and winding adventure. My children, Amelia and Alex, made the crazy trip nothing short of magical. Thank you both for the endless support, giggles, cuddles, and maniacal laughter. You are amazing, unique people who are never afraid to show your true colors. Finally, I wish to thank my mother, Sandra Chumney, for everything she has done to help me reach this goal. Because of you, Mom, I always knew my children were with someone who loved them, and that made it possible to focus on other responsibilities. I appreciate you for always reminding me to slow down, but never trying to break my stride.

6 ii Table of Contents CHAPTER I. INTRODUCTION... Present Study... 7 CHAPTER II. LITERATURE REVIEW... 9 Model Estimation... 0 Maximum Likelihood... Partial Least Squares... 3 Generalized Structured Component Analysis... 8 Markov Chain Monte Carlo Simulation Research Present Study Purpose Statement CHAPTER III. METHODS AND PROCEDURES... 3 Simulation Conditions... 3 Sample Size... 3 Number of Items Misspecification Latent Variable-Indicator Relationships Summary of Experimental Design Population Models Correct Specification, Reflective Indicators Correct Specification, Formative Indicators Misspecification, Reflective Indicators Misspecification, Formative Indicators Procedures Outcomes of Interest Convergence Rate Overall Model Fit Parameter Estimates and Standard Errors Analytic Approach CHAPTER IV. RESULTS Analytic Procedure Results by Outcome... 5 Model Convergence... 5 Goodness of Fit... 53

7 iii Bias of Measurement Model Parameter Estimates Bias of Structural Model Parameter Estimates Mean Differences of Standard Error Estimates for Measurement Models Mean Differences of Standard Error Estimates for Structural Models Accuracy of Standard Error Estimates for Measurement Models Accuracy of Standard Error Estimates for Structural Models Summary Goodness of Fit Bias of Measurement Model Parameter Estimates Bias of Structural Model Parameter Estimates Mean Differences of Standard Error Estimates for Measurement Models Mean Differences of Standard Error Estimates for Structural Models Accuracy of Standard Error Estimates for Measurement Models Accuracy of Standard Error Estimates for Structural Models CHAPTER V. DISCUSSION Research Questions Research Question Research Question Research Question Research Question General Discussion... 0 Covariance- vs. Component- Based Approaches... 3 Frequentist vs. Bayesian Approaches... 3 Limitations & Future Research... 4 Implications and Conclusions... 7 References... 9 APPENDIX A: RESULTS OF FIVE-FACTOR MULTIVARIATE ANALYSIS... 33

8 iv Table of Tables Table. Number of successfully converged replications by Table 2. Mean Goodness of Fit bias by estimation method and experimental condition. 54 Table 3. Mean bias of measurement model parameter estimates by Table 4. Mean bias of structural model parameter estimates by Table 5. Mean average differences for measurement model standard errors by... 7 Table 6. Mean average differences for structural model standard error estimates by Table 7. Mean accuracy of measurement model estimates by Table 8. Mean accuracy of structural model estimates by Table 9. Summary of top performing estimation methods per experimental condition Table A.. Results of multivariate tests for five-factor MANOVA Table A.2. Tests of between-subjects effects for Goodness of Fit estimates Table A.3. Tests of between-subjects effects for bias of measurement model parameter estimates Table A.4. Tests of between-subjects effects for bias of structural model parameter estimates Table A.5. Tests of between-subjects effects of MAD for measurement model standard error estimates Table A.6. Tests of between-subjects effects of MAD for structural model standard error estimates Table A.7. Tests of between-subjects effects for accuracy of measurement model estimates Table A.8. Tests of between-subjects effects for accuracy of structural model estimates... 37

9 v Table of Figures Figure. Population model for reflective indicators and correct model specification Figure 2. Population mode for formative indicators (reflective relationships with low reliability) and correct model specification Figure 3. Population model for reflective indicators and model misspecification Figure 4. Population model for formative indicators (reflective relationships with low reliability) and model misspecification Figure 5. Analytic model for all conditions Figure 6. Number of successfully converged replications by condition Figure 7. Bias of Goodness of Fit Estimates Figure 8. Bias of Measurement Model Parameter Estimates Figure 9. Bias of Structural Model Parameter Estimates Figure 0. MAD of Measurement Model Standard Error Estimates Figure. MAD of Structural Model Standard Error Estimates Figure 2. Accuracy of Measurement Model Estimates Figure 3. Accuracy of Structural Model Estimates

10 CHAPTER I. INTRODUCTION In response to increasing expectations from funding agencies, trends in educational research require scientists to investigate increasingly complex phenomena with regard for the contexts in which they occur. These additional layers of exploration and understanding lead to increasingly complex hypotheses and require advanced statistical techniques. Structural equation modeling (SEM) is a common analytic approach for dealing with complex systems of information. Despite their flexibility (Zhu, Walter, Rosenbaum, Russell, & Raina, 2006), traditional SEM methods require large samples in general, and even larger samples for estimating complex models. For applied researchers, large samples are often difficult and sometimes impossible to obtain. Consider, for example, a recent mail survey of elementary-level teachers which had as its purpose the evaluation of professional development experiences related to four specific areas of academic content and instructional decision-making (i.e., science, reading, math, data-based decision making; Glover, Nugent, Sheridan, Bovaird, & Chumney, 203). In addition to the typical response rate challenges posed by mail surveys, this particular study was further limited in that fewer than half of all respondents had participated in professional development directly tied to one of the four areas of interest. One goal of the research was to evaluate differences in those professional development experiences between teachers serving at schools located in rural vs. nonrural geographic settings. It was necessary for the researchers to break down the sample of participants who had participated in an appropriately-focused professional development experience into smaller subgroups based on the content area focus of their

11 2 training and geographic locale. As a result, a typically satisfactory sample size quickly diminished. A second scenario addresses a context in which large samples are not possible regardless of the resources available to potentially increase sample size or target a specific population precisely. Educational policy makers are often interested in evaluating student academic performance within a single state for the purposes of allocating resources to public schools, comparing the quality of education across school districts/regions, and/or evaluating the performance of teachers and academic administrators. Despite having access to every child in every school district, such research often struggles with the issue of small samples because the population of students within districts particularly rural districts is often quite small. In the case of individual teacher evaluation, this sometimes means that data for only a handful of students can be collected. Situations such as these are not uncommon in fields such as education and the social sciences. Unfortunately, traditional SEM techniques are not equipped to handle these types of challenges. The most common estimation method used with SEM is maximum likelihood (ML; Hoyle, 2000). ML has been studied across myriad contexts and data conditions, and its limitations are well documented. One context in which ML does not perform well is in the presence of small samples (Kline, 20). Due to this limitation, it is imperative that researchers investigate the utility of alternative approaches to recovering parameter estimates (e.g., partial least squares (PLS), generalized structural components analysis (GSCA), Markov Chain Monte Carlo (MCMC)). If the strengths and weaknesses of each

12 3 alternative method in the context of small sample research were more fully understood, researchers would be better equipped to make informed decisions with regard to selecting appropriate estimation methods and interpreting results. As the field of methodology has advanced, alternative estimation methods have developed and include generalized least squares, weighted least squares, PLS, GSCA, and MCMC approaches. Unfortunately, the performance of these alternatives is not well understood, and their performance with real data is often difficult to predict (Henseler, 202; Hwang, Ho, & Lee, 200; Hwang Malhotra, Kim, Tomiuk, & Hong, 200). Although estimation methods other than those described here have been developed for use with SEMs when the assumptions of ML are violated (e.g., robust ML, weighted least squares), it is not feasible to compare and evaluate the performance of all such alternatives in a single study. Thus, the present study will focus solely on the differential performance of ML, PLS, GSCA, and MCMC methods because they represent diverse and promising approaches for addressing the problem of estimating SEMs with small samples. Approaches to SEM estimation may be described as covariance-based (e.g., ML) and component-based (e.g., PLS, GSCA), or as frequentist (e.g., ML, PLS, GSCA) and Bayesian (e.g., MCMC). Covariance-based approaches to SEM are designed for model evaluation and validation, while component-based approaches are intended for score computation and prediction (Tenenhaus, 2008). Simply put, the primary distinction between covariance- and component-based estimation is that the former is suited to model testing and the latter is better suited to explaining variance and making predictions

13 4 (Hulland, Ryan, & Rayner, 200; Tenenhaus, 2008). Frequentist approaches identify parameter values represented by observed data (which may or may not consist of true values), while Bayesian approaches describe parameter estimates as abstract representations of relationships based on observed data. In addition to these differences of purpose and perspective, ML, PLS, GSCA, and MCMC also differ in their robustness to varying data conditions, including sample size, number of items, model misspecification, and type of indicator-latent variable relationship (i.e., reflective vs. formative measurement models). Inherent to traditional estimation methods (i.e., ML) is the expectation of large samples. Specifically, the parameter estimates produced by ML are based on asymptotic theory, which implies large samples (Tanaka, 987). Therefore, as sample size decreases, methods such as ML do not perform as well (e.g., Lee & Song, 2004). Proponents of PLS and GSCA often promote it as performing well in instances of small samples (e.g., Chin & Newsted, 999; Hulland et al., 200; Hwang, Ho, et al., 200; Hwang & Takane, 2004), but both methods have been found to perform inconsistently at times (e.g., Henseler, 202; Hwang, Ho, et al., 200; Hwang, Malhotra, et al., 200), which indicates that more work is needed to understand the interactions between sample size and other design features. Similarly, MCMC implemented as an estimation method within the framework of Bayesian analysis is often viewed as a viable alternative to ML because its sampling procedures make estimation with small samples more feasible, but this approach also does not perform consistently across all combinations of models and sample sizes (e.g., Lee & Song, 2004).

14 5 Just as the performance of estimation methods is expected to improve with increased sample size, estimation methods are expected to produce more reliable parameter estimates as the number of items per latent factor increases (e.g., Boomsma, 982; Velicer & Fava, 998). As illustrated by Marsh, Hau, Balla, and Grayson (998), however, increasing the number of items does not necessarily improve the ability of an estimation method to recover parameter estimates. The relationship between quality of parameter estimates and number of items per latent variable has not been studied at length in the context of PLS or GSCA. In both substantive and methodological research endeavors that utilize SEM, inferences and conclusions are the result of the model used. Although it is difficult to know whether or not theoretical models are specified correctly in applied research, simulation-based research has illustrated the impact of misspecification on parameter recovery across estimation methods (e.g., Asparouhov & Muthén, 200; Hwang, Malhotra, et al., 200). The extent to which estimates are impacted by the misspecification of the model depends on design features such as sample size (e.g., Henseler, 200; Tanaka, 987) and overall complexity of the model (e.g., Tanaka, 987). Whether the relationships between observed variables and latent constructs are formative or reflective in nature is as important to methodological study as it is to theorydriven, applied research. In the context of SEM, latent variables can be modeled as the cause of those observed values (reflective; Bollen & Lennox, 99), or as a representation of the combined values of those observed values (formative; Curtis & Jackson, 962). SEMs should be specified to reflect the correct theoretical relationships,

15 6 but estimation methods sometimes vary in their performance depending on the type of relationship specified. Until recent years, it was held that SEMs including formative measurement models were inappropriate for traditional ML approaches altogether (Chin, 998; Ringle, Götz, Wetzels, & Wilson, 2009). More recently, it has been found that ML is likely to overestimate parameters in formative measurement models and underestimate parameters in reflective models (Ringle et al.) when the sample is not large. In contrast to ML, Ringle et al. found that PLS is likely to underestimate parameters in formative models and overestimate parameters in reflective models. The flexibility of GSCA to handle either reflective or formative items has been documented, but the claim is generally based on theoretically-driven expectations of the method without the benefit of empirical evidence (e.g., Hwang & Takane, 2004). Although some work exists comparing ML to MCMC in a Bayesian framework (e.g., Browne & Draper, 2006) and PLS to GSCA (e.g., Tenenhaus, 2008), the four methods have only been compared once. Chumney (202) investigated the application of PLS, GSCA, and MCMC to a substantive data set for the purpose of validating the parameter estimates recovered using ML with a small sample and multiple groups. Few consistent patterns of relative bias (i.e., a single estimation method consistently overestimating or underestimating path coefficients, relative to those recovered by ML) of parameter estimates emerged when PLS, GSCA, and MCMC were compared to the ML results. This work identified a gap in the existing literature, as no explanation for the varying performance of the methods was identified. Further, because these data were part of an applied research project, the true population parameters for the specified model

16 7 were unknown, and any attempt at explaining the inconsistencies in the performance of the four estimation methods based on those findings would constitute nothing more than conjecture. This is but one example of the extent to which PLS, GSCA, and MCMC approaches are not understood, as researchers are sometimes unable to correctly predict the performance of these methods even in the context of simulation research. For the purpose of contributing to the current understanding of these methods, this study will constitute a systematic evaluation of ML, PLS, GSCA, and MCMC under varying data conditions common to applied research. Present Study The present study is a first attempt to compare the relative performance of ML, PLS, GSCA, and MCMC simultaneously under sub-ideal data conditions. Researchers have previously compared different combinations of these approaches under some data conditions, but this is the first known attempt to examine the four methods in a single study. The overarching goal of this study is to understand the effects of sample size, number of items per latent variable, model misspecification, and the nature of the latent variable-indicator relationships on the performance of ML, PLS, GSCA, and MCMC. To guide the process by which this goal will be reached, four specific research questions are posed:. To what extent does sample size affect the relative ability of the estimation methods to estimate global model fit and accurately recover model parameters (i.e., item loadings for the measurement model and regression coefficients in the structural model) and their standard errors? It is hypothesized that ML, PLS, and

17 8 MCMC will perform better with the larger sample size, regardless of whether the model is correctly specified. It is further hypothesized that ML will produce more biased parameter estimates and less efficient standard error estimates as sample size decreases, compared to PLS, GSCA, and MCMC. 2. To what extent does the number of items per latent variable affect the relative ability of the estimation methods to estimate global model fit and accurately recover model parameters and their standard errors? It is hypothesized that all four estimation methods will perform better with fewer items per latent variable. 3. To what extent does model misspecification (i.e., exclusion of cross-loadings that exist in the population model) affect the relative ability of the estimation methods to estimate global model fit and accurately recover model parameters and their standard errors? It is hypothesized that GSCA will produce more efficient estimates of standard errors than ML or PLS when the model is misspecified. It is also hypothesized that PLS will perform better under conditions of correct specification compared to misspecification. 4. To what extent does the nature of the latent variable-indicator relationship affect the relative ability of the estimation methods to estimate global model fit and accurately recover model parameters and their standard errors?

18 9 CHAPTER II. LITERATURE REVIEW Structural equation modeling is a method for examining a set of relationships and assigning a quantitative value to each based on the covariances among the variables. These quantitative values, referred to as parameter estimates, are numeric approximations of the strength and direction of inter-variable relationships that might be observed in the population (Bollen, 989; Kline, 20). A common approach across myriad disciplines (e.g., education, psychology, sociology, economics, marketing research; Monecke & Leisch, 202), SEM is essentially the concurrent calculation of multiple regression coefficients for a system in which predictor and criterion variables are expected to be interrelated in potentially complex ways (e.g., some variables are both criterion and predictor variables, some criterion variables have multiple predictors, etc.; Bollen, 989; Haenlein & Kaplan, 2004; Kline, 20). SEM has the goal of identifying a single set of parameter estimates (i.e., path coefficients, error terms, etc.) that minimizes the total difference between the covariances implied by the model and those observed in the population. SEM is generally comprised of a measurement model(s) and a structural model (Bollen, 989; Kline, 20). The measurement model (sometimes referred to as the outer model; Ringle et al., 2009) connects each latent variable to the observed variables with which it is associated, thereby specifying the synthesis of multiple variables into composite (and sometimes latent) variables. The structural model (also known as the inner model; Ringle et al., 2009) connects the composite (latent) variables within a model to each other. A computational procedure, often referred to as an estimation method, is necessary to

19 0 estimate the values of the parameters that describe those relationships. In the SEM context, both the predictor and outcome variables may be latent or observed (Lee & Xia, 2008). Model Estimation The process of specifying a model for a given data set and obtaining estimates of the parameter values is called model estimation. Simply put, an estimation method is the method used to reach a set of estimates for a model, an estimator is a particular statistic of interest used to approximate a population parameter (e.g., mean, standard error, path coefficient), and an estimate is the actual value produced for an estimator by the given method of estimation (Kline, 20). Several estimation methods and variations of those methods have been developed and applied to SEMs, including maximum likelihood (ML), and ML with robust standard errors (MLR; Muthén & Muthén, ), generalized least squares (GLS), and weighted least squares (WLS). However, all of these methods are known to perform poorly under some conditions. Specifically, ML and WLS typically fail to produce accurate parameter estimates when applied to small samples (e.g., ML; Hoogland & Boomsma, 998; Hu, Bentler, & Kano, 992; Olsson, Foss, Troye, & Howell, 2000); the more precise estimates produced by MLR are generally restricted to estimates of standard errors instead of path coefficients; GLS is generally insensitive to model misspecification, which leads to overly confident fit statistics (i.e., inflated Type I error; Olsson, Troye, & Howell, 999). In response to the limitations of these and other similar estimation methods, additional estimation approaches have been applied to the estimation

20 of SEMs, including partial least squares (PLS; Wold, 975), generalized structured component analysis (GSCA; Hwang & Takane, 2004; Kline, 20), and Markov Chain Monte Carlo (MCMC; Hastings, 970). These three estimation methods and ML are the focus of the present study. Maximum Likelihood ML is an estimation method that attempts to minimize the differences between observed data and an imposed model, thereby maximizing the likelihood that the observed data come from a population consistent with the implied model (Kline, 20). ML is a full-information method that uses an iterative process to obtain the best possible estimates before reaching the convergence criterion. In this context, the best possible estimates are those that lead to minimal (or no) differences between estimates produced by subsequent iterations, thereby optimizing the fit function. The fit function of an estimation method is the statistical criterion the method aims to minimize; in ML, the fit function is the difference in covariance structures between the observed data and the population data specified by the model being estimated. The ML fit function is represented as ( ) ( ) ( ( )) ( ) () where ( ) is the covariance structure, are estimated parameters, tr is the trace of a matrix, S is the covariance matrix observed in the data, is the inverse of a matrix, p is the number of observed indicators for the endogenous latent factors, and q is the number of observed indicators for the exogenous latent factors (Bollen, 989).

21 2 ML is one of the most common and widely used methods for estimating SEMs, is available within SEM software, and yields accurate parameter estimates when used correctly (Kline, 20). Other advantages of ML are that it is scale free (standardized parameter estimates will not change when a variable is transformed linearly) and scale invariant (the fit function is independent of the scale of response data). Inherent to the use of ML are its assumptions, which include multivariate normality, complete data, and large samples (Bollen, 989; Kline, 20). ML is typically the preferred method of estimation within the SEM context because it yields unbiased, consistent, and efficient parameter estimators when its assumptions are satisfied (Bollen, 989). Despite the availability of literature addressing the importance of meeting these assumptions, the consequences of violating them are not fully understood by all researchers who utilize the method. Thus, ML is often applied in situations where these assumptions are violated, and the result can be biased (i.e., consistently overestimated or underestimated) parameter estimates and standard errors, even when the model is correctly specified (Gerbing & Anderson, 985; Hwang et al., 200). On the one hand, ML is a powerful tool when used correctly, and some research has shown that it is robust to some violations of its assumptions (e.g., Babakus, Ferguson, & Jöreskog, 987; Maas & Hox, 2004). On the other hand, the fairly stringent assumptions imposed by ML often make it an inappropriate estimation method when used in the context of real-world data characterized by small samples, unknown population models, and other sub-ideal conditions. Specifically, ML relies on asymptotic theory, which implies large samples and assumes correct model specification,

22 3 independent observations, independent exogenous variables (i.e., values obtained for exogenous variables are independent), and that the conditional distribution of scores for endogenous variables in the population is multivariate normal (Kline, 20). Speaking generally, a small sample is problematic in the context of ML because the estimates and fit tests it produces are not asymptotically true (Lee & Song, 2004). This means that without large samples, the validity of statistical inferences may be rightly questioned. ML is known to be robust to minor violations of its assumptions, but the extent of that robustness varies with the data and model. Partial Least Squares PLS is a component- (variance-) based approach to modeling developed by Wold (975) as an alternative to covariance-based estimation methods. Compared to traditional approaches to SEM (i.e., ML), PLS is a more flexible approach that aims to maximize the amount of variance in the dependent variables that is explained by the independent variables (Haenlein & Kaplan, 2004; Wold, 975). PLS is particularly well suited for small samples (Chin & Newsted, 999; Haenlein & Kaplan, 2004; Hulland et al., 200), instances in which large numbers of indicators are used to measure latent constructs (Chin & Newsted, 999; Haenlein & Kaplan, 2004), cases in which formative indicators serve as the primary source of direct measurement (Fornell & Bookstein, 982; MacCallum & Browne, 993), situations in which data are characterized by skewed distributions (Bagozzi & Yi, 994), and structural model misspecification (Cassell, Hackl, & Westlund, 999).

23 4 Whereas covariance-based approaches to SEM estimate model parameters first, PLS first estimates the latent variable values as the product of linear combinations of indicators (Haenlin & Kaplan, 2004). Another important distinction between ML and PLS in the context of applied research is that ML is likely to produce estimates that are more statistically accurate, but PLS estimates are often more accurate in the prediction of future values (Vinzi, Trinchera, & Amato, 200). Both listwise deletion and mean imputation are viable options for handling missing data in most PLS software packages (Temme, Kreis, & Hildebrandt, 2006; Tenenhaus, Vinzi, Chatelin, & Lauro, 2005). PLS estimates are obtained as the result of an iterative five-step process (Henseler, 200; Tenenhaus, 2008) during which subparts of the overall model are estimated sequentially. It is the simplicity of the approach of sequential regression analyses that allows PLS to be used with small samples; because parameters are estimated individually or in blocks, the complexities of the model are not taken into account simultaneously so larger samples are not necessary (e.g., Reinartz, Haenlein, & Henseler, 2009). The five steps included in the process of PLS during which both the measurement (outer) and structural (inner) model parameter values are estimated are completed as follows: Step : Each latent variable is grouped with its indicators to create blocks of variables and relationships. Step 2: Outer approximations of the latent variable scores are calculated as linear combinations of the indicators associated with each latent variable, (2)

24 5 where η is a latent variable, x - x p are manifest variables associated with that latent variable (regardless of whether the model specifies this portion of measurement to be reflective or formative), and w - w p are weights assigned to those indicators. Step 3: Inner weights (w) are calculated to reflect how strongly a latent variable relates to other latent variables in the model; three methods are available for the estimation of inner weights: centroid, factor weighting, and path weighting (Henseler, 200; Monecke & Leisch, 202; Tenenhaus, 2008). The centroid method estimates the inner weights based on the signs of the correlations between a latent variable and its adjacent latent variables. The factor weighting method estimates the inner weights based on combinations of correlations between a latent variable and its adjacent latent variables. The path weighting method estimates inner weights based on the directions of the arrows linking latent variables in the model. Step 4: Inner approximations of latent variable scores are calculated as linear combinations of the outer approximations of the latent variable scores (values obtained in step 2). Step 5: Estimations of outer weights are calculated based on the relationships between each latent variable and its indicators. In the case of reflective indicators, outer weights are calculated as the covariance between the indicators and the inner approximations of latent variable scores obtained in step 4 (this method is known as Mode A). In the case of formative indicators,

25 6 outer weights are calculated as a function of the regression weights obtained from OLS regressions of the inner approximations of latent variable scores (step 4) on the indicators associated with the latent variable (Mode B). Steps 2-5 are iterative until the change in the outer weight estimates meets a change criterion, at which time step 2 is repeated and latent variable scores for all latent variables are obtained and individual case values are calculated as (3) where w - w p are weights obtained during step 3, η are latent endogenous variable estimates (step 4), and ξ are latent exogenous variables estimates (step 4). PLS is often viewed as more appropriate for exploratory work than for confirmatory modeling, as its resulting coefficients are generally consistent but biased compared to other estimation methods (Cassell et al., 999; Lohmöller, 989). Specifically, in applications of data characterized by both a small sample and a small number of indicators per latent variable, Dijkstra (983) reported that PLS underestimated the correlations between latent variables (the structural model) and overestimated factor loadings (the measurement model). The primary advantage of PLS over covariance-based estimation methods such as ML is that it relies on ordinary least squares (OLS) regression to obtain parameter estimates (Jöreskog & Wold, 982; Wold, 982) and bootstrap resampling to create standard errors (Monecke & Leisch, 202), thus relieving the challenge of strong distributional assumptions (Bagozzi & Yi, 994; Fornell & Bookstein, 982; Hwang & Takane, 2004; Wold, 982). PLS is especially flexible, as it can be applied to all data

26 7 regardless of measurement scale (Haenlein & Kaplan, 2004). Cassel et al. (999) demonstrated the robustness of PLS to models that include skewed or multicollinear indicators and some minor structural model misspecification. An additional advantage of PLS is that it is not known to converge to improper solutions (Fornell & Bookstein, 982; Hanafi, 2007). The primary disadvantage of PLS is that it does not work toward the minimization of a global optimization criterion (i.e., a fit function; McDonald, 996), and because of this, there is no meaningful way to define how PLS models are optimized. Thus, an overall goodness of fit statistic is not available for PLS models, which makes it difficult to evaluate the performance of this estimation method (Hwang & Takane, 2004; McDonald, 996). Tenenhaus et al. (2005) proposed a method for evaluating PLS model fit based on the communality of the measurement model estimates and the redundancy of the estimates of the structural model (discussed later). A modified approach to communality and redundancy has also been developed (presented in Tenenhaus et al., 2005), but is beyond the scope of this paper. An alternative (and much more common) method for evaluating the performance of PLS has been to focus on the recovery of regression coefficients within the structural model (e.g., Vinzi et al., 200). Despite its lack of assumptions and being further developed to handle more complex modeling issues in recent years, PLS is not understood well enough for researchers to correctly and consistently predict its performance. For instance, Hwang, Malhotra, et al. (200) reported that PLS produces more accurate standard error estimates than ML under conditions of model misspecification, but that ML outperforms PLS in

27 8 this regard when the model is correctly specified. Hwang et al. reported that PLS performed as well as GSCA, but only when the model was specified incorrectly to exclude cross-loadings; when the model was specified correctly and included crossloadings, PLS did not perform as well as either ML or GSCA. However, under conditions of correct model specification, PLS produced unbiased estimates of standard errors associated with the parameters of the measurement model, but the standard error estimates for the structural model were found to be biased. These are important findings, as they violated the researchers' expectations and demonstrated the need for additional work using PLS so that the contexts in which it performs reliably might be better understood. Generalized Structured Component Analysis GSCA was developed as an alternative to covariance-based methods for SEM and in response to the primary disadvantage of PLS. Specifically, GSCA is a componentbased estimation method that was developed in such a way that an overall measure of model fit is available (Hwang & Takane, 2004). The general estimation process for GSCA is the same as PLS, except that GSCA utilizes a fit function which aims to maximize the average amount of explained variance for linear composites of latent variables (Henseler, 202) and estimates the measurement and structural models simultaneously. Despite its relative newness to the field (introduced in 2004), GSCA has been extended to accommodate higher-order components (Hwang & Takane, 2004), fuzzy clustering (Hwang, DeSarbo, & Takane, 2007) and multicollinearity (regularized model; Hwang, 2009).

28 9 The GSCA approach is made up of a method for specifying models, an optimization criterion, and an algorithm used to calculate parameter estimates (Henseler, 202). GSCA combines the observed variables values to form linear composites under the assumption that the observed data have been standardized (Hwang & Takane, 2004). Latent variables are calculated as (4) where η is a vector of latent variables for respondent i, W is a matrix of component weights associated with the observed variables, and z is a vector of responses for respondent i. These composites are further defined in terms of the relationships between the observed variables and the latent variables. When the model includes formative constructs, GSCA assumes no measurement error in the observed data and the observed values are simply combined in a linear fashion. In the case of reflective constructs, each observed variable is transformed into its own composite, which includes the unit weight. The GSCA measurement model is calculated as (5) where C is a loading matrix for the relationships between the latent and observed variables, and ε is a vector of residuals associated with respondent i's observed variable responses. The GSCA structural model is calculated as (6) where B is a matrix of path coefficients describing the relationships between the latent variables, and ξ is a vector of residuals associated with respondent i's latent variable scores.

29 20 The algorithm at work in GSCA is an alternating least squares (ALS; de Leeuw, Young, & Takane, 976) approach that involves an iterative process by which A (a matrix of the relationship between component loadings and their observed variables) is updated for fixed points V and W (matrices of component weights for the endogenous and exogenous variables, respectively), and then V and W are updated for fixed point A. The optimization criterion of GSCA attempts to minimize the sum of squares of all residuals (Hwang et al., 200); the fit function can be specified as ( ) ( ) (7) where S is the observed correlation matrix, Z is the data matrix composed of the number of observations number of observed variables, W is a matrix of measurement weights, and A is a matrix of component loadings and path coefficients (Henseler, 202). The advantages of GSCA are similar to those of PLS, in that it is not known to converge to improper solutions, produces unique component score estimates, is not burdened by strict distributional assumptions (Henseler, 202; Hwang & Takane, 2004), and outperforms ML when models are misspecified (Hwang, Malhotra, et al., 200). Like PLS, GSCA utilizes bootstrap resampling to estimate standard errors of parameter estimates. An additional advantage of GSCA is that it appears to perform well when applied to both large and small samples (Hwang, Malhotra, et al., 200; Hwang & Takane, 2004). Compared to PLS, GSCA has the further advantage of being able to estimate multiple group models with equality constraints across groups (Hwang & Takane, 2004).

30 2 A noteworthy disadvantage of GSCA is that, despite its positive performance under conditions of model misspecification, GSCA sometimes is outperformed by ML when the model is correctly specified, even when the sample is small (e.g., Henseler, 200). The primary disadvantage of GSCA is that, as a relatively new estimation method, extensive research on its flexibility has not been conducted. For instance, a method for applying GSCA to models that include interactions among latent variables was only introduced in 200 (Hwang, Malhotra, et al., 200), and an application of GSCA to (fuzzy) clustered response sets was introduced in 2007 (Hwang et al., 2007), but neither application has been examined comprehensively. GSCA is a compromise between principal components analysis and ordinary least squares regression. Like PLS, GSCA utilizes a component-based approach to SEM estimation (Tenenhaus, 2008), but in GSCA, the components used for analysis are linear combinations of the model s observed variables. Compared to both ML and PLS, GSCA has been found to be more robust to model misspecification, and to produce more precise estimates of standard errors regardless of whether the model is correctly specified (Hwang, Malhotra, et al., 200). Because it does not impose distributional assumptions, GSCA is often touted as a viable alternative to ML estimation with small samples. This is supported by Hwang, Malhotra, et al., who reported that GSCA provided more accurate standard error estimates than either ML or PLS regardless of sample size. However, there is still a lot unknown about GSCA and its performance under varying data conditions, including small samples (e.g., Henseler, 202; Hwang, Malhotra, et al., 200).

31 22 Markov Chain Monte Carlo A Markov chain is a series (or chain) of samplings from a distribution for which the probability of each successive sample is dependent on previously sampled values, given the most recent value (Carlin & Chib, 995; Geyer, 992; Lynch, 2007). This iterative process can be expressed ( ) ( ) where (i = 0,...,N) represents the number of iterations. MCMC is the use of Markov chain sampling as the method for estimating parameter values (Geyer, 992; Lynch, 2007). In the context of SEM, the MCMC algorithm can be applied to either frequentist or Bayesian approaches (Cowles & Carlin, 996). For the purposes of the present study, MCMC is discussed here only as it functions within the context of Bayesian model estimation. Bayesian estimation differs from frequentist approaches (e.g., ML, PLS, GSCA) with regard to what it is that is being estimated. Whereas frequentist estimation methods view parameters as constants and work to identify the estimates for those parameters that produce the best model-data fit, Bayesian methods view parameters as random variables and work to combine the likelihood of the data with prior distributions to form posterior distributions from which to draw plausible values for the parameter estimates (Muthén, 200; Muthén & Asparouhov, 20). In other words, the frequentist perspective holds that true population parameters exist but can only be determined through data, and the Bayesian perspective posits that population parameters are abstract explanations of the relationships between data.

32 23 The basic process of the Bayesian approach to model estimation is to create a prior distribution of possible values from which to sample a value, combine that sample with the likelihood of the value given the data to create a posterior distribution, and then use the posterior distribution to update the prior distribution (Lynch, 2007; Muthén, 200). This iterative process is completed for each parameter being estimated. The Bayesian approach is based on Bayes' Theorem, which can be represented as ( ) ( ) ( ) ( ) (8) where A and B are events with joint probability expressed as a function of the conditional and marginal probabilities ( ) ( ) ( ) ( ) ( ) (9) The initial prior distribution can be created using either informative or noninformative values. In the instance of informative priors, the researchers' expected values for the parameter estimates (based on theory or past research) are used as the basis for the prior distributions (Lynch, 2007). The posterior distribution, then, is dependent on these starting values. Using informative priors can be advantageous, as they can reduce the amount of time required for the model to converge and result in more accurate estimates (Lee & Song, 2004), as such estimates are expected to be closer to the final answer than a random start value. In the instance of noninformative priors, the researcher may have little or no basis for determining expected values for the parameter estimates. In such cases, random values in the prior distribution may be left equal to zero or chosen such that the prior distribution reflects a uniform distribution. In this case, the prior distribution has little impact on the posterior distribution (Lynch, 2007). A prior

33 24 distribution with non-informative priors is sometimes referred to as a beta distribution, and has the probability density function of ( ) ( ) ( ) ( ) ( ) (0) where K is the proportion of events which occur to maximize the probability of attaining a given outcome, α and β represents prior values, and K, α, and β are random variables (Lynch, 2007). Regardless of the amount of information used to create a prior distribution, the posterior distribution takes the form ( ) ( ) ( ) ( ) () where p(θ x) is the posterior distribution, p(x θ) is the likelihood of the data (or, the data given the parameters), p(θ) is a prior distribution, and p(x) is the observed data. Each parameter estimate obtained via this approach is then a summary of the posterior distribution, typically in the form of its mean, median, or mode (Muthén & Asparouhov, 20). In the case of MCMC, the Bayesian estimate represents the mean of the posterior distribution (Lee & Song, 2004). Regardless of whether informative or non-informative priors are specified by the researchers, MCMC attempts to work from starting values more appropriate to the data than random values. To do this, a portion of the draws in each MCMC chain are discarded and the values at the end of that portion of the chain are used as starting values for obtaining estimates. This process is known as the burn-in phase, and can be lengthened to improve starting values (e.g., Meyn & Tweedie, 993). An advantage of MCMC estimation over ML is that the Markov chain sampling approach does not rely on the assumptions of asymptotic theory, which means that a large sample size is not necessary for drawing valid statistical inferences (Lee & Song, 2004;

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Latent variable transformation using monotonic B-splines in PLS Path Modeling

Latent variable transformation using monotonic B-splines in PLS Path Modeling Latent variable transformation using monotonic B-splines in PLS Path Modeling E. Jakobowicz CEDRIC, Conservatoire National des Arts et Métiers, 9 rue Saint Martin, 754 Paris Cedex 3, France EDF R&D, avenue

More information

Bayesian Inference for Sample Surveys

Bayesian Inference for Sample Surveys Bayesian Inference for Sample Surveys Trivellore Raghunathan (Raghu) Director, Survey Research Center Professor of Biostatistics University of Michigan Distinctive features of survey inference 1. Primary

More information

Performance of Latent Growth Curve Models with Binary Variables

Performance of Latent Growth Curve Models with Binary Variables Performance of Latent Growth Curve Models with Binary Variables Jason T. Newsom & Nicholas A. Smith Department of Psychology Portland State University 1 Goal Examine estimation of latent growth curve models

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Study Guide. Module 1. Key Terms

Study Guide. Module 1. Key Terms Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation

More information

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400

Statistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400 Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

Multiple Imputation with Mplus

Multiple Imputation with Mplus Multiple Imputation with Mplus Tihomir Asparouhov and Bengt Muthén Version 2 September 29, 2010 1 1 Introduction Conducting multiple imputation (MI) can sometimes be quite intricate. In this note we provide

More information

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Examples: Missing Data Modeling And Bayesian Analysis CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS Mplus provides estimation of models with missing data using both frequentist and Bayesian

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Bayesian Model Averaging over Directed Acyclic Graphs With Implications for Prediction in Structural Equation Modeling

Bayesian Model Averaging over Directed Acyclic Graphs With Implications for Prediction in Structural Equation Modeling ing over Directed Acyclic Graphs With Implications for Prediction in ing David Kaplan Department of Educational Psychology Case April 13th, 2015 University of Nebraska-Lincoln 1 / 41 ing Case This work

More information

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg

Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Discussion on Bayesian Model Selection and Parameter Estimation in Extragalactic Astronomy by Martin Weinberg Phil Gregory Physics and Astronomy Univ. of British Columbia Introduction Martin Weinberg reported

More information

Everything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness.

Everything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness. /Users/astacbf/Desktop/Assessing smartpls (engelsk).docx 1/8 Assessing smartpls Everything taken from (Hair, Hult et al. 017) but some formulas taken elswere or created by Erik Mønness. Run PLS algorithm,

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

Description Remarks and examples References Also see

Description Remarks and examples References Also see Title stata.com intro 4 Substantive concepts Description Remarks and examples References Also see Description The structural equation modeling way of describing models is deceptively simple. It is deceptive

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1

STATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International Part A: Comparison with FIML in the case of normal data. Stephen du Toit Multivariate data

More information

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users Practical Considerations for WinBUGS Users Kate Cowles, Ph.D. Department of Statistics and Actuarial Science University of Iowa 22S:138 Lecture 12 Oct. 3, 2003 Issues in MCMC use for Bayesian model fitting

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Principal Component and Factor Analysis

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

7. Collinearity and Model Selection

7. Collinearity and Model Selection Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among

More information

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR

More information

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models Bengt Muth en University of California, Los Angeles Tihomir Asparouhov Muth en & Muth en Mplus

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa Ronald H. Heck 1 In this handout, we will address a number of issues regarding missing data. It is often the case that the weakest point of a study is the quality of the data that can be brought to bear

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup Random Variables: Y i =(Y i1,...,y ip ) 0 =(Y i,obs, Y i,miss ) 0 R i =(R i1,...,r ip ) 0 ( 1

More information

STATISTICS (STAT) 200 Level Courses Registration Restrictions: STAT 250: Required Prerequisites: not Schedule Type: Mason Core: STAT 346:

STATISTICS (STAT) 200 Level Courses Registration Restrictions: STAT 250: Required Prerequisites: not Schedule Type: Mason Core: STAT 346: Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool SMA6304 M2 ---Factory Planning and scheduling Lecture Discrete Event of Manufacturing Systems Simulation Sivakumar AI Lecture: 12 copyright 2002 Sivakumar 1 Simulation Simulation - A Predictive Tool Next

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

An imputation approach for analyzing mixed-mode surveys

An imputation approach for analyzing mixed-mode surveys An imputation approach for analyzing mixed-mode surveys Jae-kwang Kim 1 Iowa State University June 4, 2013 1 Joint work with S. Park and S. Kim Ouline Introduction Proposed Methodology Application to Private

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods

Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods Minimum sample size estimation in PLS-SEM: The inverse square root and gamma-exponential methods Ned Kock Pierre Hadaya Full reference: Kock, N., & Hadaya, P. (2018). Minimum sample size estimation in

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

An Alternative Estimation Procedure For Partial Least Squares Path Modeling

An Alternative Estimation Procedure For Partial Least Squares Path Modeling An Alternative Estimation Procedure For Partial Least Squares Path Modeling Heungsun Hwang, Yoshio Takane, Arthur Tenenhaus To cite this version: Heungsun Hwang, Yoshio Takane, Arthur Tenenhaus. An Alternative

More information

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM)

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM) Estimation of Unknown Parameters in ynamic Models Using the Method of Simulated Moments (MSM) Abstract: We introduce the Method of Simulated Moments (MSM) for estimating unknown parameters in dynamic models.

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1

DETAILED CONTENTS. About the Editor About the Contributors PART I. GUIDE 1 DETAILED CONTENTS Preface About the Editor About the Contributors xiii xv xvii PART I. GUIDE 1 1. Fundamentals of Hierarchical Linear and Multilevel Modeling 3 Introduction 3 Why Use Linear Mixed/Hierarchical

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc. Expectation-Maximization Methods in Population Analysis Robert J. Bauer, Ph.D. ICON plc. 1 Objective The objective of this tutorial is to briefly describe the statistical basis of Expectation-Maximization

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation

Bayesian Estimation for Skew Normal Distributions Using Data Augmentation The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 323-333 Bayesian Estimation for Skew Normal Distributions Using Data Augmentation Hea-Jung Kim 1) Abstract In this paper, we develop a MCMC

More information

Markov Chain Monte Carlo (part 1)

Markov Chain Monte Carlo (part 1) Markov Chain Monte Carlo (part 1) Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2018 Depending on the book that you select for

More information

DEPARTMENT OF STATISTICS

DEPARTMENT OF STATISTICS Department of Statistics 1 DEPARTMENT OF STATISTICS Office in Statistics Building, Room 102 (970) 491-5269 or (970) 491-6546 stat.colostate.edu (http://www.stat.colostate.edu) Don Estep, Department Chair

More information

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN

Latent Curve Models. A Structural Equation Perspective WILEY- INTERSCIENΠKENNETH A. BOLLEN Latent Curve Models A Structural Equation Perspective KENNETH A. BOLLEN University of North Carolina Department of Sociology Chapel Hill, North Carolina PATRICK J. CURRAN University of North Carolina Department

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Single missing data imputation in PLS-SEM. Ned Kock

Single missing data imputation in PLS-SEM. Ned Kock Single imputation in PLS-SEM Ned Kock December 2014 ScriptWarp Systems Laredo, Texas USA 1 Single imputation in PLS-SEM Ned Kock Full reference: Kock, N. (2014). Single imputation in PLS-SEM. Laredo, TX:

More information

Design of Experiments

Design of Experiments Seite 1 von 1 Design of Experiments Module Overview In this module, you learn how to create design matrices, screen factors, and perform regression analysis and Monte Carlo simulation using Mathcad. Objectives

More information

Should the Word Survey Be Avoided in Invitation Messaging?

Should the Word Survey Be Avoided in  Invitation Messaging? ACT Research & Policy Issue Brief 2016 Should the Word Survey Be Avoided in Email Invitation Messaging? Raeal Moore, PhD Introduction The wording of email invitations requesting respondents participation

More information

GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS

GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS GETTING STARTED WITH THE STUDENT EDITION OF LISREL 8.51 FOR WINDOWS Gerhard Mels, Ph.D. mels@ssicentral.com Senior Programmer Scientific Software International, Inc. 1. Introduction The Student Edition

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using

More information

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2

LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 LISREL 10.1 RELEASE NOTES 2 1 BACKGROUND 2 2 MULTIPLE GROUP ANALYSES USING A SINGLE DATA FILE 2 3 MODELS FOR GROUPED- AND DISCRETE-TIME SURVIVAL DATA 5 4 MODELS FOR ORDINAL OUTCOMES AND THE PROPORTIONAL

More information

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm. Assignment No. 4 Title: SD Module- Data Science with R Program R (2) C (4) V (2) T (2) Total (10) Dated Sign Data analysis case study using R for readily available data set using any one machine learning

More information

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can: IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical

More information

Comparison of computational methods for high dimensional item factor analysis

Comparison of computational methods for high dimensional item factor analysis Comparison of computational methods for high dimensional item factor analysis Tihomir Asparouhov and Bengt Muthén November 14, 2012 Abstract In this article we conduct a simulation study to compare several

More information

The Performance of Multiple Imputation for Likert-type Items with Missing Data

The Performance of Multiple Imputation for Likert-type Items with Missing Data Journal of Modern Applied Statistical Methods Volume 9 Issue 1 Article 8 5-1-2010 The Performance of Multiple Imputation for Likert-type Items with Missing Data Walter Leite University of Florida, Walter.Leite@coe.ufl.edu

More information

PRI Workshop Introduction to AMOS

PRI Workshop Introduction to AMOS PRI Workshop Introduction to AMOS Krissy Zeiser Pennsylvania State University klz24@pop.psu.edu 2-pm /3/2008 Setting up the Dataset Missing values should be recoded in another program (preferably with

More information

Object-Oriented Programming and Laboratory of Simulation Development

Object-Oriented Programming and Laboratory of Simulation Development Object-Oriented Programming and Laboratory of Simulation Development Marco Valente LEM, Pisa and University of L Aquila January, 2008 Outline Goal: show major features of LSD and their methodological motivations

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Single missing data imputation in PLS-based structural equation modeling

Single missing data imputation in PLS-based structural equation modeling Single imputation in PLS-based structural equation modeling Ned Kock Full reference: Kock (2018). Single imputation in PLS-based structural equation modeling. Journal of Modern Applied Statistical Methods,

More information

Multidimensional Latent Regression

Multidimensional Latent Regression Multidimensional Latent Regression Ray Adams and Margaret Wu, 29 August 2010 In tutorial seven, we illustrated how ConQuest can be used to fit multidimensional item response models; and in tutorial five,

More information

Missing Data Analysis with SPSS

Missing Data Analysis with SPSS Missing Data Analysis with SPSS Meng-Ting Lo (lo.194@osu.edu) Department of Educational Studies Quantitative Research, Evaluation and Measurement Program (QREM) Research Methodology Center (RMC) Outline

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information