Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding

Similar documents
STATISTICS FOR PSYCHOLOGISTS

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Introduction to Statistical Analyses in SAS

STATS PAD USER MANUAL

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Lecture 25: Review I

Basics of Multivariate Modelling and Data Analysis

Design of Experiments

Correctly Compute Complex Samples Statistics

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Design and Analysis of Experiments Prof. Jhareswar Maiti Department of Industrial and Systems Engineering Indian Institute of Technology, Kharagpur

Lab #9: ANOVA and TUKEY tests

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

One Factor Experiments

Data Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data

Recall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:

Descriptive Statistics Descriptive statistics & pictorial representations of experimental data.

An Example of Using inter5.exe to Obtain the Graph of an Interaction

E-Campus Inferential Statistics - Part 2

Study Guide. Module 1. Key Terms

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Linear Methods for Regression and Shrinkage Methods

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

WELCOME! Lecture 3 Thommy Perlinger

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Multiple Regression White paper

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Multiple Linear Regression

For our example, we will look at the following factors and factor levels.

Product Catalog. AcaStat. Software

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Chapter 7: Linear regression

Fathom Dynamic Data TM Version 2 Specifications

Correctly Compute Complex Samples Statistics

If the active datasheet is empty when the StatWizard appears, a dialog box is displayed to assist in entering data.

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

STATISTICS (STAT) Statistics (STAT) 1

Multiple Linear Regression: Global tests and Multiple Testing

ES-2 Lecture: Fitting models to data

PSY 9556B (Feb 5) Latent Growth Modeling

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Confidence Intervals. Dennis Sun Data 301

SPSS INSTRUCTION CHAPTER 9

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

Analysis of Complex Survey Data with SAS

Unit Maps: Grade 8 Math

Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey

An Introduction to Growth Curve Analysis using Structural Equation Modeling

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Lab 5 - Risk Analysis, Robustness, and Power

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

Two-Level Designs. Chapter 881. Introduction. Experimental Design. Experimental Design Definitions. Alias. Blocking

Descriptives. Graph. [DataSet1] C:\Documents and Settings\BuroK\Desktop\Prestige.sav

CS229 Lecture notes. Raphael John Lamarre Townshend

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Polymath 6. Overview

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

The Power and Sample Size Application

Nuts and Bolts Research Methods Symposium

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

Microscopic Traffic Simulation

Unit Maps: Grade 8 Math

PSS718 - Data Mining

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

Regression. Page 1. Notes. Output Created Comments Data. 26-Mar :31:18. Input. C:\Documents and Settings\BuroK\Desktop\Data Sets\Prestige.

Quantitative - One Population

Excel 2010 with XLSTAT

Minitab detailed

DesignDirector Version 1.0(E)

Chapter 17: INTERNATIONAL DATA PRODUCTS

Applied Regression Modeling: A Business Approach

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

Independent Variables

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

SAS Structural Equation Modeling 1.3 for JMP

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example Using Missing Data 1

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials.

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Data Analyst Nanodegree Syllabus

- 1 - Fig. A5.1 Missing value analysis dialog box

Screening Design Selection

Data Analysis Guidelines

Applied Regression Modeling: A Business Approach

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Analysis of Variance in R

Analysis of Two-Level Designs

Transcription:

Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using dummy variables. Given a categorical factor with g levels we construct (g-1) dummy variables as defined by the following coding table: Category C 1 C 2 C 3. C g-1 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 (g-1) 0 0 0 1 g 0 0 0 0 The coding table is used to assign values of the dummy variables to each individual. The dummy variables are then used as IVs in a regression model, which produces a value of R 2 as well as a regression equation. The value of R 2 indicates the proportion of variance in Y accounted for by the categorical research factor, as represented by the dummy variables.

The partial statistics associated with dummy variable C j are interpreted with reference to a comparison of category j to category g. Thus, category g plays a special role by serving as the category to which all others are compared via the partial statistics. When there is no basis for assigning a particular category to play this role, we may wish to use a different coding method. Unweighted Effects Coding Effects coded variables look very much like dummy variables with one change. Individuals in group g are assigned values of -1 on all effects coded variables. Thus, the coding table would have the following general form: 2 Category C 1 C 2 C 3. C g-1 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 (g-1) 0 0 0 1 g -1-1 -1-1

3 In our example the coding table would look like this: Category C 1 C 2 C 3 1 (Drug A) 1 0 0 2 (Drug B) 0 1 0 3 (Placebo) 0 0 1 4 (Control) -1-1 -1 Using this coding table we could then assign values of the effects coded IVs to each individual and produce a data matrix of the form Participant Treatment C 1 C 2 C 3 Y 1 A 1 0 0 9 2 A 1 0 0 10 3 B 0 1 0 8 4 B 0 1 0 7 5 Placebo 0 0 1 5 6 Placebo 0 0 1 8 7 Control -1-1 -1 7 n We could then use the effects coded variables as IVs in a MLR analysis with Y as the DV. The analysis would produce a value of R 2 along with a regression equation of the form Yˆ L = B0 + B1C1 + B2C2 + + B g 1C g 1

4 The value of R 2 and all inferential information about R 2 (significance test, confidence interval, correction for shrinkage, etc.) would be identical to results obtained from the MLR analysis with dummy variables. The regression coefficients and other partial statistics associated with the effects coded variables would be different than corresponding information associated with dummy variables. It can be shown that the intercept and coefficients in the regression equation would have the following interpretation: The intercept B 0 would be equal to the mean of the g group means on the Y variable. That is: B 0 = Y&& = Y + Y + Y & 1 2 3 g + L+ Y g This value is called the unweighted mean of the group means, meaning that the group means are not weighted by sample size. If group sample sizes are equal, then this value is equivalent to the grand mean of Y across all n observations. More on this later.

The regression coefficients for effects coded IVs also have a simple interpretation. For the first coded variable, it can be shown that B = Y Y && & 1 1 That is, the regression coefficient for C 1 will equal the difference between the mean for category 1 and the mean of all group means. In our example B 1 would equal the difference between the mean value of Y for individuals in the Drug A condition and the mean of all four group means. Such coefficients can be thought of as representing the effect of membership in a given category. For example, a large positive value of B 1 indicates a strong positive effect of being in the Drug A condition. Each regression coefficient for effects coded IVs has a similar interpretation. The coefficient for C 2 would have the value B = Y Y && & 2 2 and would reflect the effect of being in the Drug B condition. In general for effects coded IV C j, B j = Yj Y &&& reflects the effect of membership in category j. For all such effects, category j is compared to the unweighted mean of all categories. 5

For each B j we can also conduct significance tests and obtain confidence intervals. Such information is interpreted with reference to a comparison of the mean for category j to the unweighted mean of all group means. Similarly, we can obtain a value of sr 2 j associated with each effects coded variable. Such a value would be interpreted as the proportion of variance in Y accounted for by the effect of membership in group j; or more specifically, the proportion of variance in Y accounted for by the difference between the mean for group j and the unweighted mean of all group means. In general, under this type of coding, all partial statistics are interpreted with reference to comparison of a given group to the unweighted mean of all group means. Note the difference between this interpretation and that for partial statistics associated with dummy variables, which are interpreted with reference to comparison of a given group to group g. Note that the use of unweighted effects coding implies that each category counts equally. Differences in sample sizes for different groups are not considered relevant. This would normally be the case in experimental designs. 6

7 Weighted Effects Coding In some situations differences in sample sizes among groups may be indicative of those groups representing different proportions of the full population. For example if the research factor is ethnicity and we take a large sample from the full population we will find different sample sizes for different ethnic groups. Those sample sizes reflect the fact that different ethnic groups make up different proportions of the full population. If we wish for these differences to be represented in our coded variables and in our regression analyses, then effects codes must be adjusted by using the differential sample sizes. See details for these adjustments in Cohen, Cohen, West, & Aiken (2004). The resulting coded variables can then be used as IVs in a regression analysis, producing a value of R 2 and a regression equation. The value of R 2 and associated inferential information will be identical to that obtained using dummy coding or unweighted effects coding.

The coefficients in the regression equation will be different and will be interpreted in terms of weighted means instead of unweighted means. The intercept B 0 will correspond to the weighted mean of all g group means. A regression coefficient B j will be interpreted as a deviation of a group mean from the weighted mean of all g group means. The choice of weighted vs. unweighted effects coding depends primarily on whether differences in sample sizes for different categories of the research factor are reflective of those categories representing different proportions of the full population. The choice of effects coding vs. dummy coding depends at least in part on whether there exists an appropriate choice for a comparison group under dummy coding. 8

9 Contrast Coding A third type of coding can be used when there exist prior hypotheses about particular differences between categories. In our example, for instance, one specific issue of interest might be evaluation of the difference in effectiveness between Drug A and Drug B, ignoring the other two categories. Another might be evaluation of the difference in effectiveness between use of a real drug (Drug A and Drug B) vs. no real drug (Placebo and Control). Such prior hypotheses are called contrasts, and we can design coded variables to represent and provide for the testing of contrasts of interest. Given g categories we would define (g-1) contrast coded variables. The general procedure for defining a contrast coded variable is as follows: Given g categories, a contrast can be seen as defining three subsets of the g categories: Subset U, containing u categories. Subset V, containing v categories. Subset W, containing w categories. The contrast is designed to compare the groups in subset U to those in subset V, ignoring those in subset W.

For example, if we wish to compare Drug A to Drug B, ignoring Placebo and Control conditions, then: Subset U is the Drug A condition, containing u=1 category. Subset V is the Drug B condition, containing v=1 category. Subset W contains the Placebo and Control conditions, thus w=2. Contrast codes (defining a column in the coding table) are then defined as follows: For categories in subset U, codes are set at v/(u+v). For categories in subset V, codes are set at +u/(u+v). For categories in subset W, codes are set at 0. To illustrate, let us define codes for contrast variable C 1 to represent a comparison of Drug A vs. Drug B. The value of C 1 for Drug A condition would be -1/2. The value of C 1 for Drug B condition would be +1/2. The value of C 1 for Placebo and Control conditions would be 0. 10

Thus the first column of the coding table would have the following form: 11 Category C 1 C 2 C 3 1 (Drug A) -1/2 2 (Drug B) +1/2 3 (Placebo) 0 4 (Control) 0 The contrast codes actually define a linear combination of group means: C 1 = 1 Y1 + 1 Y2 + (0) Y3 + (0) Y 2 2 4 Since we need (g-1) coded IVs to carry the information in the categorical research factor, we can (must) define two more contrasts in our example. Let C 2 be defined to represent a comparison of Placebo vs. Control, ignoring Drugs A and B. Let C 3 be defined to represent a comparison of Drugs A and B to Placebo and Control. The full coding table would then take this form: Category C 1 C 2 C 3 1 (Drug A) -1/2 0 +1/2 2 (Drug B) +1/2 0 +1/2 3 (Placebo) 0-1/2-1/2 4 (Control) 0 +1/2-1/2

Note that the contrasts should be defined as independent, or orthogonal. Independence is achieved by defining the contrasts so that the sum of products of codes for a given pair of contrasts is zero. In our example, if we sum the products of the codes in any pair of columns, we get a value of zero. Once contrast codes are defined we can then use the coding table to assign values of the coded variables to each individual. In our example the resulting data matrix would look like this: 12 Participant Treatment C 1 C 2 C 3 Y 1 A -1/2 0 1/2 9 2 A -1/2 0 1/2 10 3 B 1/2 0 1/2 8 4 B 1/2 0 1/2 7 5 Placebo 0-1/2-1/2 5 6 Placebo 0-1/2-1/2 8 7 Control 0 1/2-1/2 7 n Just as we did using other coding methods, we could then proceed with an MLR analysis regressing Y on the three coded IVs.

13 Results would include a value of R 2 and associated inferential information, which would exactly match corresponding results obtained under other coding methods. Results would also include a regression equation of the form Yˆ B + B C + B C + L + B g C = 0 1 1 2 2 1 g 1 Our focus in these results is on the regression coefficients and associated inferential information and partial statistics. (It can be shown that the intercept will be equal to the unweighted mean of the g group means.) In our example, B 1 would equal the value of the contrast defined by C 1 ; specifically, the difference between the unweighted means for the Drug A and Drug B conditions. The significance test for B 1 would be interpreted as a test of the significance of this contrast. The value of sr 2 1 would be interpreted as the proportion of variance in Y accounted for by this contrast.

14 In a similar fashion the partial statistics associated with each contrast coded IV could be interpreted. General Comments on Coding Methods In the case of a single categorical research factor, regardless of which coding method is used, results of an MLR analysis will be equivalent to results of a one-way ANOVA. When different coding methods are used, the value of R 2 and associated inferential information will not change. The values of regression coefficients and other partial statistics will change, as will their interpretation. In general when using coded variables we should always make use of unstandardized regression coefficients rather than standardized coefficients. Standardization of coded variables makes interpretation more difficult.

15 The choice of coding method can be based on the following principles: Dummy coding: Use when there is one group that logically can serve as a reference group to which all others will be compared through the various partial statistics. Effects coding: Use when there is no obvious choice for a reference group and no specific contrasts of interest. Use unweighted effects coding when differences in group sample sizes are irrelevant. Use weighted effects coding when differences in group sample sizes reflect differences in proportional representation in the population. Contrast coding: Use when prior hypotheses lend themselves to the specification of (g-1) independent contrasts.