Analysis of Variance in R

Size: px

Start display at page:

Download "Analysis of Variance in R"

Melissa Stevens
5 years ago
Views:

1 nalysis of Variance in R Dale arr R Training: University of Glasgow Dale arr (R Training: University of Glasgow) nalysis of Variance in R 1 / 19

2 When is NOV applicable? When you wish to assess the independent/joint effects of one or more categorical factors on a single continuous dependent variable Strictly speaking, NOV is not applicable to count or categorical DVs (but that stops few researchers from using it anyway!) NOV is a special case of linear regression, ultimately a more flexible approach Dale arr (R Training: University of Glasgow) nalysis of Variance in R 2 / 19

3 The General Linear Model (GLM) Definition (General Linear Model or GLM) general mathematical framework for expressing relationships among variables Differs from the cookbook approach to statistics t-test, NOV, NCOV, χ 2 test, regression, correlation, etc. Can express/test linear relationships between a numerical dependent variable and any combination of independent variables (categorical or continuous) Can even be generalized to categorical dependent variables (through Generalized Linear Models ) Dale arr (R Training: University of Glasgow) nalysis of Variance in R 3 / 19

4 NOV, Regression, NCOV Fig 1a. Cholesterol levels by ethnic group and gender (male=sqr, Fig 1a. Cholesterol levels by age. female=tri). Fig 1a. Cholesterol levels by age and gender. Dale arr (R Training: University of Glasgow) nalysis of Variance in R 4 / 19

5 How the GLM represents relationships Component of GLM Notation DV Y Grand verage µ "mu" Main Effects,, C,... Interactions, C, C, C,... Random Error S(Group) Score = Grand vg. + Main Effects + Interactions + Error Y = µ C C + C + C S(Group) Components of the model are estimated from the observed data Tests are performed ( F ) to see whether its variability is too large to be introduced by chance Dale arr (R Training: University of Glasgow) nalysis of Variance in R 5 / 19

6 NOV as variance partitioning SS total = SS treat + SS error SS treat = SS + SS + SS Source SS df MS F k 1 k 1 df df Error N subj Ngroups Total Dale arr (R Training: University of Glasgow) nalysis of Variance in R 6 / 19

7 Making comparisons across groups Example (Spelling) You wish to compare the benefits of three different spelling programs. Do these programs yield differences in spelling performance? H 0 : µ 1 = µ 2 = µ 3 Factors and Levels Factor: a categorical variable that is used to divide subjects into groups, usually to draw some comparison. Factors are composed of different levels. Do not confuse factors with levels! Dale arr (R Training: University of Glasgow) nalysis of Variance in R 7 / 19

8 Means, Variability, and Deviation Scores Dale arr (R Training: University of Glasgow) nalysis of Variance in R 8 / 19

9 Means, Variability, and Deviation Scores grand mean Y.. = ij Y ij N Dale arr (R Training: University of Glasgow) nalysis of Variance in R 8 / 19

10 Means, Variability, and Deviation Scores grand mean Y.. = SD Y = ij(y ij Y..) 2 N ij Y ij N deviation score: Y ij Y.. Dale arr (R Training: University of Glasgow) nalysis of Variance in R 8 / 19

11 GLM for One-Factor NOV Y ij = µ Estimation Equations Dale arr (R Training: University of Glasgow) nalysis of Variance in R 9 / 19

12 GLM for One-Factor NOV Y ij = µ + i Estimation Equations Dale arr (R Training: University of Glasgow) nalysis of Variance in R 9 / 19

13 GLM for One-Factor NOV Y ij = µ + i + S() ij Estimation Equations Dale arr (R Training: University of Glasgow) nalysis of Variance in R 9 / 19

14 GLM for One-Factor NOV Y ij = µ + i + S() ij Estimation Equations Dale arr (R Training: University of Glasgow) nalysis of Variance in R 9 / 19

15 Sources of Variance Y ij = µ + i + S() ij Y ij µ = i + S() ij individual = group + random Sum of Squares (SS) Dale arr (R Training: University of Glasgow) nalysis of Variance in R 10 / 19

16 Decomposition Matrix ˆµ = Â 1 = = 20 Â 2 = 97 = 3 Â 3 = 83 = 17 Y ij = ˆµ + Â i + Ŝ() ij 124 = = = = = = = = = = = = SS = = Dale arr (R Training: University of Glasgow) nalysis of Variance in R 11 / 19

17 Logic of NOV Compare two estimates of the variability, the between-group estimate (SS_{between}) and the within-group estimate (SS_{within}) If H 0 : µ 1 = µ 2 = µ 3 is true, then these two measures estimate the same quantity. The extent to which the between-group variability exceeds the within-group variability gives evidence against H 0 : µ 1 = µ 2 = µ 3. Dale arr (R Training: University of Glasgow) nalysis of Variance in R 12 / 19

18 Calculating SS between and SS within Y ij = ˆµ + Â i + Ŝ() ij 124 = = = = = = = = = = = = SS = = check your math SS Y = SS µ + SS + SS S() Dale arr (R Training: University of Glasgow) nalysis of Variance in R 13 / 19

19 H 0 and Sums of Squares Y ij µ = i + S() ij SS = 2792 SS S() = 526 SS + SS S() = 3318 SS = 266 SS S() = 3052 SS + SS S() = 3318 Dale arr (R Training: University of Glasgow) nalysis of Variance in R 14 / 19

20 Mean Square and Degrees of Freedom Degrees of Freedom (df) The number of observations that are free to vary. df = K 1 df S() = N K where N is the number of subjects and K is the number of groups. Mean Square (MS) sum of squares divided by its degrees of freedom. MS = SS df = = 1396 MS S() = SS S() df S() = = 58.4 Dale arr (R Training: University of Glasgow) nalysis of Variance in R 15 / 19

21 The F -ratio Relative Freq df(num,denom)=2,9 F ratio ratio of mean squares, with df_{numerator} and df_{denominator} degrees of freedom. F = MS MS S() = = F_crit= F value If F obs > F crit, then reject H 0 Dale arr (R Training: University of Glasgow) nalysis of Variance in R 16 / 19

22 Summary Table Source df SS MS F p Error µ <.001 S() <.001 S() S() Total Source df SS MS F p Error µ <.001 S() S() S() Total Dale arr (R Training: University of Glasgow) nalysis of Variance in R 17 / 19

23 Overview of One-Way NOV 1 Write the GLM: Y ij = µ + i + S() ij 2 Write down the estimating equations: ˆµ = Y.. Âi = Y i. ˆµ Ŝ()ij = Y ij ˆµ Âi 3 Compute estimates for all terms in model. 4 Create decomposition matrix. 5 Compute SS, MS, df. dfµ = 1 df = K 1 dfs() = N K MS = SS/df 6 Construct a summary NOV table. 7 Compare F_{obs} with F_{crit}. Dale arr (R Training: University of Glasgow) nalysis of Variance in R 18 / 19

24 NOV assumptions Normality Conditional independence Homoskedasticity Sphericity (RM-designs only, where k > 2) Dale arr (R Training: University of Glasgow) nalysis of Variance in R 19 / 19

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated