Correlation and Regression

Similar documents
Regression Analysis and Linear Regression Models

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Linear Modeling with Bayesian Statistics

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

PSY 9556B (Feb 5) Latent Growth Modeling

Bivariate (Simple) Regression Analysis

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Gelman-Hill Chapter 3

Section 2.2: Covariance, Correlation, and Least Squares

SLStats.notebook. January 12, Statistics:

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Applied Statistics and Econometrics Lecture 6

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

Study Guide. Module 1. Key Terms

Exam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x.

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

Multiple Linear Regression

Empirical Reasoning Center R Workshop (Summer 2016) Session 1. 1 Writing and executing code in R. 1.1 A few programming basics

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

Section 2.3: Simple Linear Regression: Predictions and Inference

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

A (very) brief introduction to R

Multiple Regression White paper

Solution to Bonus Questions

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Chapter 7: Linear regression

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2

Regression on the trees data with R

36-402/608 HW #1 Solutions 1/21/2010

Statistical Pattern Recognition

Compare Linear Regression Lines for the HP-67

Week 4: Simple Linear Regression III

Table Of Contents. Table Of Contents

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Model Selection and Inference

5.5 Regression Estimation

Standard Errors in OLS Luke Sonnet

Functions: The domain and range

4.1 The Coordinate Plane

IQR = number. summary: largest. = 2. Upper half: Q3 =

MS&E 226: Small Data

Section 2.1: Intro to Simple Linear Regression & Least Squares

Lecture 20: Outliers and Influential Points

Applied Regression Modeling: A Business Approach

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

R Exercise Practical 2. Follow the instructions described in this sheet. Type or paste any answers you are asked for into a Microsoft Word document.

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Name Date. In Exercises 1 6, graph the function. Compare the graph to the graph of ( )

S CHAPTER return.data S CHAPTER.Data S CHAPTER

nag simple linear regression (g02cac)

Robust Linear Regression (Passing- Bablok Median-Slope)

Minitab 17 commands Prepared by Jeffrey S. Simonoff

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Further Differentiation

Introduction to hypothesis testing

Section 2.1: Intro to Simple Linear Regression & Least Squares

Multivariate Analysis Multivariate Calibration part 2

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Notes for week 3. Ben Bolker September 26, Linear models: review

LESSON Constructing and Analyzing Scatter Plots

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Exercise 2.23 Villanova MAT 8406 September 7, 2015

3.5 Rational Functions

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

Product Catalog. AcaStat. Software

Gov Troubleshooting the Linear Model II: Heteroskedasticity

BIOL 458 BIOMETRY Lab 10 - Multiple Regression

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

An introduction to SPSS

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Stat 427/527: Advanced Data Analysis I

JMP 10 Student Edition Quick Guide

Introduction to Mixed-Effects Models for Hierarchical and Longitudinal Data

Statistical Pattern Recognition

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.

Linear Regression. Problem: There are many observations with the same x-value but different y-values... Can t predict one y-value from x. d j.

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Nonparametric Testing

Nuts and Bolts Research Methods Symposium

The mblm Package. August 12, License GPL version 2 or newer. confint.mblm. mblm... 3 summary.mblm...

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

One Factor Experiments

8. MINITAB COMMANDS WEEK-BY-WEEK

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Module 25.1: nag lin reg Regression Analysis. Contents

Contents Cont Hypothesis testing

Algebra 1 Semester 2 Final Review

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Simulation and resampling analysis in R

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

ST Lab 1 - The basics of SAS

Linear Regression. A few Problems. Thursday, March 7, 13

Excel 2010 with XLSTAT

End of Chapter Test. b. What are the roots of this equation? 8 1 x x 5 0

Transcription:

How are X and Y Related Correlation and Regression Causation (regression) p(y X=) Correlation p(y=, X=) http://kcd.com/552/ Correlation Can be Induced b Man Mechanisms # of Pups 1 2 3 4 5 6 7 8 Inbreeding Coefficient

Covariance Describes the relationship between two variables. Not scaled. σ = population level covariance, s = covariance in our sample We don t know which is correct - or if another model is better. We can onl eamine correlation. σ XY = (X X)( Ȳ ) n 1 4 2 0 2 4 6 8 cov(,) = 2.273 6 4 2 0 2 4 Pearson Correlation Assumptions of Pearson Correlation Describes the relationship between two variables. Scaled between -1 and 1. ρ = population level correlation, r = correlation in our sample ρ = σ σ σ ( mean())/sd() 3 2 1 0 1 2 3 cor(,) = 0.635 Observations are from a random sample Each observation is independent X and Y are from a Normal Distribution Frequenc 3 2 1 0 1 2 /sd()

The meaning of r Y is perfectl predicted b X if r = -1 or 1. r 2 = the porportion of variation in eplained b Testing if r 0 Ho is r=0. Ha is r 0. r = 0.22 r = 0.51 2 0 2 2 0 1 2 3 2 0 1 2 2 0 1 2 3 WHY n-2? Testing: t = r with df=n-2 SEr 2 0 2 r = 0.67 2 0 2 r = 0.8 SE r = 1 r2 n 2 3 1 1 2 3 2 1 0 1 2 cov(wolves) # of Pups 1 2 3 4 5 6 7 8 ## inbreeding.coefficient pups ## inbreeding.coefficient 0.009922-0.1136 ## pups -0.113569 3.5199 cor(wolves) ## inbreeding.coefficient pups ## inbreeding.coefficient 1.0000-0.6077 ## pups -0.6077 1.0000 Inbreeding Coefficient

Eercise: Pufferfish Mimics & Predator Approaches with(wolves, cor.test(pups, inbreeding.coefficient)) ## ## Pearson's product-moment correlation ## ## data: pups and inbreeding.coefficient ## t = -3.589, df = 22, p-value = 0.001633 ## alternative hpothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.8120-0.2707 ## sample estimates: ## cor ## -0.6077 Load up the pufferfish mimic data from W&S Plot the data Assess the correlation and covariance Assess Ho. Challenge - Evaluate Ha1: the correlation is 1. Eercise: Pufferfish Mimics & Predator Approaches Violating Assumptions? # get the correlation and se puff_cor = cor(puffer)[1, 2] se_puff_cor = sqrt((1 - puff_cor)/(nrow(puffer) - 2)) # t-test with difference from 1 t_puff <- (puff_cor - 1)/se_puff_cor t_puff ## [1] -2.005 # 1 tailed, as > 1 is not possible pt(t_puff, nrow(puffer) - 2) Spearman s Correlation (rank based) Distance Based Correlation & Covariance (dcor) Maimum Information Coefficient (nonparametric) All are lower in power for linear correlations ## [1] 0.03013

Spearman Correlation Distance Based Correlation, MIC, etc. 1. Transform variables to ranks, i.e.,2,3... (rank()) 2. Compute correlation using ranks as data 3. If n 100, use Spearman Rank Correlation table 4. If n > 100, use t-test as in Pearson correlation How are X and Y Related Least Squares Regression = a + b Then it s code in the data, give the keboard a punch Then cross-correlate and break for some lunch Correlate, tabulate, process and screen Program, printout, regress to the mean -White Coller Holler b Nigel Russell Causation (regression) p(y X=) Correlation p(y=, X=)

Correlation v. Regression Coefficients Basic Princples of Linear Regression Slope = 3, r = 0.98 Slope = 3, r = 0.72 Slope = 1, r = 0.72 1 50 0 50 100 150 2 50 0 50 100 150 3 50 0 50 100 150 Y is determined b X: p(y X=) The relationship between X and Y is Linear The residuals of Y= a + b are normall distributed (i.e., Y = a + bx + e where e N(0,σ)) 0 20 40 0 20 40 0 20 40 Basic Principles of Least Squares Regression Solving for Slope Ŷ = a + bx - a = intercept, b = slope 2 4 6 8 b = s s 2 = cov(,) var() = r s s 2 4 6 8 10 Minimize Residuals defined as SSresiduals = (Yi Ŷ )2

Solving for Intercept Fitting a Linear Model in R Least squares regression line alwas goes through the mean of X and Y Ȳ = a + b X a = Ȳ b X wolf_lm <- lm(pups inbreeding.coefficient, data=wolves) wolf_lm ## ## Call: ## lm(formula = pups inbreeding.coefficient, data = wolves) ## ## Coefficients: ## (Intercept) inbreeding.coefficient ## 6.57-11.45 Etracting Coefficients from a LM Etracting Fitted Values from a LM fitted(wolf_lm) coef(wolf_lm) ## (Intercept) inbreeding.coefficient ## 6.567-11.447 coef(wolf_lm)[1] ## (Intercept) ## 6.567 ## 1 2 3 4 5 6 7 8 9 10 ## 6.567 6.567 5.079 5.079 5.079 4.392 4.392 4.392 3.706 3.820 ## 11 12 13 14 15 16 17 18 19 20 ## 3.820 3.820 3.820 3.820 3.820 3.477 3.133 3.133 3.133 3.133 ## 21 22 23 24 ## 2.446 1.989 2.332 4.049 coef(wolf_lm)[1] + coef(wolf_lm)[2]*0.25 ## (Intercept) ## 3.706

Plotting Fitted LMs plot(pups inbreeding.coefficient, data=wolves, pch=19) matplot(wolves$inbreeding.coefficient, fitted(wolf_lm), add=t, lwd=2, col="red", tpe="l") Plotting Fitted LMs plot(pups inbreeding.coefficient, data=wolves, pch=19) abline(wolf_lm, col="red", lwd=2) pups 1 2 3 4 5 6 7 8 pups 1 2 3 4 5 6 7 8 inbreeding.coefficient inbreeding.coefficient Ggplot2 and LMs ggplot(data=wolves, aes(=pups, =inbreeding.coefficient)) + geom_point() + theme_bw() + stat_smooth(method="lm", color="red") Checking Residuals par(mfrow=c(1,2)) plot(fitted(wolf_lm), residuals(wolf_lm)) # hist(residuals(wolf_lm), main="residuals") 8 Residuals pups 6 4 2 residuals(wolf_lm) 2 1 0 1 2 3 Frequenc 0 2 4 6 8 10 12 inbreeding.coefficient 2 3 4 5 6 fitted(wolf_lm) 3 2 1 0 1 2 3 4 residuals(wolf_lm)

Eercise: Pufferfish Mimics & Predator Approaches Fit the pufferfish data Visualize the linear fit Eamine whether there is an relationship between fitted values, residual values, and treatment