Regression Analysis and Linear Regression Models

Similar documents
Statistical Tests for Variable Discrimination

Multiple Linear Regression

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Gelman-Hill Chapter 3

IQR = number. summary: largest. = 2. Upper half: Q3 =

Correlation and Regression

Model Selection and Inference

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Analysis of variance - ANOVA

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Section 2.1: Intro to Simple Linear Regression & Least Squares

Applied Statistics and Econometrics Lecture 6

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.2: Covariance, Correlation, and Least Squares

Multiple Regression White paper

Descriptive Statistics, Standard Deviation and Standard Error

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Week 4: Simple Linear Regression III

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

36-402/608 HW #1 Solutions 1/21/2010

5.5 Regression Estimation

Statistics Lab #7 ANOVA Part 2 & ANCOVA

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Solution to Bonus Questions

Two-Stage Least Squares

Bivariate (Simple) Regression Analysis

Section 2.1: Intro to Simple Linear Regression & Least Squares

Regression on the trees data with R

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Chapter 7: Linear regression

Introduction to hypothesis testing

Poisson Regression and Model Checking

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

SYS 6021 Linear Statistical Models

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Simulating power in practice

Robust Linear Regression (Passing- Bablok Median-Slope)

SLStats.notebook. January 12, Statistics:

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression

The Bootstrap and Jackknife

Simulation and resampling analysis in R

Lecture 16: High-dimensional regression, non-linear regression

Multivariate Analysis Multivariate Calibration part 2

Quantitative - One Population

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Lecture 13: Model selection and regularization

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Contents Cont Hypothesis testing

Introductory Applied Statistics: A Variable Approach TI Manual

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Subset Selection in Multiple Regression

The Truth behind PGA Tour Player Scores

Applied Regression Modeling: A Business Approach

Variable selection is intended to select the best subset of predictors. But why bother?

Introduction to Data Science

Table Of Contents. Table Of Contents

Error Analysis, Statistics and Graphing

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

One Factor Experiments

The linear mixed model: modeling hierarchical and longitudinal data

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Lecture 20: Outliers and Influential Points

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

Homework set 4 - Solutions

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Random coefficients models

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Statistical Pattern Recognition

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

VCEasy VISUAL FURTHER MATHS. Overview

14.2 The Regression Equation

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Section E. Measuring the Strength of A Linear Association

Exam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x.

Linear Modeling with Bayesian Statistics

Week 5: Multiple Linear Regression II

Chapter 6: DESCRIPTIVE STATISTICS

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

22s:152 Applied Linear Regression

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Lecture 7: Linear Regression (continued)

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

22s:152 Applied Linear Regression

Transcription:

Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33

Relationship between numerical variable Investigate possible linear relationship between two numerical variables. The Pearson s correlation coefficient Quantify the strenght and direction of a linear relationship Given two numerical variable X and Y N i=1 ρ = (x i µ x )(y i µ Y ) Nσ x σ y where µ x and µ y are the population means of X and Y, σ x and σ y the population standard deviations and N is the population size. It s a number in [-1,1] The stronger the relationship the closer ρ to 1 The sign of ρ indicates the direction of the relationship (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 2 / 33

Relationship between numerical variable We cannot measure ρ directly; we do not have access to the whole population Estimation of ρ from the data Given n pairs of values (x 1, y 1 ),..., (x n, y n) of the observed data The estimation r of rho is: N i=1 r = (x i x)(y i ȳ) (n 1)s x s y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 3 / 33

Relationship between numerical variable Examples with real data Example With the bodyweight dataset Examine the relationship between percent body fat (response) and abdomen circumference(explanatory variable) Dataset can be found at http://lib.stat.cmu.edu/datasets/bodyfat cor(bw[,c("abdomen2", "bodyfat")]) ## abdomen2 bodyfat ## abdomen2 1.00000 0.81343 ## bodyfat 0.81343 1.00000 ## [1] 252 Examine the relationship between height and percent body fat cor(bw[,c("bodyfat","height")]) ## bodyfat height ## bodyfat 1.000000-0.089495 ## height -0.089495 1.000000 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 4 / 33

Relationship between numerical variable Correlation tests Recall: When ρ is close to 0 means that the two variables are not related Or they are related BUT the relationship is not linear Be cautious to intepret rho close to 0 as no relationship!! Evaluate statistical significance of rho R H 0 : ρ = 0 T = (1 R 2 )/(n 2) R is the sample correlation coefficient and n the sample size If null hypothesis is true, the T distribution is the t-distribution with n 2 degree of freedom Observed statistic t = H 1 : ρ 0 r (1 r 2 )/(n 2) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 5 / 33

Example on correlation test Example With the bodyweight dataset Examine the relationship between height and percent body fat Compute the t-score from the sample aa <- cor(bw[,c("bodyfat","height")]) t <- aa[1,2] / (sqrt((1-aa[1,2]**2)/(nrow(bw) - 2))) Testing the alternative hypothesis H 1 : ρ 0 based on a t-distribution with 252-2 =250 degree of freedom Compute the p-value as p obs = 2P(T 1.42) 2 * pt(t,df=nrow(bw)-2) ## [1] 0.15664 With the commonly used significance levels (0.01, 0.05, 0.1) we reject the alternative hypothesis Therefore we cannot conclude the two variables are linearly correlated (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 6 / 33

Example on correlation test Example With the bodyweight dataset. Testing the alternative hypothesis H 1 : ρ 0 Examine the relationship between height and percent body fat cor.test(bw$bodyfat, bw$height, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$height ## t = -1.4207, df = 250, p-value = 0.1566 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.210738 0.034459 ## sample estimates: ## cor ## -0.089495 Examine the relationship between percent body fat and abdomen circumference cor.test(bw$bodyfat, bw$abdomen2, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$abdomen2 ## t = 22.112, df = 250, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.76695 0.85142 ## sample estimates: ## cor ## 0.81343 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 7 / 33

Linear regression models Aim: Investigate the relationships between numerical variables Examining linear relationships between a response variable and one or more explanatory variable Testing the hypothesis regarding relationships between one or more explanatory variable and a response variable Predicting unknown values of the response variable using one or more predictors Denote with X the set of explanatory variables Denote with Y the response variables Try to fit the equation: Y = f (X) + ɛ Defining that f (X) is linear: Y = Xβ + ɛ thus we can estimate β minimizing the prediction error: ˆβ = (X T X) 1 X T y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 8 / 33

Linear regression models One binary Explanatory variable X: is a binary variable 0,1 Y: is a numerical variable X = 0 0. 1 1 Example Investigate relationship between sodium chloride intake and blood pressure among elderly people 25 people (= 25) 15 of them (0.6 of our sample) keep a low sodium chloride diet (X = 0) 10 of them (0.4 of our sample) keep a high sodium chloride diet (X = 1) Measure of the systolic blood pressure (Y ) For each individual i we have a pair of observation (x i, y i ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 9 / 33

Example Dotplot of systolic blood pressure for each diet group 145 140 BP 135 a 130 0 1 For each group compute the mean estimation of blood pressure (red point in the graph) The sample mean provides a reasonable point estimate if a new sample arrives For group X = 0: ŷ x=0 = mean(y x=0 ) For group X = 1: ŷ x=1 = mean(y x=1 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 10 / 33

Example Example We can compute ŷ x=0 and ŷ x=1 : ## 0 1 ## 133.17 139.43 Compute the line parameters connecting the two points: a <- mm["0"] b <- (mm["1"] - mm["0"]) / 1 ## [1] 133.17 ## [1] 6.2563 We can draw the black line connecting the two means In general The regression line is defined as: ŷ = a + bx that captures the linear relationship between response variable and explanatory variables The slope b is interpreted as our estimate of the expected (average) change in response variable associated to unit increase in the value of the explanatory variable (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 11 / 33

Linear regression models Prediction and Errors The regression line Given the regression line: Define the prediction for each sample: ŷ i = a + bx i Define the residuals for each sample: e i = y i ŷ i Thus the real y i value will be: y i = ŷ i + e i = a + bx i + e i (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 12 / 33

Linear regression models Prediction and Errors Example With the same example on blood pressure compute the prediction for each group: Predictions x i = 0 ŷ i = a = 133.17 x i = 1 ŷ i = a + b = 139.429 Errors x 4 = 0 The true value is y 4 = 135.08 the error is e 4 = y 4 ŷ 4 = 135.08 133.17 = 1.91 x 25 = 1 The true value is y 25 = 134.84 the error is e 25 = y 25 ŷ 25 = 134.84 139.43 = 4.6 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 13 / 33

Linear regression models Measure discrepancy Measure the discrepacy: Residual Sum of Squares (RSS) Measure the distance between predicted values and true values Depend on the resisual and on sample size n For the mean as predictor: e i i = 0 RSS = n i e 2 i We decide to draw the the line connecting the mean between two groups We can draw almost any line between the two groups The line connecting the means is the one which give the minimum RSS which is called the least-squares regression line (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 14 / 33

Generalization Generalized to the whole population The linear relationship between Y and X in the entire population: Y = α + βx + ɛ This is defined as the linear regression model α and β are the regression parameters β is the regression coefficient fitting is the process of finding the regression parameters Confidence Interval for the regression coefficient Standard Error: Confidence Intervals: SE b = RSS/(n 2) i (x i x) 2 [b t crit SE b, b + t crit SE b ] where t crit depends on the level c of confidence (i.e.1.96 for c = 0.95) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 15 / 33

Hypothesis testing Linear regression models can be used to test hypothesis regarding possible relationships between response variable and explanatory variable null hypothesis H 0 : β = 0 no linear relationship alternative hypothesis H 0 : β 0, p obs = 2 P(T t ) t t = b SE b Example SE b = 1.593 for b = 6.25 t <- b/1.593 p.value <- pt(t,df=(nrow(saltbp)-2), lower.tail=false) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 16 / 33

Exercise I With the previous dataset saltbp try to estimate coefficient β 0 and β 1 from the matrix X and y using the least square regression line. Recall the definition of the X matrix when β 0 should be estimated For each sample compute the prediction ŷ i and the error e i. Compute also the RSS, the SE for this model and the C.I. at 90% of confidence. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 17 / 33

Linear regression models Example Example Use the lm function to predict the least square regression line aa <- lm(bp~saltlevel,data=saltbp) summary(aa) ## ## Call: ## lm(formula = BP ~ saltlevel, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## -8.299-3.563 0.687 3.211 5.591 ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) 133.17 1.01 132.22 < 2e-16 *** ## saltlevel1 6.26 1.59 3.93 0.00067 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.9 on 23 degrees of freedom ## Multiple R-squared: 0.402, Adjusted R-squared: 0.376 ## F-statistic: 15.4 on 1 and 23 DF, p-value: 0.000672 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 18 / 33

Linear Regression Models One Numerical Explanatory variable X: is a numerical variable Y: is a numerical variable Example Investigate relationship between sodium chloride intake and blood pressure among elderly people X Daily salt intake (numerical values) Y Blood Pressure (numerical values) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 19 / 33

Explore the data first I Look at the scatter plot of the data BP 130 135 140 145 2 4 6 8 10 12 salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 20 / 33

Explore the data first II BP 130 135 140 145 2 4 6 8 10 12 salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 21 / 33

Model on one numerical variable Model definition model ŷ i = a + bx i error e i = y i ŷ i n RSS e 2 i i We can estimate: slope b given by the r coefficient: intercept a given by b = r sy s x where s x and s y are the sample variances where x and ȳ are the sample means a = ȳ b x (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 22 / 33

Example on the blood data set Compute manually the regression model: sy <- sd(saltbp$bp) ## sd of y sx <- sd(saltbp$salt) ## sd of x r <- cor(saltbp$bp, saltbp$salt) ## Correlation coefficient b <- r * (sy/sx) ## The slope a <- mean(saltbp$bp) - b*mean(saltbp$salt) ## The intercept sy;sx;r;b;a ## [1] 4.9364 ## [1] 3.4595 ## [1] 0.8388 ## [1] 1.1969 ## [1] 128.62 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 23 / 33

Example on blood data set Compute the prediction value for a sample in the dataset xi <- saltbp$salt[10] ## Extract a sample yi <- saltbp$bp[10] yhi <- a + b * xi ## Compute the prediction for the sample ei <- yi - yhi ## Compute the error yhi; ei ## [1] 133.02 ## [1] -4.721 yhi <- a + b * saltbp$salt ei <- saltbp$bp - yhi RSS <- sum(ei^2) SE <- sqrt(rss/(25-2))/sqrt(sum((saltbp$salt - mean(saltbp$salt))^2)) sqrt(rss/(25-2)); SE ## [1] 2.7454 ## [1] 0.16199 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 24 / 33

Let R working for us!! Compute the model using the least regression model in R mymod <- lm(bp~salt, data=saltbp) summary(mymod) ## ## Call: ## lm(formula = BP ~ salt, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.039-1.675 0.366 1.882 5.344 ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) 128.616 1.102 116.72 < 2e-16 *** ## salt 1.197 0.162 7.39 1.6e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.75 on 23 degrees of freedom ## Multiple R-squared: 0.704, Adjusted R-squared: 0.691 ## F-statistic: 54.6 on 1 and 23 DF, p-value: 1.63e-07 ## ## Manually Computed: ## Residual Std Error 2.745 ## SE 0.162 ## pvalue 1.63120748106763e-07 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 25 / 33

Analyze the output mymod$coefficient ## Parameters of the linear model ## (Intercept) salt ## 128.6164 1.1969 mymod$residuals ## error for each sample ## 1 2 3 4 5 6 7 8 ## 1.71842 1.87111-0.59724 1.54437-0.62158 2.61017 3.89831-1.67547 ## 9 10 11 12 13 14 15 16 ## -2.27817-4.72097-2.12713 0.36618 2.19296 0.79778 2.28898-3.77798 ## 17 18 19 20 21 22 23 24 ## 0.61068 1.88238-5.03880-0.23622 2.13233-0.99761 5.34430-1.02136 ## 25 ## -4.16544 mymod$fitted.values ## Predicted values for the response variable ## 1 2 3 4 5 6 7 8 9 10 ## 130.47 129.97 134.46 133.54 130.47 134.23 131.20 131.29 131.79 133.02 ## 11 12 13 14 15 16 17 18 19 20 ## 131.42 135.77 135.31 134.85 134.54 139.57 137.51 142.79 136.17 141.02 ## 21 22 23 24 25 ## 142.00 138.17 139.68 143.66 139.01 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 26 / 33

Plots I plot(mymod, which=1:2) Residuals vs Fitted Residuals 6 4 2 0 2 4 6 10 19 23 130 132 134 136 138 140 142 144 Fitted values lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 27 / 33

Plots II Normal Q Q Standardized residuals 2 1 0 1 2 19 10 23 2 1 0 1 2 Theoretical Quantiles lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 28 / 33

Histogram of the residuals hist(mymod$residuals, col="grey") Histogram of mymod$residuals Frequency 0 1 2 3 4 5 6 7 6 4 2 0 2 4 6 mymod$residuals (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 29 / 33

Fitted values True values vs predicted values plot(bp~salt, data=saltbp) points(saltbp$salt, mymod$fitted.values, pch=20) 2 4 6 8 10 12 130 135 140 145 salt BP (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 30 / 33

Goodness of Fit Definition: R 2 Measures how well the regression model fits the observed data It depends by the RSS It quantifies the discrepancies between observed data and the regression line The higher the RSS the higher the discrepancy n RSS lack of fit e 2 i i n TSS Total variation in the response variable (y i i ȳ) 2 R 2 Total variation explained by the model: R 2 = 1 RSS TSS For simple regression line with one variable R 2 = r Pearson s correlation (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 31 / 33

Assumtpions Linear model regression assumptions 1 Linearity: we assume the relationship between X and Y is linear! 2 Independence: observations should be independent (random sampling) 3 Constant Variance and Normality: Y should be distributed normally. In general we check for the normality of ɛ given the relationship between Y and ɛ. In particular ɛ N (0, σ 2 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 32 / 33

Exercise I 1 We want to examine the relationship between body temperature Y and heart rate X. Further, we would like to use heart rate to predict the body temperature. 1 Use the BodyTemperature.txt data set to build a simple linear regression model for body temperature using heart rate as the predictor. 2 Interpret the estimate of regression coefficient and examine its statistical significance. 3 Find the 95% confidence interval for the regression coefficient. 4 Find the value of R 2 and show that it is equal to sample correlation coefficient 5 Create simple diagnostic plots for your model and identify possible outliers. 6 If someone s heart rate is 75, what would be your estimate of this person s body temperature? 2 We would like to predict a baby s birthweight (bwt) before she is born using her mother s weight at last menstrual period (lwt). 1 Use the birthwt data set to build a simple linear regression model, where bwt is the response variable and lwt is the predictor. 2 Interpret your estimate of regression coefficient and examine its statistical significance 3 Find the 90% confidence interval for the regression coefficient. 4 If mother s weight at last menstrual period is 170 pounds, what would be your estimate for the birthweight of her baby? 3 We want to predict percent body fat using the measurement for neck circumference 1 Use the bodyfat data set to build a simple linear regression model for percent body fat (bodyfat), where neck circumference (neck) is the predictor. In this data set, neck is measured in centimeters. 2 What is the expected (mean) increase in the percent body fat corresponding to one unit increase in neck circumference. 3 Create a new variable, neck.in, whose values are neck circumference in inches. Rebuild the regression model for percent body fat using neck.in as the predictor. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 33 / 33