Regression Analysis and Linear Regression Models

Size: px
Start display at page:

Download "Regression Analysis and Linear Regression Models"

Transcription

1 Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

2 Relationship between numerical variable Investigate possible linear relationship between two numerical variables. The Pearson s correlation coefficient Quantify the strenght and direction of a linear relationship Given two numerical variable X and Y N i=1 ρ = (x i µ x )(y i µ Y ) Nσ x σ y where µ x and µ y are the population means of X and Y, σ x and σ y the population standard deviations and N is the population size. It s a number in [-1,1] The stronger the relationship the closer ρ to 1 The sign of ρ indicates the direction of the relationship (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

3 Relationship between numerical variable We cannot measure ρ directly; we do not have access to the whole population Estimation of ρ from the data Given n pairs of values (x 1, y 1 ),..., (x n, y n) of the observed data The estimation r of rho is: N i=1 r = (x i x)(y i ȳ) (n 1)s x s y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

4 Relationship between numerical variable Examples with real data Example With the bodyweight dataset Examine the relationship between percent body fat (response) and abdomen circumference(explanatory variable) Dataset can be found at cor(bw[,c("abdomen2", "bodyfat")]) ## abdomen2 bodyfat ## abdomen ## bodyfat ## [1] 252 Examine the relationship between height and percent body fat cor(bw[,c("bodyfat","height")]) ## bodyfat height ## bodyfat ## height (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

5 Relationship between numerical variable Correlation tests Recall: When ρ is close to 0 means that the two variables are not related Or they are related BUT the relationship is not linear Be cautious to intepret rho close to 0 as no relationship!! Evaluate statistical significance of rho R H 0 : ρ = 0 T = (1 R 2 )/(n 2) R is the sample correlation coefficient and n the sample size If null hypothesis is true, the T distribution is the t-distribution with n 2 degree of freedom Observed statistic t = H 1 : ρ 0 r (1 r 2 )/(n 2) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

6 Example on correlation test Example With the bodyweight dataset Examine the relationship between height and percent body fat Compute the t-score from the sample aa <- cor(bw[,c("bodyfat","height")]) t <- aa[1,2] / (sqrt((1-aa[1,2]**2)/(nrow(bw) - 2))) Testing the alternative hypothesis H 1 : ρ 0 based on a t-distribution with =250 degree of freedom Compute the p-value as p obs = 2P(T 1.42) 2 * pt(t,df=nrow(bw)-2) ## [1] With the commonly used significance levels (0.01, 0.05, 0.1) we reject the alternative hypothesis Therefore we cannot conclude the two variables are linearly correlated (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

7 Example on correlation test Example With the bodyweight dataset. Testing the alternative hypothesis H 1 : ρ 0 Examine the relationship between height and percent body fat cor.test(bw$bodyfat, bw$height, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$height ## t = , df = 250, p-value = ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## cor ## Examine the relationship between percent body fat and abdomen circumference cor.test(bw$bodyfat, bw$abdomen2, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$abdomen2 ## t = , df = 250, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## cor ## (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

8 Linear regression models Aim: Investigate the relationships between numerical variables Examining linear relationships between a response variable and one or more explanatory variable Testing the hypothesis regarding relationships between one or more explanatory variable and a response variable Predicting unknown values of the response variable using one or more predictors Denote with X the set of explanatory variables Denote with Y the response variables Try to fit the equation: Y = f (X) + ɛ Defining that f (X) is linear: Y = Xβ + ɛ thus we can estimate β minimizing the prediction error: ˆβ = (X T X) 1 X T y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

9 Linear regression models One binary Explanatory variable X: is a binary variable 0,1 Y: is a numerical variable X = Example Investigate relationship between sodium chloride intake and blood pressure among elderly people 25 people (= 25) 15 of them (0.6 of our sample) keep a low sodium chloride diet (X = 0) 10 of them (0.4 of our sample) keep a high sodium chloride diet (X = 1) Measure of the systolic blood pressure (Y ) For each individual i we have a pair of observation (x i, y i ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

10 Example Dotplot of systolic blood pressure for each diet group BP 135 a For each group compute the mean estimation of blood pressure (red point in the graph) The sample mean provides a reasonable point estimate if a new sample arrives For group X = 0: ŷ x=0 = mean(y x=0 ) For group X = 1: ŷ x=1 = mean(y x=1 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

11 Example Example We can compute ŷ x=0 and ŷ x=1 : ## 0 1 ## Compute the line parameters connecting the two points: a <- mm["0"] b <- (mm["1"] - mm["0"]) / 1 ## [1] ## [1] We can draw the black line connecting the two means In general The regression line is defined as: ŷ = a + bx that captures the linear relationship between response variable and explanatory variables The slope b is interpreted as our estimate of the expected (average) change in response variable associated to unit increase in the value of the explanatory variable (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

12 Linear regression models Prediction and Errors The regression line Given the regression line: Define the prediction for each sample: ŷ i = a + bx i Define the residuals for each sample: e i = y i ŷ i Thus the real y i value will be: y i = ŷ i + e i = a + bx i + e i (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

13 Linear regression models Prediction and Errors Example With the same example on blood pressure compute the prediction for each group: Predictions x i = 0 ŷ i = a = x i = 1 ŷ i = a + b = Errors x 4 = 0 The true value is y 4 = the error is e 4 = y 4 ŷ 4 = = 1.91 x 25 = 1 The true value is y 25 = the error is e 25 = y 25 ŷ 25 = = 4.6 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

14 Linear regression models Measure discrepancy Measure the discrepacy: Residual Sum of Squares (RSS) Measure the distance between predicted values and true values Depend on the resisual and on sample size n For the mean as predictor: e i i = 0 RSS = n i e 2 i We decide to draw the the line connecting the mean between two groups We can draw almost any line between the two groups The line connecting the means is the one which give the minimum RSS which is called the least-squares regression line (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

15 Generalization Generalized to the whole population The linear relationship between Y and X in the entire population: Y = α + βx + ɛ This is defined as the linear regression model α and β are the regression parameters β is the regression coefficient fitting is the process of finding the regression parameters Confidence Interval for the regression coefficient Standard Error: Confidence Intervals: SE b = RSS/(n 2) i (x i x) 2 [b t crit SE b, b + t crit SE b ] where t crit depends on the level c of confidence (i.e.1.96 for c = 0.95) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

16 Hypothesis testing Linear regression models can be used to test hypothesis regarding possible relationships between response variable and explanatory variable null hypothesis H 0 : β = 0 no linear relationship alternative hypothesis H 0 : β 0, p obs = 2 P(T t ) t t = b SE b Example SE b = for b = 6.25 t <- b/1.593 p.value <- pt(t,df=(nrow(saltbp)-2), lower.tail=false) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

17 Exercise I With the previous dataset saltbp try to estimate coefficient β 0 and β 1 from the matrix X and y using the least square regression line. Recall the definition of the X matrix when β 0 should be estimated For each sample compute the prediction ŷ i and the error e i. Compute also the RSS, the SE for this model and the C.I. at 90% of confidence. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

18 Linear regression models Example Example Use the lm function to predict the least square regression line aa <- lm(bp~saltlevel,data=saltbp) summary(aa) ## ## Call: ## lm(formula = BP ~ saltlevel, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## saltlevel *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.9 on 23 degrees of freedom ## Multiple R-squared: 0.402, Adjusted R-squared: ## F-statistic: 15.4 on 1 and 23 DF, p-value: (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

19 Linear Regression Models One Numerical Explanatory variable X: is a numerical variable Y: is a numerical variable Example Investigate relationship between sodium chloride intake and blood pressure among elderly people X Daily salt intake (numerical values) Y Blood Pressure (numerical values) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

20 Explore the data first I Look at the scatter plot of the data BP salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

21 Explore the data first II BP salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

22 Model on one numerical variable Model definition model ŷ i = a + bx i error e i = y i ŷ i n RSS e 2 i i We can estimate: slope b given by the r coefficient: intercept a given by b = r sy s x where s x and s y are the sample variances where x and ȳ are the sample means a = ȳ b x (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

23 Example on the blood data set Compute manually the regression model: sy <- sd(saltbp$bp) ## sd of y sx <- sd(saltbp$salt) ## sd of x r <- cor(saltbp$bp, saltbp$salt) ## Correlation coefficient b <- r * (sy/sx) ## The slope a <- mean(saltbp$bp) - b*mean(saltbp$salt) ## The intercept sy;sx;r;b;a ## [1] ## [1] ## [1] ## [1] ## [1] (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

24 Example on blood data set Compute the prediction value for a sample in the dataset xi <- saltbp$salt[10] ## Extract a sample yi <- saltbp$bp[10] yhi <- a + b * xi ## Compute the prediction for the sample ei <- yi - yhi ## Compute the error yhi; ei ## [1] ## [1] yhi <- a + b * saltbp$salt ei <- saltbp$bp - yhi RSS <- sum(ei^2) SE <- sqrt(rss/(25-2))/sqrt(sum((saltbp$salt - mean(saltbp$salt))^2)) sqrt(rss/(25-2)); SE ## [1] ## [1] (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

25 Let R working for us!! Compute the model using the least regression model in R mymod <- lm(bp~salt, data=saltbp) summary(mymod) ## ## Call: ## lm(formula = BP ~ salt, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## salt e-07 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.75 on 23 degrees of freedom ## Multiple R-squared: 0.704, Adjusted R-squared: ## F-statistic: 54.6 on 1 and 23 DF, p-value: 1.63e-07 ## ## Manually Computed: ## Residual Std Error ## SE ## pvalue e-07 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

26 Analyze the output mymod$coefficient ## Parameters of the linear model ## (Intercept) salt ## mymod$residuals ## error for each sample ## ## ## ## ## ## ## 25 ## mymod$fitted.values ## Predicted values for the response variable ## ## ## ## ## ## (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

27 Plots I plot(mymod, which=1:2) Residuals vs Fitted Residuals Fitted values lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

28 Plots II Normal Q Q Standardized residuals Theoretical Quantiles lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

29 Histogram of the residuals hist(mymod$residuals, col="grey") Histogram of mymod$residuals Frequency mymod$residuals (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

30 Fitted values True values vs predicted values plot(bp~salt, data=saltbp) points(saltbp$salt, mymod$fitted.values, pch=20) salt BP (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

31 Goodness of Fit Definition: R 2 Measures how well the regression model fits the observed data It depends by the RSS It quantifies the discrepancies between observed data and the regression line The higher the RSS the higher the discrepancy n RSS lack of fit e 2 i i n TSS Total variation in the response variable (y i i ȳ) 2 R 2 Total variation explained by the model: R 2 = 1 RSS TSS For simple regression line with one variable R 2 = r Pearson s correlation (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

32 Assumtpions Linear model regression assumptions 1 Linearity: we assume the relationship between X and Y is linear! 2 Independence: observations should be independent (random sampling) 3 Constant Variance and Normality: Y should be distributed normally. In general we check for the normality of ɛ given the relationship between Y and ɛ. In particular ɛ N (0, σ 2 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

33 Exercise I 1 We want to examine the relationship between body temperature Y and heart rate X. Further, we would like to use heart rate to predict the body temperature. 1 Use the BodyTemperature.txt data set to build a simple linear regression model for body temperature using heart rate as the predictor. 2 Interpret the estimate of regression coefficient and examine its statistical significance. 3 Find the 95% confidence interval for the regression coefficient. 4 Find the value of R 2 and show that it is equal to sample correlation coefficient 5 Create simple diagnostic plots for your model and identify possible outliers. 6 If someone s heart rate is 75, what would be your estimate of this person s body temperature? 2 We would like to predict a baby s birthweight (bwt) before she is born using her mother s weight at last menstrual period (lwt). 1 Use the birthwt data set to build a simple linear regression model, where bwt is the response variable and lwt is the predictor. 2 Interpret your estimate of regression coefficient and examine its statistical significance 3 Find the 90% confidence interval for the regression coefficient. 4 If mother s weight at last menstrual period is 170 pounds, what would be your estimate for the birthweight of her baby? 3 We want to predict percent body fat using the measurement for neck circumference 1 Use the bodyfat data set to build a simple linear regression model for percent body fat (bodyfat), where neck circumference (neck) is the predictor. In this data set, neck is measured in centimeters. 2 What is the expected (mean) increase in the percent body fat corresponding to one unit increase in neck circumference. 3 Create a new variable, neck.in, whose values are neck circumference in inches. Rebuild the regression model for percent body fat using neck.in as the predictor. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33

Statistical Tests for Variable Discrimination

Statistical Tests for Variable Discrimination Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Correlation and Regression

Correlation and Regression How are X and Y Related Correlation and Regression Causation (regression) p(y X=) Correlation p(y=, X=) http://kcd.com/552/ Correlation Can be Induced b Man Mechanisms # of Pups 1 2 3 4 5 6 7 8 Inbreeding

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables: Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Solution to Bonus Questions

Solution to Bonus Questions Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90% ------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do

More information

Chapter 7: Linear regression

Chapter 7: Linear regression Chapter 7: Linear regression Objective (1) Learn how to model association bet. 2 variables using a straight line (called "linear regression"). (2) Learn to assess the quality of regression models. (3)

More information

Introduction to hypothesis testing

Introduction to hypothesis testing Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

Simulating power in practice

Simulating power in practice Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

SLStats.notebook. January 12, Statistics:

SLStats.notebook. January 12, Statistics: Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,

More information

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12, 2013 1 1 Repetition 1.1 Estimation using the mean square error Assume to have

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

Simulation and resampling analysis in R

Simulation and resampling analysis in R Simulation and resampling analysis in R Author: Nicholas G Reich, Jeff Goldsmith, Andrea S Foulkes, Gregory Matthews This material is part of the statsteachr project Made available under the Creative Commons

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Contents Cont Hypothesis testing

Contents Cont Hypothesis testing Lecture 5 STATS/CME 195 Contents Hypothesis testing Hypothesis testing Exploratory vs. confirmatory data analysis Two approaches of statistics to analyze data sets: Exploratory: use plotting, transformations

More information

Introductory Applied Statistics: A Variable Approach TI Manual

Introductory Applied Statistics: A Variable Approach TI Manual Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

The Truth behind PGA Tour Player Scores

The Truth behind PGA Tour Player Scores The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science CS 491, DES 430, IE 444, ME 444, MKTG 477 UIC Innovation Center Fall 2017 and Spring 2018 Instructors: Charles Frisbie, Marco Susani, Michael Scott and Ugo Buy Author: Ugo

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

One Factor Experiments

One Factor Experiments One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered

More information

Lecture 20: Outliers and Influential Points

Lecture 20: Outliers and Influential Points Lecture 20: Outliers and Influential Points An outlier is a point with a large residual. An influential point is a point that has a large impact on the regression. Surprisingly, these are not the same

More information

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0 Lines and linear functions: a refresher A straight line is the graph of a linear equation. These equations come in several forms, for example: (i) ax + by = c, (ii) y = y 0 + m(x x 0 ), (iii) y = mx +

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics: UPPSALA UNIVERSITY Department of Mathematics Måns Thulin, thulin@math.uu.se Analysis of regression and variance Fall 2011 COMPUTER EXERCISE 2: One-way ANOVA In this computer exercise we will work with

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

CH5: CORR & SIMPLE LINEAR REFRESSION ======================================= STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

More information

Homework set 4 - Solutions

Homework set 4 - Solutions Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Random coefficients models

Random coefficients models enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................

More information

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Example 1: (Weighted Least Squares) A health researcher is interested in studying the relationship between diastolic blood pressure (bp)

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

14.2 The Regression Equation

14.2 The Regression Equation 14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Section E. Measuring the Strength of A Linear Association

Section E. Measuring the Strength of A Linear Association This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Exam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x.

Exam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x. Exam 4 1-5. Normal Population. The scatter plot show below is a random sample from a 2D normal population. The bell curves and dark lines refer to the population. The sample Least Squares Line (shorter)

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41 1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider

More information