Regression Analysis and Linear Regression Models
|
|
- Brianna Powell
- 5 years ago
- Views:
Transcription
1 Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
2 Relationship between numerical variable Investigate possible linear relationship between two numerical variables. The Pearson s correlation coefficient Quantify the strenght and direction of a linear relationship Given two numerical variable X and Y N i=1 ρ = (x i µ x )(y i µ Y ) Nσ x σ y where µ x and µ y are the population means of X and Y, σ x and σ y the population standard deviations and N is the population size. It s a number in [-1,1] The stronger the relationship the closer ρ to 1 The sign of ρ indicates the direction of the relationship (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
3 Relationship between numerical variable We cannot measure ρ directly; we do not have access to the whole population Estimation of ρ from the data Given n pairs of values (x 1, y 1 ),..., (x n, y n) of the observed data The estimation r of rho is: N i=1 r = (x i x)(y i ȳ) (n 1)s x s y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
4 Relationship between numerical variable Examples with real data Example With the bodyweight dataset Examine the relationship between percent body fat (response) and abdomen circumference(explanatory variable) Dataset can be found at cor(bw[,c("abdomen2", "bodyfat")]) ## abdomen2 bodyfat ## abdomen ## bodyfat ## [1] 252 Examine the relationship between height and percent body fat cor(bw[,c("bodyfat","height")]) ## bodyfat height ## bodyfat ## height (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
5 Relationship between numerical variable Correlation tests Recall: When ρ is close to 0 means that the two variables are not related Or they are related BUT the relationship is not linear Be cautious to intepret rho close to 0 as no relationship!! Evaluate statistical significance of rho R H 0 : ρ = 0 T = (1 R 2 )/(n 2) R is the sample correlation coefficient and n the sample size If null hypothesis is true, the T distribution is the t-distribution with n 2 degree of freedom Observed statistic t = H 1 : ρ 0 r (1 r 2 )/(n 2) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
6 Example on correlation test Example With the bodyweight dataset Examine the relationship between height and percent body fat Compute the t-score from the sample aa <- cor(bw[,c("bodyfat","height")]) t <- aa[1,2] / (sqrt((1-aa[1,2]**2)/(nrow(bw) - 2))) Testing the alternative hypothesis H 1 : ρ 0 based on a t-distribution with =250 degree of freedom Compute the p-value as p obs = 2P(T 1.42) 2 * pt(t,df=nrow(bw)-2) ## [1] With the commonly used significance levels (0.01, 0.05, 0.1) we reject the alternative hypothesis Therefore we cannot conclude the two variables are linearly correlated (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
7 Example on correlation test Example With the bodyweight dataset. Testing the alternative hypothesis H 1 : ρ 0 Examine the relationship between height and percent body fat cor.test(bw$bodyfat, bw$height, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$height ## t = , df = 250, p-value = ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## cor ## Examine the relationship between percent body fat and abdomen circumference cor.test(bw$bodyfat, bw$abdomen2, alternative="two.sided") ## ## Pearson's product-moment correlation ## ## data: bw$bodyfat and bw$abdomen2 ## t = , df = 250, p-value < 2.2e-16 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## ## sample estimates: ## cor ## (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
8 Linear regression models Aim: Investigate the relationships between numerical variables Examining linear relationships between a response variable and one or more explanatory variable Testing the hypothesis regarding relationships between one or more explanatory variable and a response variable Predicting unknown values of the response variable using one or more predictors Denote with X the set of explanatory variables Denote with Y the response variables Try to fit the equation: Y = f (X) + ɛ Defining that f (X) is linear: Y = Xβ + ɛ thus we can estimate β minimizing the prediction error: ˆβ = (X T X) 1 X T y (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
9 Linear regression models One binary Explanatory variable X: is a binary variable 0,1 Y: is a numerical variable X = Example Investigate relationship between sodium chloride intake and blood pressure among elderly people 25 people (= 25) 15 of them (0.6 of our sample) keep a low sodium chloride diet (X = 0) 10 of them (0.4 of our sample) keep a high sodium chloride diet (X = 1) Measure of the systolic blood pressure (Y ) For each individual i we have a pair of observation (x i, y i ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
10 Example Dotplot of systolic blood pressure for each diet group BP 135 a For each group compute the mean estimation of blood pressure (red point in the graph) The sample mean provides a reasonable point estimate if a new sample arrives For group X = 0: ŷ x=0 = mean(y x=0 ) For group X = 1: ŷ x=1 = mean(y x=1 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
11 Example Example We can compute ŷ x=0 and ŷ x=1 : ## 0 1 ## Compute the line parameters connecting the two points: a <- mm["0"] b <- (mm["1"] - mm["0"]) / 1 ## [1] ## [1] We can draw the black line connecting the two means In general The regression line is defined as: ŷ = a + bx that captures the linear relationship between response variable and explanatory variables The slope b is interpreted as our estimate of the expected (average) change in response variable associated to unit increase in the value of the explanatory variable (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
12 Linear regression models Prediction and Errors The regression line Given the regression line: Define the prediction for each sample: ŷ i = a + bx i Define the residuals for each sample: e i = y i ŷ i Thus the real y i value will be: y i = ŷ i + e i = a + bx i + e i (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
13 Linear regression models Prediction and Errors Example With the same example on blood pressure compute the prediction for each group: Predictions x i = 0 ŷ i = a = x i = 1 ŷ i = a + b = Errors x 4 = 0 The true value is y 4 = the error is e 4 = y 4 ŷ 4 = = 1.91 x 25 = 1 The true value is y 25 = the error is e 25 = y 25 ŷ 25 = = 4.6 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
14 Linear regression models Measure discrepancy Measure the discrepacy: Residual Sum of Squares (RSS) Measure the distance between predicted values and true values Depend on the resisual and on sample size n For the mean as predictor: e i i = 0 RSS = n i e 2 i We decide to draw the the line connecting the mean between two groups We can draw almost any line between the two groups The line connecting the means is the one which give the minimum RSS which is called the least-squares regression line (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
15 Generalization Generalized to the whole population The linear relationship between Y and X in the entire population: Y = α + βx + ɛ This is defined as the linear regression model α and β are the regression parameters β is the regression coefficient fitting is the process of finding the regression parameters Confidence Interval for the regression coefficient Standard Error: Confidence Intervals: SE b = RSS/(n 2) i (x i x) 2 [b t crit SE b, b + t crit SE b ] where t crit depends on the level c of confidence (i.e.1.96 for c = 0.95) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
16 Hypothesis testing Linear regression models can be used to test hypothesis regarding possible relationships between response variable and explanatory variable null hypothesis H 0 : β = 0 no linear relationship alternative hypothesis H 0 : β 0, p obs = 2 P(T t ) t t = b SE b Example SE b = for b = 6.25 t <- b/1.593 p.value <- pt(t,df=(nrow(saltbp)-2), lower.tail=false) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
17 Exercise I With the previous dataset saltbp try to estimate coefficient β 0 and β 1 from the matrix X and y using the least square regression line. Recall the definition of the X matrix when β 0 should be estimated For each sample compute the prediction ŷ i and the error e i. Compute also the RSS, the SE for this model and the C.I. at 90% of confidence. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
18 Linear regression models Example Example Use the lm function to predict the least square regression line aa <- lm(bp~saltlevel,data=saltbp) summary(aa) ## ## Call: ## lm(formula = BP ~ saltlevel, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## saltlevel *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.9 on 23 degrees of freedom ## Multiple R-squared: 0.402, Adjusted R-squared: ## F-statistic: 15.4 on 1 and 23 DF, p-value: (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
19 Linear Regression Models One Numerical Explanatory variable X: is a numerical variable Y: is a numerical variable Example Investigate relationship between sodium chloride intake and blood pressure among elderly people X Daily salt intake (numerical values) Y Blood Pressure (numerical values) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
20 Explore the data first I Look at the scatter plot of the data BP salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
21 Explore the data first II BP salt (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
22 Model on one numerical variable Model definition model ŷ i = a + bx i error e i = y i ŷ i n RSS e 2 i i We can estimate: slope b given by the r coefficient: intercept a given by b = r sy s x where s x and s y are the sample variances where x and ȳ are the sample means a = ȳ b x (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
23 Example on the blood data set Compute manually the regression model: sy <- sd(saltbp$bp) ## sd of y sx <- sd(saltbp$salt) ## sd of x r <- cor(saltbp$bp, saltbp$salt) ## Correlation coefficient b <- r * (sy/sx) ## The slope a <- mean(saltbp$bp) - b*mean(saltbp$salt) ## The intercept sy;sx;r;b;a ## [1] ## [1] ## [1] ## [1] ## [1] (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
24 Example on blood data set Compute the prediction value for a sample in the dataset xi <- saltbp$salt[10] ## Extract a sample yi <- saltbp$bp[10] yhi <- a + b * xi ## Compute the prediction for the sample ei <- yi - yhi ## Compute the error yhi; ei ## [1] ## [1] yhi <- a + b * saltbp$salt ei <- saltbp$bp - yhi RSS <- sum(ei^2) SE <- sqrt(rss/(25-2))/sqrt(sum((saltbp$salt - mean(saltbp$salt))^2)) sqrt(rss/(25-2)); SE ## [1] ## [1] (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
25 Let R working for us!! Compute the model using the least regression model in R mymod <- lm(bp~salt, data=saltbp) summary(mymod) ## ## Call: ## lm(formula = BP ~ salt, data = saltbp) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## salt e-07 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.75 on 23 degrees of freedom ## Multiple R-squared: 0.704, Adjusted R-squared: ## F-statistic: 54.6 on 1 and 23 DF, p-value: 1.63e-07 ## ## Manually Computed: ## Residual Std Error ## SE ## pvalue e-07 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
26 Analyze the output mymod$coefficient ## Parameters of the linear model ## (Intercept) salt ## mymod$residuals ## error for each sample ## ## ## ## ## ## ## 25 ## mymod$fitted.values ## Predicted values for the response variable ## ## ## ## ## ## (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
27 Plots I plot(mymod, which=1:2) Residuals vs Fitted Residuals Fitted values lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
28 Plots II Normal Q Q Standardized residuals Theoretical Quantiles lm(bp ~ salt) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
29 Histogram of the residuals hist(mymod$residuals, col="grey") Histogram of mymod$residuals Frequency mymod$residuals (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
30 Fitted values True values vs predicted values plot(bp~salt, data=saltbp) points(saltbp$salt, mymod$fitted.values, pch=20) salt BP (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
31 Goodness of Fit Definition: R 2 Measures how well the regression model fits the observed data It depends by the RSS It quantifies the discrepancies between observed data and the regression line The higher the RSS the higher the discrepancy n RSS lack of fit e 2 i i n TSS Total variation in the response variable (y i i ȳ) 2 R 2 Total variation explained by the model: R 2 = 1 RSS TSS For simple regression line with one variable R 2 = r Pearson s correlation (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
32 Assumtpions Linear model regression assumptions 1 Linearity: we assume the relationship between X and Y is linear! 2 Independence: observations should be independent (random sampling) 3 Constant Variance and Normality: Y should be distributed normally. In general we check for the normality of ɛ given the relationship between Y and ɛ. In particular ɛ N (0, σ 2 ) (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
33 Exercise I 1 We want to examine the relationship between body temperature Y and heart rate X. Further, we would like to use heart rate to predict the body temperature. 1 Use the BodyTemperature.txt data set to build a simple linear regression model for body temperature using heart rate as the predictor. 2 Interpret the estimate of regression coefficient and examine its statistical significance. 3 Find the 95% confidence interval for the regression coefficient. 4 Find the value of R 2 and show that it is equal to sample correlation coefficient 5 Create simple diagnostic plots for your model and identify possible outliers. 6 If someone s heart rate is 75, what would be your estimate of this person s body temperature? 2 We would like to predict a baby s birthweight (bwt) before she is born using her mother s weight at last menstrual period (lwt). 1 Use the birthwt data set to build a simple linear regression model, where bwt is the response variable and lwt is the predictor. 2 Interpret your estimate of regression coefficient and examine its statistical significance 3 Find the 90% confidence interval for the regression coefficient. 4 If mother s weight at last menstrual period is 170 pounds, what would be your estimate for the birthweight of her baby? 3 We want to predict percent body fat using the measurement for neck circumference 1 Use the bodyfat data set to build a simple linear regression model for percent body fat (bodyfat), where neck circumference (neck) is the predictor. In this data set, neck is measured in centimeters. 2 What is the expected (mean) increase in the percent body fat corresponding to one unit increase in neck circumference. 3 Create a new variable, neck.in, whose values are neck circumference in inches. Rebuild the regression model for percent body fat using neck.in as the predictor. (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, / 33
Statistical Tests for Variable Discrimination
Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationCorrelation and Regression
How are X and Y Related Correlation and Regression Causation (regression) p(y X=) Correlation p(y=, X=) http://kcd.com/552/ Correlation Can be Induced b Man Mechanisms # of Pups 1 2 3 4 5 6 7 8 Inbreeding
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationZ-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationBivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017
Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More information5.5 Regression Estimation
5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationSolution to Bonus Questions
Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationBivariate (Simple) Regression Analysis
Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationrange: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%
------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do
More informationChapter 7: Linear regression
Chapter 7: Linear regression Objective (1) Learn how to model association bet. 2 variables using a straight line (called "linear regression"). (2) Learn to assess the quality of regression models. (3)
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationSYS 6021 Linear Statistical Models
SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationSimulating power in practice
Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationSLStats.notebook. January 12, Statistics:
Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,
More informationStatistical foundations of Machine Learning INFO-F-422 TP: Linear Regression
Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12, 2013 1 1 Repetition 1.1 Estimation using the mean square error Assume to have
More informationThe Bootstrap and Jackknife
The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter
More informationSimulation and resampling analysis in R
Simulation and resampling analysis in R Author: Nicholas G Reich, Jeff Goldsmith, Andrea S Foulkes, Gregory Matthews This material is part of the statsteachr project Made available under the Creative Commons
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationMultivariate Analysis Multivariate Calibration part 2
Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationUnit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users
BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu.83j / 6.78J / ESD.63J Control of Manufacturing Processes (SMA 633) Spring 8 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationContents Cont Hypothesis testing
Lecture 5 STATS/CME 195 Contents Hypothesis testing Hypothesis testing Exploratory vs. confirmatory data analysis Two approaches of statistics to analyze data sets: Exploratory: use plotting, transformations
More informationIntroductory Applied Statistics: A Variable Approach TI Manual
Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationSubset Selection in Multiple Regression
Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that
More informationThe Truth behind PGA Tour Player Scores
The Truth behind PGA Tour Player Scores Sukhyun Sean Park, Dong Kyun Kim, Ilsung Lee May 7, 2016 Abstract The main aim of this project is to analyze the variation in a dataset that is obtained from the
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationIntroduction to Data Science
Introduction to Data Science CS 491, DES 430, IE 444, ME 444, MKTG 477 UIC Innovation Center Fall 2017 and Spring 2018 Instructors: Charles Frisbie, Marco Susani, Michael Scott and Ugo Buy Author: Ugo
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationError Analysis, Statistics and Graphing
Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your
More informationCDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed
More informationOne Factor Experiments
One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationThe linear mixed model: modeling hierarchical and longitudinal data
The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical
More informationSection 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered
More informationLecture 20: Outliers and Influential Points
Lecture 20: Outliers and Influential Points An outlier is a point with a large residual. An influential point is a point that has a large impact on the regression. Surprisingly, these are not the same
More informationA straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0
Lines and linear functions: a refresher A straight line is the graph of a linear equation. These equations come in several forms, for example: (i) ax + by = c, (ii) y = y 0 + m(x x 0 ), (iii) y = mx +
More informationSelected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.
Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data
More informationIn this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:
UPPSALA UNIVERSITY Department of Mathematics Måns Thulin, thulin@math.uu.se Analysis of regression and variance Fall 2011 COMPUTER EXERCISE 2: One-way ANOVA In this computer exercise we will work with
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationHomework set 4 - Solutions
Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate
More informationData Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology
❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationRandom coefficients models
enote 9 1 enote 9 Random coefficients models enote 9 INDHOLD 2 Indhold 9 Random coefficients models 1 9.1 Introduction.................................... 2 9.2 Example: Constructed data...........................
More informationStat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares
Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Example 1: (Weighted Least Squares) A health researcher is interested in studying the relationship between diastolic blood pressure (bp)
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationVCEasy VISUAL FURTHER MATHS. Overview
VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that
More information14.2 The Regression Equation
14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationSection E. Measuring the Strength of A Linear Association
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationExam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x.
Exam 4 1-5. Normal Population. The scatter plot show below is a random sample from a 2D normal population. The bell curves and dark lines refer to the population. The sample Least Squares Line (shorter)
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationModel selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41
1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More information