Exercise 2.23 Villanova MAT 8406 September 7, 2015
|
|
- Calvin Black
- 5 years ago
- Views:
Transcription
1 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations are used to fit this model. Generate 500 samples of 20 observations, drawing one observation for each level of x = 1, 1.5, 2...., 10 for each sample. R makes this easy because its normal random number generator, rnorm, does not require fixed values of the parameters (the mean and standard deviation): you may vary them! Therefore you can generate one dataset according to the preceding instructions by means of remarkably terse, efficient commands: sigma.2 <- 16 beta <- c(50, 10) x <- seq(1, 10, by=1/2) y <- rnorm(length(x), beta[1] + beta[2]*x, sigma.2) Before proceeding, let s check that this is correct and matches what is intended in the problem. Always draw a picture: plot(x, y, main="first Try at Sampling") First Try at Sampling y x Does it look correct? Is this a plot of 20 points that could be described by the model y NID( x, 16)? A quick check is afforded by fitting the OLS line and reading the summary output: 1
2 fit <- lm(y ~ x) summary(fit) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-07 *** x e-10 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 17 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 17 DF, p-value: 3.25e-10 Scan it carefully, looking for evidence of every quantitative value that was used: the dataset size of 20, the model y = x, and the variance of 16 in the errors. There are two salient problems that need to be addressed. (It s good we did this check before proceeding with extensive simulation!) 1. The value of 17 for DF (degrees of freedom) is one less than we would expect. Indeed, x has only 19 elements! (length(x)) [1] 19 Let s just assume statisticians can t count :-) and presume the question really is calling for generating samples of size 19. (A quick scan through the rest of the question suggests none of it relies fundamentally on the sample size being 20.) 2. The residual standard error of 11 suggests the error variance (its square) is around 121, which is far larger than the intended value of 16. This kind of mistake is common but insidious: the textbook uses a different parameterization of Normal distributions than the software does. R uses the mean and standard deviation while the text uses the mean and variance. (Still other sources might use the precision, which is the reciprocal of the variance, or even the logarithm of the variance for the second parameter.) This problem is particularly acute with other distributions, like the Gamma distributions, for which there is no clear convention for the parameters. It is crucial to understand what the parameters mean so that you can perform calculations correctly! There may be additional problems: the intercept of 46.4 and the slope of differ somewhat from the intended intercept of 50 and slope of 10. However, they re of the right order of magnitude, so let s hope the discrepancies are due to randomness but we ll keep an eye on this issue and perform a fuller check later. Fixing these problems is easy: (1) needs no change, while (2) requires us to convert the variance of 16 into its square root: 2
3 y <- rnorm(length(x), beta[1] + beta[2]*x, sqrt(sigma.2)) plot(x, y, main="fixed-up Sample") # Always check! Fixed up Sample y x (You should re-run the lm and summary code to verify that you re getting what you expected.) Step 2: Do the Calculations We are asked to generate 500 samples according to this model. Now that we have written and tested the commands to generate one sample, there are many (easy) ways to generate 500 samples. Because 500 is a relatively small number and each sample is small and requires relatively little calculation, we can afford to be inefficient. Rather than extracting all the information requested in parts (a) - (d) of the question, let s just save all the samples and all the fits. We can then post-process them at our leisure. Here s the command: sim <- replicate(3, { y <- rnorm(length(x), beta[1] + beta[2]*x, sqrt(sigma.2)) lm(y ~ x) To get started, the intended count of 500 has been replaced by 3. That s enough to practice with yet small enough to avoid being overwhelmed by managing 500 different (complex) fits. One step at a time! The result is an array of three (or later, 500) objects: each of them is the output of lm in the last line. It is an R idiosyncrasy that each object will be considered to be indexed by a second coordinate. For instance, the result of applying lm to the first sample is contained in sim[, 1], not sim[1, ]. You can confirm this by inspecting sim (either in the Global Environment pane in RStudio or by computing dim(sim)). 3
4 Question (a) a. For each sample compute the least-squares estimates of the slope and intercept. Construct histograms of the sample values of ˆβ 0 and ˆβ 1. Discuss the shape of these histograms. To apply some procedure such as extracting the least-squares estimates of the coefficients to an array like sim, you will usually use one of the *apply functions in R: often apply, lapply, or sapply, with the first being appropriate for looping over rows or columns of arrays. In this case we wish to treat sim as an array of columns by looping over its second index (number 2). The coefficients of the fit in each column are extracted using the coef function: beta.hat <- apply(sim, 2, coef) The output will have one column for iteration in the loop. Because coef returns first the intercept and then the slope, the intercepts will be found in the first row of beta.hat and the slopes in its second row. Let s look: print(beta.hat) [,1] [,2] [,3] (Intercept) x That s looking good! The first row is actually named (Intercept) and the second row, x (because x was the name of the regressor in the call to lm). We may refer to the rows by name. This is usually a good idea because it avoids mistakes made when we miscount the number of a row in which we are interested. Thus, for instance, the histograms can be obtained with two calls to hist, one for each row. Since a histogram of just three values won t reveal much, first we go back and re-do the simulation with the full 500 values. sim <- replicate(500, { y <- rnorm(length(x), beta[1] + beta[2]*x, sqrt(sigma.2)) lm(y ~ x) beta.hat <- apply(sim, 2, coef) par(mfrow=c(1,2)) # Draws side-by-side histograms hist(beta.hat["(intercept)", ], freq=false, main="", xlab=expression(hat(beta)[0])) hist(beta.hat["x", ], freq=false, main="", xlab=expression(hat(beta)[1])) 4
5 Density Density β^ β^1 Discuss the shape of these histograms should include quantitative evaluation of their centers and spreads, along with either quantitative or qualitative assessment of other aspects of a distribution, such as its skewness, heaviness of tails, presence of outliers, peakedness, numbers of modes, etc. If you have reason to suppose the data shown by these histograms would look approximately like some well-known distributional shape (such as Normal, Student t, etc) then compare them to that shape as a reference. Question (b) For each sample, compute an estimate of E(y x = 5). Construct a histogram of the estimates you obtained. Discuss the shape of the histogram. The preferred way in R to estimate this expectation is with the predict function. It works in a strangely restricted way: you must supply it a data frame of the values of x in which you are interested. To test, note that you still have an object fit lying around from your initial testing. Let s try out predict on it: predict(object=fit, newdata=data.frame(x=5)) fit is the name of the object containing the lm output (we chose it) and x is the name of the regressor variable used by lm. The output value of 106 is reasonably close to the model value = 100. Having successfully done the calculation with one fit, we are ready to apply it to the entire simulation. As before, all 500 values will be stored in a variable which is then fed to hist for visualization as a histogram. y.hat.0 <- apply(sim, 2, function(f) { class(f) <- "lm" predict(f, newdata=data.frame(x=5)) 5
6 As you can see, this is fussy: we are obliged to define a function on the fly that (re-)informs R that each column of sim really is the output of lm just so we can apply predict. (R tends to be inconsistent: even core procedures like lm, coef, and predict do not work together in a consistent manner. A simpler approach is to use your knowledge of least squares. The predicted value at x = 5 is given by the estimated coefficients, which we already have computed (and stored as rows in beta.hat): y.hat <- beta.hat["(intercept)", ] + beta.hat["x", ] * 5 par(mfrow=c(1,2)) hist(y.hat.0, freq=false, main="output of `predict`", cex.main=0.95, xlab=expression(hat(y)[0])) hist(y.hat, freq=false, main="manually computed predictions", cex.main=0.95, xlab=expression(hat(y))) Output of `predict` Manually computed predictions Density Density y^ y^ The results are the same, of course. Question (c) c. For each sample, compute a 95% CI on the slope. How many of these intervals contain the true value β 1 = 10? Is this what you would expect? It s a good exercise to compute this CI using formulas from the book. In practice, though, you would look for a built-in R function. It is confint: confint(fit, "x", level=95/100) 2.5 % 97.5 % x The art of statistical computing lies in continually checking that your understanding of the software is correct. How do we know that this output really is providing a symmetric, two-sided, 95% 6
7 confidence interval for β 1? One way is to compute the same interval in an alternative way. For instance, we could inspect the summary table. For fit it included an estimate of ˆβ 1 = and a standard error of Using 19 2 = 17 degrees of freedom (also shown in the summary output) we may compute the corresponding multiplier from the Student t distribution as κ = t 1 df (1 α/2). Here are the commands to perform these calculations and display κ: confidence <- 95/100 alpha <- (1 - confidence)/2 df <- fit$df.residual (multiplier <- qt(1 - alpha, df)) [1] The confidence interval is ˆβ 1 ± κse( ˆβ 1 ) = ± It agrees with the output of confint. Now we can feel comfortable using confint in our work. Let s apply this to the simulation: CI.beta.1 <- apply(sim, 2, function(f) { class(f) <- "lm" confint(f, "x", level=95/100) To count the number of intervals containing the true value, compare them with the true value: covers <- CI.beta.1[1, ] <= beta[2] & beta[2] <= CI.beta.1[2, ] print(paste0(sum(covers), " (", mean(covers)*100, "%) of the intervals cover the true value.")) [1] "475 (95%) of the intervals cover the true value." Question (d) d. For each estimate of E(y x = 5) in part b, compute the 95% CI, etc. The R solution once again is predict. This function is overloaded: it does lots of different things, depending on what you ask of it. As before, we should not rely on it until we have tested it/ predict(fit, newdata=data.frame(x=5), interval="confidence", level=95/100) fit lwr upr Evidently it produces a vector of three values: the fit ŷ and the lower and upper (symmetric, two-sided) confidence interval. We can deal with these exactly as we did with ˆβ: the result of apply will be three rows of output which can be referenced by their names fit, lwr, and upr. y.hat.0 <- apply(sim, 2, function(f) { class(f) <- "lm" predict(f, newdata=data.frame(x=5)) From this point on, emulate the calculations and the answer to part (c). 7
Section 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationEstimating R 0 : Solutions
Estimating R 0 : Solutions John M. Drake and Pejman Rohani Exercise 1. Show how this result could have been obtained graphically without the rearranged equation. Here we use the influenza data discussed
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationLab #13 - Resampling Methods Econ 224 October 23rd, 2018
Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationCSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation
CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationWeek 4: Simple Linear Regression II
Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties
More informationStatistical foundations of Machine Learning INFO-F-422 TP: Linear Regression
Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12, 2013 1 1 Repetition 1.1 Estimation using the mean square error Assume to have
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationUnit 5: Estimating with Confidence
Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More information1 Lab 1. Graphics and Checking Residuals
R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationNEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age
NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationSolution to Bonus Questions
Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample
More informationModel selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41
1/41 Model selection 560 Hierarchical modeling Peter Hoff Statistics, University of Washington /41 Modeling choices Model: A statistical model is a set of probability distributions for your data. In HLM,
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationBivariate (Simple) Regression Analysis
Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationWorkshop 8: Model selection
Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationHeteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors
Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms
More informationSubset Selection in Multiple Regression
Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationBasic Statistical Terms and Definitions
I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationpredict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015
predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationAlgebra 1, 4th 4.5 weeks
The following practice standards will be used throughout 4.5 weeks:. Make sense of problems and persevere in solving them.. Reason abstractly and quantitatively. 3. Construct viable arguments and critique
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationStatistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.
Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. In this Computer Class we are going to use Statgraphics
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationMeasures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.
Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in
More informationBland-Altman Plot and Analysis
Chapter 04 Bland-Altman Plot and Analysis Introduction The Bland-Altman (mean-difference or limits of agreement) plot and analysis is used to compare two measurements of the same variable. That is, it
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More informationStandard Errors in OLS Luke Sonnet
Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationReferences R's single biggest strenght is it online community. There are tons of free tutorials on R.
Introduction to R Syllabus Instructor Grant Cavanaugh Department of Agricultural Economics University of Kentucky E-mail: gcavanugh@uky.edu Course description Introduction to R is a short course intended
More information9.8 Rockin the Residuals
42 SECONDARY MATH 1 // MODULE 9 9.8 Rockin the Residuals A Solidify Understanding Task The correlation coefficient is not the only tool that statisticians use to analyze whether or not a line is a good
More informationMeasures of Central Tendency
Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More information1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.
6. Using the regression line, determine a predicted value of y for x = 25. Does it look as though this prediction is a good one? Ans. The regression line at x = 25 is at height y = 45. This is right at
More informationA Knitr Demo. Charles J. Geyer. February 8, 2017
A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.
More informationPage 1. Graphical and Numerical Statistics
TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise
More informationRecall the expression for the minimum significant difference (w) used in the Tukey fixed-range method for means separation:
Topic 11. Unbalanced Designs [ST&D section 9.6, page 219; chapter 18] 11.1 Definition of missing data Accidents often result in loss of data. Crops are destroyed in some plots, plants and animals die,
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationIntroductory Applied Statistics: A Variable Approach TI Manual
Introductory Applied Statistics: A Variable Approach TI Manual John Gabrosek and Paul Stephenson Department of Statistics Grand Valley State University Allendale, MI USA Version 1.1 August 2014 2 Copyright
More informationplots Chris Parrish August 20, 2015
plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data
More informationUNIT 1: NUMBER LINES, INTERVALS, AND SETS
ALGEBRA II CURRICULUM OUTLINE 2011-2012 OVERVIEW: 1. Numbers, Lines, Intervals and Sets 2. Algebraic Manipulation: Rational Expressions and Exponents 3. Radicals and Radical Equations 4. Function Basics
More informationThe Statistical Sleuth in R: Chapter 10
The Statistical Sleuth in R: Chapter 10 Kate Aloisio Ruobing Zhang Nicholas J. Horton September 28, 2013 Contents 1 Introduction 1 2 Galileo s data on the motion of falling bodies 2 2.1 Data coding, summary
More informationProblem Set #8. Econ 103
Problem Set #8 Econ 103 Part I Problems from the Textbook No problems from the textbook on this assignment. Part II Additional Problems 1. For this question assume that we have a random sample from a normal
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationCHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY
23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series
More informationVocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.
5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table
More informationRegression III: Lab 4
Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationOne Factor Experiments
One Factor Experiments 20-1 Overview Computation of Effects Estimating Experimental Errors Allocation of Variation ANOVA Table and F-Test Visual Diagnostic Tests Confidence Intervals For Effects Unequal
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationMonte Carlo Analysis
Monte Carlo Analysis Andrew Q. Philips* February 7, 27 *Ph.D Candidate, Department of Political Science, Texas A&M University, 2 Allen Building, 4348 TAMU, College Station, TX 77843-4348. aphilips@pols.tamu.edu.
More informationSection 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered
More information