Lecture 3 - Object-oriented programming and statistical programming examples
|
|
- Coleen McCoy
- 6 years ago
- Views:
Transcription
1 Lecture 3 - Object-oriented programming and statistical programming examples Björn Andersson (w/ Ronnie Pingel) Department of Statistics, Uppsala University February 1, 2013
2 Table of Contents 1 Some notes on object-oriented programming
3 R objects revisited R includes many different classes of objects, with a unique structure and with unique properties Using the built-in way of handling objects of different types there exists the possibility to write generic functions operate differently depending on the class of the arguments this simplifies usage and makes R flexible You can create your own object classes which suit your needs and you can write functions tailored for these objects that only function for arguments of the correct class ensures that functions are not used improperly
4 A few important definitions A class determines what an object is supposed to be made of (vectors, matrices, formulas etc) A generic function is a function which directs an object to a method depending on the object s class, and then operations are made to the object using this method (plot(), summary(), print() are examples of generic functions) A method is the set of operations directed to by a generic function You cannot use a method itself the way you use a function
5 Why use classes? By using classes you can accommodate more types of data than is possible using e.g. data frames or lists You can define what each slot in the class is to be constituted of and then it will be impossible to assign other types of data to these slots You maintain a higher level of certainty that the object is what it is supposed to be and as a result any analysis you make is more trustworthy (in general) In e.g. a data frame you can manipulate the data as much as you wish - classes allow for restrictions which are often useful
6 Why use methods? Methods provide a way to use R more easily A generic function allows for different things to be executed depending on the object class plot() does something diffent for a glm object compared to an object of class matrix When writing a package you can create your own generic functions which make usage and development simpler Extract information from different objects you have created in a consistent way Ease your own programming by providing methods for functions only seen inside the code
7 How I use classes and methods In the kequate I added a few classes to ease computations within the package. I also added a new class for the output from the main function methods for this class was added for the functions plot() and summary() methods are also provided for other functions in the package which allow for comparisons
8 Table of Contents 1 Some notes on object-oriented programming
9 The bootstrap The idea of the bootstrap By bootstrapping we mean the act of resampling from a random sample that we have observed and drawing conclusions about an estimator based on these resamples. In a sense you pull yourself up by your boostraps - you do something which is not possible. But bootstrapping actually works!
10 The bootstrap Words from the originator of the bootstrap I also wish to thank the many friends who suggested names more colorful than Bootstrap, including Swiss Army Knife, Meat Axe, Swan-Dive, Jack-Rabbit, and my personal favorite, the Shotgun, which to paraphrase Tukey, can blow the head off any problem if the statistician can stand the resulting mess. - Efron (1979)
11 The bootstrap The idea of the bootstrap The reasoning behind the bootstrap is largely as follows: Your observed sample is a random instantiation from the population of interest Therefore, a random sample from your sample can be viewed as a random sample from the population of interest As such, the distribution of an estimator for a parameter of interest can be estimated by calculating the estimate for each bootstrap sample
12 The bootstrap Bootstrap vs MC Bootstrapping and monte-carlo simulation are both based on repetitive sampling. What is the difference? Monte-carlo simulation Data generation with known values of the parameters. Used to test drive estimators. Bootstrapping Uses the original, initial sample as the population from which to resample. You can estimate the variability of the statistic and the shape of its sampling distribution. The bootstrap has had a considerable impact on statistics and offers a new way to find standard errors and confidence intervals.
13 The bootstrap Estimating the standard error of a statistic using the bootstrap We find the bootstrap estimate from the following steps: 1 We have a random sample X of size n and a statistic s(x). 2 We draw a random sample X of size n with replacement from X. 3 We repeat step 2 to obtain B independent bootstrap samples and calculate the statistic s(x ) i for each bootstrap sample. 4 The bootstrap estimate of the standard error of the statistic s(x) is then the standard deviation of the bootstrap sample statistics: ˆ SE B [s(x)] = 1 B 1 [ B ] B 2 s(x i=1 ) i s(x ) i B i=1 1/2
14 The bootstrap Asymptotic results We define the ideal bootstrap estimate of the standard error of a statistic as the standard deviation of the m bootstrap values s(z j ) m seˆf (s(x)) = j=1 w j s(zj ) 2 m w j s(z j ) This is not tractable to compute. However, it can be shown that lim B So the bootstrap works! j=1 SE ˆ B [s(x)] = seˆf (s(x)). 1/2.
15 The bootstrap The bootstrap method using R sample() draws a random sample from a vector with or without replacement. > sample(1:10) [1] > sample(1:10, replace=true) [1] Read help(sample) for details. Remember that you can select rows in a data frame like: > testdata <- data.frame(a=runif(10), B=rpois(10, 10), + C=rbinom(10, 1, 0.5)) > testdata[c(3,1),] A B C
16 The bootstrap The bootstrap method using R The basic bootstrap is very easy to implement in R. We write a simple function to calculate the bootstrap estimate of the standard error of the mean: > bootstrapsemean <- function(x, B){ + res <- numeric(b) + for(i in 1:B) + res[i] <- mean(sample(x, replace=true)) + res <- sd(res) + return(res) + }
17 The bootstrap The bootstrap method using R > ttx <- rnorm(100) > bootstrapsemean(ttx, 10) [1] > bootstrapsemean(ttx, 1000) [1] We have i, X i N(0, 1) and independent. Hence: Var( X) = Var( n i=1 X i n ) = We note that 1/100 = 0.1. n i=1 Var(X i ) n 2 = n i=1 1 n 2 = 1 n.
18 The bootstrap The bootstrap method using R Of course, in the case of the sample mean we do not need the bootstrap estimate of the variance since it is readily available. However, in many situations we do not have a way of finding the variance of a statistic. In many such cases the bootstrap works. > bootstrapsemedian <- function(x, B){ + res <- numeric(b) + for(i in 1:B) + res[i] <- median(sample(x, replace=true)) + res <- sd(res) + return(res) + } > bootstrapsemedian(ttx, 1000) [1]
19 The bootstrap The bootstrap method using R In Assignment 1, I will ask you to write a function for the bootstrap of a particular statistic which depends on two variables. The function should: Have input arguments such that you can specify a data frame containing the data and the number of replications to be used Calculate the estimate of the statistic and its bootstrap standard error Provide a suitable output of the estimate of the statistic and the standard error of the statistic I will also ask you to provide plots of the distribution for the bootstrapped statistic.
20 The bootstrap The bootstrap sometimes fails If the support for the random variable X depends on the parameter θ you want to estimate, and s(x) is the estimator, then the bootstrap may fail for example a R.V. X such that X U(0, θ) If certain regularity conditions are violated then the bootstrap fails These conditions are however not as strict as those required by e.g. the Delta method (asymptotic approximation using taylor expansion) The matching estimator used in causal inference is an example of when the bootstrap fails.
21 The bootstrap How many bootstrap replications? As many as you have time for! Rule of thumb: Use system.time() to check how fast the bootstrap runs and choose a reasonable number For many problems you will however need more than 1000 replications For the statistics used in the presentation we can choose a very large number of replications without any problems
22 The bootstrap Kernel equating: a bootstrap example Equating is a statistical method used in educational measurement to ensure that the results of standardized testing are comparable Kernel equating is a special type of equating using a Gaussian kernel to calculate the equating function Kernel equating requires the selection of a bandwidth Problem: we do not have a way to derive the analytical standard errors of equating when considering the most commonly used bandwidth selection The bootstrap can be used in this case! The bootstrap shows that the influence of the bandwidth selection is very small - the currently used analytical standard errors are in fact a decent approximation
23 Some useful statistical functions in R Included functions for common distributions Generate random numbers rnorm(n, mean=0, sd=1) rpois, rbinom(), rchisq() etc. Density function/probaility mass function dnorm(x, mean=0, sd=1) dpois, dbinom(), dchisq() etc. Distribution function pnorm(q, mean=0, sd=1) ppois, pbinom(), pchisq() etc.
24 Some useful statistical functions in R Some more plotting functions hist() plots a histogram of your data qqnorm() plots the sample quantiles of a data vector and compares it the normal case The function density() calculates the density of your data which can then be plotted You can write: plot(density(x)), where x is the vector of data points
25 Generalized linear models Fitting generalized linear models in R Using the function glm() in R an array of linear models can be fitted. glm() has many arguments, the most important of which are: formula - the form of the model specified, e.g. y~x+z+x:z family - the link function used, e.g. gaussian, poisson, binomial etc. (defaults to gaussian) data - a data frame (not required) Gaussian linear model: > x <- rnorm(100) > y <- 1.2 * x + rnorm(100) > glmgauss <- glm(y~x)
26 Generalized linear models Fitting generalized linear models in R > glmgauss Call: glm(formula = y ~ x) Coefficients: (Intercept) x Degrees of Freedom: 99 Total (i.e. Null); Null Deviance: 233 Residual Deviance: AIC: Residual
27 Generalized linear models Fitting generalized linear models in R The fitted values are stored in the glm object as fitted.values. The observed values are stored as y. > gaussfitted <- glmgauss$fitted.values > gaussobs <- glmgauss$y You can choose to also save the design matrix (i.e. the explanatory variables) if specifying x=true in the glm() function call.
28 Generalized linear models Fitting generalized linear models in R: data frames With data frames you can easily specify models with glm(). > z <- rnorm(100) > xyz <- data.frame(x=x, y=y, z=z) > glmxyz <- glm(x~., data=xyz) When you write x~. you use x as the response and the rest of the variables in the data frame as explanatory variables.
29 Generalized linear models Automatic model selection in R In Assignment 1 I will ask you to write a function which automatically selects the best generalized linear model for an arbitrary response variable according to some criterion. The criteria are AIC = 2p 2 log(l) and BIC = log(n)p 2 log(l), where p is the number of parameters in the model and n is the sample size.
30 Generalized linear models Automatic model selection in R The function step() in R can be used to stepwise search for the best model with respect to some criterion. If you provide a glm object to step() the function will default to provide the best model using a backward search starting with the full model specified. Read the help file!
31 Table of Contents 1 Some notes on object-oriented programming
32 Some tips If you get stuck/get an error message you don t understand, read the help files for the function or google your error message Use online manuals such as Quick-R (
33 Group presentation of assignments I decided to generate a random sequence of integers from 1 to 8 where the first number in the sequence would correspond to presenting Exercise 1, the second to be the discussant for Exercise 1 and so on. I generated random numbers from The site uses atmospheric data as its source of randomness. I retrieved the following sequence of integers from 1 to 8: The R package random has features to detect if a sequence is not random if you want to check it (it is likely that this is too short of a sequence though). See the schedule of the seminar for the full list!
34 Next time Today I will not go through any more new material but rather be available for questions An apportunity for you to work on the exercises in Assignment 1 and the report for said assignment
Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More informationMore Summer Program t-shirts
ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationLAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT
NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling
More informationPackage FWDselect. December 19, 2015
Title Selecting Variables in Regression Models Version 2.1.0 Date 2015-12-18 Author Marta Sestelo [aut, cre], Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] Maintainer Marta Sestelo
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationWork through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.
CDT R Review Sheet Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. 1. Vectors (a) Generate 100 standard normal random variables,
More informationThe glmmml Package. August 20, 2006
The glmmml Package August 20, 2006 Version 0.65-1 Date 2006/08/20 Title Generalized linear models with clustering A Maximum Likelihood and bootstrap approach to mixed models. License GPL version 2 or newer.
More informationAcknowledgments. Acronyms
Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction
More informationPackage glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering
Encoding UTF-8 Version 1.0.3 Date 2018-03-25 Title Generalized Linear Models with Clustering Package glmmml March 25, 2018 Binomial and Poisson regression for clustered data, fixed and random effects with
More informationR Programming Basics - Useful Builtin Functions for Statistics
R Programming Basics - Useful Builtin Functions for Statistics Vectorized Arithmetic - most arthimetic operations in R work on vectors. Here are a few commonly used summary statistics. testvect = c(1,3,5,2,9,10,7,8,6)
More informationNina Zumel and John Mount Win-Vector LLC
SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification
More informationR Programming: Worksheet 6
R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.
More informationLecture 12. August 23, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
Lecture 12 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University August 23, 2007 1 2 3 4 5 1 2 Introduce the bootstrap 3 the bootstrap algorithm 4 Example
More informationStatistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals.
Statistics I 2011/2012 Notes about the third Computer Class: Simulation of samples and goodness of fit; Central Limit Theorem; Confidence intervals. In this Computer Class we are going to use Statgraphics
More informationWeek 7: The normal distribution and sample means
Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample
More informationMonte Carlo Analysis
Monte Carlo Analysis Andrew Q. Philips* February 7, 27 *Ph.D Candidate, Department of Political Science, Texas A&M University, 2 Allen Building, 4348 TAMU, College Station, TX 77843-4348. aphilips@pols.tamu.edu.
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More informationpredict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015
predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout
More informationProblem Set #8. Econ 103
Problem Set #8 Econ 103 Part I Problems from the Textbook No problems from the textbook on this assignment. Part II Additional Problems 1. For this question assume that we have a random sample from a normal
More informationChapters 5-6: Statistical Inference Methods
Chapters 5-6: Statistical Inference Methods Chapter 5: Estimation (of population parameters) Ex. Based on GSS data, we re 95% confident that the population mean of the variable LONELY (no. of days in past
More informationExploratory model analysis
Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models
More informationCARTWARE Documentation
CARTWARE Documentation CARTWARE is a collection of R functions written for Classification and Regression Tree (CART) Analysis of ecological data sets. All of these functions make use of existing R functions
More informationBootstrapping Methods
Bootstrapping Methods example of a Monte Carlo method these are one Monte Carlo statistical method some Bayesian statistical methods are Monte Carlo we can also simulate models using Monte Carlo methods
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationUP School of Statistics Student Council Education and Research
w UP School of Statistics Student Council Education and Research erho.weebly.com 0 erhomyhero@gmail.com f /erhoismyhero t @erhomyhero S133_HOA_001 Statistics 133 Bayesian Statistical Inference Use of R
More informationGoals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)
SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationProgramming and Post-Estimation
Programming and Post-Estimation Bootstrapping Monte Carlo Post-Estimation Simulation (Clarify) Extending Clarify to Other Models Censored Probit Example What is Bootstrapping? A computer-simulated nonparametric
More informationModel validation T , , Heli Hiisilä
Model validation T-61.6040, 03.10.2006, Heli Hiisilä Testing Neural Models: How to Use Re-Sampling Techniques? A. Lendasse & Fast bootstrap methodology for model selection, A. Lendasse, G. Simon, V. Wertz,
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More information1. Estimation equations for strip transect sampling, using notation consistent with that used to
Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,
More informationDesign of Experiments
Seite 1 von 1 Design of Experiments Module Overview In this module, you learn how to create design matrices, screen factors, and perform regression analysis and Monte Carlo simulation using Mathcad. Objectives
More informationR practice. Eric Gilleland. 20th May 2015
R practice Eric Gilleland 20th May 2015 1 Preliminaries 1. The data set RedRiverPortRoyalTN.dat can be obtained from http://www.ral.ucar.edu/staff/ericg. Read these data into R using the read.table function
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationAn introduction to plotting data
An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationStatistics 406 Exam November 17, 2005
Statistics 406 Exam November 17, 2005 1. For each of the following, what do you expect the value of A to be after executing the program? Briefly state your reasoning for each part. (a) X
More informationGLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015
GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background
More informationDealing with Categorical Data Types in a Designed Experiment
Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of
More informationIntroduction to R Programming
Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More information1 Pencil and Paper stuff
Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationModelling and Quantitative Methods in Fisheries
SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of
More informationSTAT 135 Lab 1 Solutions
STAT 135 Lab 1 Solutions January 26, 2015 Introduction To complete this lab, you will need to have access to R and RStudio. If you have not already done so, you can download R from http://cran.cnr.berkeley.edu/,
More informationBootstrap Confidence Interval of the Difference Between Two Process Capability Indices
Int J Adv Manuf Technol (2003) 21:249 256 Ownership and Copyright 2003 Springer-Verlag London Limited Bootstrap Confidence Interval of the Difference Between Two Process Capability Indices J.-P. Chen 1
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationUnit 5: Estimating with Confidence
Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating
More informationGOV 2001/ 1002/ E-2001 Section 1 1 Monte Carlo Simulation
GOV 2001/ 1002/ E-2001 Section 1 1 Monte Carlo Simulation Anton Strezhnev Harvard University January 27, 2016 1 These notes and accompanying code draw on the notes from TF s from previous years. 1 / 33
More information1 Lab 1. Graphics and Checking Residuals
R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows
More informationLab 4: Distributions of random variables
Lab 4: Distributions of random variables In this lab we ll investigate the probability distribution that is most central to statistics: the normal distribution If we are confident that our data are nearly
More informationCategorical Data in a Designed Experiment Part 2: Sizing with a Binary Response
Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Authored by: Francisco Ortiz, PhD Version 2: 19 July 2018 Revised 18 October 2018 The goal of the STAT COE is to assist in
More informationComputing With R Handout 1
Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationGenerating random samples from user-defined distributions
The Stata Journal (2011) 11, Number 2, pp. 299 304 Generating random samples from user-defined distributions Katarína Lukácsy Central European University Budapest, Hungary lukacsy katarina@phd.ceu.hu Abstract.
More informationOn the usage of the grim package
On the usage of the grim package Søren Højsgaard grim version 0.2-0 as of 2017-03-31 Contents 1 Introduction 2 2 Introductory examples 2 2.1 A Discrete Model................................ 2 2.2 Model
More informationBootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping
Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationR Programming: Worksheet 3
R Programming: Worksheet 3 By the end of the practical you should feel confident writing and calling functions, and using if(), for() and while() constructions. 1. Review (a) Write a function which takes
More informationComputational statistics Jamie Griffin. Semester B 2018 Lecture 1
Computational statistics Jamie Griffin Semester B 2018 Lecture 1 Course overview This course is not: Statistical computing Programming This course is: Computational statistics Statistical methods that
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationChapter 10: Extensions to the GLM
Chapter 10: Extensions to the GLM 10.1 Implement a GAM for the Swedish mortality data, for males, using smooth functions for age and year. Age and year are standardized as described in Section 4.11, for
More informationPackage caic4. May 22, 2018
Type Package Package caic4 May 22, 2018 Title Conditional Akaike Information Criterion for 'lme4' Version 0.4 Date 2018-05-22 Author Benjamin Saefken and David Ruegamer, with contributions from Sonja Greven
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationPackage samplesizelogisticcasecontrol
Package samplesizelogisticcasecontrol February 4, 2017 Title Sample Size Calculations for Case-Control Studies Version 0.0.6 Date 2017-01-31 Author Mitchell H. Gail To determine sample size for case-control
More informationThis is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or
STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationProbability and Statistics for Final Year Engineering Students
Probability and Statistics for Final Year Engineering Students By Yoni Nazarathy, Last Updated: April 11, 2011. Lecture 1: Introduction and Basic Terms Welcome to the course, time table, assessment, etc..
More informationToday s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors
Today s Lecture Factors & Sampling Jarrett Byrnes September 8, 2014 1. A little bit about Factors 2. Sampling 3. Describing your sample Quick Review of Last Week s Computational Concepts Numbers we Understand
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 13: The bootstrap (v3) Ramesh Johari ramesh.johari@stanford.edu 1 / 30 Resampling 2 / 30 Sampling distribution of a statistic For this lecture: There is a population model
More informationPackage OrthoPanels. November 11, 2016
Package OrthoPanels November 11, 2016 Title Dynamic Panel Models with Orthogonal Reparameterization of Fixed Effects Version 1.1-0 Implements the orthogonal reparameterization approach recommended by Lancaster
More informationLab 5 - Risk Analysis, Robustness, and Power
Type equation here.biology 458 Biometry Lab 5 - Risk Analysis, Robustness, and Power I. Risk Analysis The process of statistical hypothesis testing involves estimating the probability of making errors
More information4.5 The smoothed bootstrap
4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution
More informationSimulating power in practice
Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More information5.5 Regression Estimation
5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationNCSS Statistical Software
Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression
More informationR Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean
R Primer for Introduction to Mathematical Statistics 8th Edition Joseph W. McKean Copyright 2017 by Joseph W. McKean at Western Michigan University. All rights reserved. Reproduction or translation of
More information[1] CURVE FITTING WITH EXCEL
1 Lecture 04 February 9, 2010 Tuesday Today is our third Excel lecture. Our two central themes are: (1) curve-fitting, and (2) linear algebra (matrices). We will have a 4 th lecture on Excel to further
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationTutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3
Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 This tutorial shows you: how to simulate a random process how to plot the distribution of a variable how to assess the distribution
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More information1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files).
Hints on using WinBUGS 1 Running a model in WinBUGS 1. Start WinBUGS by double clicking on the WinBUGS icon (or double click on the file WinBUGS14.exe in the WinBUGS14 directory in C:\Program Files). 2.
More informationFrequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM
Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.
More informationLecture 3: Basics of R Programming
Lecture 3: Basics of R Programming This lecture introduces you to how to do more things with R beyond simple commands. Outline: 1. R as a programming language 2. Grouping, loops and conditional execution
More informationFrom logistic to binomial & Poisson models
From logistic to binomial & Poisson models Ben Bolker October 17, 2018 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share
More informationImproving the Post-Smoothing of Test Norms with Kernel Smoothing
Improving the Post-Smoothing of Test Norms with Kernel Smoothing Anli Lin Qing Yi Michael J. Young Pearson Paper presented at the Annual Meeting of National Council on Measurement in Education, May 1-3,
More informationSimulation and resampling analysis in R
Simulation and resampling analysis in R Author: Nicholas G Reich, Jeff Goldsmith, Andrea S Foulkes, Gregory Matthews This material is part of the statsteachr project Made available under the Creative Commons
More informationThe Bootstrap and Jackknife
The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter
More informationChapter 2: Statistical Models for Distributions
Chapter 2: Statistical Models for Distributions 2.2 Normal Distributions In Chapter 2 of YMS, we learn that distributions of data can be approximated by a mathematical model known as a density curve. In
More informationPackage glinternet. June 15, 2018
Type Package Package glinternet June 15, 2018 Title Learning Interactions via Hierarchical Group-Lasso Regularization Version 1.0.8 Date 2018-06-20 Author Michael Lim, Trevor Hastie Maintainer Michael
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationGeneralized Additive Models
:p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1
More informationExercises R For Simulations Columbia University EPIC 2015 (no answers)
Exercises R For Simulations Columbia University EPIC 2015 (no answers) C DiMaggio June 10, 2015 Contents 1 Sampling and Simulations 2 2 Drawing Statistical Inferences on a Continuous Variable 2 2.1 Simulations
More informationMore advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K.
More advanced use of mgcv Simon Wood Mathematical Sciences, University of Bath, U.K. Fine control of smoothness: gamma Suppose that we fit a model but a component is too wiggly. For GCV/AIC we can increase
More informationIntroduction to hypothesis testing
Introduction to hypothesis testing Mark Johnson Macquarie University Sydney, Australia February 27, 2017 1 / 38 Outline Introduction Hypothesis tests and confidence intervals Classical hypothesis tests
More informationFor our example, we will look at the following factors and factor levels.
In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball
More information