Salary 9 mo : 9 month salary for faculty member for 2004
|
|
- Clarence Small
- 5 years ago
- Views:
Transcription
1 22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor of salary for science, technology, engineering, and math (STEM) faculty at Iowa State University. All the information is publicly available, but the names have been removed, and this is a subset of the full variables set. The subsetted data can be found in salary ISU data.csv VARIABLES Department : one of 28 different departments Rank Code : rank of faculty Full professor 2 Associate professor 3 Assistant professor Gender : male or female Salary 9 mo : 9 month salary for faculty member for 2004 Avg Cont Grants : average contracts and grants for fiscal years 200, 2002, Subsetting the data The data set salary ISU data.csv is available from our class website. Download this.csv file to your C://Temp/ directory so we can read it into R. > salary.original=read.csv("c://temp/salary_isu_data.csv") > head(salary.original) Department Rank_Code Gender Salary_9_mo Avg_Cont_Grants CCE E 2 M EEOBS M EEOBS 3 M IMSE 2 M COM S 3 M AN S 2 M
2 ## We wish to exclude faculty members without any contracts or ## grants for this analysis. ## ) Use a boolean statement to pull-out certain rows: > salary=salary.original[salary.original[,5]!=0,] or ## 2) Use a boolean statement within the subset function: > salary=subset(salary.original, salary.original[,5]!=0) > head(salary) Department Rank_Code Gender Salary_9_mo Avg_Cont_Grants CCE E 2 M EEOBS 3 M AN S 2 M FSHNF 2 F AGRON 3 M COM S 2 M > attach(salary) 3. Exploring the data Let s look at the salary variable, the grants variable, and some of their transformations. Income or money variables are often right-skewed. First, let s rename them for ease of scripting: > ACG=Avg_Cont_Grants > Sal=Salary_9_mo > par(mfrow=c(2,2)) > hist(sal) > hist(log(sal)) > hist(acg) > hist(log(acg)) It turns out that using the transformed variables will help meet our assumptions. 2
3 Let s look at the rank variable, which is a categorical variable. How many faculty members are in each category? > table(rank_code) ## Recall: Full Professor== Rank_Code What percentage is in each category? > table(rank_code)/length(rank_code) Rank_Code Why do you think there s so many more full professors? 4. Relationship between log(salary) and log(grants) > plot(log(acg),log(sal)) There doesn t seem to be a strong relationship, but on top of that, there are three individuals with very large grant amounts. Let s look at these observations: > subset(salary,log(acg)>5) Turns out these individuals are different from the others in that they are administrators in university centers. In this case, after discussion with those involved, we felt it was justifiable to remove these observations before further analysis. We ll remove these three observations and proceed: > salary.2=subset(salary,log(acg)<5) > detach(salary) ## The old data set > attach(salary.2) ## After removal of the 3 > ACG=Avg_Cont_Grants ## ACG after removal > Sal=Salary_9_mo ## Sal after removal > plot(log(acg),log(sal)) 3
4 Fit the simple linear regression: > SLR.out=lm(log(Sal) ~ log(acg)) > abline(slr.out) > summary(slr.out) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-6 *** log(acg) *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 422 degrees of freedom Multiple R-Squared: ,Adjusted R-squared: F-statistic: 4.28 on and 422 DF, p-value: The relationship between log(salary) and log(grants) is significant. Write the model: 5. Inclusion of Rank It was known that rank would have an impact on salary. If we re interested in how gender affects salary, we should also include any other variables known to affect salary. Rank is a categorical variable. Let s create dummy variables. Because there are three categories, we ll need 2 dummy variables: rank.dummy.=rep(0,nrow(salary.2)) rank.dummy.[rank_code==3]= rank.dummy.2=rep(0,nrow(salary.2)) rank.dummy.2[rank_code==2]= ## All zeroes at first ## Place s appropriately 4
5 What is the coding we used? (Recall: Full Professor== in dataset) Assistant Associate Full dummy dummy 2 What is the baseline group? Let s check our coding: > data.frame(rank_code,rank.dummy.,rank.dummy.2) Rank_Code rank.dummy. rank.dummy Rank Code was already numeric, why can t we just use that variable in our model? What model are you fitting if you regress log(sal) on Rank Code here? > is.numeric(rank_code) [] TRUE 5
6 Fit an additive model (i.e. no interaction) with grants and rank: > lm.both.out=lm(log(sal) ~ log(acg) + rank.dummy. + rank.dummy.2) > summary(lm.both.out) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-6 *** log(acg) ** rank.dummy < 2e-6 *** rank.dummy < 2e-6 *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 420 degrees of freedom Multiple R-Squared: ,Adjusted R-squared: F-statistic: on 3 and 420 DF, p-value: < 2.2e-6 Write the model: Y i = β 0 + β ACG x i + β D D i + β D2 D 2i + ɛ i What does the hypothesis of H 0 : β D = 0 test? (In the context of data) How do I test if Rank is useful in the model at all? H 0 : β D = β D2 = 0... a Partial F-test: > anova(slr.out,lm.both.out) Analysis of Variance Table Model : log(sal) ~ log(acg) Model 2: log(sal) ~ log(acg) + rank.dummy. + rank.dummy.2 Res.Df RSS Df Sum of Sq F Pr(>F) < 2.2e-6 *** Signif. codes: 0 *** 0.00 ** 0.0 *
7 If we wanted to include an interaction between rank and grants, what other variables would be needed in the model? What test would be used to test for interaction? 6. Inclusion of gender > sex.dummy=rep(0,nrow(salary.2)) > sex.dummy[gender=="m"]= > lm.out.3=lm(log(sal)~log(acg)+rank.dummy.+rank.dummy.2+sex.dummy) > summary(lm.out.3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-6 *** log(acg) ** rank.dummy < 2e-6 *** rank.dummy < 2e-6 *** sex.dummy ** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 49 degrees of freedom Multiple R-Squared: ,Adjusted R-squared: F-statistic: on 4 and 49 DF, p-value: < 2.2e-6 In this model, which sex is estimated to have a slight advantage? 7
8 What if we didn t include rank (only grants and gender)? > lm.out.norank=lm(log(sal)~log(acg)+sex.dummy) > summary(lm.out.norank) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-6 *** log(acg) e-05 *** sex.dummy e-08 *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 42 degrees of freedom Multiple R-Squared: ,Adjusted R-squared: F-statistic: 23. on 2 and 42 DF, p-value: 3.038e-0 Why is gender so much stronger without rank included? (Recall the fundamentals of multiple regression). > table(rank_code) Rank_Code > table(gender,rank_code) Rank_Code Gender 2 3 F M rank and gender are not independent of each other. Knowing the rank of a randomly chosen individual gives you some information on the likelihood of their gender. A large proportion of the women are in lower ranks. If you don t account for rank, it will look like women are paid less (but that s not a good analysis). It turns out that if we also include department (which is also associated with salary), the significant sex effect disappears. 8
9 7. Lattice Plot Lattice plots can be useful when considering a quantitative response and categorical predictors, or factors. R can actually make dummy variables on its own using the as.factor() command (more on this later). Here, we can see how log(sal) is related to log(acg) for each of the six combinations of Sex/Rank: ## lattice is an attachable package. > library(lattice) > xyplot(log(sal)~log(acg) as.factor(sex.dummy)+as.factor(rank_code)) log(sal) log(acg) 9
22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3
s:5 Applied Linear Regression DeCook all 0 Lab onday October The data Set In 004, a study was done to examine if gender, after controlling for other variables, was a significant predictor of salary for
More informationCDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed
More informationMultiple Linear Regression
Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors
More informationStatistics Lab #7 ANOVA Part 2 & ANCOVA
Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")
More informationModel Selection and Inference
Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs
More information610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison
610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationOrange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)
Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationStat 5303 (Oehlert): Response Surfaces 1
Stat 5303 (Oehlert): Response Surfaces 1 > data
More informationRegression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:
Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationNon-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel
Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationMultiple Linear Regression: Global tests and Multiple Testing
Multiple Linear Regression: Global tests and Multiple Testing Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike
More informationSection 2.1: Intro to Simple Linear Regression & Least Squares
Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:
More informationMultiple Regression White paper
+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms
More informationShow how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model
Tutorial #S1: Getting Started with LG-Syntax DemoData = 'conjoint.sav' This tutorial introduces the use of the LG-Syntax module, an add-on to the Advanced version of Latent GOLD. In this tutorial we utilize
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationSPSS. (Statistical Packages for the Social Sciences)
Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.
More informationFrequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values
Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used
More informationLaboratory for Two-Way ANOVA: Interactions
Laboratory for Two-Way ANOVA: Interactions For the last lab, we focused on the basics of the Two-Way ANOVA. That is, you learned how to compute a Brown-Forsythe analysis for a Two-Way ANOVA, as well as
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationA Knitr Demo. Charles J. Geyer. February 8, 2017
A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationIndependent Variables
1 Stepwise Multiple Regression Olivia Cohen Com 631, Spring 2017 Data: Film & TV Usage 2015 I. MODEL Independent Variables Demographics Item: Age Item: Income Dummied Item: Gender (Female) Digital Media
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationRegression on the trees data with R
> trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76
More informationMath 263 Excel Assignment 3
ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More informationIntroduction to Mixed Models: Multivariate Regression
Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate
More informationTutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models
Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models In this tutorial, we analyze data from a simple choice-based conjoint (CBC) experiment designed to estimate market shares (choice
More informationLab 07: Multiple Linear Regression: Variable Selection
Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics
More informationCoding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton
Coding Categorical Variables in Regression: Indicator or Dummy Variables Professor George S. Easton DataScienceSource.com This video is embedded on the following web page at DataScienceSource.com: DataScienceSource.com/DummyVariables
More informationMixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.
Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics
More informationSection 2.2: Covariance, Correlation, and Least Squares
Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper
More information. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)
DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their
More information9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10
St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationLab #13 - Resampling Methods Econ 224 October 23rd, 2018
Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section
More informationStatistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R
Statistical Bioinformatics (Biomedical Big Data) Notes 2: Installing and Using R In this course we will be using R (for Windows) for most of our work. These notes are to help students install R and then
More informationITSx: Policy Analysis Using Interrupted Time Series
ITSx: Policy Analysis Using Interrupted Time Series Week 5 Slides Michael Law, Ph.D. The University of British Columbia COURSE OVERVIEW Layout of the weeks 1. Introduction, setup, data sources 2. Single
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationStatistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015
Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015 Data Exploration Import Relevant Packages: library(grdevices) library(graphics) library(plyr) library(hexbin) library(base)
More informationSection 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationIntroduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data
Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves
More informationExample 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1
Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 10 - Multiple Regression Many problems in science involve the analysis of multi-variable data sets. For data sets in which there is a single continuous dependent variable, but several
More informationNEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age
NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationIntroduction to R. Introduction to Econometrics W
Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,
More informationGelman-Hill Chapter 3
Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal
More informationThe theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that
The theory of the linear model 41 Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that E(Y X) =X 0 b 0 0 the F-test statistic follows an F-distribution with (p p 0, n p) degrees
More informationLab 1: Introduction, Plotting, Data manipulation
Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationRegression. Notes. Page 1 25-JAN :21:57. Output Created Comments
/STATISTICS COEFF OUTS CI(95) R ANOVA /CRITERIA=PIN(.05) POUT(.10) /DEPENDENT Favorability /METHOD=ENTER zcontemp ZAnxious6 zallcontact. Regression Notes Output Created Comments Input Missing Value Handling
More informationExercise: Graphing and Least Squares Fitting in Quattro Pro
Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.
More informationEvaluating an Alternative CS1 for Students with Prior Programming Experience
Evaluating an Alternative CS1 for Students with Prior Programming Experience Michael S. Kirkpatrick Chris Mayfield SIGCSE Technical Symposium March 2017 JMU Introductory Sequence CS1 Java CS2 Not Java
More informationNCSS Statistical Software. Design Generator
Chapter 268 Introduction This program generates factorial, repeated measures, and split-plots designs with up to ten factors. The design is placed in the current database. Crossed Factors Two factors are
More informationThe linear mixed model: modeling hierarchical and longitudinal data
The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationSimulating power in practice
Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationI. MODEL. Q3i: Check my . Q29s: I like to see films and TV programs from other countries. Q28e: I like to watch TV shows on a laptop/tablet/phone
1 Multiple Regression-FORCED-ENTRY HIERARCHICAL MODEL DORIS ACHEME COM 631/731, Spring 2017 Data: Film & TV Usage 2015 I. MODEL IV Block 1: Demographics Sex (female dummy):q30 Age: Q31 Income: Q34 Block
More informationStatistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings
Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in
More informationINTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:
INTRODUCTION TO SPSS Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN-0-024 Phone: 89287269 E-mail: mujibur@uniten.edu.my OUTLINE About the four-windows in SPSS The basics of managing data files The basic
More informationAnalysis of variance - ANOVA
Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationDescriptives. Graph. [DataSet1] C:\Documents and Settings\BuroK\Desktop\Prestige.sav
GET FILE='C:\Documents and Settings\BuroK\Desktop\Prestige.sav'. DESCRIPTIVES VARIABLES=prestige education income women /STATISTICS=MEAN STDDEV MIN MAX. Descriptives Input Missing Value Handling Resources
More informationGeneral Factorial Models
In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationSolution to Bonus Questions
Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample
More informationSPSS INSTRUCTION CHAPTER 9
SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationPSY 9556B (Feb 5) Latent Growth Modeling
PSY 9556B (Feb 5) Latent Growth Modeling Fixed and random word confusion Simplest LGM knowing how to calculate dfs How many time points needed? Power, sample size Nonlinear growth quadratic Nonlinear growth
More informationCITS4009 Introduction to Data Science
School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data
More informationExercise 2.23 Villanova MAT 8406 September 7, 2015
Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations
More informationTwo-Stage Least Squares
Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes
More informationSubset Selection in Multiple Regression
Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that
More informationInstruction on JMP IN of Chapter 19
Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"
More informationSelect Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables
Select Cases GRAPHS The Select Cases command excludes from further analysis all those cases that do not meet specified selection criteria. Select Cases For a subset of the datafile, use Select Cases. In
More information