PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

Size: px
Start display at page:

Download "PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1"

Transcription

1 PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation Simple Linear Regression Software: Stata v 10.1 Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook for a First Course in Statistics and Data Analysis. New York, John Wiley, 1995, pp Setting: Calls to the New York Auto Club are possibly related to the weather, with more calls occurring during bad weather. This example illustrates descriptive analyses and simple linear regression to explore this hypothesis in a data set containing information on calendar day, weather, and numbers of calls. Data File: ERS.dta - This is a stata data set. Variable Name Label Coding/ DAY Date using informat MMDDYY6. Example: is January 16, 1993 CALLS Calls answered FHIGH Forecasted high temperature FLOW Forecasted low temperature HIGH High temperature LOW Low temperature RAIN Rain Forecast 0 = NO 1 = RAIN SNOW Snow Forecast 0 = NO 1 = SNOW WEEKDAY Type of Day 0 = NO 1 = Weekday YEAR 0 = = 1994 SUNDAY 0 = NO 1 = SUNDAY SUBZERO 0 = NO 1 = SUBZERO \stata_howto\simple linear regression ny auto club.doc Page 1 of 10

2 Key - Green: comments (note that a comment begins with an asterisk) Black: Stata command syntax. Note You do not type the leading period. Blue: Output I have also inserted some remarks. *. * Simple Linear Regression Using Stata v * toggle off the screen by screen pausing of output. set more off. * Use FILE > OPEN to read in the stata data set ers.dta. use "/Users/carolbigelow/Desktop/ers.dta". * Use the command CODEBOOK followed by a comma and the option COMPACT. * to see a compact description of the data. codebook,compact Variable Obs Unique Mean Min Max Label day calls fhigh flow high low rain snow weekday year sunday subzero * 1. Create dictionary of variable values for readability. label define rainf 0 "0=no" 1 "1=rain". label define snowf 0 "0=no" 1 "1=snow". label define weekdayf 0 "0=no" 1 "1=weekday". label define yearf 0 "0=1993" 1 "1=1994". label define sundayf 0 "0=no" 1 "1=Sunday". label define subzerof 0 "0=no" 1 "1=subzero". * 2. Associate the discrete variables with their dictionary of value codes. label values rain rainf. label values snow snowf. label values weekday weekdayf. label values year yearf. label values sunday sundayf. label values subzero subzerof \stata_howto\simple linear regression ny auto club.doc Page 2 of 10

3 . *2. Use command LIST to produce a listing of the data.. list day calls fhigh flow high low rain snow weekday year sunday subzero day calls fhigh flow high low rain snow weekday year sunday subzero =no 0=no 0=no 0=1993 0=no 0=no =no 0=no 0=no 0=1993 1=Sunday 0=no =no 0=no 0=no 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =rain 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 0=no 0=1993 0=no 0=no =rain 0=no 0=no 0=1993 1=Sunday 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =rain 1=snow 1=weekday 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 1=weekday 0=1993 0=no 0=no =no 0=no 0=no 1=1994 1=Sunday 1=subzero =rain 1=snow 0=no 1=1994 0=no 0=no =rain 0=no 1=weekday 1=1994 0=no 0=no =no 0=no 1=weekday 1=1994 0=no 1=subzero =rain 1=snow 1=weekday 1=1994 0=no 1=subzero =no 0=no 1=weekday 1=1994 0=no 1=subzero =no 0=no 0=no 1=1994 0=no 0=no =no 0=no 0=no 1=1994 1=Sunday 0=no =no 0=no 1=weekday 1=1994 0=no 0=no =no 0=no 1=weekday 1=1994 0=no 0=no =rain 1=snow 1=weekday 1=1994 0=no 0=no =no 0=no 1=weekday 1=1994 0=no 1=subzero =rain 1=snow 1=weekday 1=1994 0=no 0=no =rain 1=snow 0=no 1=1994 0=no 0=no \stata_howto\simple linear regression ny auto club.doc Page 3 of 10

4 . *3. Look at your data first! Plot of Y=calls versus X=low. * The following command of SET SCHEME is optional. I downloaded this particular scheme previously.. set scheme lean1. graph twoway (scatter calls day, symbol(d)), title("calls to NY Auto Club "). * At the top bar of the GRAPH window click on the SAVE ICON to save your graph!. * Suggestion: From the drop down menu, choose.png extension. It s nice for cut and paste. * I saved my graph as nyauto_graph01.png" Source: nyauto_graph01.png The scatterplot suggests, as we might expect, that lower temperatures are associated with more calls to the NY Auto club. \stata_howto\simple linear regression ny auto club.doc Page 4 of 10

5 . *4. Descriptives on the outcome variable Y=calls. * Use command SUMMARIZE followed by comma and then followed by option DETAIL. summarize calls, detail calls Percentiles Smallest 1% % % Obs 28 25% Sum of Wgt % 3062 Mean Largest Std. Dev % % Variance % Skewness % Kurtosis *4. continued - Assess assumption of normality both graphically and with hypothesis test. * There are multiple graphs you might consider.. * Here I do a histogram with the y-axis defined as frequency and with an overlay normal. histogram calls, frequency normal title("histogram of Y=CALLS with overlay NORMAL") (bin=5, start=1674, width=1454.6). * save graph as nyauto_graph02.png Source: nyauto_graph02.png The graph shows what we suspected nonnormality of Y=CALLS. \stata_howto\simple linear regression ny auto club.doc Page 5 of 10

6 .* There are also a variety of tests of normality. One is the Shapiro Wilk Test..* See Unit 2 lecture notes page 54.* Null : Distribution of calls is normal. Under Null, test statistic W is close to 1.* Evidence of NON normality is reflected in W < 1 and small p-value. swilk calls Shapiro-Wilk W test for normal data Variable Obs W V z Prob>z calls The null hypothesis of normality of Y=CALLS is rejected. Take care, sometimes the cure is worse than the problem. For now, we ll continue along anyway; this will give us a chance to see some interesting diagnostics!. * 6. Least Squares estimation and analysis of variance table.. regress calls low Source SS df MS Number of obs = F( 1, 26) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = calls Coef. Std. Err. t P> t [95% Conf. Interval] low _cons alls ˆ = 7, *[low] The fitted line is c R 2 =.51 indicates that 51% of the variability in calls is explained. The overall F test significance level <.0001 suggests that the straight line fit performs better in explaining variability in calls than does Y = average # calls From this output, the analysis of variance is the following (next page) \stata_howto\simple linear regression ny auto club.doc Page 6 of 10

7 Source Df Sum of Squares Mean Square Model 1 n Regression ( Yˆ ) 2 i Y = 100,233,719 SS(model)/1 i= 1 = 100,233,719 Residual (n-2) = 26 n 2 Error ( Y ˆ i Yi i= 1 3,673, Total, corrected (n-1) = 27 n 2 Y Y ) = 95,513,596.2 SS(residual)/(n-2) = ( i ) = 195,747,315 i= 1. *7. Overlay of straight line fit on the scatter plot. graph twoway (scatter calls low, symbol(d)) (lfit calls low), title("calls to NY Auto Club ") subtitle("overlay Straight Line Fit"). * save graph as nyauto_graph03.png Source: nyauto_graph03.png The overlay of the straight line fit is reasonable but substantial variability is seen, too. There is a lot we still don t know, including but not limited to the following --- Case influence, omitted variables, variance heterogeneity, incorrect functional form, etc. \stata_howto\simple linear regression ny auto club.doc Page 7 of 10

8 . *8. Residuals Analysis - Assessment of Normality of Residuals. * Stata requires that you use post-estimation commands to obtain residuals. * We consider a few here... * Use command PREDICT varname, RESIDUALS to save residuals to a variable you name. predict e, residuals. * Use command PNORM to plot residuals e versus percentiles of normal. * Reasonableness is suggested by points falling along the line. pnorm e, title("normality of Residuals of Y=calls v X=low"). * save graph as nyauto_graph04.png Source: nyauto_graph04.png Not bad, actually. \stata_howto\simple linear regression ny auto club.doc Page 8 of 10

9 . *9. Residuals analysis - Detection of Outliers Using Cooks Distance. * See Unit 2 lecture notes page 60. * Use command PREDICT varname, COOKSD to save residuals to a variable you name. predict cook, cooksd. * Preliminary to plot of cook s distance, we need to create an ID variable. * This is because the data set ers.dta does not have an ID variable. Most do.. * Use command GENERATE varname=_n to save the system variable _n to a variable you name.. generate id=_n. * Plot Cook s distance values on Y-axis versus id on the X-axis. Look for extreme values. graph twoway (scatter cook id, symbol(d)), title("cook's Distance Values") subtitle("simple Linear Regression of Y=calls on X=low"). * save graph as nyauto_graph05.png Source: nyauto_graph05.png For straight line regression, the suggestion is to regard Cook s Distance values > 1 as significant.. Here, there are no unusually large Cook Distance values. Not shown but useful, too, are examinations of leverage and jackknife residuals. \stata_howto\simple linear regression ny auto club.doc Page 9 of 10

10 . *10. Assessing Assumptions of Linearity, Heteroscedasticity, Independence using Jacknife Residuals. * See Unit 2 notes page 59. * In Stata, jackknife residuals are referred to as studentized residuals. * Use command PREDICT varname, XB to save predicted outcomes to a variable you name. predict predicted, xb. * Use command PREDICT varname, RSTUDENT to save jackknife residuals to a variable you name. predict jack, rstudent. graph twoway (scatter jack predicted, symbol(d)), title("jacknife Residuals versus Predicted"). * save graph as nyauto_graph06.png Source: nyauto_graph06.png Recall A jackknife residual for an individual is a modification of the solution for a studentized residual in which the mean square error is replaced by the mean square error obtained after deleting that individual from the analysis. This plot in SAS is nice for its inclusion of some useful summaries the fitted line, the R 2 Departures of this plot from a parallel band about the horizontal line at zero are significant. The plot here is a bit noisy but not too bad considering the small sample size. \stata_howto\simple linear regression ny auto club.doc Page 10 of 10

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90% ------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do

More information

Stata v 12 Illustration. First Session

Stata v 12 Illustration. First Session Launch Stata PC Users Stata v 12 Illustration Mac Users START > ALL PROGRAMS > Stata; or Double click on the Stata icon on your desktop APPLICATIONS > STATA folder > Stata; or Double click on the Stata

More information

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata.. Stata version 13 January 2015 I- Launching and Exiting Stata... 1. Launching Stata... 2. Exiting Stata.. II - Toolbar, Menu bar and Windows.. 1. Toolbar Key.. 2. Menu bar Key..... 3. Windows..... III -...

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Introduction to Stata: An In-class Tutorial

Introduction to Stata: An In-class Tutorial Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

Lab 2: OLS regression

Lab 2: OLS regression Lab 2: OLS regression Andreas Beger February 2, 2009 1 Overview This lab covers basic OLS regression in Stata, including: multivariate OLS regression reporting coefficients with different confidence intervals

More information

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice. I Launching and Exiting Stata 1. Launching Stata Stata can be launched in either of two ways: 1) in the stata program, click on the stata application; or 2) double click on the short cut that you have

More information

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE PLS 802 Spring 2018 Professor Jacoby THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE This handout shows the log of a Stata session

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education) DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their

More information

Introduction to Stata Toy Program #1 Basic Descriptives

Introduction to Stata Toy Program #1 Basic Descriptives Introduction to Stata 2018-19 Toy Program #1 Basic Descriptives Summary The goal of this toy program is to get you in and out of a Stata session and, along the way, produce some descriptive statistics.

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models SOCY776: Longitudinal Data Analysis Instructor: Natasha Sarkisian Panel Data Analysis: Fixed Effects Models Fixed effects models are similar to the first difference model we considered for two wave data

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Empirical Asset Pricing

Empirical Asset Pricing Department of Mathematics and Statistics, University of Vaasa, Finland Texas A&M University, May June, 2013 As of May 17, 2013 Part I Stata Introduction 1 Stata Introduction Interface Commands Command

More information

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata.. Introduction to Stata 2016-17 01. First Session I- Launching and Exiting Stata... 1. Launching Stata... 2. Exiting Stata.. II - Toolbar, Menu bar and Windows.. 1. Toolbar Key.. 2. Menu bar Key..... 3.

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Stata versions 12 & 13 Week 4 Practice Problems

Stata versions 12 & 13 Week 4 Practice Problems Stata versions 12 & 13 Week 4 Practice Problems SOLUTIONS 1 Practice Screen Capture a Create a word document Name it using the convention lastname_lab1docx (eg bigelow_lab1docx) b Using your browser, go

More information

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very bottom of the screen: 4 The univariate statistics command

More information

Introduction to STATA 6.0 ECONOMICS 626

Introduction to STATA 6.0 ECONOMICS 626 Introduction to STATA 6.0 ECONOMICS 626 Bill Evans Fall 2001 This handout gives a very brief introduction to STATA 6.0 on the Economics Department Network. In a few short years, STATA has become one of

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Cluster Randomization Create Cluster Means Dataset

Cluster Randomization Create Cluster Means Dataset Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters

More information

An Introductory Guide to Stata

An Introductory Guide to Stata An Introductory Guide to Stata Scott L. Minkoff Assistant Professor Department of Political Science Barnard College sminkoff@barnard.edu Updated: July 9, 2012 1 TABLE OF CONTENTS ABOUT THIS GUIDE... 4

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

A Short Introduction to STATA

A Short Introduction to STATA A Short Introduction to STATA 1) Introduction: This session serves to link everyone from theoretical equations to tangible results under the amazing promise of Stata! Stata is a statistical package that

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

1 Introducing Stata sample session

1 Introducing Stata sample session 1 Introducing Stata sample session Introducing Stata This chapter will run through a sample work session, introducing you to a few of the basic tasks that can be done in Stata, such as opening a dataset,

More information

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

Stat 5100 Handout #19 SAS: Influential Observations and Outliers Stat 5100 Handout #19 SAS: Influential Observations and Outliers Example: Data collected on 50 countries relevant to a cross-sectional study of a lifecycle savings hypothesis, which states that the response

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

After opening Stata for the first time: set scheme s1mono, permanently

After opening Stata for the first time: set scheme s1mono, permanently Stata 13 HELP Getting help Type help command (e.g., help regress). If you don't know the command name, type lookup topic (e.g., lookup regression). Email: tech-support@stata.com. Put your Stata serial

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Stata version 12. Lab Session 1 February Preliminary: How to Screen Capture.. 2. Preliminary: How to Keep a Log of Your Stata Session..

Stata version 12. Lab Session 1 February Preliminary: How to Screen Capture.. 2. Preliminary: How to Keep a Log of Your Stata Session.. Stata version 12 Lab Session 1 February 2013 1. Preliminary: How to Screen Capture.. 2. Preliminary: How to Keep a Log of Your Stata Session.. 3. Preliminary: How to Save a Stata Graph... 4. Enter Data:

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Exercise 1: Introduction to Stata

Exercise 1: Introduction to Stata Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet

More information

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017 piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 1 Heriot-Watt University, Edinburgh, UK Center for Energy Economics Research and Policy (CEERP) 2 The Hebrew University

More information

Introduction to Stata

Introduction to Stata Introduction to Stata Introduction In introductory biostatistics courses, you will use the Stata software to apply statistical concepts and practice analyses. Most of the commands you will need are available

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Health Disparities (HD): It s just about comparing two groups

Health Disparities (HD): It s just about comparing two groups A review of modern methods of estimating the size of health disparities May 24, 2017 Emil Coman 1 Helen Wu 2 1 UConn Health Disparities Institute, 2 UConn Health Modern Modeling conference, May 22-24,

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

Principles of Biostatistics and Data Analysis PHP 2510 Lab2

Principles of Biostatistics and Data Analysis PHP 2510 Lab2 Goals for Lab2: Familiarization with Do-file Editor (two important features: reproducible and analysis) Reviewing commands for summary statistics Visual depiction of data- bar chart and histograms Stata

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

Week 1: Introduction to Stata

Week 1: Introduction to Stata Week 1: Introduction to Stata Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline Log

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto AGRODEP Technical Note 08 April 2013 Stata Training Manuel Barron and Pia Basurto AGRODEP Technical Notes are designed to document state-of-the-art tools and methods. They are circulated in order to help

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

Review of Stata II AERC Training Workshop Nairobi, May 2002

Review of Stata II AERC Training Workshop Nairobi, May 2002 Review of Stata II AERC Training Workshop Nairobi, 20-24 May 2002 This note provides more information on the basics of Stata that should help you with the exercises in the remaining sessions of the workshop.

More information

/23/2004 TA : Jiyoon Kim. Recitation Note 1

/23/2004 TA : Jiyoon Kim. Recitation Note 1 Recitation Note 1 This is intended to walk you through using STATA in an Athena environment. The computer room of political science dept. has STATA on PC machines. But, knowing how to use it on Athena

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS 3/12/15 Nate Derby Getting Correct Results from PROC REG 1 / 29 Outline PROC REG 1 PROC REG 2 Nate Derby Getting

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

Reproducible Research: Weaving with Stata

Reproducible Research: Weaving with Stata StataCorp LP Italian Stata Users Group Meeting October, 2008 Outline I Introduction 1 Introduction Goals Reproducible Research and Weaving 2 3 What We ve Seen Goals Reproducible Research and Weaving Goals

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

Getting started with Stata 2017: Cheat-sheet

Getting started with Stata 2017: Cheat-sheet Getting started with Stata 2017: Cheat-sheet 4. september 2017 1 Get started Graphical user interface (GUI). Clickable. Simple. Commands. Allows for use of do-le. Easy to keep track. Command window: Write

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Sales Price of Laptops Based on Their Specifications. Hyunwoo Cho Jay Jung Gun Hee Lee Chan Hong Park Seoyul Um Mario Wijaya Team #10

Sales Price of Laptops Based on Their Specifications. Hyunwoo Cho Jay Jung Gun Hee Lee Chan Hong Park Seoyul Um Mario Wijaya Team #10 Sales Price of Laptops Based on Their Specifications Hyunwoo Cho Jay Jung Gun Hee Lee Chan Hong Park Seoyul Um Mario Wijaya Team #10 Table of Contents 1) Introduction 2 2) Problem Statement 2 3) Data Descriptions

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

SLStats.notebook. January 12, Statistics:

SLStats.notebook. January 12, Statistics: Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Recoding and Labeling Variables

Recoding and Labeling Variables Updated July 2018 Recoding and Labeling Variables This set of notes describes how to use the computer program Stata to recode variables and save them as new variables as well as how to label variables.

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE =

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE = - /*** To demonstrate use of 2SLS ***/ * Case: In the early 1990's Tanzania implemented a FP program to reduce fertility, which was among the highest in the world * The FP program had two main components:

More information

SASEG 9B Regression Assumptions

SASEG 9B Regression Assumptions SASEG 9B Regression Assumptions (Fall 2015) Sources (adapted with permission)- T. P. Cronan, Jeff Mullins, Ron Freeze, and David E. Douglas Course and Classroom Notes Enterprise Systems, Sam M. Walton

More information

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012 Stata Session 2 Tarjei Havnes 1 ESOP and Department of Economics University of Oslo 2 Research department Statistics Norway ECON 4136, UiO, 2012 Tarjei Havnes (University of Oslo) Stata Session 2 ECON

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Stat 401 B Lecture 26

Stat 401 B Lecture 26 Stat B Lecture 6 Forward Selection The Forward selection rocedure looks to add variables to the model. Once added, those variables stay in the model even if they become insignificant at a later ste. Backward

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❷Chapter 2 Basic Statistics Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 1 1 recording data using computers 2 3 4 5 6 some famous

More information

Meet MINITAB. Student Release 14. for Windows

Meet MINITAB. Student Release 14. for Windows Meet MINITAB Student Release 14 for Windows 2003, 2004 by Minitab Inc. All rights reserved. MINITAB and the MINITAB logo are registered trademarks of Minitab Inc. All other marks referenced remain the

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

Bland-Altman Plot and Analysis

Bland-Altman Plot and Analysis Chapter 04 Bland-Altman Plot and Analysis Introduction The Bland-Altman (mean-difference or limits of agreement) plot and analysis is used to compare two measurements of the same variable. That is, it

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Week 11: Interpretation plus

Week 11: Interpretation plus Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork

More information

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October 2010 Stata 2 Willa Friedman Outline of Presentation Importing data from other sources IDs Merging and Appending multiple

More information

Source:

Source: Time Series Source: http://www.princeton.edu/~otorres/stata/ Time series data is data collected over time for a single or a group of variables. Date variable For this kind of data the first thing to do

More information

An Econometric Study: The Cost of Mobile Broadband

An Econometric Study: The Cost of Mobile Broadband An Econometric Study: The Cost of Mobile Broadband Zhiwei Peng, Yongdon Shin, Adrian Raducanu IATOM13 ENAC January 16, 2014 Zhiwei Peng, Yongdon Shin, Adrian Raducanu (UCLA) The Cost of Mobile Broadband

More information