Replication of paper: Women As Policy Makers: Evidence From A Randomized Policy Experiment In India

Size: px
Start display at page:

Download "Replication of paper: Women As Policy Makers: Evidence From A Randomized Policy Experiment In India"

Transcription

1 Replication of paper: Women As Policy Makers: Evidence From A Randomized Policy Experiment In India Matthieu Stigler October 3, 2013 Try to replicate paper Raghabendra Chattopadhyay & Esther Duo, (2004). "Women as Policy Makers: Evidence from a Randomized Policy Experiment in India," Econometrica, Econometric Society, vol. 72(5), pages , 09. Available here. Data is taken from this site (downloaded in tab format, just need the four womenpolicymakers_part*.tab, documentation and supplementary documentation will also prove useful ). Contents 1 Tables Table Table Table Data issues 2 3 Code Open packages and data Data Cleaning Tables Try to replicate Table 1: Try to replicate Table 2: Try to replicate Table 3: Informations on session Tables 1.1 Table 1 ˆ Variable used: womres and prsex. 1

2 Reserved Unreserved Total Female Percent ˆ Issue: not same number for percentage of women (maybe 7 in their data). 1.2 Table 2 Diculty to know whether variables are from the 161 pradhans or the 483 villages, few variables could be traced back. When looking at the 161, variables not found, or dierent numbers obtained. 1.3 Table 3 Reserved Unreserved Participation Complaint ˆ Variable used: vgswp and vwiss. ˆ Issue: dierent value for fraction of women in samsad. Not same standard values obtained with Moulton (not shown). 2 Data issues ˆ Inconsistency in village coding between womenpolicymakers_partc.tab and womenpolicymakers_partd.tab: village gpnum==47 has not same jlnum in each dataset, probably inverted ˆ Villae gpnum==7 & villnum==1 has NA for jlunm in womenpolicymakers_partd.tab, but not in womenpolicymakers_partc.tab. 3 Code 3.1 Open packages and data Open (and install before) some packages: library(plyr) library(car) 2

3 Read the data. This assumes you downloaded the *.tab, not *.dta, in the latter case, use library foreign, and function read.dta. user <- Sys.info()["user"] ## I need this trick to switch from laptop to desktop if (user == "mat") { pathdir <- "/home/mat/dropbox/" } else if (user == "stigler") { pathdir <- "C:/Users/stigler/Dropbox/" } pathmat <- paste(pathdir, "HEI/Coursera/Replicate/Chattopday, Duflo 2004/study_USBFNOMLAT/2. sep = "") wpm_1 <- read.csv(paste(pathmat, "womenpolicymakers_parta.tab", sep = ""), sep = "\t") wpm_2 <- read.csv(paste(pathmat, "womenpolicymakers_partb.tab", sep = ""), sep = "\t") wpm_3 <- read.csv(paste(pathmat, "womenpolicymakers_partc.tab", sep = ""), sep = "\t") wpm_4 <- read.csv(paste(pathmat, "womenpolicymakers_partd.tab", sep = ""), sep = "\t", fileencoding = "native.enc") # not used now: wpm_surv_a <- read.csv(paste(pathmat, # 'womenpolicymakers_resurveya.tab', sep=''), sep='\t') wpm_surv_b <- # read.csv(paste(pathmat, 'womenpolicymakers_resurveyd.tab', sep=''), # sep='\t') dim(wpm_3) ## [1] dim(wpm_4) ## [1] Data Cleaning Recode the wom reserved variable: ### recode variables: wpm_1$womres2 <- Recode(wpm_1$womres, "'1'='Reserved';'2'='Unreserved'") wpm_1$prsex2 <- Recode(wpm_1$prsex, "'1'='Male';'2'='Female'") Merge now the dataset 1 and 2: 3

4 ### Identify pre-test villages in 1-2: pre_test <- which(apply(wpm_1[, 3:7], 1, function(x) all(is.na(x)))) ## Merge 1-2 wpm_12 <- arrange(merge(wpm_1[-pre_test, ], wpm_2[-pre_test, ], by = c("gpnum", "gpnumst")), gpnum) Merge now the dataset 3 and 4: ## identify pre-test villages in 3-4 pre_test_vill <- which(apply(wpm_3[, 3:7], 1, function(x) all(is.na(x)))) pre_test_vill2 <- which(apply(wpm_4[, 3:7], 1, function(x) all(is.na(x)))) ## check got right identifyers all(unique(wpm_3[pre_test_vill, "gpnum"]) == pre_test) ## [1] TRUE all(pre_test_vill == pre_test_vill2) ## [1] TRUE ## remove pre-test villages wpm_3_notest <- wpm_3[-pre_test_vill, ] wpm_4_notest <- wpm_4[-pre_test_vill, ] ## Check we indeed removed: which(is.na(wpm_4[, c("gpnum", "villnum", "jlnum")]), arr.ind = TRUE) ## row col ## [1,] 10 3 ## [2,] 11 3 ## [3,] 12 3 ## [4,] 13 3 ## [5,] 14 3 ## [6,] 15 3 ## [7,] 19 3 ## [8,] 40 3 ## [9,] 41 3 ## [10,] 42 3 ## [11,] 43 3 ## [12,] 44 3 ## [13,] 45 3 ## [14,] ## [15,] ## [16,]

5 which(is.na(wpm_4_notest[, c("gpnum", "villnum", "jlnum")]), arr.ind = TRUE) # still a prob ## row col ## But to merge 3 and 4, some data problems to solve rst... ## Visualise mistake 1: subset(wpm_3, gpnum == 47, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## ## ## subset(wpm_4, gpnum == 47, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## ## ## ## Visualise mistake 2: subset(wpm_3, gpnum == 7 & villnum == 1, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## subset(wpm_4, gpnum == 7 & villnum == 1, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## NA ## Correct mistake 1 index_mis1 <- which(wpm_4_notest$gpnum == 47 & wpm_4_notest$villnum!= 1) wpm_4_notest[index_mis1, "jlnum"] <- wpm_3_notest[index_mis1, "jlnum"] ## Correct mistake 2 index_mis2 <- which(wpm_4_notest$gpnum == 7 & wpm_4_notest$villnum == 1) wpm_4_notest[index_mis2, "jlnum"] <- wpm_3_notest[index_mis2, "jlnum"] Now can nally merge 3 and 4: 5

6 ## Now finally can merge 3 and 4! wpm_34_t <- arrange(merge(wpm_3_notest, wpm_4_notest, by = c("gpnum", "villnum", "jlnum"), all = FALSE), gpnum) ### add womres (in 1-2) to wpm_34 wpm_34 <- merge(wpm_34_t, wpm_12[, c("gpnum", "womres", "womres2")], by = "gpnum", all.x = TRUE) 3.3 Tables Try to replicate Table 1: ## Table 1 tab_tot <- table(wpm_1[["womres2"]], dnn = list("villages")) table(wpm_1[, "womres2"], wpm_1[["prsex2"]], dnn = list("reserved", "Sex of Prashan")) ## Sex of Prashan ## Reserved Female Male ## Reserved 54 0 ## Unreserved 8 99 tab_dis <- table(wpm_1[["prsex2"]], wpm_1[, "womres2"], dnn = list("reserved", "Sex of Prashan")) tab_tex <- rbind(total = tab_tot, Female = tab_dis[1, ], Percent = round(100 * tab_dis[1, ]/tab_tot, 1)) tab_tex ## Reserved Unreserved ## Total ## Female ## Percent Try to replicate Table 2: #### Table 2: handpumps wpm_12$gtubbn ## [1] NA ## [18] NA NA 12 NA

7 ## [35] ## [52] NA 19 ## [69] 4 5 NA ## [86] ## [103] ## [120] NA ## [137] ## [154] # Is the variable just the mean? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtubbn, na.rm = TRUE)) ## womres handpumps ## ## # Is the variable just the mean, removing NA and 999? ddply(wpm_12,.(womres), summarise, handpumps = mean(!is.na(gtubbn) & gtubbn!= 999, na.rm = TRUE)) ## womres handpumps ## ## #### Table 2: tap water Is the variable just the mean? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtapn, na.rm = TRUE)) ## womres handpumps ## ## # Is the variable justthe number of obs higher than 0? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtapn > 0, na.rm = TRUE)) ## womres handpumps ## ## ddply(wpm_12,.(womres), summarise, handpumps = mean(!is.na(gtapn) & gtapn!= 999, na.rm = TRUE)) ## womres handpumps ## ## ## table 2 primary school summary(wpm_12$gpsc) 7

8 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## summary(wpm_12$gnopsc) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## ddply(wpm_12,.(womres), summarise, primaryschool = mean(gnopsc > 0, na.rm = TRUE)) ## womres primaryschool ## ## ## does not work very well Try to replicate Table 3: ##### Table 3: gran samsad participatio tab3 <- ddply(subset(wpm_34, villnum!= 1),.(womres2), summarise, Participation = round(mea na.rm = TRUE), 2), Complaint = round(mean(vwiss == 1, na.rm = TRUE), 2)) tab3ok <- t(tab3[, -1]) colnames(tab3ok) <- c("reserved", "Unreserved") reg_part_1 <- lm(vgswp ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 1) reg_part_2 <- lm(vgswp ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 2) reg_part_diff <- lm(vgswp ~ 1 + I(womres == 1), data = wpm_34, subset = villnum!= 1) coef(summary(reg_part_diff)) ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-15 ## I(womres == 1)TRUE e-02 ### woman complaint compute the means with the mean() function, by womres ddply(subset(wpm_34, villnum!= 1),.(womres2), summarise, Complaint = round(mean(vwiss == 1, na.rm = TRUE), 2)) ## different!! 8

9 ## womres2 Complaint ## 1 Reserved 0.20 ## 2 Unreserved 0.11 ## compute the means with the lm() function, (so subset data) reg_complaintw_1 <- lm(i(vwiss == 1) ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 1) reg_complaintw_2 <- lm(i(vwiss == 1) ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 2) ## Extract the standard error of the mean: round(coef(summary(reg_complaintw_1))[1, c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## round(coef(summary(reg_complaintw_2))[1, c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## ## compute the p-value of the diff: reg_complaintw_12 <- lm(i(vwiss == 2) ~ 1 + womres, data = wpm_34, subset = villnum!= 1) round(coef(summary(reg_complaintw_12))["womres", c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## ## use Moulton, as in paper: source(paste(pathdir, ## 'Documents/stats/R/RcompAngrist/pkg/R/Moulton.R',sep='')) ## moulton(lm=reg_part_1, cluster=subset(wpm_34, villnum!=1 & womres==1 & ##!is.na(vgswp), 'gpnum', drop=true)) moulton(lm=reg_part_2, ## cluster=subset(wpm_34, villnum!=1 & womres==2 &!is.na(vgswp), 'gpnum', ## drop=true)) 9

10 3.4 Informations on session We like at the end to put some information on the R session (R version, version of packages, platform, etc...) sessioninfo() ## R version ( ) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=French_Switzerland.1252 LC_CTYPE=French_Switzerland.1252 ## [3] LC_MONETARY=French_Switzerland.1252 LC_NUMERIC=C ## [5] LC_TIME=French_Switzerland.1252 ## ## attached base packages: ## [1] stats graphics grdevices utils datasets base ## ## other attached packages: ## [1] xtable_1.7-1 car_ plyr_1.8 knitr_1.4.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.7 formatr_0.9 highr_0.2.1 ## [5] MASS_ nnet_7.3-7 stringr_0.6.2 tools_

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette

More information

8. Introduction to R Packages

8. Introduction to R Packages 8. Introduction to R Packages Ken Rice & David Reif University of Washington & North Carolina State University NYU Abu Dhabi, January 2019 In this session Base R includes pre-installed packages that allow

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1

RAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1 RAPIDR Kitty Lo November 20, 2014 Contents 1 Intended use of RAPIDR 1 2 Create binned counts file from BAMs 1 2.1 Masking.................................................... 1 3 Build the reference 2 3.1

More information

An Introduction to R 1.3 Some important practical matters when working with R

An Introduction to R 1.3 Some important practical matters when working with R An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop,

More information

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression

Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12, 2013 1 1 Repetition 1.1 Estimation using the mean square error Assume to have

More information

ICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research

ICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research http://www.icssrdataservice.in/ ICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research ICSSR Data Service Contents 1. Introduction 1 2. Installation

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,

More information

Shrinkage of logarithmic fold changes

Shrinkage of logarithmic fold changes Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs

More information

GxE.scan. October 30, 2018

GxE.scan. October 30, 2018 GxE.scan October 30, 2018 Overview GxE.scan can process a GWAS scan using the snp.logistic, additive.test, snp.score or snp.matched functions, whereas snp.scan.logistic only calls snp.logistic. GxE.scan

More information

Reproducible Homerange Analysis

Reproducible Homerange Analysis Reproducible Homerange Analysis (Sat Aug 09 15:28:43 2014) based on the rhr package This is an automatically generated file with all parameters and settings, in order to enable later replication of the

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

Intermediate Programming in R Session 1: Data. Olivia Lau, PhD

Intermediate Programming in R Session 1: Data. Olivia Lau, PhD Intermediate Programming in R Session 1: Data Olivia Lau, PhD Outline About Me About You Course Overview and Logistics R Data Types R Data Structures Importing Data Recoding Data 2 About Me Using and programming

More information

Lab: Using R and Bioconductor

Lab: Using R and Bioconductor Lab: Using R and Bioconductor Robert Gentleman Florian Hahne Paul Murrell June 19, 2006 Introduction In this lab we will cover some basic uses of R and also begin working with some of the Bioconductor

More information

Advanced analysis using bayseq; generic distribution definitions

Advanced analysis using bayseq; generic distribution definitions Advanced analysis using bayseq; generic distribution definitions Thomas J Hardcastle October 30, 2017 1 Generic Prior Distributions bayseq now offers complete user-specification of underlying distributions

More information

segmentseq: methods for detecting methylation loci and differential methylation

segmentseq: methods for detecting methylation loci and differential methylation segmentseq: methods for detecting methylation loci and differential methylation Thomas J. Hardcastle October 30, 2018 1 Introduction This vignette introduces analysis methods for data from high-throughput

More information

Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden

Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden 2016-11-24 Contents Requirements 2 Context 2 Loading a data table 2 Checking the content of the count tables 3 Factors

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export

IST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export IST 3108 Data Analysis and Graphics Using R Summarizing Data Data Import-Export Engin YILDIZTEPE, PhD Working with Vectors and Logical Subscripts >xsum(x) how many of the values were less than

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;

Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ; Example1D.1.sas * SAS example program 1D.1 ; * 1. Create a dataset called prob from the following data: ; * age prob lb ub ; * 24.25.20.31 ; * 36.26.21.32 ; * 48.28.24.33 ; * 60.31.28.36 ; * 72.35.32.39

More information

The analysis of rtpcr data

The analysis of rtpcr data The analysis of rtpcr data Jitao David Zhang, Markus Ruschhaupt October 30, 2017 With the help of this document, an analysis of rtpcr data can be performed. For this, the user has to specify several parameters

More information

Count outlier detection using Cook s distance

Count outlier detection using Cook s distance Count outlier detection using Cook s distance Michael Love August 9, 2014 1 Run DE analysis with and without outlier removal The following vignette produces the Supplemental Figure of the effect of replacing

More information

Reading and wri+ng data

Reading and wri+ng data An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look

More information

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data

Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................

More information

An Introduction to the methylumi package

An Introduction to the methylumi package An Introduction to the methylumi package Sean Davis and Sven Bilke October 13, 2014 1 Introduction Gene expression patterns are very important in understanding any biologic system. The regulation of gene

More information

Examples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI)

Examples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI) Electronic Supplementary Material (ESI) for Analyst. This journal is The Royal Society of Chemistry 2015 Examples of implementation of pre-processing method described in paper with R code snippets - Electronic

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny 171:161: Introduction to Biostatistics Breheny Lab #3 The focus of this lab will be on using SAS and R to provide you with summary statistics of different variables with a data set. We will look at both

More information

Introduction to the TSSi package: Identification of Transcription Start Sites

Introduction to the TSSi package: Identification of Transcription Start Sites Introduction to the TSSi package: Identification of Transcription Start Sites Julian Gehring, Clemens Kreutz October 30, 2017 Along with the advances in high-throughput sequencing, the detection of transcription

More information

Statistical foundations of Machine Learning INFO-F-422 TP: Prediction

Statistical foundations of Machine Learning INFO-F-422 TP: Prediction Statistical foundations of Machine Learning INFO-F-422 TP: Prediction Catharina Olsen and Gianluca Bontempi March 25, 2013 1 1 Introduction: supervised learning A supervised learning problem lets us study

More information

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert

Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................

More information

Chapter 6: Modifying and Combining Data Sets

Chapter 6: Modifying and Combining Data Sets Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as

More information

Image Analysis with beadarray

Image Analysis with beadarray Mike Smith October 30, 2017 Introduction From version 2.0 beadarray provides more flexibility in the processing of array images and the extraction of bead intensities than its predecessor. In the past

More information

Frequency Distributions and Descriptive Statistics in SPS

Frequency Distributions and Descriptive Statistics in SPS 230 Combs Building 859.622.3050 studentcomputing.eku.edu studentcomputing@eku.edu Frequency Distributions and Descriptive Statistics in SPSS In this tutorial, we re going to work through a sample problem

More information

S CHAPTER return.data S CHAPTER.Data S CHAPTER

S CHAPTER return.data S CHAPTER.Data S CHAPTER 1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]

More information

Main Results. Kevin R, Coombes. 10 September 2011

Main Results. Kevin R, Coombes. 10 September 2011 Main Results Kevin R, Coombes 10 September 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives.................................. 1 1.2

More information

Checking whether the protocol was followed: gender and age 51

Checking whether the protocol was followed: gender and age 51 Checking whether the protocol was followed: gender and age 51 Session 4: Checking whether the protocol was followed: gender and age In the data cleaning workbook there are two worksheets which form the

More information

ECON Introductory Econometrics Seminar 4

ECON Introductory Econometrics Seminar 4 ECON4150 - Introductory Econometrics Seminar 4 Stock and Watson EE8.2 April 28, 2015 Stock and Watson EE8.2 ECON4150 - Introductory Econometrics Seminar 4 April 28, 2015 1 / 20 Current Population Survey

More information

The Command Pattern in R

The Command Pattern in R The Command Pattern in R Michael Lawrence September 2, 2012 Contents 1 Introduction 1 2 Example pipelines 2 3 sessioninfo 8 1 Introduction Command pattern is a design pattern used in object-oriented programming,

More information

SML 201 Week 2 John D. Storey Spring 2016

SML 201 Week 2 John D. Storey Spring 2016 SML 201 Week 2 John D. Storey Spring 2016 Contents Getting Started in R 3 Summary from Week 1.......................... 3 Missing Values.............................. 3 NULL....................................

More information

1 Building a simple data package for R. 2 Data files. 2.1 bmd data

1 Building a simple data package for R. 2 Data files. 2.1 bmd data 1 Building a simple data package for R Suppose that we wish to make a package containing data sets only available in-house or on CRAN. This is often done for the data sets in the examples and exercises

More information

Performing Cluster Bootstrapped Regressions in R

Performing Cluster Bootstrapped Regressions in R Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Table A.1 Evaluation of Actual Pradhan: Question-wise Analysis Pradhan did a good job N

Table A.1 Evaluation of Actual Pradhan: Question-wise Analysis Pradhan did a good job N Table A.1 Evaluation of Actual Pradhan: Question-wise Analysis Pradhan did a good job Pradhan is Looking after Looking after effective village needs your needs Making BPL lists Male Female Male Female

More information

Introduction to the Codelink package

Introduction to the Codelink package Introduction to the Codelink package Diego Diez October 30, 2018 1 Introduction This package implements methods to facilitate the preprocessing and analysis of Codelink microarrays. Codelink is a microarray

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

CITS4009 Introduction to Data Science

CITS4009 Introduction to Data Science School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data

More information

Introduction to SPSS

Introduction to SPSS Introduction to SPSS Purpose The purpose of this assignment is to introduce you to SPSS, the most commonly used statistical package in the social sciences. You will create a new data file and calculate

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Introduction to R (BaRC Hot Topics)

Introduction to R (BaRC Hot Topics) Introduction to R (BaRC Hot Topics) George Bell September 30, 2011 This document accompanies the slides from BaRC s Introduction to R and shows the use of some simple commands. See the accompanying slides

More information

Importing data sets in R

Importing data sets in R Importing data sets in R R can import and export different types of data sets including csv files text files excel files access database STATA data SPSS data shape files audio files image files and many

More information

Classification of Breast Cancer Clinical Stage with Gene Expression Data

Classification of Breast Cancer Clinical Stage with Gene Expression Data Classification of Breast Cancer Clinical Stage with Gene Expression Data Zhu Wang Connecticut Children s Medical Center University of Connecticut School of Medicine zwang@connecticutchildrens.org July

More information

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

Mass Correspondence. Theoretical Part: Practical Part:

Mass Correspondence. Theoretical Part: Practical Part: Mass Correspondence Theoretical Part: A typical example of using Mass correspondence can be entrance exams. Participants send a request; the school processes it and prepares an inviting letter for the

More information

Contents. Introduction 2

Contents. Introduction 2 R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Vasiliki Lagou, on behalf of co-authors 18 September 2015 Contents Introduction

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

An Introductory Guide to R

An Introductory Guide to R An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics

More information

An Overview of the S4Vectors package

An Overview of the S4Vectors package Patrick Aboyoun, Michael Lawrence, Hervé Pagès Edited: February 2018; Compiled: June 7, 2018 Contents 1 Introduction.............................. 1 2 Vector-like and list-like objects...................

More information

MMDiff2: Statistical Testing for ChIP-Seq Data Sets

MMDiff2: Statistical Testing for ChIP-Seq Data Sets MMDiff2: Statistical Testing for ChIP-Seq Data Sets Gabriele Schwikert 1 and David Kuo 2 [1em] 1 The Informatics Forum, University of Edinburgh and The Wellcome

More information

WORKSHOP: Using the Health Survey for England, 2014

WORKSHOP: Using the Health Survey for England, 2014 WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience

More information

GETTING DATA INTO THE PROGRAM

GETTING DATA INTO THE PROGRAM GETTING DATA INTO THE PROGRAM 1. Have a Stata dta dataset. Go to File then Open. OR Type use pathname in the command line. 2. Using a SAS or SPSS dataset. Use Stat Transfer. (Note: do not become dependent

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

Package statar. July 6, 2017

Package statar. July 6, 2017 Package statar July 6, 2017 Title Tools Inspired by 'Stata' to Manipulate Tabular Data Version 0.6.5 A set of tools inspired by 'Stata' to eplore data.frames ('summarize', 'tabulate', 'tile', 'pctile',

More information

Smartphone Ownership 2013 Update

Smartphone Ownership 2013 Update www.pewresearch.org JUNE 5, 2013 Smartphone Ownership 2013 Update 56% of American adults now own a smartphone of some kind; Android and iphone owners account for half of the cell phone user population.

More information

STATA Hand Out 1. STATA's latest version is version 12. Most commands in this hand-out work on all versions of STATA.

STATA Hand Out 1. STATA's latest version is version 12. Most commands in this hand-out work on all versions of STATA. STATA Hand Out 1 STATA Background: STATA is a Data Analysis and Statistical Software developed by the company STATA-CORP in 1985. It is widely used by researchers across different fields. STATA is popular

More information

segmentseq: methods for detecting methylation loci and differential methylation

segmentseq: methods for detecting methylation loci and differential methylation segmentseq: methods for detecting methylation loci and differential methylation Thomas J. Hardcastle October 13, 2015 1 Introduction This vignette introduces analysis methods for data from high-throughput

More information

Transformations. Hadley Wickham. October 2009

Transformations. Hadley Wickham. October 2009 Transformations Hadley Wickham October 2009 1. US baby names data 2. Transformations 3. Summaries 4. Doing it by group Baby names Top 1000 male and female baby names in the US, from 1880 to 2008. 258,000

More information

PROPER: PROspective Power Evaluation for RNAseq

PROPER: PROspective Power Evaluation for RNAseq PROPER: PROspective Power Evaluation for RNAseq Hao Wu [1em]Department of Biostatistics and Bioinformatics Emory University Atlanta, GA 303022 [1em] hao.wu@emory.edu October 30, 2017 Contents 1 Introduction..............................

More information

STAT:5400 Computing in Statistics

STAT:5400 Computing in Statistics STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,

More information

Intro to Stata for Political Scientists

Intro to Stata for Political Scientists Intro to Stata for Political Scientists Andrew S. Rosenberg Junior PRISM Fellow Department of Political Science Workshop Description This is an Introduction to Stata I will assume little/no prior knowledge

More information

Debugging R Code. Biostatistics

Debugging R Code. Biostatistics Debugging R Code Biostatistics 140.776 Something s Wrong! Indications that something s not right message: A generic notification/diagnostic message produced by the message function; execution of the function

More information

MIGSA: Getting pbcmc datasets

MIGSA: Getting pbcmc datasets MIGSA: Getting pbcmc datasets Juan C Rodriguez Universidad Católica de Córdoba Universidad Nacional de Córdoba Cristóbal Fresno Instituto Nacional de Medicina Genómica Andrea S Llera Fundación Instituto

More information

Rearranging and manipula.ng data

Rearranging and manipula.ng data An introduc+on to Rearranging and manipula.ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 7 Course outline Review Checking and cleaning data Rearranging and manipula+ng

More information

BIOSTAT640 R Lab1 for Spring 2016

BIOSTAT640 R Lab1 for Spring 2016 BIOSTAT640 R Lab1 for Spring 2016 Minming Li & Steele H. Valenzuela Feb.1, 2016 This is the first R lab session of course BIOSTAT640 at UMass during the Spring 2016 semester. I, Minming (Matt) Li, am going

More information

Appendix II: STATA Preliminary

Appendix II: STATA Preliminary Appendix II: STATA Preliminary STATA is a statistical software package that offers a large number of statistical and econometric estimation procedures. With STATA we can easily manage data and apply standard

More information

crlmm to downstream data analysis

crlmm to downstream data analysis crlmm to downstream data analysis VJ Carey, B Carvalho March, 2012 1 Running CRLMM on a nontrivial set of CEL files To use the crlmm algorithm, the user must load the crlmm package, as described below:

More information

Survey Questions and Methodology

Survey Questions and Methodology Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project

More information

Appendix II: STATA Preliminary

Appendix II: STATA Preliminary Appendix II: STATA Preliminary STATA is a statistical software package that offers a large number of statistical and econometric estimation procedures. With STATA we can easily manage data and apply standard

More information

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Introduction to BatchtoolsParam

Introduction to BatchtoolsParam Nitesh Turaga 1, Martin Morgan 2 Edited: March 22, 2018; Compiled: January 4, 2019 1 Nitesh.Turaga@ RoswellPark.org 2 Martin.Morgan@ RoswellPark.org Contents 1 Introduction..............................

More information

Data Cleaning. Andrew Jaffe. January 6, 2016

Data Cleaning. Andrew Jaffe. January 6, 2016 Data Cleaning Andrew Jaffe January 6, 2016 Data We will be using multiple data sets in this lecture: Salary, Monument, Circulator, and Restaurant from OpenBaltimore: https: //data.baltimorecity.gov/browse?limitto=datasets

More information

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set

Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging, IPHT Jena e.v. February 13,

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

It s Proc Tabulate Jim, but not as we know it!

It s Proc Tabulate Jim, but not as we know it! Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on

More information

Creating IGV HTML reports with tracktables

Creating IGV HTML reports with tracktables Thomas Carroll 1 [1em] 1 Bioinformatics Core, MRC Clinical Sciences Centre; thomas.carroll (at)imperial.ac.uk June 13, 2018 Contents 1 The tracktables package.................... 1 2 Creating IGV sessions

More information

Stata versions 12 & 13 Week 4 Practice Problems

Stata versions 12 & 13 Week 4 Practice Problems Stata versions 12 & 13 Week 4 Practice Problems SOLUTIONS 1 Practice Screen Capture a Create a word document Name it using the convention lastname_lab1docx (eg bigelow_lab1docx) b Using your browser, go

More information

Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS

Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS What are we doing when we merge data from two sweeps of the NCDS (i.e. data from different points in time)? We are adding new information

More information

Running the perf function Kim-Anh Le Cao 01 September 2014

Running the perf function Kim-Anh Le Cao 01 September 2014 Running the perf function Kim-Anh Le Cao 01 September 2014 The function valid has been superseded by the perf function to avoid some selection bias in the sparse functions. This has been fixed. Load the

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Colorado Results. For 10/3/ /4/2012. Contact: Doug Kaplan,

Colorado Results. For 10/3/ /4/2012. Contact: Doug Kaplan, Colorado Results For 10/3/2012 10/4/2012 Contact: Doug Kaplan, 407-242-1870 Executive Summary Following the debates, Gravis Marketing, a non-partisan research firm conducted a survey of 1,438 likely voters

More information

Exercise 1: Introduction to Stata

Exercise 1: Introduction to Stata Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet

More information

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a Put the following data into an spss data set: Be sure to include variable and value labels and missing value specifications for all variables

More information

Package zebu. R topics documented: October 24, 2017

Package zebu. R topics documented: October 24, 2017 Type Package Title Local Association Measures Version 0.1.2 Date 2017-10-21 Author Olivier M. F. Martin [aut, cre], Michel Ducher [aut] Package zebu October 24, 2017 Maintainer Olivier M. F. Martin

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

IPUMS Training and Development: Requesting Data

IPUMS Training and Development: Requesting Data IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore

More information

The linear mixed model: modeling hierarchical and longitudinal data

The linear mixed model: modeling hierarchical and longitudinal data The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical

More information