Replication of paper: Women As Policy Makers: Evidence From A Randomized Policy Experiment In India
|
|
- Allan Jacobs
- 6 years ago
- Views:
Transcription
1 Replication of paper: Women As Policy Makers: Evidence From A Randomized Policy Experiment In India Matthieu Stigler October 3, 2013 Try to replicate paper Raghabendra Chattopadhyay & Esther Duo, (2004). "Women as Policy Makers: Evidence from a Randomized Policy Experiment in India," Econometrica, Econometric Society, vol. 72(5), pages , 09. Available here. Data is taken from this site (downloaded in tab format, just need the four womenpolicymakers_part*.tab, documentation and supplementary documentation will also prove useful ). Contents 1 Tables Table Table Table Data issues 2 3 Code Open packages and data Data Cleaning Tables Try to replicate Table 1: Try to replicate Table 2: Try to replicate Table 3: Informations on session Tables 1.1 Table 1 ˆ Variable used: womres and prsex. 1
2 Reserved Unreserved Total Female Percent ˆ Issue: not same number for percentage of women (maybe 7 in their data). 1.2 Table 2 Diculty to know whether variables are from the 161 pradhans or the 483 villages, few variables could be traced back. When looking at the 161, variables not found, or dierent numbers obtained. 1.3 Table 3 Reserved Unreserved Participation Complaint ˆ Variable used: vgswp and vwiss. ˆ Issue: dierent value for fraction of women in samsad. Not same standard values obtained with Moulton (not shown). 2 Data issues ˆ Inconsistency in village coding between womenpolicymakers_partc.tab and womenpolicymakers_partd.tab: village gpnum==47 has not same jlnum in each dataset, probably inverted ˆ Villae gpnum==7 & villnum==1 has NA for jlunm in womenpolicymakers_partd.tab, but not in womenpolicymakers_partc.tab. 3 Code 3.1 Open packages and data Open (and install before) some packages: library(plyr) library(car) 2
3 Read the data. This assumes you downloaded the *.tab, not *.dta, in the latter case, use library foreign, and function read.dta. user <- Sys.info()["user"] ## I need this trick to switch from laptop to desktop if (user == "mat") { pathdir <- "/home/mat/dropbox/" } else if (user == "stigler") { pathdir <- "C:/Users/stigler/Dropbox/" } pathmat <- paste(pathdir, "HEI/Coursera/Replicate/Chattopday, Duflo 2004/study_USBFNOMLAT/2. sep = "") wpm_1 <- read.csv(paste(pathmat, "womenpolicymakers_parta.tab", sep = ""), sep = "\t") wpm_2 <- read.csv(paste(pathmat, "womenpolicymakers_partb.tab", sep = ""), sep = "\t") wpm_3 <- read.csv(paste(pathmat, "womenpolicymakers_partc.tab", sep = ""), sep = "\t") wpm_4 <- read.csv(paste(pathmat, "womenpolicymakers_partd.tab", sep = ""), sep = "\t", fileencoding = "native.enc") # not used now: wpm_surv_a <- read.csv(paste(pathmat, # 'womenpolicymakers_resurveya.tab', sep=''), sep='\t') wpm_surv_b <- # read.csv(paste(pathmat, 'womenpolicymakers_resurveyd.tab', sep=''), # sep='\t') dim(wpm_3) ## [1] dim(wpm_4) ## [1] Data Cleaning Recode the wom reserved variable: ### recode variables: wpm_1$womres2 <- Recode(wpm_1$womres, "'1'='Reserved';'2'='Unreserved'") wpm_1$prsex2 <- Recode(wpm_1$prsex, "'1'='Male';'2'='Female'") Merge now the dataset 1 and 2: 3
4 ### Identify pre-test villages in 1-2: pre_test <- which(apply(wpm_1[, 3:7], 1, function(x) all(is.na(x)))) ## Merge 1-2 wpm_12 <- arrange(merge(wpm_1[-pre_test, ], wpm_2[-pre_test, ], by = c("gpnum", "gpnumst")), gpnum) Merge now the dataset 3 and 4: ## identify pre-test villages in 3-4 pre_test_vill <- which(apply(wpm_3[, 3:7], 1, function(x) all(is.na(x)))) pre_test_vill2 <- which(apply(wpm_4[, 3:7], 1, function(x) all(is.na(x)))) ## check got right identifyers all(unique(wpm_3[pre_test_vill, "gpnum"]) == pre_test) ## [1] TRUE all(pre_test_vill == pre_test_vill2) ## [1] TRUE ## remove pre-test villages wpm_3_notest <- wpm_3[-pre_test_vill, ] wpm_4_notest <- wpm_4[-pre_test_vill, ] ## Check we indeed removed: which(is.na(wpm_4[, c("gpnum", "villnum", "jlnum")]), arr.ind = TRUE) ## row col ## [1,] 10 3 ## [2,] 11 3 ## [3,] 12 3 ## [4,] 13 3 ## [5,] 14 3 ## [6,] 15 3 ## [7,] 19 3 ## [8,] 40 3 ## [9,] 41 3 ## [10,] 42 3 ## [11,] 43 3 ## [12,] 44 3 ## [13,] 45 3 ## [14,] ## [15,] ## [16,]
5 which(is.na(wpm_4_notest[, c("gpnum", "villnum", "jlnum")]), arr.ind = TRUE) # still a prob ## row col ## But to merge 3 and 4, some data problems to solve rst... ## Visualise mistake 1: subset(wpm_3, gpnum == 47, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## ## ## subset(wpm_4, gpnum == 47, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## ## ## ## Visualise mistake 2: subset(wpm_3, gpnum == 7 & villnum == 1, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## subset(wpm_4, gpnum == 7 & villnum == 1, c("gpnum", "villnum", "jlnum")) ## gpnum villnum jlnum ## NA ## Correct mistake 1 index_mis1 <- which(wpm_4_notest$gpnum == 47 & wpm_4_notest$villnum!= 1) wpm_4_notest[index_mis1, "jlnum"] <- wpm_3_notest[index_mis1, "jlnum"] ## Correct mistake 2 index_mis2 <- which(wpm_4_notest$gpnum == 7 & wpm_4_notest$villnum == 1) wpm_4_notest[index_mis2, "jlnum"] <- wpm_3_notest[index_mis2, "jlnum"] Now can nally merge 3 and 4: 5
6 ## Now finally can merge 3 and 4! wpm_34_t <- arrange(merge(wpm_3_notest, wpm_4_notest, by = c("gpnum", "villnum", "jlnum"), all = FALSE), gpnum) ### add womres (in 1-2) to wpm_34 wpm_34 <- merge(wpm_34_t, wpm_12[, c("gpnum", "womres", "womres2")], by = "gpnum", all.x = TRUE) 3.3 Tables Try to replicate Table 1: ## Table 1 tab_tot <- table(wpm_1[["womres2"]], dnn = list("villages")) table(wpm_1[, "womres2"], wpm_1[["prsex2"]], dnn = list("reserved", "Sex of Prashan")) ## Sex of Prashan ## Reserved Female Male ## Reserved 54 0 ## Unreserved 8 99 tab_dis <- table(wpm_1[["prsex2"]], wpm_1[, "womres2"], dnn = list("reserved", "Sex of Prashan")) tab_tex <- rbind(total = tab_tot, Female = tab_dis[1, ], Percent = round(100 * tab_dis[1, ]/tab_tot, 1)) tab_tex ## Reserved Unreserved ## Total ## Female ## Percent Try to replicate Table 2: #### Table 2: handpumps wpm_12$gtubbn ## [1] NA ## [18] NA NA 12 NA
7 ## [35] ## [52] NA 19 ## [69] 4 5 NA ## [86] ## [103] ## [120] NA ## [137] ## [154] # Is the variable just the mean? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtubbn, na.rm = TRUE)) ## womres handpumps ## ## # Is the variable just the mean, removing NA and 999? ddply(wpm_12,.(womres), summarise, handpumps = mean(!is.na(gtubbn) & gtubbn!= 999, na.rm = TRUE)) ## womres handpumps ## ## #### Table 2: tap water Is the variable just the mean? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtapn, na.rm = TRUE)) ## womres handpumps ## ## # Is the variable justthe number of obs higher than 0? ddply(wpm_12,.(womres), summarise, handpumps = mean(gtapn > 0, na.rm = TRUE)) ## womres handpumps ## ## ddply(wpm_12,.(womres), summarise, handpumps = mean(!is.na(gtapn) & gtapn!= 999, na.rm = TRUE)) ## womres handpumps ## ## ## table 2 primary school summary(wpm_12$gpsc) 7
8 ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## summary(wpm_12$gnopsc) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## ddply(wpm_12,.(womres), summarise, primaryschool = mean(gnopsc > 0, na.rm = TRUE)) ## womres primaryschool ## ## ## does not work very well Try to replicate Table 3: ##### Table 3: gran samsad participatio tab3 <- ddply(subset(wpm_34, villnum!= 1),.(womres2), summarise, Participation = round(mea na.rm = TRUE), 2), Complaint = round(mean(vwiss == 1, na.rm = TRUE), 2)) tab3ok <- t(tab3[, -1]) colnames(tab3ok) <- c("reserved", "Unreserved") reg_part_1 <- lm(vgswp ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 1) reg_part_2 <- lm(vgswp ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 2) reg_part_diff <- lm(vgswp ~ 1 + I(womres == 1), data = wpm_34, subset = villnum!= 1) coef(summary(reg_part_diff)) ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-15 ## I(womres == 1)TRUE e-02 ### woman complaint compute the means with the mean() function, by womres ddply(subset(wpm_34, villnum!= 1),.(womres2), summarise, Complaint = round(mean(vwiss == 1, na.rm = TRUE), 2)) ## different!! 8
9 ## womres2 Complaint ## 1 Reserved 0.20 ## 2 Unreserved 0.11 ## compute the means with the lm() function, (so subset data) reg_complaintw_1 <- lm(i(vwiss == 1) ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 1) reg_complaintw_2 <- lm(i(vwiss == 1) ~ 1, data = wpm_34, subset = villnum!= 1 & womres == 2) ## Extract the standard error of the mean: round(coef(summary(reg_complaintw_1))[1, c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## round(coef(summary(reg_complaintw_2))[1, c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## ## compute the p-value of the diff: reg_complaintw_12 <- lm(i(vwiss == 2) ~ 1 + womres, data = wpm_34, subset = villnum!= 1) round(coef(summary(reg_complaintw_12))["womres", c("estimate", "Std. Error")], 2) ## Estimate Std. Error ## ## use Moulton, as in paper: source(paste(pathdir, ## 'Documents/stats/R/RcompAngrist/pkg/R/Moulton.R',sep='')) ## moulton(lm=reg_part_1, cluster=subset(wpm_34, villnum!=1 & womres==1 & ##!is.na(vgswp), 'gpnum', drop=true)) moulton(lm=reg_part_2, ## cluster=subset(wpm_34, villnum!=1 & womres==2 &!is.na(vgswp), 'gpnum', ## drop=true)) 9
10 3.4 Informations on session We like at the end to put some information on the R session (R version, version of packages, platform, etc...) sessioninfo() ## R version ( ) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=French_Switzerland.1252 LC_CTYPE=French_Switzerland.1252 ## [3] LC_MONETARY=French_Switzerland.1252 LC_NUMERIC=C ## [5] LC_TIME=French_Switzerland.1252 ## ## attached base packages: ## [1] stats graphics grdevices utils datasets base ## ## other attached packages: ## [1] xtable_1.7-1 car_ plyr_1.8 knitr_1.4.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.7 formatr_0.9 highr_0.2.1 ## [5] MASS_ nnet_7.3-7 stringr_0.6.2 tools_
survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes
survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette
More information8. Introduction to R Packages
8. Introduction to R Packages Ken Rice & David Reif University of Washington & North Carolina State University NYU Abu Dhabi, January 2019 In this session Base R includes pre-installed packages that allow
More information610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison
610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because
More informationRAPIDR. Kitty Lo. November 20, Intended use of RAPIDR 1. 2 Create binned counts file from BAMs Masking... 1
RAPIDR Kitty Lo November 20, 2014 Contents 1 Intended use of RAPIDR 1 2 Create binned counts file from BAMs 1 2.1 Masking.................................................... 1 3 Build the reference 2 3.1
More informationAn Introduction to R 1.3 Some important practical matters when working with R
An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop,
More informationStatistical foundations of Machine Learning INFO-F-422 TP: Linear Regression
Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12, 2013 1 1 Repetition 1.1 Estimation using the mean square error Assume to have
More informationICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research
http://www.icssrdataservice.in/ ICSSR Data Service Indian Social Science Data Repository R : User Guide Indian Council of Social Science Research ICSSR Data Service Contents 1. Introduction 1 2. Installation
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More informationCalibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec
Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,
More informationShrinkage of logarithmic fold changes
Shrinkage of logarithmic fold changes Michael Love August 9, 2014 1 Comparing the posterior distribution for two genes First, we run a DE analysis on the Bottomly et al. dataset, once with shrunken LFCs
More informationGxE.scan. October 30, 2018
GxE.scan October 30, 2018 Overview GxE.scan can process a GWAS scan using the snp.logistic, additive.test, snp.score or snp.matched functions, whereas snp.scan.logistic only calls snp.logistic. GxE.scan
More informationReproducible Homerange Analysis
Reproducible Homerange Analysis (Sat Aug 09 15:28:43 2014) based on the rhr package This is an automatically generated file with all parameters and settings, in order to enable later replication of the
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationIntermediate Programming in R Session 1: Data. Olivia Lau, PhD
Intermediate Programming in R Session 1: Data Olivia Lau, PhD Outline About Me About You Course Overview and Logistics R Data Types R Data Structures Importing Data Recoding Data 2 About Me Using and programming
More informationLab: Using R and Bioconductor
Lab: Using R and Bioconductor Robert Gentleman Florian Hahne Paul Murrell June 19, 2006 Introduction In this lab we will cover some basic uses of R and also begin working with some of the Bioconductor
More informationAdvanced analysis using bayseq; generic distribution definitions
Advanced analysis using bayseq; generic distribution definitions Thomas J Hardcastle October 30, 2017 1 Generic Prior Distributions bayseq now offers complete user-specification of underlying distributions
More informationsegmentseq: methods for detecting methylation loci and differential methylation
segmentseq: methods for detecting methylation loci and differential methylation Thomas J. Hardcastle October 30, 2018 1 Introduction This vignette introduces analysis methods for data from high-throughput
More informationPractical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden
Practical: exploring RNA-Seq counts Hugo Varet, Julie Aubert and Jacques van Helden 2016-11-24 Contents Requirements 2 Context 2 Loading a data table 2 Checking the content of the count tables 3 Factors
More informationWHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
More informationIST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export
IST 3108 Data Analysis and Graphics Using R Summarizing Data Data Import-Export Engin YILDIZTEPE, PhD Working with Vectors and Logical Subscripts >xsum(x) how many of the values were less than
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationExample1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;
Example1D.1.sas * SAS example program 1D.1 ; * 1. Create a dataset called prob from the following data: ; * age prob lb ub ; * 24.25.20.31 ; * 36.26.21.32 ; * 48.28.24.33 ; * 60.31.28.36 ; * 72.35.32.39
More informationThe analysis of rtpcr data
The analysis of rtpcr data Jitao David Zhang, Markus Ruschhaupt October 30, 2017 With the help of this document, an analysis of rtpcr data can be performed. For this, the user has to specify several parameters
More informationCount outlier detection using Cook s distance
Count outlier detection using Cook s distance Michael Love August 9, 2014 1 Run DE analysis with and without outlier removal The following vignette produces the Supplemental Figure of the effect of replacing
More informationReading and wri+ng data
An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look
More informationPreliminary Figures for Renormalizing Illumina SNP Cell Line Data
Preliminary Figures for Renormalizing Illumina SNP Cell Line Data Kevin R. Coombes 17 March 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives..................................
More informationAn Introduction to the methylumi package
An Introduction to the methylumi package Sean Davis and Sven Bilke October 13, 2014 1 Introduction Gene expression patterns are very important in understanding any biologic system. The regulation of gene
More informationExamples of implementation of pre-processing method described in paper with R code snippets - Electronic Supplementary Information (ESI)
Electronic Supplementary Material (ESI) for Analyst. This journal is The Royal Society of Chemistry 2015 Examples of implementation of pre-processing method described in paper with R code snippets - Electronic
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationLab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny
171:161: Introduction to Biostatistics Breheny Lab #3 The focus of this lab will be on using SAS and R to provide you with summary statistics of different variables with a data set. We will look at both
More informationIntroduction to the TSSi package: Identification of Transcription Start Sites
Introduction to the TSSi package: Identification of Transcription Start Sites Julian Gehring, Clemens Kreutz October 30, 2017 Along with the advances in high-throughput sequencing, the detection of transcription
More informationStatistical foundations of Machine Learning INFO-F-422 TP: Prediction
Statistical foundations of Machine Learning INFO-F-422 TP: Prediction Catharina Olsen and Gianluca Bontempi March 25, 2013 1 1 Introduction: supervised learning A supervised learning problem lets us study
More informationStats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert
Stats with R and RStudio Practical: basic stats for peak calling Jacques van Helden, Hugo varet and Julie Aubert 2017-01-08 Contents Introduction 2 Peak-calling: question...........................................
More informationChapter 6: Modifying and Combining Data Sets
Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as
More informationImage Analysis with beadarray
Mike Smith October 30, 2017 Introduction From version 2.0 beadarray provides more flexibility in the processing of array images and the extraction of bead intensities than its predecessor. In the past
More informationFrequency Distributions and Descriptive Statistics in SPS
230 Combs Building 859.622.3050 studentcomputing.eku.edu studentcomputing@eku.edu Frequency Distributions and Descriptive Statistics in SPSS In this tutorial, we re going to work through a sample problem
More informationS CHAPTER return.data S CHAPTER.Data S CHAPTER
1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]
More informationMain Results. Kevin R, Coombes. 10 September 2011
Main Results Kevin R, Coombes 10 September 2011 Contents 1 Executive Summary 1 1.1 Introduction......................................... 1 1.1.1 Aims/Objectives.................................. 1 1.2
More informationChecking whether the protocol was followed: gender and age 51
Checking whether the protocol was followed: gender and age 51 Session 4: Checking whether the protocol was followed: gender and age In the data cleaning workbook there are two worksheets which form the
More informationECON Introductory Econometrics Seminar 4
ECON4150 - Introductory Econometrics Seminar 4 Stock and Watson EE8.2 April 28, 2015 Stock and Watson EE8.2 ECON4150 - Introductory Econometrics Seminar 4 April 28, 2015 1 / 20 Current Population Survey
More informationThe Command Pattern in R
The Command Pattern in R Michael Lawrence September 2, 2012 Contents 1 Introduction 1 2 Example pipelines 2 3 sessioninfo 8 1 Introduction Command pattern is a design pattern used in object-oriented programming,
More informationSML 201 Week 2 John D. Storey Spring 2016
SML 201 Week 2 John D. Storey Spring 2016 Contents Getting Started in R 3 Summary from Week 1.......................... 3 Missing Values.............................. 3 NULL....................................
More information1 Building a simple data package for R. 2 Data files. 2.1 bmd data
1 Building a simple data package for R Suppose that we wish to make a package containing data sets only available in-house or on CRAN. This is often done for the data sets in the examples and exercises
More informationPerforming Cluster Bootstrapped Regressions in R
Performing Cluster Bootstrapped Regressions in R Francis L. Huang / October 6, 2016 Supplementary material for: Using Cluster Bootstrapping to Analyze Nested Data with a Few Clusters in Educational and
More informationPredictive Checking. Readings GH Chapter 6-8. February 8, 2017
Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison
More informationTable A.1 Evaluation of Actual Pradhan: Question-wise Analysis Pradhan did a good job N
Table A.1 Evaluation of Actual Pradhan: Question-wise Analysis Pradhan did a good job Pradhan is Looking after Looking after effective village needs your needs Making BPL lists Male Female Male Female
More informationIntroduction to the Codelink package
Introduction to the Codelink package Diego Diez October 30, 2018 1 Introduction This package implements methods to facilitate the preprocessing and analysis of Codelink microarrays. Codelink is a microarray
More informationEPIB Four Lecture Overview of R
EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided
More informationCITS4009 Introduction to Data Science
School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data
More informationIntroduction to SPSS
Introduction to SPSS Purpose The purpose of this assignment is to introduce you to SPSS, the most commonly used statistical package in the social sciences. You will create a new data file and calculate
More informationIntroduction to R. Introduction to Econometrics W
Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,
More informationIntroduction to R (BaRC Hot Topics)
Introduction to R (BaRC Hot Topics) George Bell September 30, 2011 This document accompanies the slides from BaRC s Introduction to R and shows the use of some simple commands. See the accompanying slides
More informationImporting data sets in R
Importing data sets in R R can import and export different types of data sets including csv files text files excel files access database STATA data SPSS data shape files audio files image files and many
More informationClassification of Breast Cancer Clinical Stage with Gene Expression Data
Classification of Breast Cancer Clinical Stage with Gene Expression Data Zhu Wang Connecticut Children s Medical Center University of Connecticut School of Medicine zwang@connecticutchildrens.org July
More informationTYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT
PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationMass Correspondence. Theoretical Part: Practical Part:
Mass Correspondence Theoretical Part: A typical example of using Mass correspondence can be entrance exams. Participants send a request; the school processes it and prepares an inviting letter for the
More informationContents. Introduction 2
R code for The human immune system is robustly maintained in multiple stable equilibriums shaped by age and cohabitation Vasiliki Lagou, on behalf of co-authors 18 September 2015 Contents Introduction
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationAn Introductory Guide to R
An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics
More informationAn Overview of the S4Vectors package
Patrick Aboyoun, Michael Lawrence, Hervé Pagès Edited: February 2018; Compiled: June 7, 2018 Contents 1 Introduction.............................. 1 2 Vector-like and list-like objects...................
More informationMMDiff2: Statistical Testing for ChIP-Seq Data Sets
MMDiff2: Statistical Testing for ChIP-Seq Data Sets Gabriele Schwikert 1 and David Kuo 2 [1em] 1 The Informatics Forum, University of Edinburgh and The Wellcome
More informationWORKSHOP: Using the Health Survey for England, 2014
WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience
More informationGETTING DATA INTO THE PROGRAM
GETTING DATA INTO THE PROGRAM 1. Have a Stata dta dataset. Go to File then Open. OR Type use pathname in the command line. 2. Using a SAS or SPSS dataset. Use Stat Transfer. (Note: do not become dependent
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationPackage statar. July 6, 2017
Package statar July 6, 2017 Title Tools Inspired by 'Stata' to Manipulate Tabular Data Version 0.6.5 A set of tools inspired by 'Stata' to eplore data.frames ('summarize', 'tabulate', 'tile', 'pctile',
More informationSmartphone Ownership 2013 Update
www.pewresearch.org JUNE 5, 2013 Smartphone Ownership 2013 Update 56% of American adults now own a smartphone of some kind; Android and iphone owners account for half of the cell phone user population.
More informationSTATA Hand Out 1. STATA's latest version is version 12. Most commands in this hand-out work on all versions of STATA.
STATA Hand Out 1 STATA Background: STATA is a Data Analysis and Statistical Software developed by the company STATA-CORP in 1985. It is widely used by researchers across different fields. STATA is popular
More informationsegmentseq: methods for detecting methylation loci and differential methylation
segmentseq: methods for detecting methylation loci and differential methylation Thomas J. Hardcastle October 13, 2015 1 Introduction This vignette introduces analysis methods for data from high-throughput
More informationTransformations. Hadley Wickham. October 2009
Transformations Hadley Wickham October 2009 1. US baby names data 2. Transformations 3. Summaries 4. Doing it by group Baby names Top 1000 male and female baby names in the US, from 1880 to 2008. 258,000
More informationPROPER: PROspective Power Evaluation for RNAseq
PROPER: PROspective Power Evaluation for RNAseq Hao Wu [1em]Department of Biostatistics and Bioinformatics Emory University Atlanta, GA 303022 [1em] hao.wu@emory.edu October 30, 2017 Contents 1 Introduction..............................
More informationSTAT:5400 Computing in Statistics
STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,
More informationIntro to Stata for Political Scientists
Intro to Stata for Political Scientists Andrew S. Rosenberg Junior PRISM Fellow Department of Political Science Workshop Description This is an Introduction to Stata I will assume little/no prior knowledge
More informationDebugging R Code. Biostatistics
Debugging R Code Biostatistics 140.776 Something s Wrong! Indications that something s not right message: A generic notification/diagnostic message produced by the message function; execution of the function
More informationMIGSA: Getting pbcmc datasets
MIGSA: Getting pbcmc datasets Juan C Rodriguez Universidad Católica de Córdoba Universidad Nacional de Córdoba Cristóbal Fresno Instituto Nacional de Medicina Genómica Andrea S Llera Fundación Instituto
More informationRearranging and manipula.ng data
An introduc+on to Rearranging and manipula.ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 7 Course outline Review Checking and cleaning data Rearranging and manipula+ng
More informationBIOSTAT640 R Lab1 for Spring 2016
BIOSTAT640 R Lab1 for Spring 2016 Minming Li & Steele H. Valenzuela Feb.1, 2016 This is the first R lab session of course BIOSTAT640 at UMass during the Spring 2016 semester. I, Minming (Matt) Li, am going
More informationAppendix II: STATA Preliminary
Appendix II: STATA Preliminary STATA is a statistical software package that offers a large number of statistical and econometric estimation procedures. With STATA we can easily manage data and apply standard
More informationcrlmm to downstream data analysis
crlmm to downstream data analysis VJ Carey, B Carvalho March, 2012 1 Running CRLMM on a nontrivial set of CEL files To use the crlmm algorithm, the user must load the crlmm package, as described below:
More informationSurvey Questions and Methodology
Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project
More informationAppendix II: STATA Preliminary
Appendix II: STATA Preliminary STATA is a statistical software package that offers a large number of statistical and econometric estimation procedures. With STATA we can easily manage data and apply standard
More informationInternational Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata
International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program
More informationBIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...
BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...
More informationIntroduction to BatchtoolsParam
Nitesh Turaga 1, Martin Morgan 2 Edited: March 22, 2018; Compiled: January 4, 2019 1 Nitesh.Turaga@ RoswellPark.org 2 Martin.Morgan@ RoswellPark.org Contents 1 Introduction..............................
More informationData Cleaning. Andrew Jaffe. January 6, 2016
Data Cleaning Andrew Jaffe January 6, 2016 Data We will be using multiple data sets in this lecture: Salary, Monument, Circulator, and Restaurant from OpenBaltimore: https: //data.baltimorecity.gov/browse?limitto=datasets
More informationRaman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set
Raman Spectra of Chondrocytes in Cartilage: hyperspec s chondro data set Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging, IPHT Jena e.v. February 13,
More informationApplied Statistics and Econometrics Lecture 6
Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,
More informationIt s Proc Tabulate Jim, but not as we know it!
Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on
More informationCreating IGV HTML reports with tracktables
Thomas Carroll 1 [1em] 1 Bioinformatics Core, MRC Clinical Sciences Centre; thomas.carroll (at)imperial.ac.uk June 13, 2018 Contents 1 The tracktables package.................... 1 2 Creating IGV sessions
More informationStata versions 12 & 13 Week 4 Practice Problems
Stata versions 12 & 13 Week 4 Practice Problems SOLUTIONS 1 Practice Screen Capture a Create a word document Name it using the convention lastname_lab1docx (eg bigelow_lab1docx) b Using your browser, go
More informationLongitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS
Longitudinal Linkage of Cross-Sectional NCDS Data Files Using SPSS What are we doing when we merge data from two sweeps of the NCDS (i.e. data from different points in time)? We are adding new information
More informationRunning the perf function Kim-Anh Le Cao 01 September 2014
Running the perf function Kim-Anh Le Cao 01 September 2014 The function valid has been superseded by the perf function to avoid some selection bias in the sparse functions. This has been fixed. Load the
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationColorado Results. For 10/3/ /4/2012. Contact: Doug Kaplan,
Colorado Results For 10/3/2012 10/4/2012 Contact: Doug Kaplan, 407-242-1870 Executive Summary Following the debates, Gravis Marketing, a non-partisan research firm conducted a survey of 1,438 likely voters
More informationExercise 1: Introduction to Stata
Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet
More informationANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a
ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a Put the following data into an spss data set: Be sure to include variable and value labels and missing value specifications for all variables
More informationPackage zebu. R topics documented: October 24, 2017
Type Package Title Local Association Measures Version 0.1.2 Date 2017-10-21 Author Olivier M. F. Martin [aut, cre], Michel Ducher [aut] Package zebu October 24, 2017 Maintainer Olivier M. F. Martin
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationIPUMS Training and Development: Requesting Data
IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore
More informationThe linear mixed model: modeling hierarchical and longitudinal data
The linear mixed model: modeling hierarchical and longitudinal data Analysis of Experimental Data AED The linear mixed model: modeling hierarchical and longitudinal data 1 of 44 Contents 1 Modeling Hierarchical
More information