Package MatchingFrontier

Similar documents
Package MatchIt. April 18, 2017

Package texteffect. November 27, 2017

Package panelview. April 24, 2018

Package oaxaca. January 3, Index 11

Package GLDreg. February 28, 2017

Package cgh. R topics documented: February 19, 2015

Package sciplot. February 15, 2013

Package CausalGAM. October 19, 2017

Package SAMUR. March 27, 2015

Package desplot. R topics documented: April 3, 2018

Package mixphm. July 23, 2015

Package nearfar. June 3, 2017

Package CausalImpact

Package Mondrian. R topics documented: March 4, Type Package

Package r2d2. February 20, 2015

Package PTE. October 10, 2017

Package qvcalc. R topics documented: September 19, 2017

Package rdd. March 14, 2016

Package intccr. September 12, 2017

RCR.R User Guide. Table of Contents. Version 0.9

Package waterfall. R topics documented: February 15, Type Package. Version Date Title Waterfall Charts in R

Package areaplot. October 18, 2017

Package beanplot. R topics documented: February 19, Type Package

Package fitplc. March 2, 2017

Package gains. September 12, 2017

Package MIICD. May 27, 2017

Package sure. September 19, 2017

Package mlegp. April 15, 2018

Package CMatching. R topics documented:

Package TriMatch. R topics documented: December 6, License GPL (>= 2)

Package MmgraphR. R topics documented: September 14, Type Package

Package quickreg. R topics documented:

Package generxcluster

The twang Package. July 27, 2006

Common R commands used in Data Analysis and Statistical Inference

Package basictrendline

Package casematch. R topics documented: January 7, Type Package

Package FWDselect. December 19, 2015

Package JOP. February 19, 2015

Package CorporaCoCo. R topics documented: November 23, 2017

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

Package RTNduals. R topics documented: March 7, Type Package

Package cosinor. February 19, 2015

Penalized regression Statistical Learning, 2011

Package ICEbox. July 13, 2017

Package nplr. August 1, 2015

Package SCVA. June 1, 2017

Package bdots. March 12, 2018

Package pampe. R topics documented: November 7, 2015

Package Cubist. December 2, 2017

Package mwa. R topics documented: February 24, Type Package

Package gee. June 29, 2015

Package esabcv. May 29, 2015

Package rknn. June 9, 2015

An Introduction to R 2.2 Statistical graphics

Package strat. November 23, 2016

Package EnQuireR. R topics documented: February 19, Type Package Title A package dedicated to questionnaires Version 0.

Package epitab. July 4, 2018

Statistics 251: Statistical Methods

Package arphit. March 28, 2019

Package replicatedpp2w

An introduction to network inference and mining - TP

Package dynsim. R topics documented: August 29, 2016

Package OptimaRegion

Package visualizationtools

Package ClustGeo. R topics documented: July 14, Type Package

Package PIPS. February 19, 2015

Package snm. July 20, 2018

R topics documented: 2 genbernoullidata

DSCI 325: Handout 18 Introduction to Graphics in R

Package mmeta. R topics documented: March 28, 2017

Package csvread. August 29, 2016

2 cubist.default. Fit a Cubist model

Package lvplot. August 29, 2016

Package docxtools. July 6, 2018

The relimp Package. April 26, 2006

Package Grid2Polygons

Package DTRreg. August 30, 2017

Package GRmetrics. R topics documented: September 27, Type Package

Package RcmdrPlugin.survival

Package R2SWF. R topics documented: February 15, Version 0.4. Title Convert R Graphics to Flash Animations. Date

Basic Statistical Graphics in R. Stem and leaf plots 100,100,100,99,98,97,96,94,94,87,83,82,77,75,75,73,71,66,63,55,55,55,51,19

Package MixSim. April 29, 2017

Package munfold. R topics documented: February 8, Type Package. Title Metric Unfolding. Version Date Author Martin Elff

Package SmoothHazard

Package RcmdrPlugin.HH

Package simmsm. March 3, 2015

Package multihiccompare

Package rbounds. February 20, 2015

Package HMRFBayesHiC

Package orthodr. R topics documented: March 26, Type Package

Package EMC. February 19, 2015

Section 4 Matching Estimator

Package abe. October 30, 2017

An R Package for the Panel Approach Method for Program Evaluation: pampe by Ainhoa Vega-Bayo

Package GlobalDeviance

The RcmdrPlugin.HH Package

Package aop. December 5, 2016

Package addhaz. September 26, 2018

Package beast. March 16, 2018

Transcription:

Version 0.15.28 Date 2015-01-20 Type Package Package MatchingFrontier March 1, 2015 Title Computation of the Balance - Sample Size Frontier in Matching Methods for Causal Inference Author Gary King, Christopher Lucas, and Richard Nielsen Maintainer Christopher Lucas <clucas@fas.harvard.edu> MatchingFrontier returns the subset of the data with the minimum imbalance for every possible subset size (N - 1, N - 2,...), down to the data set with the minimum possible imbalance. The package also includes tools for the estimation of causal effects for each subset size, as well as functions for visualization and data export. MatchingFrontier also contains functions for calculating model dependence, as proposed by Athey and Imbens. URL http://projects.iq.harvard.edu/frontier Imports MASS, igraph, segmented Suggests stargazer LazyData true License GPL-3 R topics documented: estimateeffects....................................... 2 generatedataset....................................... 3 lalonde............................................ 4 makefrontier........................................ 5 modeldependence...................................... 6 parallelplot......................................... 7 plotestimates........................................ 8 plotfrontier......................................... 9 plotmeans.......................................... 10 Index 12 1

2 estimateeffects estimateeffects Estimate Effects on the Frontier estimateeffects() is used to estimate the effect of the treatment along the entire frontier. estimateeffects(frontier.object, formula, prop.estimated = 1, mod.dependence.formula, continuous.vars = NA, seed = 1, means.as.cutpoints = FALSE) frontier.object An object generated by makefrontier(). formula An object of class formula (or one that can be coerced to that class). This will be passed to lm() to estimate the point estimates for the causal effect estimates across the frontier. prop.estimated The proportion of points on the frontier to estimate. By default, 100% of the points on the frontier are estimated. To estimate less than 100% of the points, pass the proportion to be estimated to prop.estimated (for example,.6 to estimate 60% of the points). mod.dependence.formula The formula used as the base formula for the Athey-Imbens model dependence estimates. continuous.vars All continuous control variables in mod.dependence.formula must be passed as a character vector to continuous.vars. A cutpoint for each of these variables will be estimated with segmented regression. seed The seed used before estimation of the effects. If prop.estimated is less than 1, this is necessary in order to replicate the exact plot. means.as.cutpoints FALSE by default. If TRUE, cutpoints are calculated as the mean instead of the breakpoint in a segmented regression. This is sometimes much faster. King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015).

generatedataset 3 match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde, my.form <- as.formula(re78 ~ treat + age + black + education + hispanic + married + nodegree + re74 + re75) ## Not run: my.estimates <- estimateeffects(my.frontier, re78 ~ treat, mod.dependence.formula = my.form, continuous.vars = c( age, education, re74, re75 ), prop.estimated =.1, means.as.cutpoints = TRUE) ## End(Not run) generatedataset Generate a data set that is on the balance - sample size frontier. generatedataset() allows the user to export a data set that sits on the frontier. generatedataset(frontier.object, N) frontier.object An object generated by makefrontier(). N The number of observations left in the exported data set. If the user selects an undefined point, generatedataset returns a dataset from the nearest defined point on the frontier. King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015).

4 lalonde match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde, n <- 300 # Identify the point from which to select the data matched.data <- generatedataset(my.frontier, N = n) lalonde Modified Lalonde dataset This is a modified version of the Lalonde experimental dataset used for explanatory purposes only. Format A data frame with 16437 observations on the following 11 variables. treat treatment variable indicator age age education years of education black race indicator variable married marital status indicator variable nodegree indicator variable for not possessing a degree re74 real earnings in 1974 re75 real earnings in 1975 re78 real earnings in 1978 (post-treatment outcome) hispanic ethnic indicator variable

makefrontier 5 makefrontier Compute the balance - sample size frontier. makefrontier() computes the balance - frontier sample size and can be used with estimateeffects to estimate effects along the balance - sample size frontier. makefrontier(dataset, treatment, outcome, match.on, keep.vars = NULL, QOI = FSATT, metric = Mahal, ratio = fixed, breaks = NULL) dataset treatment outcome match.on keep.vars QOI metric ratio breaks The data set contain containing the treatment, outcome, and variable to match on. The name of the treatment. The name of the outcome. A vector of colnames indicating which variables are to be matched on. A character vector of variable names that are not in treatment, outcome, or match.on but that the user would like to store in the data, either for calculation of model depedence intervals or for use in exported data sets. The quantity of interest to be estimated. By default, feasible sample average treatment effect on the treated or FSATT. The other option is SATT (sample average treatment effect on the treated). The metric used to measure imbalance. Defaults to average mahalanobis distance to nearest match. The other option is L1. Variable or fixed ratio. See King, Lucas, and Nielsen for details. Can be used with L1 to provide user-specified breaks. King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015). match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde,

6 modeldependence modeldependence Compute the Athey-Imbens measure of sensitivity to model specification. modeldependence() is used to compute the Athey-Imbens measure of sensitivity to model specification. modeldependence(dataset, treatment, base.form, verbose = TRUE, seed = 1, cutpoints = NA) dataset treatment base.form verbose seed cutpoints A data frame containing the variables in the model. The treatment (quantity of interest). The measure of model dependence is with respect to estimates of this quantity. Must be in base.form. The base formula that is to be evaluated. If TRUE, additional information is printed. Seed for the random number generator. A list where the keys are variables names and the values are cutpoints. If specified, cutpoints for these variables will not be estimated. Otherwise, cutpoints are estimated with segmented regression. Athey, Susan, and Guido W. Imbens. "A Measure of Robustness to Misspecification." (2014). treatment <- treat base.form <- as.formula( re78 ~ treat + age + education + black + hispanic + married + nodegree + re74 + re75 ) md <- modeldependence(lalonde, treatment, base.form, cutpoints = list( age = mean(lalonde$age))) print(md$sigma.hat.theta)

parallelplot 7 parallelplot Create a parallel plot for a specified point on the frontier. parallelplot() creates a parallel plot for a specified point on the frontier. Wraps parcoord() from MASS. parallelplot(frontier.object, N, variables, treated.col = grey, control.col = black ) frontier.object An object generated by makefrontier(). N variables treated.col control.col The number of observations left in the exported data set. If the user selects an undefined point, generatedataset returns a dataset from the nearest defined point on the frontier. The variables to be included in the parallel plot. The color of the lines corresponding to observations assigned to the treatment. Grey by default. The color of the lines corresponding to observations assigned to the control. Black by default. Venables, William N., and Brian D. Ripley. Modern applied statistics with S. Springer, 2002. match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] mahal.frontier <- makefrontier(dataset = lalonde, parallelplot(mahal.frontier, N = 300, variables = c( age, re74, re75, black ), treated.col = grey, control.col = blue )

8 plotestimates plotestimates Plot estimates along the frontier. plotestimates() plots estimates along the frontier. plotestimates(estimates.object, xlab = Number of Observations Pruned, ylab = Estimate, main = Effects Plot, xlim = NULL, ylim = NULL, mod.dependence.col = rgb(255,0,0,127, maxcolorvalue=255), mod.dependence.border.col = rgb(255,0,0,200, maxcolorvalue=255), line.col = rgb(102,0,0,255, maxcolorvalue=255),...) estimates.object An object generated by estimateeffects() xlab ylab main xlim ylim Details The label for the x-axis. Defaults to Number of Observations Pruned. The label for the y-axis. Defaults to Estimate. The main label. Defaults to Effects Plot. The x-axis limits. The y-axis limits.... Additional arguments to be passed to plot. mod.dependence.col The color to shade the model dependence region. mod.dependence.border.col The model dependence region border color. line.col The color of the line displaying point estimates. plotestimates() wraps plot and uses... to pass additional arguments to the base plot() function, like color, axis range, etc. King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015).

plotfrontier 9 match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde, base.form <- as.formula( re78 ~ treat + age + education + black + hispanic + married + nodegree + re74 + re75 ) ## Not run: my.estimates <- estimateeffects(my.frontier, re78 ~ treat, mod.dependence.formula = base.form, continuous.vars = c( age, education, re74, re75 ), prop.estimated =.1, means.as.cutpoints = TRUE) plotestimates(my.estimates) ## End(Not run) plotfrontier Plot the balance - sample size frontier. plotfrontier() plots the balance - sample size frontier. plotfrontier(frontier.object, xlab = "Number of Observations Pruned", ylab = frontier.object$metric, main = "Frontier Plot",...) frontier.object An object generated by makefrontier() xlab ylab main Details The label for the x-axis. Defaults to Number of Observations Pruned. The label for the y-axis. Defaults to the selected metric. The main label. Defaults to Effects Plot.... Additional arguments to be passed to plot. plotestimates() wraps plot and uses... to pass additional arguments to the base plot() function, like color, axis range, etc.

10 plotmeans King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015). match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde, plotfrontier(my.frontier) plotmeans Plot covariate means along the frontier. plotmeans() plots means along the frontier. plotmeans(frontier.object, xlab = Number of Observations Pruned, main = Means Plot, xlim = c(1,max(frontier.object$frontier$xs)), ylim = c(0, 1), cols = rainbow(length(frontier.object$match.on)), diff.in.means = FALSE,...) frontier.object An object generated by makefrontier() xlab main xlim The label for the x-axis. Defaults to Number of Observations Pruned. The main label. Defaults to Means Plot. The x-axis limits. Defaults to the range of the frontier. ylim The y-axis limits. Defaults to (0, 1). cols diff.in.means The line colors. Defaults to the rainbow palette. If TRUE, means are the difference in means between treated and control groups. If FALSE (the default), means are the covariate means pooling across treated and control.... Additional arguments to be passed to plot.

plotmeans 11 Details plotmeans() wraps plot and uses... to pass additional arguments to the base plot() function. King, Gary, Christopher Lucas, and Richard Nielsen. "The Balance-Sample Size Frontier in Matching Methods for Causal Inference." (2015). match.on <- colnames(lalonde)[!(colnames(lalonde) %in% c( re78, treat ))] my.frontier <- makefrontier(dataset = lalonde, plotmeans(my.frontier)

Index estimateeffects, 2 generatedataset, 3 lalonde, 4 makefrontier, 5 modeldependence, 6 parallelplot, 7 plotestimates, 8 plotfrontier, 9 plotmeans, 10 12