Package auctestr. November 13, 2017

Similar documents
Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package fastdummies. January 8, 2018

Package bisect. April 16, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package SEMrushR. November 3, 2018

Package datasets.load

Package splithalf. March 17, 2018

Package comparedf. February 11, 2019

Package ECctmc. May 1, 2018

Package vinereg. August 10, 2018

Package rsppfp. November 20, 2018

Package wrswor. R topics documented: February 2, Type Package

Package validara. October 19, 2017

Package WordR. September 7, 2017

Package robotstxt. November 12, 2017

Package IATScore. January 10, 2018

Package docxtools. July 6, 2018

Package milr. June 8, 2017

Package embed. November 19, 2018

Package messaging. May 27, 2018

Package sigqc. June 13, 2018

Package climber. R topics documented:

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package vip. June 15, 2018

Package labelvector. July 28, 2018

Package dkanr. July 12, 2018

Package canvasxpress

Package projector. February 27, 2018

Package calpassapi. August 25, 2018

Package kdtools. April 26, 2018

Package fitbitscraper

Package knitrprogressbar

Package mmpa. March 22, 2017

Package repec. August 31, 2018

Package pcr. November 20, 2017

Package crossword.r. January 19, 2018

Package PDN. November 4, 2017

Package catenary. May 4, 2018

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package githubinstall

Package queuecomputer

Package statsdk. September 30, 2017

Package rzeit2. January 7, 2019

Package shinyfeedback

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package modmarg. R topics documented:

Package facerec. May 14, 2018

Package nngeo. September 29, 2018

Package semver. January 6, 2017

Package keyholder. May 19, 2018

Package jpmesh. December 4, 2017

Package condusco. November 8, 2017

Package linkspotter. July 22, Type Package

Package gtrendsr. October 19, 2017

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package tidyimpute. March 5, 2018

Package internetarchive

Package spark. July 21, 2017

Package TrafficBDE. March 1, 2018

Package balance. October 12, 2018

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package states. May 4, 2018

Package stapler. November 27, 2017

Package readxl. April 18, 2017

Package quickreg. R topics documented:

Package editdata. October 7, 2017

Package cancensus. February 4, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package omu. August 2, 2018

Package sfdct. August 29, 2017

Package qualmap. R topics documented: September 12, Type Package

Package filematrix. R topics documented: February 27, Type Package

Package spelling. December 18, 2017

Package triebeard. August 29, 2016

Package rgho. R topics documented: January 18, 2017

Package patentsview. July 12, 2017

Package NFP. November 21, 2016

Package censusapi. August 19, 2018

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package nmslibr. April 14, 2018

Package kirby21.base

Package rcv. August 11, 2017

Package wikitaxa. December 21, 2017

Package pdfsearch. July 10, 2018

Package elmnnrcpp. July 21, 2018

Package epitab. July 4, 2018

Package tidytransit. March 4, 2019

Package widyr. August 14, 2017

Package raker. October 10, 2017

Package sfc. August 29, 2016

Package rwars. January 14, 2017

Package textrecipes. December 18, 2018

Package barcoder. October 26, 2018

Package lumberjack. R topics documented: July 20, 2018

Package qicharts2. March 3, 2018

Package mixsqp. November 14, 2018

Package datapasta. January 24, 2018

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package gbts. February 27, 2017

Package opencage. January 16, 2018

Transcription:

Title Statistical Testing for AUC Data Version 1.0.0 Maintainer Josh Gardner <jpgard@umich.edu> Package auctestr November 13, 2017 Performs statistical testing to compare predictive models based on multiple observations of the A' statistic (also known as Area Under the Receiver Operating Characteristic Curve, or AUC). Specifically, it implements a testing method based on the equivalence between the A' statistic and the Wilcoxon statistic. For more information, see Hanley and McNeil (1982) <doi:10.1148/radiology.143.1.7063747>. Imports dplyr, tidyr Depends R (>= 3.3.1) License MIT + file LICENSE Encoding UTF-8 LazyData true RoxygenNote 6.0.1 Suggests knitr, rmarkdown, testthat VignetteBuilder knitr NeedsCompilation no Author Josh Gardner [aut, cre] Repository CRAN Date/Publication 2017-11-13 09:46:18 UTC R topics documented: auctestr........................................... 2 auc_compare........................................ 2 fbh_test........................................... 4 sample_experiment_data.................................. 5 se_auc............................................ 6 stouffer_z.......................................... 7 Index 8 1

2 auc_compare auctestr auctestr: Statistical Testing for AUC data. auctestr currently provides four main useful functions for statistical testing of the AUC, or A, statistic: fbh_auc_compare, stouffer_z, fbh_test, and se_auc. auc_compare Compare AUC values using the FBH method. Apply the FBH method to compare outcome_col by compare_col, averaging over time_col (due to non-independence) and then over over_col by using Stouffer s Method. auc_compare(df, compare_values, filter_value, time_col = "time", outcome_col = "auc", compare_col = "model_id", over_col = "dataset", n_col = "n", n_p_col = "n_p", n_n_col = "n_n", filter_col = "model_variant") Arguments df DataFrame containing time_col, outcome_col, compare_col, and over_col. compare_values names of models to compare (character vector of length 2). These should match exactly the names as they appear in compare_col. filter_value (optional) keep only observations which contain filter_value for filter_col. time_col name of column in df representing time of observations (z-scores are averaged over time_col within each model/dataset due to non-independence). These can also be other dependent groupings, such as cross-validation folds. outcome_col name of column in df representing outcome to compare; this should be Area Under the Receiver Operating Characteristic or A statistic (this method applies specifically to AUC and not other metrics (i.e., sensitivity, precision, F1).. compare_col name of column in df representing two conditions to compare (should contain at least 2 unique values; these two values are specified as compare_values). over_col identifier for independent experiments, iterations, etc. over which z-scores for models are to be compared (using Stouffer s Z). n_col name of column in df with total number of observations in the sample tested by each row. n_p_col name of column in df with n_p, number of positive observations. n_n_col name of column in df with n_n, number of negative observations. filter_col (optional) name of column in df to filter observations on; keep only observations which contain filter_value for filter_col.

auc_compare 3 Value numeric, overall z-score of comparison using the FBH method. References Fogarty, Baker and Hudson, Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction, Proceedings of Graphics Interface (2005) pp. 129-136. Stouffer, S.A.; Suchman, E.A.; DeVinney, L.C.; Star, S.A.; Williams, R.M. Jr. The American Soldier, Vol.1: Adjustment during Army Life (1949). See Also Other fbh method: fbh_test, se_auc Examples ## load sample experiment data data(sample_experiment_data) ## compare VariantA of ModelA and ModelB auc_compare(sample_experiment_data, compare_values = c('modela', 'ModelB'), filter_value = c('varianta'), time_col = 'time', outcome_col = 'auc', compare_col = 'model_id', over_col = 'dataset', filter_col = 'model_variant') ## compare VariantC of ModelA and ModelB auc_compare(sample_experiment_data, compare_values = c('modela', 'ModelB'), filter_value = c('variantc'), time_col = 'time', outcome_col = 'auc', compare_col = 'model_id', over_col = 'dataset', filter_col = 'model_variant') ## compare ModelC, VariantA and VariantB auc_compare(sample_experiment_data, compare_values = c('varianta', 'VariantB'), filter_value = c('modelc'), time_col = 'time', outcome_col = 'auc', compare_col = 'model_variant', over_col = 'dataset', filter_col = 'model_id')

4 fbh_test fbh_test Apply z-test for difference between auc_1 and auc_2 using FBH method. Apply z-test for difference between auc_1 and auc_2 using FBH method. fbh_test(auc_1, auc_2, n_p, n_n) Arguments auc_1 auc_2 n_p n_n value of A statistic (or AUC, or Area Under the Receiver operating characteristic curve) for the first group (numeric). value of A statistic (or AUC, or Area Under the Receiver operating characteristic curve) for the second group (numeric). number of positive observations (needed for calculation of standard error of Wilcoxon statistic) (numeric). number of negative observations (needed for calculation of standard error of Wilcoxon statistic) (numeric). Value numeric, single aggregated z-score of comparison A _1 - A _2. References Fogarty, Baker and Hudson, Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction, Proceedings of Graphics Interface (2005) pp. 129-136. See Also Other fbh method: auc_compare, se_auc Examples ## Two models with identical AUC return z-score of zero fbh_test(0.56, 0.56, 1000, 2500) ## Compare two models; note that changing order changes sign of z-statistic fbh_test(0.56, 0.59, 1000, 2500) fbh_test(0.59, 0.56, 1000, 2500)

sample_experiment_data 5 sample_experiment_data Performance of several predictive models over three different datasets, using multiple cutoff points for time within each dataset. A dataset containing the performance of several predictive models over three different datasets, where models are built using data from multiple time points (where time 1 has less data than time 2, but each subsequent time point T also uses data from all prior time points up to that time t<= T.) This represents the typical output of a machine learning experiment where several models are being considered across multiple datasets, often with different variants of each model type being considered (i.e., different hyperparameter settings of each model). sample_experiment_data Format A data frame with 180 rows and 10 variables: auc Area Under the Receiver Operating Characteristic Curve, or AUC, for this model configuration. precision Precision for this model configuration. accuracy Accuracy for this model configuration. n Number of observations in this dataset. n_n Number of negative observations (i.e., outcome == 0) in this dataset (required for standard error estimation of AUC statistic). n_p Number of positive observations (i.e., outcome == 1) in this dataset (required for standard error estimation of AUC statistic). dataset indicator for different datasets. time indicator for different time points used to build each dataset; these represent dependent observations of model performance. model_id Indicator for the statistical algorithm used (this could be Logistic Regression, SVM, etc.). model_variant Indicator for different variants of each model which are not equivalent and should be used individually (model should not be averaged over these, and instead should be held fixed when comparing to other model). Example of this could be various hyperparameter settings for a given model (i.e., cost for an SVM).

6 se_auc se_auc Compute standard error of AUC score, using its equivalence to the Wilcoxon statistic. Compute standard error of AUC score, using its equivalence to the Wilcoxon statistic. se_auc(auc, n_p, n_n) Arguments auc n_p n_n value of A statistic (or AUC, or Area Under the Receiver operating characteristic curve) (numeric). number of positive cases (integer). number of negative cases (integer). References Hanley and McNeil, The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology (1982) 43 (1) pp. 29-36. Fogarty, Baker and Hudson, Case Studies in the use of ROC Curve Analysis for Sensor-Based Estimates in Human Computer Interaction, Proceedings of Graphics Interface (2005) pp. 129-136. See Also Other fbh method: auc_compare, fbh_test Examples se_auc(0.75, 20, 200) ## standard error decreases when data become more balanced over ## positive/negative outcome class, holding sample size fixed se_auc(0.75, 110, 110) ## standard error increases when sample size shrinks se_auc(0.75, 20, 20)

stouffer_z 7 stouffer_z Compute aggregate z-score using Stouffer s method. Compute aggregate z-score using Stouffer s method. stouffer_z(z_vec, ignore.na = TRUE) Arguments z_vec ignore.na vector of z-scores (numeric). should NA values be ignored? defaults to TRUE. Value numeric, Z-score using Stouffer s method aggregated over z_vec. References Stouffer, S.A.; Suchman, E.A.; DeVinney, L.C.; Star, S.A.; Williams, R.M. Jr. The American Soldier, Vol.1: Adjustment during Army Life (1949).

Index Topic datasets sample_experiment_data, 5 auc_compare, 2, 4, 6 auctestr, 2 auctestr-package (auctestr), 2 fbh_test, 3, 4, 6 sample_experiment_data, 5 se_auc, 3, 4, 6 stouffer_z, 7 8