Package sbrl. R topics documented: July 29, 2016

Similar documents
Package datasets.load

Package milr. June 8, 2017

Package fastdummies. January 8, 2018

Package ECctmc. May 1, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package TVsMiss. April 5, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package vinereg. August 10, 2018

Package strat. November 23, 2016

Package validara. October 19, 2017

Package hyphenatr. August 29, 2016

Package fasttextr. May 12, 2017

Package manet. September 19, 2017

Package sgmcmc. September 26, Type Package

Package bisect. April 16, 2018

Package arulescba. April 24, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package gbts. February 27, 2017

Package mfa. R topics documented: July 11, 2018

Package ChoR. May 27, 2018

Package JOUSBoost. July 12, 2017

Package OsteoBioR. November 15, 2018

Package Numero. November 24, 2018

Package AnDE. R topics documented: February 19, 2015

Package TANDEM. R topics documented: June 15, Type Package

Package rpst. June 6, 2017

Package fbroc. August 29, 2016

Package fst. December 18, 2017

Package kirby21.base

Package IATScore. January 10, 2018

Package clipr. June 23, 2018

Package coga. May 8, 2018

Package raker. October 10, 2017

Package sgd. August 29, 2016

Package SimilaR. June 21, 2018

Package OLScurve. August 29, 2016

Package mdftracks. February 6, 2017

Package semver. January 6, 2017

Package readxl. April 18, 2017

Package rucrdtw. October 13, 2017

Package wskm. July 8, 2015

Package rcba. May 27, 2018

Package elmnnrcpp. July 21, 2018

Package fastrtext. December 10, 2017

Package kernelfactory

Package bife. May 7, 2017

Package automl. September 13, 2018

Package BayesCR. September 11, 2017

Package nngeo. September 29, 2018

Package farver. November 20, 2018

Package embed. November 19, 2018

Package Rspc. July 30, 2018

Package rhor. July 9, 2018

Package LVGP. November 14, 2018

Package ompr. November 18, 2017

Package CatEncoders. March 8, 2017

Package CuCubes. December 9, 2016

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package hashids. August 29, 2016

Package QCAtools. January 3, 2017

Package mfbvar. December 28, Type Package Title Mixed-Frequency Bayesian VAR Models Version Date

Package geogrid. August 19, 2018

Package AhoCorasickTrie

Package DALEX. August 6, 2018

Package sfc. August 29, 2016

Package ipft. January 4, 2018

Package tuts. June 12, 2018

Package labelvector. July 28, 2018

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package Rtsne. April 14, 2017

Package RODBCext. July 31, 2017

Package RNAseqNet. May 16, 2017

Package SAENET. June 4, 2015

Package estprod. May 2, 2018

Package clustmixtype

Package wrswor. R topics documented: February 2, Type Package

Package bayeslongitudinal

Package readobj. November 2, 2017

Package bikedata. April 27, 2018

Package FSelectorRcpp

Package fst. June 7, 2018

Package auctestr. November 13, 2017

Package nodeharvest. June 12, 2015

Package Bergm. R topics documented: September 25, Type Package

Package RAMP. May 25, 2017

Package ANOVAreplication

Package PCADSC. April 19, 2017

Package catenary. May 4, 2018

Package clustering.sc.dp

Package maxent. July 2, 2014

Package pomdp. January 3, 2019

Package repec. August 31, 2018

Package PrivateLR. March 20, 2018

Package nonlinearicp

Package diffusr. May 17, 2018

Package coda.base. August 6, 2018

Package tidyimpute. March 5, 2018

Package queuecomputer

Package canvasxpress

Package BigVAR. R topics documented: April 3, Type Package

Transcription:

Type Package Title Scalable Bayesian Rule Lists Model Version 1.2 Date 2016-07-25 Author Hongyu Yang [aut, cre], Morris Chen [ctb], Cynthia Rudin [aut, ctb], Margo Seltzer [aut, ctb] Maintainer Hongyu Yang <hongyuy@mit.edu> Package sbrl July 29, 2016 An implementation of Scalable Bayesian Rule Lists Algorithm. License GPL (>= 2) Imports Rcpp (>= 0.12.4), arules, methods RcppModules sbrl LinkingTo Rcpp NeedsCompilation yes LazyData yes SystemRequirements gmp (>= 4.2.0), gsl Repository CRAN Date/Publication 2016-07-29 06:41:15 R topics documented: sbrl-package......................................... 2 get_data_feature_mat.................................... 3 predict.sbrl......................................... 4 print.sbrl........................................... 5 sbrl............................................. 6 tictactoe........................................... 7 Index 9 1

2 sbrl-package sbrl-package SCALABLE BAYESIAN RULE LISTS Fit a sbrl model. Learn from the data and create a decision rule list in the format of: if (condition1) then positive probability =... else if (condition2) then positive probability =... else if (condition3)...... Details else (default rule) then positive probability =... ( See the examples below ) This package contains three functions: sbrl, print.sbrl, show.sbrl, and predict.sbrl Author(s) Hongyu Yang, Cynthia Rudin, Margo Seltzer References Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer (2016) Scalable Bayesian Rule Lists. Working paper on arxiv 2016. Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan (2015) Building Interpretable Classifiers with Rules using Bayesian Analysis. Annals of Applied Statistics, 2015. See Also sbrl, print.sbrl, show.sbrl and predict.sbrl # Let us use the titactoe dataset for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])} # Train on two-thirds of the data b = round(2*nrow(tictactoe)/3, digit=0) data_train <- tictactoe[1:b, ] # Test on the remaining one third of the data data_test <- tictactoe[(b+1):nrow(tictactoe), ] # data_train, data_test are dataframes with factor columns # The class column is "label"

get_data_feature_mat 3 # Run the sbrl algorithm on the training set sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0", rule_minlen=1, rule_maxlen=3, lambda=10.0, eta=1.0, nchain=25) print(sbrl_model) # Make predictions on the test set yhat <- predict(sbrl_model, data_test) # yhat will be a list of predicted negative and positive probabilities for the test data. get_data_feature_mat GET BINARY MATRIX REPRESENTATION OF THE DATA- FEATURE RELAITONSHIP Given some features in the form "feature1=x1", "feature2=x2"..., this function will generate a matrix representation of which data are captured by which features. get_data_feature_mat(data, featurenames) Arguments data featurenames a data.frame representing the observations. a character vector representing the features in the form: "feature1=x1", "feature2=x2"... Value a binary matrix of size #observations-by-#featurenames Author(s) Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer featurenames <- c("c1=b", "c1=o", "c1=x") get_data_feature_mat(tictactoe, featurenames) #it will generate a binary matrix representing which observations are captured by which features.

4 predict.sbrl predict.sbrl PREDICT THE POSITIVE PROBABILITY FOR THE OBSERVA- TIONS Returns a list of probabilities. ## S3 method for class 'sbrl' predict(object, tdata,...) Arguments Value object tdata sbrl model returned from the sbrl function. test data... further arguments passed to or from other methods. return a list containing 2 lists of probablities for the rule list, corresponding to probability being 0 and 1 for each observation. The two probabilities for each rule add up to 1, P(y=0 rule r) + p(y=1 rule r) = 1 # Let us use the titactoe dataset for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])} # Train on two-thirds of the data b = round(2*nrow(tictactoe)/3, digit=0) data_train <- tictactoe[1:b, ] # Test on the remaining one third of the data data_test <- tictactoe[(b+1):nrow(tictactoe), ] # data_train, data_test are dataframes with factor columns # The class column is "label" # Run the sbrl algorithm on the training set sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0", rule_minlen=1, rule_maxlen=3, lambda=10.0, eta=1.0, nchain=25) print(sbrl_model) # Make predictions on the test set yhat <- predict(sbrl_model, data_test) # yhat will be a list of predicted negative and positive probabilities for the test data.

print.sbrl 5 print.sbrl INTERPRETABLE VERSION OF A SBRL MODEL This function prints an sbrl object. It is a method for the generic function print of class "sbrl". # S3 method for class 'sbrl' # This complies with the form of the standard generic method print ## S3 method for class 'sbrl' print(x, uses4=false,...) ## S3 method for class 'sbrl' show(x, uses4=false,...) Arguments x Details uses4 A sbrl model returned from sbrl function An argument used to match showdefault function. Fixed as FALSE.... further arguments passed to or from other methods. This function is a method for the generic function print for class "sbrl". It can be invoked by calling print for an object of the appropriate class, or directly by calling print.sbrl regardless of the class of the object. # Let us use the titactoe dataset for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])} # Train on two-thirds of the data b = round(2*nrow(tictactoe)/3, digit=0) data_train <- tictactoe[1:b, ] # Test on the remaining one third of the data data_test <- tictactoe[(b+1):nrow(tictactoe), ] # data_train, data_test are dataframes with factor columns # The class column is "label" # Run the sbrl algorithm on the training set sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0", rule_minlen=1, rule_maxlen=3, lambda=10.0, eta=1.0, nchain=25) print(sbrl_model)

6 sbrl sbrl TRAIN THE SBRL MODEL WITH THE GIVEN DATA TRAIN THE SBRL MODEL WITH THE GIVEN DATA sbrl(tdata, iters=30000, pos_sign="1", neg_sign="0", rule_minlen=1, rule_maxlen=1, lambda=10.0, eta=1.0, alpha=c(1,1), nchain=10) Arguments Value tdata iters pos_sign neg_sign rule_minlen rule_maxlen a dataframe, with a "label" column specifying the correct labels for each observation. the number of iterations for each MCMC chain. the sign for the positive labels in the "label" column. the sign for the negative labels in the "label" column. the minimum number of cardinality for rules to be mined from the dataframe. the maximum number of cardinality for rules to be mined from the dataframe. minsupport_pos a number between 0 and 1, for the minimum percentage support for the positive observations. minsupport_neg a number between 0 and 1, for the minimum percentage support for the negative observations. lambda eta a hyperparameter for the expected length of the rule list. a hyperparameter for the expected cardinality of the rules in the optimal rule list. alpha a prior pseudo-count for the positive and negative classes. fixed at 1 s nchain Return a list of : rs rulenames an integer for the number of the chains that MCMC will be running. a ruleset which contains the rule indices and their positive probabilities for the best rule list by training sbrl with the given data and parameters. a list of all the rule names mined with arules. featurenames a list of all the feature names. mat_feature_rule a binary matrix representing which features are included in which rules.

tictactoe 7 Author(s) Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer References Hongyu Yang, Morris Chen, Cynthia Rudin, Margo Seltzer (2016) Scalable Bayesian Rule Lists. Working paper on arxiv 2016. Benjamin Letham, Cynthia Rudin, Tyler McCormick and David Madigan (2015) Building Interpretable Classifiers with Rules using Bayesian Analysis. Annals of Applied Statistics, 2015. # Let us use the titactoe dataset for (name in names(tictactoe)) {tictactoe[name] <- as.factor(tictactoe[,name])} # Train on two-thirds of the data b = round(2*nrow(tictactoe)/3, digit=0) data_train <- tictactoe[1:b, ] # Test on the remaining one third of the data data_test <- tictactoe[(b+1):nrow(tictactoe), ] # data_train, data_test are dataframes with factor columns # The class column is "label" # Run the sbrl algorithm on the training set sbrl_model <- sbrl(data_train, iters=20000, pos_sign="1", neg_sign="0", rule_minlen=1, rule_maxlen=3, lambda=10.0, eta=1.0, nchain=25) print(sbrl_model) # Make predictions on the test set yhat <- predict(sbrl_model, data_test) # yhat will be a list of predicted negative and positive probabilities for the test data. tictactoe SHUFFLED TIC-TAC-TOE-ENDGAME DATASET This is a shuffled version of the Tic-Tac-Toe Endgame Data Set on UCI Machine Learning Repository. data("tictactoe")

8 tictactoe Format Details Source A data frame with 958 observations on the following 10 variables. c1 a factor with levels b, o, x c2 a factor with levels b, o, x c3 a factor with levels b, o, x c4 a factor with levels b, o, x c5 a factor with levels b, o, x c6 a factor with levels b, o, x c7 a factor with levels b, o, x c8 a factor with levels b, o, x c9 a factor with levels b, o, x label an integer with values 0, 1 This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when "x" has one of 8 possible ways to create a "three-in-a-row"). https://archive.ics.uci.edu/ml/datasets/tic-tac-toe+endgame ## maybe str(tictactoe) ; plot(tictactoe)...

Index Topic \textasciitildekwd1 get_data_feature_mat, 3 predict.sbrl, 4 print.sbrl, 5 sbrl, 6 Topic \textasciitildekwd2 get_data_feature_mat, 3 predict.sbrl, 4 print.sbrl, 5 sbrl, 6 Topic package sbrl-package, 2 get_data_feature_mat, 3 predict (predict.sbrl), 4 predict.sbrl, 2, 4 print.sbrl, 2, 5 sbrl, 2, 4, 5, 6 sbrl-package, 2 show.sbrl, 2 show.sbrl (print.sbrl), 5 tictactoe, 7 9