Package sqlscore. April 29, 2018

Similar documents
Package dbx. July 5, 2018

Package reghelper. April 8, 2017

Package rwars. January 14, 2017

Package colf. October 9, 2017

Package crochet. January 8, 2018

Package rdryad. June 18, 2018

Package optimus. March 24, 2017

Package loggit. April 9, 2018

Package glue. March 12, 2019

Package glmnetutils. August 1, 2017

Package elasticsearchr

Package palmtree. January 16, 2018

Package messaging. May 27, 2018

Package vip. June 15, 2018

Package postgistools

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package ECctmc. May 1, 2018

Package sparklyr.nested

Package mason. July 5, 2018

Package rbraries. April 18, 2018

Package modmarg. R topics documented:

Package caretensemble

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package spark. July 21, 2017

Package fastdummies. January 8, 2018

Package aws.transcribe

Package dkanr. July 12, 2018

Package available. November 17, 2017

Package docxtools. July 6, 2018

Package eply. April 6, 2018

Package RPostgres. April 6, 2018

Package datasets.load

Package robotstxt. November 12, 2017

Package memoise. April 21, 2017

Package CompGLM. April 29, 2018

Package taxizedb. June 21, 2017

Package repec. August 31, 2018

Package vinereg. August 10, 2018

Package farver. November 20, 2018

Package clipr. June 23, 2018

Package SNPediaR. April 9, 2019

Package implyr. May 17, 2018

Package embed. November 19, 2018

Package humanize. R topics documented: April 4, Version Title Create Values for Human Consumption

Package tidyselect. October 11, 2018

Package goodpractice

Package RPresto. July 13, 2017

Package pdfsearch. July 10, 2018

Package citools. October 20, 2018

Package wikitaxa. December 21, 2017

Package SSLASSO. August 28, 2018

Package virustotal. May 1, 2017

Package rzeit2. January 7, 2019

Package deductive. June 2, 2017

Package glinternet. June 15, 2018

Package nima. May 23, 2018

Package oasis. February 21, 2018

Package arulescba. April 24, 2018

Package ClusterBootstrap

Package validara. October 19, 2017

Package apastyle. March 29, 2017

Package strat. November 23, 2016

Package fastqcr. April 11, 2017

Package shinyhelper. June 21, 2018

Package condformat. October 19, 2017

Package RAMP. May 25, 2017

Package spelling. December 18, 2017

Package RODBCext. July 31, 2017

Package FWDselect. December 19, 2015

Package breakdown. June 14, 2018

Package interplot. R topics documented: June 30, 2018

Package readobj. November 2, 2017

Package QCAtools. January 3, 2017

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package raker. October 10, 2017

Package datapasta. January 24, 2018

Package influxdbr. January 10, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package rcmdcheck. R topics documented: November 10, 2018

Package semver. January 6, 2017

Package sqliter. August 29, 2016

Package oec. R topics documented: May 11, Type Package

Package reval. May 26, 2015

Package mdftracks. February 6, 2017

Package flam. April 6, 2018

Package dotcall64. January 11, 2018

Package RPostgres. December 6, 2017

Package routr. October 26, 2017

Package modules. July 22, 2017

Package internetarchive

Package TANDEM. R topics documented: June 15, Type Package

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package FSelectorRcpp

Package canvasxpress

Package zeallot. R topics documented: September 28, Type Package

Package sfc. August 29, 2016

Package liftr. R topics documented: May 14, Type Package

Package JGR. September 24, 2017

Package subplex. April 5, 2018

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Transcription:

Version 0.1.3 Package sqlscore April 29, 2018 Title Utilities for Generating SQL Queries from Model Objects Provides utilities for generating SQL queries (particularly CREATE TABLE statements) from R model objects. The most important use case is generating SQL to score a generalized linear model or related model represented as an R object, in which case the package handles parsing formula operators and including the model's response function. License MIT + file LICENSE URL https://github.com/wwbrannon/sqlscore/ BugReports https://github.com/wwbrannon/sqlscore/issues Depends R (>= 3.3.0) Imports dbplyr (>= 1.0.0) Suggests testthat, arm, glmnet, mboost RoxygenNote 6.0.1 NeedsCompilation no Author William Brannon [aut, cre] Maintainer William Brannon <wwbrannon@email.wm.edu> Repository CRAN Date/Publication 2018-04-29 07:10:22 UTC R topics documented: create_statement...................................... 2 linpred............................................ 3 score_expression...................................... 4 select_statement....................................... 5 sqlscore........................................... 7 Index 8 1

2 create_statement create_statement Generate a CREATE TABLE statement from a model Generate a CREATE TABLE statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor. Usage create_statement(mod, dest_table, src_table, dest_schema = NULL, dest_catalog = NULL, src_schema = NULL, src_catalog = NULL, drop = FALSE, temporary = FALSE, pk = c("id"), response = NULL, con = NULL) Arguments mod dest_table src_table dest_schema dest_catalog src_schema src_catalog drop temporary pk response con A supported model object. The unqualified DB name of the destination table. The unqualified DB name of the source table. The DB schema of the destination table. The DB catalog of the destination table. The DB schema of the source table. The DB catalog of the source table. Whether to generate a DROP TABLE IF EXISTS before the CREATE TABLE. Whether the destination table should be a temporary table. A vector of primary key column names. The name of a custom response function to apply to the linear predictor. An optional DBI connection to control the details of SQL generation. Value A dbplyr SQL object representing the SELECT statement. Supported packages Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm. Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.

linpred 3 Warning Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL. Examples # Basic create statements mod <- glm(sepal.length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris) create_statement(mod, src_table="tbl_name", dest_table="target_tbl") create_statement(mod, src_table="tbl_name", src_schema="schema_name", src_catalog="catalog_name", dest_table="target_tbl") create_statement(mod, src_table="tbl_name", src_schema="schema_name", src_catalog="catalog_name", dest_table="target_tbl", dest_schema="target_schema", dest_catalog="target_catalog", pk=c("lab", "specimen_id")) #With a custom response function create_statement(mod, src_table="tbl_name", src_schema="schema_name", dest_table="target_tbl", response="probit") # With a model-derived non-identity response function mod <- glm(sepal.length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris, family=binomial("logit")) create_statement(mod, src_table="tbl_name", dest_table="target_tbl") #With formula operators x <- matrix(rnorm(100*20),100,20) colnames(x) <- sapply(1:20, function(x) paste0("x", as.character(x))) x <- as.data.frame(x) mod <- glm(x2 ~ X3 + X5 + X15*X8, data=x) create_statement(mod, src_table="tbl_name", dest_table="target_tbl") create_statement(mod, src_table="tbl_name", dest_table="target_tbl", response="cauchit") linpred Unevaluated prediction expressions for models Generate an unevaluated call corresponding to the predict step of the passed model. The call represents the linear predictor in terms of elementary functions on the underlying column names. Before translation into SQL, it should have a response function applied by score_expression (which may be a no-op in the case of the identity response).

4 score_expression Usage linpred(mod) Arguments mod A supported model object. Value An unevaluated R call object representing the linear predictor. Warning The Binomial models in glmboost return coefficients which are 1/2 the coefficients fit by a call to glm(..., family=binomial(...)), because the response variable is internally recoded to -1 and +1. sqlscore multiplies the returned coefficients by 2 to put them back on the same scale as glm, and adds the glmboost offset to the intercept before multiplying. Examples # A Gaussian GLM including factors mod <- glm(sepal.length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris) linpred(mod) # A binomial GLM - linear predictor is unaffected mod <- glm(sepal.length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris, family=binomial("logit")) linpred(mod) #With formula operators x <- matrix(rnorm(100*20),100,20) colnames(x) <- sapply(1:20, function(x) paste0("x", as.character(x))) x <- as.data.frame(x) mod <- glm(x2 ~ X3 + X5 + X15*X8, data=x) linpred(mod) score_expression Unevaluated prediction expressions for models Generate an unevaluated call corresponding to the predict step of the passed model. The call represents the response function of the linear predictor in terms of elementary functions on the underlying column names, and is suitable for direct translation into SQL.

select_statement 5 Usage score_expression(mod, response = NULL) Arguments mod response A supported model object. The name of a custom response function to apply to the linear predictor. Value An unevaluated R call object representing the response function of the linear predictor. Warning The Binomial models in glmboost return coefficients which are 1/2 the coefficients fit by a call to glm(..., family=binomial(...)), because the response variable is internally recoded to -1 and +1. sqlscore multiplies the returned coefficients by 2 to put them back on the same scale as glm, and adds the glmboost offset to the intercept before multiplying. Examples # A Gaussian GLM including factors mod <- glm(sepal.length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris) score_expression(mod) # A binomial GLM - linear predictor is unaffected mod <- glm(sepal.length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris, family=binomial("logit")) score_expression(mod) #With a hand-specified response function score_expression(mod, response="probit") #With formula operators x <- matrix(rnorm(100*20),100,20) colnames(x) <- sapply(1:20, function(x) paste0("x", as.character(x))) x <- as.data.frame(x) mod <- glm(x2 ~ X3 + X5 + X15*X8, data=x) score_expression(mod) select_statement Generate a SELECT statement from a model

6 select_statement Usage Generate a SELECT statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor. select_statement(mod, src_table, src_schema = NULL, src_catalog = NULL, pk = c("id"), response = NULL, con = NULL) Arguments mod src_table src_schema src_catalog pk response con A supported model object. The unqualified DB name of the source table. The DB schema of the source table. The DB catalog of the source table. A vector of primary key column names. The name of a custom response function to apply to the linear predictor. An optional DBI connection to control the details of SQL generation. Value A dbplyr SQL object representing the SELECT statement. Supported packages Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm. Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to. Warning Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL. Examples # Basic select statements mod <- glm(sepal.length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris) select_statement(mod, src_table="tbl_name") select_statement(mod, src_table="tbl_name", src_schema="schema_name", src_catalog="catalog_name") select_statement(mod, src_table="tbl_name", src_schema="schema_name",

sqlscore 7 src_catalog="catalog_name", pk=c("lab", "specimen_id")) #With a custom response function select_statement(mod, src_table="tbl_name", src_schema="schema_name", response="probit") # With a model-derived non-identity response function mod <- glm(sepal.length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species, data=datasets::iris, family=binomial("logit")) select_statement(mod, src_table="tbl_name") #With formula operators x <- matrix(rnorm(100*20),100,20) colnames(x) <- sapply(1:20, function(x) paste0("x", as.character(x))) x <- as.data.frame(x) mod <- glm(x2 ~ X3 + X5 + X15*X8, data=x) select_statement(mod, src_table="tbl_name") select_statement(mod, src_table="tbl_name", response="cauchit") sqlscore sqlscore: Utilities to score GLMs and related models in SQL. The sqlscore package provides utilities for generating sql queries (particularly CREATE TABLE statements) from R model objects. The most important use case is generating SQL to score a GLM or related model represented as an R object, in which case the package handles parsing formula operators and including the model s response function. The models scored need not be generalized linear models, strictly speaking, but their prediction steps must consist of applying a response function to a linear predictor. The package handles escaping and dealing with formula operators, and provides a way to use a custom response function if desired. Function overview The SQL-generating functions create_statement and select_statement do what their names suggest and generate CREATE TABLE and SELECT statements for model scoring. Helper functions include linpred(), which generates an R call object representing the linear predictor, and score_expression, an S3 generic that handles wrapping the linear predictor in the response function. Supported models Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm. Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.

Index create_statement, 2 linpred, 3 score_expression, 4 select_statement, 5 sqlscore, 7 sqlscore-package (sqlscore), 7 8