Package BayesLogit. May 6, 2012

Similar documents
Package bayescl. April 14, 2017

Package binomlogit. February 19, 2015

Package kernelfactory

Package SSLASSO. August 28, 2018

Evaluating the SVM Component in Oracle 10g Beta

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Package mmtsne. July 28, 2017

Package misclassglm. September 3, 2016

Package sensory. R topics documented: February 23, Type Package

Package milr. June 8, 2017

Package RegressionFactory

Stata goes BUGS (via R)

Package PrivateLR. March 20, 2018

Package SAM. March 12, 2019

Package mnlogit. November 8, 2016

Package obliquerf. February 20, Index 8. Extract variable importance measure

Package clusternomics

Package EBglmnet. January 30, 2016

Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Package reglogit. November 19, 2017

PIANOS requirements specifications

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Package naivebayes. R topics documented: January 3, Type Package. Title High Performance Implementation of the Naive Bayes Algorithm

Package pnmtrem. February 20, Index 9

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

Package mrm. December 27, 2016

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Package RaPKod. February 5, 2018

Package BPHO documentation

Mixture Models and the EM Algorithm

Package BayesCR. September 11, 2017

Package beast. March 16, 2018

Package DSBayes. February 19, 2015

Package RAMP. May 25, 2017

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Package plgp. February 20, 2015

CS281 Section 9: Graph Models and Practical MCMC

Package manet. September 19, 2017

Monte Carlo Methods and Statistical Computing: My Personal E

Package Rambo. February 19, 2015

Package gbts. February 27, 2017

Package fastsom. February 15, 2013

MACAU User Manual. Xiang Zhou. March 15, 2017

Package SparseFactorAnalysis

Package maxent. July 2, 2014

BART STAT8810, Fall 2017

Package rpst. June 6, 2017

Package reghelper. April 8, 2017

Introduction to Machine Learning

Package bayeslongitudinal

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Package hiernet. March 18, 2018

Monte Carlo for Spatial Models

Package vbmp. March 3, 2018

Package sgmcmc. September 26, Type Package

Package glinternet. June 15, 2018

Package FisherEM. February 19, 2015

Package treethresh. R topics documented: June 30, Version Date

Package gee. June 29, 2015

Package mvprobit. November 2, 2015

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Package smcfcs. March 30, 2019

Package gwrr. February 20, 2015

Package dglm. August 24, 2016

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package DPBBM. September 29, 2016

Package randomglm. R topics documented: February 20, 2015

Package DTRlearn. April 6, 2018

Package StVAR. February 11, 2017

Package missforest. February 15, 2013

Package plsrbeta. February 20, 2015

Exponential Random Graph Models for Social Networks

Package logisticpca. R topics documented: August 29, 2016

Package glmnetutils. August 1, 2017

Package CuCubes. December 9, 2016

Package mcclust.ext. May 15, 2015

Package XMRF. June 25, 2015

Package mfbvar. December 28, Type Package Title Mixed-Frequency Bayesian VAR Models Version Date

Package MetaLasso. R topics documented: March 22, Type Package

The grplasso Package

Package TVsMiss. April 5, 2018

Simulation of SpatiallyEmbedded Networks

Parametric Response Surface Models for Analysis of Multi-Site fmri Data

SYS 6021 Linear Statistical Models

Machine Learning / Jan 27, 2010

Package texteffect. November 27, 2017

Package PUlasso. April 7, 2018

Package Rtsne. April 14, 2017

Package mgc. April 13, 2018

mmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Using Machine Learning to Optimize Storage Systems

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Clustering and The Expectation-Maximization Algorithm

Package sparsereg. R topics documented: March 10, Type Package

Package bisect. April 16, 2018

Package PCADSC. April 19, 2017

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Classification of Hand-Written Numeric Digits

Transcription:

Package BayesLogit May 6, 2012 Version 0.1-0 Date 2012-05-06 Depends R (>= 2.14.0) Title Logistic Regression Author Nicolas Polson, James G. Scott, and Jesse Windle Maintainer Jesse Windle <jwindle@ices.utexas.edu> The BayesLogit package does posterior simulation for binary and multinomial logistic regression using the Polya-Gamma latent variable technique. This method is fully automatic, exact, and fast. A routine to efficiently sample from the Polya-Gamma class of distributions is included. License GPL (>= 3) Repository CRAN Date/Publication 2012-05-06 14:11:24 R topics documented: logit............................................. 2 logit.em........................................... 3 mlogit............................................ 5 rks.............................................. 7 rpg.............................................. 8 spambase.......................................... 9 Index 11 1

2 logit logit Default Bayesian Logistic Regression Usage Run a Bayesian logistic regression. logit(y, X, n=rep(1,length(y)), y.prior=0.5, x.prior=colmeans(as.matrix(x)), n.prior=1.0, samp=1000, burn=500) Arguments y An N dimensional vector; y i is the average response at x i. X An N x P dimensional design matrix; x i is the ith row. n An N dimensional vector; n_i is the number of observations at each x i. y.prior Average response at x.prior. x.prior Prior predictor variable. n.prior Number of observations at x.prior. samp The number of MCMC iterations saved. burn The number of MCMC iterations discarded. Logistic regression is a classification mechanism. Given the binary data {y i } and the p-dimensional predictor variables {x i }, one wants to forecast whether a future data point y* observed at the predictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing a success=1 or failure=0 is governed by P (y = 1 x, β) = (1 + exp( x β)) 1. Instead of representing data as a collection of binary outcomes, one may record the average response y i at each unique x i given a total number of n i observations at x i. We follow this method of encoding data. Polson and Scott suggest placing a Jeffrey s Beta prior Be(1/2,1/2) on which generates a Z-distribution prior for β, m(β) := P (y 0 = 1 x 0, β) = (1 + exp( x 0 β)) 1, p(β) = exp(0.5x 0 β)/(1 + exp(0.5x 0 β)). One may interpret this as "prior" data where the average response at x 0 is 1/2 based upon a "single" observation. The default value of x 0 = mean(x), x = {x i }.

logit.em 3 Value logit returns a list. beta A samp x P array; the posterior sample of the regression coefficients. w A samp x N array; the posterior sample of the latent variable. WARNING: N may be less than N if data is combined. y X n The response matrix different than input if data is combined. The design matrix different than input if data is combined. The number of samples at each observation different than input if data is combined. Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310 Nicolas Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a dataaugmentation approach. http://arxiv.org/pdf/1109.4180 See Also rpg, logit.em, mlogit Examples ## From UCI Machine Learning Repository. data(spambase); ## A subset of the data. sbase = spambase[seq(1,nrow(spambase),10),]; X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase); y = sbase$is.spam; ## Run logistic regression. output = logit(y, X, samp=1000, burn=100); logit.em Logistic Regression Expectation Maximization Expectation maximization for logistic regression.

4 logit.em Usage logit.em(y, X, n=rep(1,length(y)), y.prior=0.5, x.prior=colmeans(as.matrix(x)), n.prior=1.0, tol=1e-9, max.iter=100) Arguments y An N dimensional vector; y i is the average response at x i. X An N x P dimensional design matrix; x i is the ith row. n An N dimensional vector; n_i is the number of observations at each x i. y.prior x.prior n.prior tol max.iter Average response at x.prior. Prior predictor variable. Number of observations at x.prior. Threshold at which algorithm stops. Maximum number of iterations. Logistic regression is a classification mechanism. Given the binary data {y i } and the p-dimensional predictor variables {x i }, one wants to forecast whether a future data point y* observed at the predictor x* will be zero or one. Logistic regression stipulates that the statistical model for observing a success=1 or failure=0 is governed by P (y = 1 x, β) = (1 + exp( x β)) 1. Instead of representing data as a collection of binary outcomes, one may record the average response y i at each unique x i given a total number of n i observations at x i. We follow this method of encoding data. Polson and Scott suggest placing a Jeffrey s Beta prior Be(1/2,1/2) on which generates a Z-distribution prior for β, m(β) := P (y 0 = 1 x 0, β) = (1 + exp( x 0 β)) 1, p(β) = exp(0.5x 0 β)/(1 + exp(0.5x 0 β)). One may interpret this as "prior" data where the average response at x 0 is 1/2 based upon a "single" observation. The default value of x 0 = mean(x), x = {x i }. Value beta iter The posterior mode. The number of iterations.

mlogit 5 Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310 Nicolas Poslon and James G. Scott. Default Bayesian analysis for multi-way tables: a dataaugmentation approach. http://arxiv.org/pdf/1109.4180 See Also rpg, logit, mlogit Examples ## From UCI Machine Learning Repository. data(spambase); ## A subset of the data. sbase = spambase[seq(1,nrow(spambase),10),]; X = model.matrix(is.spam ~ word.freq.free + word.freq.1999, data=sbase); y = sbase$is.spam; ## Run logistic regression. output = logit.em(y, X); mlogit Bayesian Multinomial Logistic Regression Usage Run a Bayesian multinomial logistic regression. mlogit(y, X, n=rep(1,nrow(as.matrix(y))), m.0=array(0, dim=c(ncol(x), ncol(y))), P.0=array(0, dim=c(ncol(x), ncol(x), ncol(y))), samp=1000, burn=500) Arguments y An N x J-1 dimensional matrix; y ij is the average response for category j at x i. X An N x P dimensional design matrix; x i is the ith row. n An N dimensional vector; n i is the total number of observations at each x i.

6 mlogit m.0 A P x J-1 matrix with the beta j s prior means. P.0 A P x P x J-1 array of matrices with the beta j s prior precisions. samp The number of MCMC iterations saved. burn The number of MCMC iterations discarded. Multinomial logistic regression is a classifiction mechanism. Given the multinomial data {y i } with J categories and the p-dimensional predictor variables {x i }, one wants to forecast whether a future data point y* at the predictor x*. Multinomial Logistic regression stiuplates that the statistical model for observing a draw category j after rolling the multinomial die n = 1 time is governed by P (y = j x, β, n = 1) = e x β j / J e x β k. Instead of representing data as the total number of responses in each category, one may record the average number of responses in each category and the total number of responses n i at x i. We follow this method of encoding data. We assume that β J = 0 for purposes of identification! k=1 You may use mlogit for binary logistic regression with a normal prior. Value mlogit returns a list. beta w y X n A samp x P x J-1 array; the posterior sample of the regression coefficients. A samp x N x J-1 array; the posterior sample of the latent variable. WARNING: N may be less than N if data is combined. The response matrix different than input if data is combined. The design matrix different than input if data is combined. The number of samples at each observation different than input if data is combined. Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310 See Also rpg, logit.em, logit

rks 7 Examples ## Use the iris dataset. N = nrow(iris) P = ncol(iris) J = nlevels(iris$species) X = model.matrix(species ~., data=iris); y.all = model.matrix(~ Species - 1, data=iris); y = y.all[,-j]; out = mlogit(y, X, samp=1000, burn=100); rks The Kolmogorov-Smirnov distribution Generate a random variate from the Kolmogorov-Smirnov distribution. This is not directly related to the Polya-Gamma technique, but it is a nice example of using an alternating sum to generate a random variate. Usage rks(n=1) Arguments N The number of random variates to generate. The density function of the KS distribution is f(x) = 8 ( 1) n+1 n 2 xe 2n2 x 2. We follow Devroye (1986) p. 161 to generate random draws from KS. i=1 L. Devroye. Non-Uniform Random Variate Generation, 1986.

8 rpg Examples X = rks(1000) rpg The Polya-Gamma Distribution Generate a random variate from the Polya-Gamma distribution. Usage rpg.gamma(num=1, n=1, z=0.0, trunc=200) rpg.devroye(num=1, n=1, z=0.0) Arguments num z trunc You may call rpg when n and z are vectors. The number of random variates to simulate. Degrees of freedom. Parameter associated with tilting. The number of elements used in sum of gammas approximation. A random variable X with distribution PG(n,z) is generated by X 2.0 k = 1 G(n, 1)/((k 1/2) 2 4.0π 2 + z 2 ). The density for X may be derived from Z and PG(n,0) as Thus PG(n,z) is an exponentially tilted PG(n,0). p(x n, z) exp( z 2 /2x)p(x n, 0). Two different methods for generating this random variable are implemented. In general, you may use rpg.gamma to generate an approximation of PG(n,z) using the sum of Gammas representation above. When n is a natural number you may use rpg.devroye to sample PG(n,z). The later method is fast.

spambase 9 Value This function returns num Polya-Gamma samples. Nicolas G. Polson, James G. Scott, and Jesse Windle. Bayesian inference for logistic models using Polya-Gamma latent variables. http://arxiv.org/abs/1205.0310 See Also logit.em, logit, mlogit Examples a = c(1, 2, 3); b = c(4, 5, 6); ## If a is only integers, use Devroye-like method. X = rpg.devroye(100, a, b); a = c(1.2, 2.3, 3.2); b = c(4, 5, 6); ## If a has scalars use sum-of-gammas method. X = rpg.gamma(100, a, b); spambase Spambase Data Format The spambase data has 57 real valued explanatory variables which characterize the contents of an email and and one binary response variable indicating if the email is spam. There are 4601 observations. A data frame: the first column is a binary response variable indicating if the email is spam. The remaining 57 columns are real valued explanatory variables. Of the 57 explanatory variables, 48 describe word frequency, 6 describe character frequency, and 3 describe sequences of capital letters. word.freq.<word> A continuous explanatory variable describing the frequency with which the word <word> appears; measured in percent.

10 spambase char.freq.<char> A continuous explanatory variable describing the frequency with which the character <char> appears; measured in percent. capital.run.length.<oper> A statistic involving the length of consecutive capital letters. Use names to see the specific words, characters, or statistics for each respective class of variable. Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt of Hewlett-Packard Labs (1999). Spambase Data Set. http://archive.ics.uci.edu/ml/datasets/spambase Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Index Topic Polya-Gamma rpg, 8 Topic datasets spambase, 9 Topic logit logit, 2 logit.em, 3 mlogit, 5 Topic polyagamma rpg, 8 Topic regression logit, 2 logit.em, 3 mlogit, 5 Topic rks rks, 7 Topic rpg rpg, 8 logit, 2, 5, 6, 9 logit.em, 3, 3, 6, 9 mlogit, 3, 5, 5, 9 rks, 7 rpg, 3, 5, 6, 8 spambase, 9 11