The BGLR (Bayesian Generalized Linear Regression) R- Package. Gustavo de los Campos, Amit Pataki & Paulino Pérez. (August- 2013)

Similar documents
Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

y and the total sum of

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Support Vector Machines

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

A Semi-parametric Regression Model to Estimate Variability of NO 2

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

ECONOMICS 452* -- Stata 11 Tutorial 6. Stata 11 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

Data Mining: Model Evaluation

S1 Note. Basis functions.

ECONOMICS 452* -- Stata 12 Tutorial 6. Stata 12 Tutorial 6. TOPIC: Representing Multi-Category Categorical Variables with Dummy Variable Regressors

Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

X- Chart Using ANOM Approach

Mixed Linear System Estimation and Identification

Three supervised learning methods on pen digits character recognition dataset

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

CS 534: Computer Vision Model Fitting

Biostatistics 615/815

A Robust Method for Estimating the Fundamental Matrix

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

TN348: Openlab Module - Colocalization

Analysis of Continuous Beams in General

An Entropy-Based Approach to Integrated Information Needs Assessment

Smoothing Spline ANOVA for variable screening

Feature Reduction and Selection

Fusion Performance Model for Distributed Tracking and Classification

Six-Band HDTV Camera System for Color Reproduction Based on Spectral Information

Nonlinear Mixed Model Methods and Prediction Procedures Demonstrated on a Volume-Age Model

Hermite Splines in Lie Groups as Products of Geodesics

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Wishing you all a Total Quality New Year!

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Programming in Fortran 90 : 2017/2018

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

FITTING A CHI -square CURVE TO AN OBSERVI:D FREQUENCY DISTRIBUTION By w. T Federer BU-14-M Jan. 17, 1951

Wavefront Reconstructor

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Mathematics 256 a course in differential equations for engineering students

RStudio for Data Management,

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

A Simple and Efficient Goal Programming Model for Computing of Fuzzy Linear Regression Parameters with Considering Outliers

Some variations on the standard theoretical models for the h-index: A comparative analysis. C. Malesios 1

Classification / Regression Support Vector Machines

Adjusted Estimates for Time-to-Event Endpoints

A Binarization Algorithm specialized on Document Images and Photos

Improved Methods for Lithography Model Calibration

Estimating Regression Coefficients using Weighted Bootstrap with Probability

Lecture 4: Principal components

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

arxiv: v1 [stat.co] 16 Jul 2015

Classifier Selection Based on Data Complexity Measures *

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

5/20/ Advanced Analysis Methodologies

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

SVM-based Learning for Multiple Model Estimation

The Codesign Challenge

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Performance Evaluation of Information Retrieval Systems

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Problem Set 3 Solutions

Analysis of Malaysian Wind Direction Data Using ORIANA

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Anonymisation of Public Use Data Sets


FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Intelligent Information Acquisition for Improved Clustering

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Lecture 5: Probability Distributions. Random Variables

Classifying Acoustic Transient Signals Using Artificial Intelligence

Review of approximation techniques

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

Stitching of off-axis sub-aperture null measurements of an aspheric surface

An Optimal Algorithm for Prufer Codes *

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Signature and Lexicon Pruning Techniques

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

Classification Based Mode Decisions for Video over Networks

Application of Maximum Entropy Markov Models on the Protein Secondary Structure Predictions

EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Probability Base Classification Technique: A Preliminary Study for Two Groups

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Machine Learning. K-means Algorithm

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

mquest Quickstart Version 11.0

Random Variables and Probability Distributions

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Learning Ensemble of Local PDM-based Regressions. Yen Le Computational Biomedicine Lab Advisor: Prof. Ioannis A. Kakadiaris

A Coding Practice for Preparing Adaptive Multistage Testing Yung-chen Hsu, GED Testing Service, LLC, Washington, DC

Modeling Local Uncertainty accounting for Uncertainty in the Data

ANSYS FLUENT 12.1 in Workbench User s Guide

Transcription:

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) The BGLR (Bayesan Generalzed Lnear Regresson) R- Package By Gustavo de los Campos, Amt Patak & Paulno Pérez (August- 03) (contact: gdeloscampos@gmal.com ) Contents. Introducton.... Structure of the software... 3 3. Runnng BGLR... 4 3.. Loadng the BGLR package... 4 3.. Fttng a fxed effects model to a contnuous outcome... 4 3.3. Fttng a fxed effects model to a bnary outcome... 6 3.4. Fttng fxed effects model to a rght- censored outcome... 8 3.5. Fttng marker effects as random... 0 3.6. Extractng estmates of marker effects and predctons... 3.7. Predctng un- observed outcomes usng BGLR... 3

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR). Introducton The BLR (Bayesan Lnear Regresson, http://cran.r- project.org/web/packages/blr/ndex.html ) package of R (http://cran.r- project.org) mplements several types of Bayesan regresson models, ncludng fxed effects, Bayesan Lasso (BL, Park and Casella 008) and Bayesan Rdge Regresson. BLR can only handle contnuous outcomes. We have produced a modfed (beta) verson of BLR (BGLR=Bayesan Generalzed Lnear Regresson) that extends BLR by allowng regressons for bnary and censored outcomes. Most of the nputs, processes and outputs are as n BLR. Here we focus on descrbng changes n nputs, nternal process and outputs ntroduced to handle bnary and censored outcomes. Users that are not famlar wth BLR are strongly encouraged to frst read the BLR user s manual and Pérez et al. (00). Future developments wll be released frst n the R- forge webpage https://r- forge.r- project.org/projects/bglr/ and subsequently as R- packages. Censored outcomes. In BGLR censored outcomes are dealt wth as a mssng data problem. BGLR handles three types of censorng: left, rght and nterval censored. For an nterval censored data- pont the nformaton avalable s a < y < b where: a and b are known lower and upper bounds and y s the actual phenotype whch for censored data ponts s un- observed. Rght censorng occurs when b s also unknown, therefore, the only nformaton avalable s a < y. In a tme- to- event settng ths means that we know that tme to event exceeded the tme at censorng gven by a. Left censorng occurs when a s unknown; therefore, the only nformaton avalable s: specfed wth three vectors, y = { y }, a = { a } and b = { b } { a y, b } y < b. In BGLR censored outcomes are then. The confguraton of the trplet, for un- censored, rght- censored, left- censored and nterval censored are descrbed n the table below.

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) a y b Un- censored NA y NA Rght Censored a NA Left Censored - NA b Interval Censored a NA b Relatve to BLR, the only modfcaton ntroduced n the Gbbs sampler requred for handlng censored data ponts consst of samplng, at each teraton of the Gbbs sampler, the censored phenotypes form the correspondng fully- condtonal denstes whch n BGLR are truncated normal denstes. Bnary outcomes are modeled usng the threshold model, or probt lnk. Here, probablty of success s p( y =) = Φ( η ) where Φ( ) s the standard normal cumulatve dstrbuton functon (also known as normal probt lnk) and η s a lnear predctor, whch can nclude fxed or random effects, handled by BGLR. In order to run a regresson for bnary outcomes, the response must be coded wth 0 s (falure) and s (success), and the argument response_type should be set to 'ordnal' (further detals are gven n the examples provded below).. Structure of the software The program s provded as an R package that can be downloaded from http://r- forge.r- project.org/r/?group_d=55. The package ncludes several datasets. Here we descrbe the wheat dataset that have been used n several publcatons. 3

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) The wheat dataset comprses phenotypc (Y, 4 trats), marker (X,,79 markers) and pedgree (A, a matrx contanng knshp coeffcents derved from pedgree) nformaton for 599 lnes of wheat. The data can be loaded wthn R typng lbrary(bglr) and then data(wheat). Further detals about ths data can be found n Crossa et al. (00). 3. Runnng BGLR In ths secton we ntroduce examples that llustrate the use of the BGLR package for regressons usng molecular markers and other covarates. 3.. Loadng the BGLR package Box provdes the code requred to load BGLR. Box. Loadng BGLR setwd(tempdr()) #Set workng drectory lbrary(bglr) 3.. Fttng a fxed effects model to a contnuous outcome In the followng example we llustrate how ft a fxed effects lnear model to a contnuous outcome usng BGLR (lne n Box ). The code n lnes 5-7 loads the program and the wheat dataset that contans phenotypc and genotypc nformaton of 599 pure lnes of wheat, ths dataset s also avalable wth the BLR package (de los Campos and Pérez 00). Phenotypes are smulated n lnes 0-4. The pror assgned to the resdual varance s defned n lnes 7-8 Detals about the prors used n BGLR and on how to choose hyper- parameters are explaned n Pérez et al. (00). The lnear model s ftted usng BGLR n lnes 9-. The argument y n BGLR s used to provde phenotypes, for contnuous outcomes ths must be a numerc vector and a lst wth predctors whose effects wll be consdered as fxed. In addton to 4

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) phenotypes, we ndcate the number of teratons of the Gbbs sampler (6000) and the number that we want to dscard as burn- n (000 n the example). For comparson we nclude n lne 4 code that fts the same lnear model va ordnary least squares usng the lm() functon. Results from both BGLR and lm are dsplayed n Fgure, the code used to produce ths fgure s gven n lnes 7-8 of Box. Box. Fttng a fxed effects model to a contnuous outcome 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 rm(lst=ls()) setwd(tempdr()) #loads BGLR & Data lbrary(bglr) data(wheat) X<-wheat.X #smulaton of data X<-X[,:4] N<-nrow(X) b<-c(-,,-,) error<-rnorm(n) y<-as.vector(x%*%b+ error) #fts model usng BGLR DF<-5 S<-var(y)/*(DF-) ETA<-lst(lst(X=X,model='FIXED')) fm<-bglr(y=y,eta=eta,niter=6000,burnin=000,df0=df,s0=s) #fts the same model usng lm() fm<- lm(y~x) #compares results from BGLR() & lm() plot(fm$eta[[]]$b~fm$coeff[-],pch=9,col=,cex=.5, xlab="lm()", ylab="bglr()"); ablne(a=0,b=,lty=) 5

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) BGLR() - - 0 - - 0 lm() Fgure. Estmated effects n a lnear model for a contnuous outcome (BGLR vs lm). 3.3. Fttng a fxed effects model to a bnary outcome We now turn nto an example nvolvng a bnary outcome. Usng the same smulaton used n Box, we generate a bnary outcome by dchotomzng the smulated phenotype (see lne 0 of Box 3). The model s ftted usng BGLR n lnes 3-5. For comparson, we also ft the model usng the glm() functon of R (lne 7). In BGLR we set the argument response_type="ordnal" (see lne 4) to ndcate that the response s bnary. Note that for bnary outcomes we do not have a resdual varance parameter, therefore, for ths example there s no need to provde a pror. Estmates of effects derved usng BGLR and glm are gven n Fgure. 6

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Box 3. Fttng a fxed effects model to a bnary outcome 3 4 5 9 0 3 4 5 6 7 0 3 4 5 6 7 8 9 30 rm(lst=ls()) setwd(tempdr()) #loads BGLR & Data lbrary(bglr) data(wheat) X=wheat.X #smulaton of data X<-X[,:4] N<-nrow(X) b<-c(-,,-,) error<-rnorm(n) y<-as.vector(x%*%b+ error) ybn<-felse(y>0,,0) #fts models ETA<-lst(lst(X=X,model='FIXED')) fm<-bglr(y=ybn,response_type='ordnal',eta=eta, niter=6000,burnin=000) fm<- glm(ybn~x,famly=bnomal(lnk='probt')) plot(fm$eta[[]]$b~fm$coeff[-],pch=9,col=,cex=.5, xlab="glm()", ylab="bglr()") ; ablne(a=0,b=,lty=) BGLR() -0. -0. 0.0 0. 0. 0.3 0.4-0. -0. 0.0 0. 0. 0.3 0.4 glm() Fgure. Estmated effects n fxed effects model for a bnary outcome (BGLR vs glm) 7

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) 3.4. Fttng fxed effects model to a rght- censored outcome We now llustrate how to use BGLR to ft a model to a rght- censored outcome. The code s gven n Box 4. The begnnng of the code (lnes - 7) s as n the examples ntroduced n Box ad 3. In lnes 8-4 we generate 00 rght- censored data ponts. These are defned usng the conventons explaned n Table. Subsequently, we ft the model usng BGLR() n lne 30. Relatve to un- censored outcomes (see example n Box ) the only dfference here s that the response s specfed va 3 vectors (y,a,b) whch are defned usng the conventons explaned n Table. For comparson we ft the same model usng the surverg() functon of the survval package (lnes 34-38). Fgure 3 gves estmates of effects derved from surverg()and BGLR(). BGLR() - - 0 - - 0 survreg() Fgure 3. Estmated effects n fxed effects model for a bnary outcome (BGLR vs survreg) 8

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Box 4. Fttng a fxed effects model to a censored outcome 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 4 rm(lst=ls()) setwd(tempdr()) #loadng lbrares lbrary(bglr) lbrary(survval) #loadng data ncluded n BGLR data(wheat) #smulaton of data X<-wheat.X[,:4] N<-nrow(X) b<-c(-,,-,) error<-rnorm(n) y<-as.vector(x%*%b+ error) cen<-sample(:n,sze=00) ycen<-y ycen[cen]<-na a<-rep(na,n) b<-rep(na,n) a[cen]<-y[cen]-runf(mn=0,max=,n=00) b[cen]<-inf DF<-5 S<-var(y)/*(DF-) ETA<-lst(lst(X=X,model='FIXED')) fm<-bglr(y=y,a=a,b=b,eta=eta,niter=6000,burnin=000, df0=df,s0=s) #fts the model usng survreg event<-felse(s.na(ycen),0,) tme<-felse(s.na(ycen),a,ycen) surv.object<-surv(tme=tme,event=event,type='rght') fm<-survreg(surv.object~x, dst="gaussan") plot(fm$eta[[]]$b~fm$coeff[-],pch=9,col=,cex=.5, xlab="survreg()", ylab="bglr()") ablne(a=0,b=,lty=) 9

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) 3.5. Fttng marker effects as random We now turn nto the problem of usng BGLR for fttng a Whole- Genome Regresson (WGR) model to contnuous, bnary or censored outcomes. In these models, the number of predctors typcally exceeds the number of phenotypes; therefore, shrnkage estmaton procedures are commonly used. BGLR offers several shrnkage (Bayesan) estmaton methods, for example: Bayesan Rdge Regresson (BRR) and the Bayesan Lasso (BL, Park and Casella 008). Here we llustrate how to ft models for contnuous, bnary and a censored outcome usng the BL. For the BL we need to provde a pror to the regularzaton parameter (λ) whch controls the extent of shrnkage of estmates of effects. A dscusson of how to choose these hyper- parameters based on pror nformaton about trat hertablty and on the number of markers nvolved s gven n Pérez et al. (00). In the example gven n Box 5 we ft the BL usng, for the 599 wheat lnes avalable n the wheat dataset,,79 markers. Lnes 4-7 gve the code requred for loadng BGLR and the wheat dataset. In lne we extract one of the four phenotypes, ths wll be used as a contnuous response. In lne 4 we extract the genotypes. Subsequently we generate (lnes 6-3) a rght- censored outcome by censorng 00 out of the 599 records. These lnes prepare the trplets (y,a,b) needed to specfy the censored outcome n BGLR. Fnally, n lne 6 we generate a bnary outcome. Lnes 8-4 are used to ft the models. As the number of markers ncluded n the model ncreases the number of teratons requred for convergence also ncreases, n the example of Box 5, and only for llustraton purposes, we use,000 teratons; however, convergence wth large- p may requre runnng much longer chans. Box 6 gves code that llustrates how to extract estmates of marker effects and predctons from the ftted model. 0

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Box 5. Fttng a Whole Genome Regresson Usng the Bayesan LASSO for contnuous, censored and bnary outcomes 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 4 43 rm(lst=ls()) setwd(tempdr()) #loadng lbrares lbrary(bglr) lbrary(survval) data(wheat) #extracts phenotypes #contnous y<-wheat.y[,] #Extract genotypes X<-wheat.X n<- length(y) #censored cen<-sample(:n,sze=00) ycen<-y ycen[cen]<-na ; a<-rep(na,n) ; b<-rep(na,n) a[cen]<-y[cen]-runf(mn=0,max=,n=00) b[cen]<-inf #bnary ybn<-felse(y>0,,0) #pror DF<-5 S<-var(y)/*(DF-) #models ETA<-lst(lst(X=X,model='BL',lambda=5,type='gamma', rate=e-4,shape=0.55)) fm<-bglr(y=y,eta=eta,niter=000,burnin=000, df0=df,s0=s) fm<-bglr(y=ycen,a=a,b=b,eta=eta,niter=000,burnin=000, df0=df,s0=s) fm3<-bglr(y=ybn,response_type='ordnal', ETA=ETA, niter=000,burnin=000)

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) 3.6. Extractng estmates of marker effects and predctons Box 6 llustrates how to extract: the estmated posteror means and posteror standard devatons of marker effects (see lnes 3-8) and posteror means of the lnear predctor (e.g., fm$yhat, see lne 3). For bnary and censored outcomes the lnear posteror mean of the lnear predctor consttutes an estmate of the condtonal expectaton. For bnary outcomes, BGLR uses the probt lnk; therefore an estmate of the expected value of the response, or probablty of success, can be obtaned by evaluatng the standard normal cumulatve dstrbuton functon at the posteror mean of the lnear predctor (see lne n Box 6). Box 6. Extractng and Dsplayng Estmates of Marker Effects and Predctons 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 ##Vulcano plot (posteror SD vs estmated effects) plot(fm$eta[[]]$b~fm$eta[[]]$sd.b,col=, man='vulcano Plot (contnuous outcome)', xlab='estmated Effect',ylab='Est. Posteror SD') ##Estmated effects, contnuous versus censored plot(fm$eta[[]]$b~fm$eta[[]]$b,col=, man='estmated Effects', xlab='censored', ylab='contnuos') ##Predctons: contnuous versus censored outcome plot(fm$yhat~fm$yhat,col=, man='predctons', xlab='censored', ylab='contnuos') ##Estmated effects, contnuous versus bnary plot(fm$eta[[]]$b~fm3$eta[[]]$b,col=, man='estmated Effects', xlab='bnary', ylab='contnuos') ##Predctons: contnuous versus bnary outcome plot(fm$yhat~pnorm(fm3$yhat),col=, man='predctons', xlab='bnary (probablty)', ylab='contnuos')

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) 3.7. Predctng un- observed outcomes usng BGLR We close ths note by llustratng how to use BGLR for the predcton of yet- to- be observed phenotypes. In prncple there are at least two ways of carryng out ths task. One possblty s to partton the data (both predctors and response) nto tranng and a valdaton dataset, the tranng dataset s provded to BGLR to derve parameter estmates, whch could then be used to predct observatons n the valdatng dataset. An alternatve s to provde the whole data to BGLR wth the response values of the observatons n the valdaton set replaced wth mssng values. BGLR wll return predctons for these data- ponts as well and such predctons can be used to assess the ablty of the model to predct un- observed phenotypes. In the case of contnuous and bnary outcomes ths s done smply by settng the entres of y correspondng to the valdaton dataset equal to NA (see example below); for censored outcomes, the trplets correspondng to the valdaton set needs to be set to (a =-, y =NA, b = ) so that these are completely un- nformatve. Predcton of bnary outcomes. The example n Box 7 llustrates how to derve predctons for a valdaton dataset n case of a bnary outcome. The code n lnes - 9 loads lbrares and the wheat dataset and defnes the pror densty and sets predctors. These lnes are essentally as n our prevous examples. In lnes - 4 we generate a valdaton set by settng 00 randomly chosen entres of the response to mssng values. The model s ftted n lnes 6-7. Lnes 9-30 llustrate how to calculate mean- squared predcton error and area under the curve. 3

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Box 7. Fttng a Whole Genome Regresson Usng the Bayesan LASSO for contnuous, 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 rm(lst=ls()) setwd(tempdr()) #loadng lbrares lbrary(bglr) lbrary(proc) data(wheat) #extracts phenotypes #contnous y<-wheat.y[,] X<-wheat.X #bnary ybn<-felse(y>0,,0) censored and bnary outcomes ETA<-lst(lst(X=X,model='BL',lambda=5,type='gamma', rate=e-4,shape=0.55)) #generates testng dataset tst<-sample(:599,sze=00,replace=false) yna<-ybn yna[tst]<-na fm<-bglr(y=yna,response_type='ordnal', ETA=ETA, niter=000,burnin=000) mean((ybn[tst]-pnorm(fm$yhat[tst]))^) # mean-sq. error auc(response=ybn[tst],predctor=fm$yhat[tst]) Predcton of censored outcomes. The example n Box 8 llustrates how to derve predctons for a valdaton dataset n case of a censored outcome. Lnes - 4 are used to load lbrares and the dataset and to defne the pror. These are essentally as n our prevous examples. In lnes 3-36 we generate a valdaton set usng 00 lnes randomly chosen among the un- censored observatons. Note that n order for these phenotypes to be un- nformatve we need to set the trplets of the lnes n the valdaton dataset to (a =-, y =NA, b = ). The model s ftted n lnes 39-40 and predcton accuracy s quantfed n lne 4. 4

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Box 8. Fttng a Whole Genome Regresson Usng the Bayesan LASSO for contnuous, censored and bnary outcomes 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 rm(lst=ls()) setwd(tempdr()) #loadng lbrares lbrary(bglr) lbrary(survval) data(wheat) #extracts phenotypes #contnous y<-wheat.y[,] #Extract genotypes X<-wheat.X n<- length(y) #censored cen<-sample(:n,sze=00) ycen<-y ycen[cen]<-na ; a<-rep(na,n) ; b<-rep(na,n) a[cen]<-y[cen]-runf(mn=0,max=,n=00) b[cen]<-inf #Set pror and predctors DF<-5 S<-var(y)/*(DF-) ETA<-lst(lst(X=X,model='BL',lambda=5,type='gamma', rate=e-4,shape=0.55)) #generates testng dataset tst<-sample(whch(!s.na(ycen)),sze=00,replace=false) yna<-ycen ; yna[tst]<-na ana<-a ; ana[tst]<- -Inf bna<-b ; bna[tst]<- Inf #model fm<-bglr(y=ycen,a=a,b=b,eta=eta,niter=000,burnin=000, df0=df,s0=s) cor(fm$yhat[tst],ycen[tst]) 5

Bostatstcs Department Bayesan Generalzed Lnear Regresson (BGLR) Acknowledgments. Fnancal support from NIH P30 Admnstratve supplement (UAB- Nutrton Obesty Research Center) and NIH grants R0GM09-0 and R0GM09999-0A are gratefully acknowledged. References de los Campos, G., and P. Pérez. 00. BLR: Bayesan Lnear Regresson. R Package Verson.. http://cran.r- project.org/web/packages/blr/ndex.html. Crossa, J., G. de los Campos, P. Perez, D. Ganola, J. Burgueño, J. L Araus, D. Makumb, et al. 00. Predcton of Genetc Values of Quanttatve Trats n Plant Breedng Usng Pedgree and Molecular Markers. Genetcs 86 (): 73 74. Park, T., and G. Casella. 008. The Bayesan Lasso. Journal of the Amercan Statstcal Assocaton 03 (48): 68 686. Pérez, Paulno, Gustavo de los Campos, José Crossa, and Danel Ganola. 00. Genomc- Enabled Predcton Based on Molecular Markers and Pedgree Usng the Bayesan Lnear Regresson Package n R. The Plant Genome Journal 3 (): 06 6. do:0.3835/plantgenome00.04.0005. 6