Predictor Selection Algorithm for Bayesian Lasso

Size: px
Start display at page:

Download "Predictor Selection Algorithm for Bayesian Lasso"

Transcription

1 Predictor Selection Algorithm for Baesian Lasso Quan Zhang Ma 16, Introduction The Lasso [1] is a method in regression model for coefficients shrinkage and model selection. It is often used in the linear regression model = µ1 n + Xβ + ε where is the response vector with the length of n, µ is the overall mean, X is the n p standardized design matri, β = (β 1,, β p ), and ε is the n 1 vector of iid normal errors with zero mean and unknown variance σ 2. The Lasso estimates, ˆβ, minimizes the sum of the squared residuals subject to β t. It can be described as argmin β p (ỹ Xβ) (ỹ Xβ) + λ β i i=1 where ỹ = ȳ1 n, and λ 0 is the tuning parameter relating to the bound t. It is indicated that ˆβ can be interpreted as a Baesian mode estimate when the parameters β i s have iid double eponential priors, i.e., π(β) = The double eponential distribution can be writen as a scale miture of normals: p i=1 λ 2 e λ β i. (1) α 1 2 e a z = e z2 /(2s) a2 s/2 0 2πs 2 e a2 ds, a > 0. (2) The detailed hierarchical model formulation and the Gibbs sampler implementation can been seen in Park and Casella s Baesian Lasso paper [2]. Note that λ can be not onl pre-specified, but also estimated if a hperprior is placed. A slightl more complicated Baesian hierarchical model and its Gibbs sampler implementation will be given in Section 4. However, as Park and Cassella said [2], this method cannot automaticall perform predictor selection, in other words, it cannot force some of the regression coefficients to zero. We propose an algorithm to automaticall select predictors in the process of the Gibbs sampler. We applied this method to ordinar linear regression problem with variable selection. Another important application is the penalized spline using the l 1 penalt. Using the Baesian Lasso with predictor selection, we can not onl fit the spline model, but also automaticall select knots. Division of Biostatistics, School of Public Health, Universit of Minnesota. 1

2 2 Algorithm for predictor selection As described above, we have p predictors, X 1,, X p and denote the corresponding regression parameters of the predictors as β 1,, β p. The main idea of the algorithm is to throw out one predictor whose coefficient is the most insignificant after some iterations of the Gibbs sampler till all the remaining predictors coefficients are significant based on a specific Baesian credible interval (BCI). The algorithm can be fulfilled through the following steps: 1. Run N 1 iterations of the Gibbs sampler, where N 1 is a large number, sa 10, 000. Note that the N 1 iterations are just burn-in and we found it neccessar especiall when p is large. 2. Run another M iterations where M is relativel small, like 1, 000. For a specific p, for eample, 0.05, if the (1 p ) Baesian credible interval based on the last M iterations of all the p β s eclude zero, then continue running the Gibbs sampler till convergence. Otherwise, go to Step Define N1+M k drop = argma k min( g=n 1 +1 I(β (g) k > 0), N 1 +M g=n 1 +1 I(β (g) k < 0)) where β (g) k is the sample value of β k in the gth iteration of the Gibbs sampler for k = 1,, p. Throw out X kdrop. That is to sa, we throw out the β that is most centered to 0 based on the last M iterations of the Gibbs sampler. 4. Replace N 1 b N 1 + M and p b p 1. Let k = 1,, k drop 1, k drop + 1,, p and go to step 2. This algorithm will select ver few predictors if X i s are highl correlated, even though there are a large number of predictors. We can define the number of predictors to be selected in the algorithm rather than defining p. This ma result in some of the selected predictors coefficient of insignificance. 3 Linear regression We applied our algorithm to a linear regression problem on the diabetes data [3], which has n = 442 and p = 10. The predictors are 10 baseline variables, age, se, bod mass inde, average blood pressure, and si blood serum measurements were obtained for each of 442 diabetes patients, and the response of interest is a quantitative measure of disease progression one ear after baseline. We performed a Baesian Lasso with the predictor selection algorithm described in Section 2, setting p = 0.05 and placing a hperprior on λ to estimate it simultaneousl. Then we perform the Baesian Lasso without selecting predictors and the ordinar Lasso both of which have the tuning parameter matching the value of λ estimated b the algorithm. The results are summarized in Table 1 along with the ordinar Lasso whose λ is chosen b n-fold cross-validation. Comparing the Baesian Lasso using the algorithm with the one not using the algorithm, the algorithm threw out all predictors that have an insignificant coefficient based on the 95% credible interval, ecept one predictor, (7) hdl. hdl has a marginall significant coefficient in the Baesian Lasso without the algorithm, but has a highl significant coefficient if using the algorithm. If we compare the Baesian Lasso with the algorithm and the ordinar Lasso with λ matched, we find the algorithm tends to select fewer predictors: (5) tc and (10) glu are non-zero in the ordinar Lasso but are thrown out b the algorithm. If we look at the last two columns, we find the Lasso with λ matched shinks β more than that b the cross-validation. Furthermore, we performed the Baesian Lasso without the predictor selection algorithm and estimate λ simultaneousl b placing a hperprior. We also fit the model b the Baesian Lasso using the algorithm and the ordinar Lasso 2

3 predictor Baesian Lasso (predictor selection) 95% BCI Baesian Lasso 95% BCI Lasso (λ matched) Lasso (n-fold c.v.) (1) age 0 NA ( , ) 0 0 (2) se ( , ) ( , ) (3) bmi (396.68, ) (391.97, ) (4) map (181.85, ) (176.65, ) (5) tc 0 NA ( , ) (6) ldl 0 NA ( , ) 0 0 (7) hdl ( , ) ( , 64.75) (8) tch 0 NA ( , ) 0 0 (9) ltg (335.30, ) (325.79, ) (10) glu 0 NA (-52.77, ) Table 1: Comparison of linear regression parameters for the diabetes data using the λ estimated b the Baesian lasso with the predictor selection algorithm. both of which match the tuning parameter. The results are given in Table 2. We found the Baesian Lasso using the algorithm selects 5 predictors while the ordinar Lasso selects 7. From the 5th and 10th rows in both Table 1 and 2 we can see, given a tuning paramter, for a predictor that the Baesian Lasso using the algorithm throws out whereas the ordinar Lasso selects, the estimate of its coefficent b the ordinar Lasso alwas has a larger absolute value than b the Baesian Lasso without the algorithm. predictor Baesian Lasso (predictor selection) 95% BCI Baesian Lasso 95% BCI Lasso (λ matched) (1) age 0 NA ( , ) 0 (2) se ( , ) ( , ) (3) bmi (396.80, ) (395.94, ) (4) map (186.78, ) (183.42, ) (5) tc 0 NA ( , ) (6) ldl 0 NA ( , ) 0 (7) hdl ( , ) ( , 92.73) (8) tch 0 NA ( , ) 0 (9) ltg (349.14, ) (335.42, ) (10) glu 0 NA (-49.00, ) Table 2: Comparison of linear regression parameters for the diabetes data using the λ estimated b the Baesian Lasso without predictor selection. 4 Penalized Spline using l 1 penalt The Baesian Lasso with the predictor selection algorithm can be applied to penalized spline models using l 1 penalt so that we can fit the model and select knots from a set of candidates simultaneousl. The model is constructed as = µ1 n + Xβ + Zu + ε (3) 3

4 where Z is the n K standardized design matri, u = (u 1,, u K ) and other notations are defined as in Section 1. Then (β, u) are estimated as ( ˆβ, û) = argmin (β,u) where ỹ and λ has the same meaning as in Section 1. K (ỹ Xβ Zu) (ỹ Xβ Zu) + λ u i i=1 4.1 Baesian hierarchical model and Gibbs sampler implementation Analougous to Baesian Lasso for linear regression model [2], the hierarchical representation of the full model is ỹ X, β, Z, u, σ 2 N n (Xβ + Zu, σ 2 I n ) β π(β)dβ, π(β) 1 u τ 2 1,, τ 2 K, λ, σ 2 N K (0 K, σ 2 D τ ), D τ = diag(τ 2 1,, τ 2 K) τ 2 1,, τ 2 K λ Gamma(1, λ2 2 ) λ 2 Gamma(r, δ) σ 2 π(σ 2 )dσ 2, π(σ 2 ) 1 σ 2 with τ and σ 2 independent. According to (2), integrating out τ, the conditional prior on u will be π(u λ, σ 2 ) = K i=1 λ 2σ e λ u i /σ. (4) Park and Casella [2] eplained in detail wh the prior in (4) is preferred to the one in (1). As for the Gamma prior on λ 2, we need the prior variance of λ large enough to make the prior relativel flat. So set r = 1 and δ = 0.1 so that the prior mean of λ 2 is much larger than ˆλ (ˆλ s are less than 1 throughout all models in this paper). The full conditional distributions of the parameters are ( β N p (X X) 1 X (ỹ Zu), σ 2 (X X) 1) ( u N K (Z Z + Dτ 1 ) 1 Z (ỹ Xβ), σ 2 (Z Z + Dτ 1 ) 1) ( p + n 1 σ 2 InvGamma, ỹ Xβ Zu 2 + u D 1 ) τ u λ Inverse Gaussian with mean parameter as 2 σ 2 τ j u 2 and scale parameter as λ 2, j = 1,, K j ( Kj=1 λ 2 τ 2 ) j Gamma K + r, + δ 2 The Gibbs sampler samples parameters cclicall from their full conditionals above. We construct the hierarchical model using ỹ instead of since µ is often of secondar interest. But we make inference of µ, we can still draw samples from its full conditional distribution, N( 1 ni=1 n, σ2 n ), if each column of X and Z is centered. 4

5 4.2 Penalized spline model for the global temperature data using B-spline Basis The Global temperature data [4] recorded 125 ears () of global temperature (Y ) from ear 1881 to The design matri X is obtained b standardizing (, 2, 3 ) so that each column of X has mean zero and l 2 norm one. If Z is a polnomial basis matri, its columns will be highl correlated. It is known that Lasso is not good at dealing with problems when the predictors are highl correlated [5], so we use a B-spline basis matri generated b the R function splinedesign: set the option order = 4 and knots as 23 equall spaced points on the interval [min()+2, ma() 2] such that the function returns a design matri with 21 columns. We obtain Z b standardizing this matri B-spline and truncated cubic spline using l 2 penalt This section is to show how B-spline fits (3) using the l 2 penalt and compare it to the truncated cubic spline. Define W = ( ) ( κ 1 ) 3 +,, ( κ 21 ) 3 + where the knots κ 1,, κ 21 are equall spaced points on the interval [min()+2, ma() 2]. Then we obatin Z b standardizing W. We use Z to fit (3) using l 2 penalt on u b the R function lme, and compare the fitted curve with the one using Z. The curves are plotted in Figure 1 and we can see the curve fitted using B-spline are more bump. B spline truncated cubic spline Figure 1: B-spline and truncated cubic spline using l 2 penalt. 5

6 4.2.2 B-spline knots selection We can appl the predictor selection algorithm to the penalized spline model to select knots automaticall. We use the design matri X and the B-spline basis matri Z using l 1 penalt to fit (3). Running the Gibbs sampler with the algorithm, 3, 5, 6 and 6 knots are selected corresponding to p equal to 0.01, 0.05, 0.1 and 0.2, respectivel. The fitted curves are plotted in Figure 2. We can see the smoothed curves do not fit ver well; the curve tends to fit better as p increases. We can also specif the number of knots to select in the algorithm, rather then setting p. Figure 3 plotted p * = p * = p * = p * = Figure 2: B-spline using l 1 penalt and knots selection for p = 0.01, 0.05, 0.1, 0.2. the fitted curve with 10 knots. It fits much better than all in Figure 2. 5 Discussion Like ordinar Lasso, the Baesian Lasso with the predictor selection algorithm does not select predictors stabl; even if we run the Gibbs sampler several times with all the same settings, the results of predictor selection and parameter estimation ma differ a lot especiall when the predictors are highl correlated. Another problem of the algorithm is the computing speed. First, we need a burn-in of N 1 iterations and N 1 ma be a ver big number, sa , when we initiall have a lot of predictors. Furthermore, M should also be large, sa 10000, because the coefficient of a predictor ma change a lot after throwing out a correlated predictor. When we stop throwing out predictors, we need to run the Gibbs sampler till convergence. In this paper, we just run a big number of iterations and use the trace plot to check convergence. The method of Monte Carlo standard errors 6

7 Figure 3: B-spline using l 1 penalt with 10 knots. can be used to determine when to stop the sampler [6]. 7

8 References [1] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Roal Statistical Societ. Series B (Methodological), pages , [2] Trevor Park and George Casella. The baesian lasso. Journal of the American Statistical Association, 103(482): , [3] Bradle Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. Least angle regression. The Annals of statistics, 32(2): , [4] James S Hodges. Richl Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. CRC Press, [5] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Roal Statistical Societ: Series B (Statistical Methodolog), 67(2): , [6] James M Flegal, Murali Haran, Galin L Jones, et al. Markov chain monte carlo: Can we trust the third significant figure? Statistical Science, 23(2): ,

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version

More information

Package biglars. February 19, 2015

Package biglars. February 19, 2015 Package biglars February 19, 2015 Type Package Title Scalable Least-Angle Regression and Lasso Version 1.0.2 Date Tue Dec 27 15:06:08 PST 2011 Author Mark Seligman, Chris Fraley, Tim Hesterberg Maintainer

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri

Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment

More information

Package batchmeans. R topics documented: July 4, Version Date

Package batchmeans. R topics documented: July 4, Version Date Package batchmeans July 4, 2016 Version 1.0-3 Date 2016-07-03 Title Consistent Batch Means Estimation of Monte Carlo Standard Errors Author Murali Haran and John Hughes

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials

More information

Markov chain Monte Carlo methods

Markov chain Monte Carlo methods Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general

More information

Gradient LASSO algoithm

Gradient LASSO algoithm Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents

More information

Lasso.jl Documentation

Lasso.jl Documentation Lasso.jl Documentation Release 0.0.1 Simon Kornblith Jan 07, 2018 Contents 1 Lasso paths 3 2 Fused Lasso and trend filtering 7 3 Indices and tables 9 i ii Lasso.jl Documentation, Release 0.0.1 Contents:

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R

The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Package mcmcse. February 15, 2013

Package mcmcse. February 15, 2013 Package mcmcse February 15, 2013 Version 1.0-1 Date 2012 Title Monte Carlo Standard Errors for MCMC Author James M. Flegal and John Hughes Maintainer James M. Flegal

More information

3.5 Rational Functions

3.5 Rational Functions 0 Chapter Polnomial and Rational Functions Rational Functions For a rational function, find the domain and graph the function, identifing all of the asmptotes Solve applied problems involving rational

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Non-linear models. Basis expansion. Overfitting. Regularization.

Non-linear models. Basis expansion. Overfitting. Regularization. Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

Domain of Rational Functions

Domain of Rational Functions SECTION 46 RATIONAL FU NCTIONS SKI LLS OBJ ECTIVES Find the domain of a rational function Determine vertical, horizontal, and slant asmptotes of rational functions Graph rational functions CONCE PTUAL

More information

Hierarchical Modelling for Large Spatial Datasets

Hierarchical Modelling for Large Spatial Datasets Hierarchical Modelling for Large Spatial Datasets Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics,

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

Discussion: Clustering Random Curves Under Spatial Dependence

Discussion: Clustering Random Curves Under Spatial Dependence Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large

More information

1 Methods for Posterior Simulation

1 Methods for Posterior Simulation 1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing

More information

arxiv: v1 [stat.me] 2 Jun 2017

arxiv: v1 [stat.me] 2 Jun 2017 Inference for Penalized Spline Regression: Improving Confidence Intervals by Reducing the Penalty arxiv:1706.00865v1 [stat.me] 2 Jun 2017 Ning Dai School of Statistics University of Minnesota daixx224@umn.edu

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation

An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements

More information

Interpolation by Spline Functions

Interpolation by Spline Functions Interpolation by Spline Functions Com S 477/577 Sep 0 007 High-degree polynomials tend to have large oscillations which are not the characteristics of the original data. To yield smooth interpolating curves

More information

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Multiresponse Sparse Regression with Application to Multidimensional Scaling Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

20 Calculus and Structures

20 Calculus and Structures 0 Calculus and Structures CHAPTER FUNCTIONS Calculus and Structures Copright LESSON FUNCTIONS. FUNCTIONS A function f is a relationship between an input and an output and a set of instructions as to how

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Interpretable Dimension Reduction for Classifying Functional Data

Interpretable Dimension Reduction for Classifying Functional Data Interpretable Dimension Reduction for Classifying Functional Data TIAN SIVA TIAN GARETH M JAMES Abstract Classification problems involving a categorical class label Y and a functional predictor X(t) are

More information

A Quick Guide for the EMCluster Package

A Quick Guide for the EMCluster Package A Quick Guide for the EMCluster Package Wei-Chen Chen 1, Ranjan Maitra 2 1 pbdr Core Team 2 Department of Statistics, Iowa State Universit, Ames, IA, USA Contents Acknowledgement ii 1. Introduction 1 2.

More information

Package FWDselect. December 19, 2015

Package FWDselect. December 19, 2015 Title Selecting Variables in Regression Models Version 2.1.0 Date 2015-12-18 Author Marta Sestelo [aut, cre], Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] Maintainer Marta Sestelo

More information

CSE446: Linear Regression. Spring 2017

CSE446: Linear Regression. Spring 2017 CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill

More information

Tight Clustering: a method for extracting stable and tight patterns in expression profiles

Tight Clustering: a method for extracting stable and tight patterns in expression profiles Statistical issues in microarra analsis Tight Clustering: a method for etracting stable and tight patterns in epression profiles Eperimental design Image analsis Normalization George C. Tseng Dept. of

More information

Data mining techniques for actuaries: an overview

Data mining techniques for actuaries: an overview Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of

More information

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

5.2 Graphing Polynomial Functions

5.2 Graphing Polynomial Functions Locker LESSON 5. Graphing Polnomial Functions Common Core Math Standards The student is epected to: F.IF.7c Graph polnomial functions, identifing zeros when suitable factorizations are available, and showing

More information

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an

Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models

More information

Spatially Adaptive Bayesian Penalized Regression Splines (P-splines)

Spatially Adaptive Bayesian Penalized Regression Splines (P-splines) Spatially Adaptive Bayesian Penalized Regression Splines (P-splines) Veerabhadran BALADANDAYUTHAPANI, Bani K. MALLICK, and Raymond J. CARROLL In this article we study penalized regression splines (P-splines),

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.

More information

02/22/02. Assignment 1 on the web page: Announcements. Test login procedure NOW!

02/22/02.   Assignment 1 on the web page: Announcements. Test login procedure NOW! Announcements Assignment on the web page: www.cs.cmu.edu/~jkh/anim_class.html est login procedure NOW! 0//0 Forward and Inverse Kinematics Parent: Chapter 4. Girard and Maciejewski 985 Zhao and Badler

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Nested Sampling: Introduction and Implementation

Nested Sampling: Introduction and Implementation UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ

More information

Sparsity and image processing

Sparsity and image processing Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents

More information

A fast Mixed Model B-splines algorithm

A fast Mixed Model B-splines algorithm A fast Mixed Model B-splines algorithm arxiv:1502.04202v1 [stat.co] 14 Feb 2015 Abstract Martin P. Boer Biometris WUR Wageningen The Netherlands martin.boer@wur.nl February 17, 2015 A fast algorithm for

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department

More information

Section 1.4 Limits involving infinity

Section 1.4 Limits involving infinity Section. Limits involving infinit (/3/08) Overview: In later chapters we will need notation and terminolog to describe the behavior of functions in cases where the variable or the value of the function

More information

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines

Spline Models. Introduction to CS and NCS. Regression splines. Smoothing splines Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m

More information

Semiparametric Mixed Effecs with Hierarchical DP Mixture

Semiparametric Mixed Effecs with Hierarchical DP Mixture Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients

Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public

More information

Theoretical Concepts of Machine Learning

Theoretical Concepts of Machine Learning Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5

More information

EE 511 Linear Regression

EE 511 Linear Regression EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due

More information

Variable Selection 6.783, Biomedical Decision Support

Variable Selection 6.783, Biomedical Decision Support 6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based

More information

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use

Modeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca

More information

Calibration and emulation of TIE-GCM

Calibration and emulation of TIE-GCM Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

4.2 Properties of Rational Functions. 188 CHAPTER 4 Polynomial and Rational Functions. Are You Prepared? Answers

4.2 Properties of Rational Functions. 188 CHAPTER 4 Polynomial and Rational Functions. Are You Prepared? Answers 88 CHAPTER 4 Polnomial and Rational Functions 5. Obtain a graph of the function for the values of a, b, and c in the following table. Conjecture a relation between the degree of a polnomial and the number

More information

Making Graphs from Tables and Graphing Horizontal and Vertical Lines - Black Level Problems

Making Graphs from Tables and Graphing Horizontal and Vertical Lines - Black Level Problems Making Graphs from Tables and Graphing Horizontal and Vertical Lines - Black Level Problems Black Level Hperbola. Give the graph and find the range and domain for. EXPONENTIAL Functions - The following

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of

More information

An MM Algorithm for Multicategory Vertex Discriminant Analysis

An MM Algorithm for Multicategory Vertex Discriminant Analysis An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance

More information

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Introduction There is much that is unknown regarding

More information

scikit-learn (Machine Learning in Python)

scikit-learn (Machine Learning in Python) scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize

More information

Estimation of Tikhonov Regularization Parameter for Image Reconstruction in Electromagnetic Geotomography

Estimation of Tikhonov Regularization Parameter for Image Reconstruction in Electromagnetic Geotomography Rafał Zdunek Andrze Prałat Estimation of ikhonov Regularization Parameter for Image Reconstruction in Electromagnetic Geotomograph Abstract: he aim of this research was to devise an efficient wa of finding

More information

Yelp Recommendation System

Yelp Recommendation System Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop

More information

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis

Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3

More information

Nonparametric sparse hierarchical models describe V1 fmri responses to natural images

Nonparametric sparse hierarchical models describe V1 fmri responses to natural images Nonparametric sparse hierarchical models describe V1 fmri responses to natural images Pradeep Ravikumar, Vincent Q. Vu and Bin Yu Department of Statistics University of California, Berkeley Berkeley, CA

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Chapter 7: Numerical Prediction

Chapter 7: Numerical Prediction Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

Variability in Annual Temperature Profiles

Variability in Annual Temperature Profiles Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center

More information

NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University

NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies, preprints, and references a link to Recent Talks and

More information

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53

Model selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53 /53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)

More information