Predictor Selection Algorithm for Bayesian Lasso
|
|
- Cody May
- 6 years ago
- Views:
Transcription
1 Predictor Selection Algorithm for Baesian Lasso Quan Zhang Ma 16, Introduction The Lasso [1] is a method in regression model for coefficients shrinkage and model selection. It is often used in the linear regression model = µ1 n + Xβ + ε where is the response vector with the length of n, µ is the overall mean, X is the n p standardized design matri, β = (β 1,, β p ), and ε is the n 1 vector of iid normal errors with zero mean and unknown variance σ 2. The Lasso estimates, ˆβ, minimizes the sum of the squared residuals subject to β t. It can be described as argmin β p (ỹ Xβ) (ỹ Xβ) + λ β i i=1 where ỹ = ȳ1 n, and λ 0 is the tuning parameter relating to the bound t. It is indicated that ˆβ can be interpreted as a Baesian mode estimate when the parameters β i s have iid double eponential priors, i.e., π(β) = The double eponential distribution can be writen as a scale miture of normals: p i=1 λ 2 e λ β i. (1) α 1 2 e a z = e z2 /(2s) a2 s/2 0 2πs 2 e a2 ds, a > 0. (2) The detailed hierarchical model formulation and the Gibbs sampler implementation can been seen in Park and Casella s Baesian Lasso paper [2]. Note that λ can be not onl pre-specified, but also estimated if a hperprior is placed. A slightl more complicated Baesian hierarchical model and its Gibbs sampler implementation will be given in Section 4. However, as Park and Cassella said [2], this method cannot automaticall perform predictor selection, in other words, it cannot force some of the regression coefficients to zero. We propose an algorithm to automaticall select predictors in the process of the Gibbs sampler. We applied this method to ordinar linear regression problem with variable selection. Another important application is the penalized spline using the l 1 penalt. Using the Baesian Lasso with predictor selection, we can not onl fit the spline model, but also automaticall select knots. Division of Biostatistics, School of Public Health, Universit of Minnesota. 1
2 2 Algorithm for predictor selection As described above, we have p predictors, X 1,, X p and denote the corresponding regression parameters of the predictors as β 1,, β p. The main idea of the algorithm is to throw out one predictor whose coefficient is the most insignificant after some iterations of the Gibbs sampler till all the remaining predictors coefficients are significant based on a specific Baesian credible interval (BCI). The algorithm can be fulfilled through the following steps: 1. Run N 1 iterations of the Gibbs sampler, where N 1 is a large number, sa 10, 000. Note that the N 1 iterations are just burn-in and we found it neccessar especiall when p is large. 2. Run another M iterations where M is relativel small, like 1, 000. For a specific p, for eample, 0.05, if the (1 p ) Baesian credible interval based on the last M iterations of all the p β s eclude zero, then continue running the Gibbs sampler till convergence. Otherwise, go to Step Define N1+M k drop = argma k min( g=n 1 +1 I(β (g) k > 0), N 1 +M g=n 1 +1 I(β (g) k < 0)) where β (g) k is the sample value of β k in the gth iteration of the Gibbs sampler for k = 1,, p. Throw out X kdrop. That is to sa, we throw out the β that is most centered to 0 based on the last M iterations of the Gibbs sampler. 4. Replace N 1 b N 1 + M and p b p 1. Let k = 1,, k drop 1, k drop + 1,, p and go to step 2. This algorithm will select ver few predictors if X i s are highl correlated, even though there are a large number of predictors. We can define the number of predictors to be selected in the algorithm rather than defining p. This ma result in some of the selected predictors coefficient of insignificance. 3 Linear regression We applied our algorithm to a linear regression problem on the diabetes data [3], which has n = 442 and p = 10. The predictors are 10 baseline variables, age, se, bod mass inde, average blood pressure, and si blood serum measurements were obtained for each of 442 diabetes patients, and the response of interest is a quantitative measure of disease progression one ear after baseline. We performed a Baesian Lasso with the predictor selection algorithm described in Section 2, setting p = 0.05 and placing a hperprior on λ to estimate it simultaneousl. Then we perform the Baesian Lasso without selecting predictors and the ordinar Lasso both of which have the tuning parameter matching the value of λ estimated b the algorithm. The results are summarized in Table 1 along with the ordinar Lasso whose λ is chosen b n-fold cross-validation. Comparing the Baesian Lasso using the algorithm with the one not using the algorithm, the algorithm threw out all predictors that have an insignificant coefficient based on the 95% credible interval, ecept one predictor, (7) hdl. hdl has a marginall significant coefficient in the Baesian Lasso without the algorithm, but has a highl significant coefficient if using the algorithm. If we compare the Baesian Lasso with the algorithm and the ordinar Lasso with λ matched, we find the algorithm tends to select fewer predictors: (5) tc and (10) glu are non-zero in the ordinar Lasso but are thrown out b the algorithm. If we look at the last two columns, we find the Lasso with λ matched shinks β more than that b the cross-validation. Furthermore, we performed the Baesian Lasso without the predictor selection algorithm and estimate λ simultaneousl b placing a hperprior. We also fit the model b the Baesian Lasso using the algorithm and the ordinar Lasso 2
3 predictor Baesian Lasso (predictor selection) 95% BCI Baesian Lasso 95% BCI Lasso (λ matched) Lasso (n-fold c.v.) (1) age 0 NA ( , ) 0 0 (2) se ( , ) ( , ) (3) bmi (396.68, ) (391.97, ) (4) map (181.85, ) (176.65, ) (5) tc 0 NA ( , ) (6) ldl 0 NA ( , ) 0 0 (7) hdl ( , ) ( , 64.75) (8) tch 0 NA ( , ) 0 0 (9) ltg (335.30, ) (325.79, ) (10) glu 0 NA (-52.77, ) Table 1: Comparison of linear regression parameters for the diabetes data using the λ estimated b the Baesian lasso with the predictor selection algorithm. both of which match the tuning parameter. The results are given in Table 2. We found the Baesian Lasso using the algorithm selects 5 predictors while the ordinar Lasso selects 7. From the 5th and 10th rows in both Table 1 and 2 we can see, given a tuning paramter, for a predictor that the Baesian Lasso using the algorithm throws out whereas the ordinar Lasso selects, the estimate of its coefficent b the ordinar Lasso alwas has a larger absolute value than b the Baesian Lasso without the algorithm. predictor Baesian Lasso (predictor selection) 95% BCI Baesian Lasso 95% BCI Lasso (λ matched) (1) age 0 NA ( , ) 0 (2) se ( , ) ( , ) (3) bmi (396.80, ) (395.94, ) (4) map (186.78, ) (183.42, ) (5) tc 0 NA ( , ) (6) ldl 0 NA ( , ) 0 (7) hdl ( , ) ( , 92.73) (8) tch 0 NA ( , ) 0 (9) ltg (349.14, ) (335.42, ) (10) glu 0 NA (-49.00, ) Table 2: Comparison of linear regression parameters for the diabetes data using the λ estimated b the Baesian Lasso without predictor selection. 4 Penalized Spline using l 1 penalt The Baesian Lasso with the predictor selection algorithm can be applied to penalized spline models using l 1 penalt so that we can fit the model and select knots from a set of candidates simultaneousl. The model is constructed as = µ1 n + Xβ + Zu + ε (3) 3
4 where Z is the n K standardized design matri, u = (u 1,, u K ) and other notations are defined as in Section 1. Then (β, u) are estimated as ( ˆβ, û) = argmin (β,u) where ỹ and λ has the same meaning as in Section 1. K (ỹ Xβ Zu) (ỹ Xβ Zu) + λ u i i=1 4.1 Baesian hierarchical model and Gibbs sampler implementation Analougous to Baesian Lasso for linear regression model [2], the hierarchical representation of the full model is ỹ X, β, Z, u, σ 2 N n (Xβ + Zu, σ 2 I n ) β π(β)dβ, π(β) 1 u τ 2 1,, τ 2 K, λ, σ 2 N K (0 K, σ 2 D τ ), D τ = diag(τ 2 1,, τ 2 K) τ 2 1,, τ 2 K λ Gamma(1, λ2 2 ) λ 2 Gamma(r, δ) σ 2 π(σ 2 )dσ 2, π(σ 2 ) 1 σ 2 with τ and σ 2 independent. According to (2), integrating out τ, the conditional prior on u will be π(u λ, σ 2 ) = K i=1 λ 2σ e λ u i /σ. (4) Park and Casella [2] eplained in detail wh the prior in (4) is preferred to the one in (1). As for the Gamma prior on λ 2, we need the prior variance of λ large enough to make the prior relativel flat. So set r = 1 and δ = 0.1 so that the prior mean of λ 2 is much larger than ˆλ (ˆλ s are less than 1 throughout all models in this paper). The full conditional distributions of the parameters are ( β N p (X X) 1 X (ỹ Zu), σ 2 (X X) 1) ( u N K (Z Z + Dτ 1 ) 1 Z (ỹ Xβ), σ 2 (Z Z + Dτ 1 ) 1) ( p + n 1 σ 2 InvGamma, ỹ Xβ Zu 2 + u D 1 ) τ u λ Inverse Gaussian with mean parameter as 2 σ 2 τ j u 2 and scale parameter as λ 2, j = 1,, K j ( Kj=1 λ 2 τ 2 ) j Gamma K + r, + δ 2 The Gibbs sampler samples parameters cclicall from their full conditionals above. We construct the hierarchical model using ỹ instead of since µ is often of secondar interest. But we make inference of µ, we can still draw samples from its full conditional distribution, N( 1 ni=1 n, σ2 n ), if each column of X and Z is centered. 4
5 4.2 Penalized spline model for the global temperature data using B-spline Basis The Global temperature data [4] recorded 125 ears () of global temperature (Y ) from ear 1881 to The design matri X is obtained b standardizing (, 2, 3 ) so that each column of X has mean zero and l 2 norm one. If Z is a polnomial basis matri, its columns will be highl correlated. It is known that Lasso is not good at dealing with problems when the predictors are highl correlated [5], so we use a B-spline basis matri generated b the R function splinedesign: set the option order = 4 and knots as 23 equall spaced points on the interval [min()+2, ma() 2] such that the function returns a design matri with 21 columns. We obtain Z b standardizing this matri B-spline and truncated cubic spline using l 2 penalt This section is to show how B-spline fits (3) using the l 2 penalt and compare it to the truncated cubic spline. Define W = ( ) ( κ 1 ) 3 +,, ( κ 21 ) 3 + where the knots κ 1,, κ 21 are equall spaced points on the interval [min()+2, ma() 2]. Then we obatin Z b standardizing W. We use Z to fit (3) using l 2 penalt on u b the R function lme, and compare the fitted curve with the one using Z. The curves are plotted in Figure 1 and we can see the curve fitted using B-spline are more bump. B spline truncated cubic spline Figure 1: B-spline and truncated cubic spline using l 2 penalt. 5
6 4.2.2 B-spline knots selection We can appl the predictor selection algorithm to the penalized spline model to select knots automaticall. We use the design matri X and the B-spline basis matri Z using l 1 penalt to fit (3). Running the Gibbs sampler with the algorithm, 3, 5, 6 and 6 knots are selected corresponding to p equal to 0.01, 0.05, 0.1 and 0.2, respectivel. The fitted curves are plotted in Figure 2. We can see the smoothed curves do not fit ver well; the curve tends to fit better as p increases. We can also specif the number of knots to select in the algorithm, rather then setting p. Figure 3 plotted p * = p * = p * = p * = Figure 2: B-spline using l 1 penalt and knots selection for p = 0.01, 0.05, 0.1, 0.2. the fitted curve with 10 knots. It fits much better than all in Figure 2. 5 Discussion Like ordinar Lasso, the Baesian Lasso with the predictor selection algorithm does not select predictors stabl; even if we run the Gibbs sampler several times with all the same settings, the results of predictor selection and parameter estimation ma differ a lot especiall when the predictors are highl correlated. Another problem of the algorithm is the computing speed. First, we need a burn-in of N 1 iterations and N 1 ma be a ver big number, sa , when we initiall have a lot of predictors. Furthermore, M should also be large, sa 10000, because the coefficient of a predictor ma change a lot after throwing out a correlated predictor. When we stop throwing out predictors, we need to run the Gibbs sampler till convergence. In this paper, we just run a big number of iterations and use the trace plot to check convergence. The method of Monte Carlo standard errors 6
7 Figure 3: B-spline using l 1 penalt with 10 knots. can be used to determine when to stop the sampler [6]. 7
8 References [1] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Roal Statistical Societ. Series B (Methodological), pages , [2] Trevor Park and George Casella. The baesian lasso. Journal of the American Statistical Association, 103(482): , [3] Bradle Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani, et al. Least angle regression. The Annals of statistics, 32(2): , [4] James S Hodges. Richl Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. CRC Press, [5] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Roal Statistical Societ: Series B (Statistical Methodolog), 67(2): , [6] James M Flegal, Murali Haran, Galin L Jones, et al. Markov chain monte carlo: Can we trust the third significant figure? Statistical Science, 23(2): ,
GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute
GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version
More informationPackage biglars. February 19, 2015
Package biglars February 19, 2015 Type Package Title Scalable Least-Angle Regression and Lasso Version 1.0.2 Date Tue Dec 27 15:06:08 PST 2011 Author Mark Seligman, Chris Fraley, Tim Hesterberg Maintainer
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationBayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri
Bayesian Spatiotemporal Modeling with Hierarchical Spatial Priors for fmri Galin L. Jones 1 School of Statistics University of Minnesota March 2015 1 Joint with Martin Bezener and John Hughes Experiment
More informationPackage batchmeans. R topics documented: July 4, Version Date
Package batchmeans July 4, 2016 Version 1.0-3 Date 2016-07-03 Title Consistent Batch Means Estimation of Monte Carlo Standard Errors Author Murali Haran and John Hughes
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials
More informationMarkov chain Monte Carlo methods
Markov chain Monte Carlo methods (supplementary material) see also the applet http://www.lbreyer.com/classic.html February 9 6 Independent Hastings Metropolis Sampler Outline Independent Hastings Metropolis
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationA GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM
A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM Jayawant Mandrekar, Daniel J. Sargent, Paul J. Novotny, Jeff A. Sloan Mayo Clinic, Rochester, MN 55905 ABSTRACT A general
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationLasso.jl Documentation
Lasso.jl Documentation Release 0.0.1 Simon Kornblith Jan 07, 2018 Contents 1 Lasso paths 3 2 Fused Lasso and trend filtering 7 3 Indices and tables 9 i ii Lasso.jl Documentation, Release 0.0.1 Contents:
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationThe fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R
Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationSTAT 705 Introduction to generalized additive models
STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear
More informationPackage mcmcse. February 15, 2013
Package mcmcse February 15, 2013 Version 1.0-1 Date 2012 Title Monte Carlo Standard Errors for MCMC Author James M. Flegal and John Hughes Maintainer James M. Flegal
More information3.5 Rational Functions
0 Chapter Polnomial and Rational Functions Rational Functions For a rational function, find the domain and graph the function, identifing all of the asmptotes Solve applied problems involving rational
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationNon-linear models. Basis expansion. Overfitting. Regularization.
Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationDomain of Rational Functions
SECTION 46 RATIONAL FU NCTIONS SKI LLS OBJ ECTIVES Find the domain of a rational function Determine vertical, horizontal, and slant asmptotes of rational functions Graph rational functions CONCE PTUAL
More informationHierarchical Modelling for Large Spatial Datasets
Hierarchical Modelling for Large Spatial Datasets Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics,
More informationComparison of Optimization Methods for L1-regularized Logistic Regression
Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com
More informationDiscussion: Clustering Random Curves Under Spatial Dependence
Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLecture 17: Smoothing splines, Local Regression, and GAMs
Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large
More information1 Methods for Posterior Simulation
1 Methods for Posterior Simulation Let p(θ y) be the posterior. simulation. Koop presents four methods for (posterior) 1. Monte Carlo integration: draw from p(θ y). 2. Gibbs sampler: sequentially drawing
More informationarxiv: v1 [stat.me] 2 Jun 2017
Inference for Penalized Spline Regression: Improving Confidence Intervals by Reducing the Penalty arxiv:1706.00865v1 [stat.me] 2 Jun 2017 Ning Dai School of Statistics University of Minnesota daixx224@umn.edu
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationSplines and penalized regression
Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationInterpolation by Spline Functions
Interpolation by Spline Functions Com S 477/577 Sep 0 007 High-degree polynomials tend to have large oscillations which are not the characteristics of the original data. To yield smooth interpolating curves
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationMoving Beyond Linearity
Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationSplines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)
Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function
More informationBayes Estimators & Ridge Regression
Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o
More information20 Calculus and Structures
0 Calculus and Structures CHAPTER FUNCTIONS Calculus and Structures Copright LESSON FUNCTIONS. FUNCTIONS A function f is a relationship between an input and an output and a set of instructions as to how
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationInterpretable Dimension Reduction for Classifying Functional Data
Interpretable Dimension Reduction for Classifying Functional Data TIAN SIVA TIAN GARETH M JAMES Abstract Classification problems involving a categorical class label Y and a functional predictor X(t) are
More informationA Quick Guide for the EMCluster Package
A Quick Guide for the EMCluster Package Wei-Chen Chen 1, Ranjan Maitra 2 1 pbdr Core Team 2 Department of Statistics, Iowa State Universit, Ames, IA, USA Contents Acknowledgement ii 1. Introduction 1 2.
More informationPackage FWDselect. December 19, 2015
Title Selecting Variables in Regression Models Version 2.1.0 Date 2015-12-18 Author Marta Sestelo [aut, cre], Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] Maintainer Marta Sestelo
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationTight Clustering: a method for extracting stable and tight patterns in expression profiles
Statistical issues in microarra analsis Tight Clustering: a method for etracting stable and tight patterns in epression profiles Eperimental design Image analsis Normalization George C. Tseng Dept. of
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationPicasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which
More informationAssessing the Quality of the Natural Cubic Spline Approximation
Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,
More information5.2 Graphing Polynomial Functions
Locker LESSON 5. Graphing Polnomial Functions Common Core Math Standards The student is epected to: F.IF.7c Graph polnomial functions, identifing zeros when suitable factorizations are available, and showing
More informationSoft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an
Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models
More informationSpatially Adaptive Bayesian Penalized Regression Splines (P-splines)
Spatially Adaptive Bayesian Penalized Regression Splines (P-splines) Veerabhadran BALADANDAYUTHAPANI, Bani K. MALLICK, and Raymond J. CARROLL In this article we study penalized regression splines (P-splines),
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More information02/22/02. Assignment 1 on the web page: Announcements. Test login procedure NOW!
Announcements Assignment on the web page: www.cs.cmu.edu/~jkh/anim_class.html est login procedure NOW! 0//0 Forward and Inverse Kinematics Parent: Chapter 4. Girard and Maciejewski 985 Zhao and Badler
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationNested Sampling: Introduction and Implementation
UNIVERSITY OF TEXAS AT SAN ANTONIO Nested Sampling: Introduction and Implementation Liang Jing May 2009 1 1 ABSTRACT Nested Sampling is a new technique to calculate the evidence, Z = P(D M) = p(d θ, M)p(θ
More informationSparsity and image processing
Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents
More informationA fast Mixed Model B-splines algorithm
A fast Mixed Model B-splines algorithm arxiv:1502.04202v1 [stat.co] 14 Feb 2015 Abstract Martin P. Boer Biometris WUR Wageningen The Netherlands martin.boer@wur.nl February 17, 2015 A fast algorithm for
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationSection 1.4 Limits involving infinity
Section. Limits involving infinit (/3/08) Overview: In later chapters we will need notation and terminolog to describe the behavior of functions in cases where the variable or the value of the function
More informationSpline Models. Introduction to CS and NCS. Regression splines. Smoothing splines
Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m
More informationSemiparametric Mixed Effecs with Hierarchical DP Mixture
Semiparametric Mixed Effecs with Hierarchical DP Mixture R topics documented: April 21, 2007 hdpm-package........................................ 1 hdpm............................................ 2 hdpmfitsetup........................................
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationImage analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis
7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning
More informationGene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients
1 Gene signature selection to predict survival benefits from adjuvant chemotherapy in NSCLC patients 1,2 Keyue Ding, Ph.D. Nov. 8, 2014 1 NCIC Clinical Trials Group, Kingston, Ontario, Canada 2 Dept. Public
More informationTheoretical Concepts of Machine Learning
Theoretical Concepts of Machine Learning Part 2 Institute of Bioinformatics Johannes Kepler University, Linz, Austria Outline 1 Introduction 2 Generalization Error 3 Maximum Likelihood 4 Noise Models 5
More informationEE 511 Linear Regression
EE 511 Linear Regression Instructor: Hanna Hajishirzi hannaneh@washington.edu Slides adapted from Ali Farhadi, Mari Ostendorf, Pedro Domingos, Carlos Guestrin, and Luke Zettelmoyer, Announcements Hw1 due
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationModeling Criminal Careers as Departures From a Unimodal Population Age-Crime Curve: The Case of Marijuana Use
Modeling Criminal Careers as Departures From a Unimodal Population Curve: The Case of Marijuana Use Donatello Telesca, Elena A. Erosheva, Derek A. Kreader, & Ross Matsueda April 15, 2014 extends Telesca
More informationCalibration and emulation of TIE-GCM
Calibration and emulation of TIE-GCM Serge Guillas School of Mathematics Georgia Institute of Technology Jonathan Rougier University of Bristol Big Thanks to Crystal Linkletter (SFU-SAMSI summer school)
More informationStatistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2
Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response
More information4.2 Properties of Rational Functions. 188 CHAPTER 4 Polynomial and Rational Functions. Are You Prepared? Answers
88 CHAPTER 4 Polnomial and Rational Functions 5. Obtain a graph of the function for the values of a, b, and c in the following table. Conjecture a relation between the degree of a polnomial and the number
More informationMaking Graphs from Tables and Graphing Horizontal and Vertical Lines - Black Level Problems
Making Graphs from Tables and Graphing Horizontal and Vertical Lines - Black Level Problems Black Level Hperbola. Give the graph and find the range and domain for. EXPONENTIAL Functions - The following
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own
More informationChapter 1. Introduction
Chapter 1 Introduction A Monte Carlo method is a compuational method that uses random numbers to compute (estimate) some quantity of interest. Very often the quantity we want to compute is the mean of
More informationAn MM Algorithm for Multicategory Vertex Discriminant Analysis
An MM Algorithm for Multicategory Vertex Discriminant Analysis Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park May 22, 2008 Joint work with Professor Kenneth
More informationPerformance Evaluation
Performance Evaluation Dan Lizotte 7-9-5 Evaluating Performance..5..5..5..5 Which do ou prefer and wh? Evaluating Performance..5..5 Which do ou prefer and wh?..5..5 Evaluating Performance..5..5..5..5 Performance
More informationMapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.
Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Introduction There is much that is unknown regarding
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationEstimation of Tikhonov Regularization Parameter for Image Reconstruction in Electromagnetic Geotomography
Rafał Zdunek Andrze Prałat Estimation of ikhonov Regularization Parameter for Image Reconstruction in Electromagnetic Geotomograph Abstract: he aim of this research was to devise an efficient wa of finding
More informationYelp Recommendation System
Yelp Recommendation System Jason Ting, Swaroop Indra Ramaswamy Institute for Computational and Mathematical Engineering Abstract We apply principles and techniques of recommendation systems to develop
More informationIntelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis
Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3
More informationNonparametric sparse hierarchical models describe V1 fmri responses to natural images
Nonparametric sparse hierarchical models describe V1 fmri responses to natural images Pradeep Ravikumar, Vincent Q. Vu and Bin Yu Department of Statistics University of California, Berkeley Berkeley, CA
More informationMCMC Methods for data modeling
MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationLinear Modeling with Bayesian Statistics
Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the
More informationVariability in Annual Temperature Profiles
Variability in Annual Temperature Profiles A Multivariate Spatial Analysis of Regional Climate Model Output Tamara Greasby, Stephan Sain Institute for Mathematics Applied to Geosciences, National Center
More informationNONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University
NONPARAMETRIC REGRESSION WIT MEASUREMENT ERROR: SOME RECENT PR David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies, preprints, and references a link to Recent Talks and
More informationModel selection. Peter Hoff STAT 423. Applied Regression and Analysis of Variance. University of Washington /53
/53 Model selection Peter Hoff STAT 423 Applied Regression and Analysis of Variance University of Washington Diabetes example: y = diabetes progression x 1 = age x 2 = sex. dim(x) ## [1] 442 64 colnames(x)
More information