Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an
|
|
- Sherilyn Gibbs
- 6 years ago
- Views:
Transcription
1 Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models is proposed. The estimator leads to a sparse representation of the results as sums of a few basis functions. In the special case of function estimation, it reduces to the soft threshold estimator widely used by the wavelet community. By using appropriate sets of basis functions, varying coecients characterizing bumps or periodic functions can also be modelled within the same framework. KEYWORDS: Generalized additive models; Shrinkage estimation; Splines; Varying coecients; Wavelets 1 Introduction In many applications the parametric form of common generalized linear models is too restrictive. One way to obtain more exibility is to assume the coecients as functions j (x j ) varying over other (metrical) covariates x j. Varyingcoecient models of this general form were introduced by Hastie and Tibshirani (1993). Extending the predictor of generalized linear models to = (x ) + 1 (x 1 )z 1 + : : : + p (x p )z p ; (1.1) they are a valuable tool for exploring interactions between coded categorical covariates z j and their eect-modiers x j. Semiparametric models, where x 1 : : : x p 1, generalized linear models for time series data (x = : : : = x p = t) and generalized additive models (z 1 : : : z p 1) are important special cases of (1.1). Usually estimation of the varying coecients is carried out by penalized likelihood estimation, leading to smoothing splines, or by maximizing local likelihoods. These approaches are oriented on linear or polynomial regression, and hence, they are not always appropriate for modelling baseline functions or intercepts, collecting unobserved variables, and seasonal components. We propose an alternative penalized likelihood estimator motivated by soft thresholding of wavelet coecients. To review the basic idea, consider the problem of function estimation in a model y i = f(x i ) + " i, " i i:i:d N(; 2 ), and let P be an orthogonal matrix with columns created by vectors of point evalu-
2 Soft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth and describe characteristics of the underlying systematic of f. With ~c = P y, ~c = (~c 1 ; : : : ; ~c n ), y = (y 1 ; : : : ; y n ), we have ~c i i:i:d N(; 2 ). The strategy of soft thresholding is to set all coecients ~c i with absolute value smaller than a specied noise level to zero. Since small ~c i correspond to more `smooth' or desirable functions f, big coecients are shrunken towards zero by taking the noise level o. Formally the soft threshold strategy is described by ^c i = sgn(~c i ) max(; j~c i j? ) = sgn(~c i )(j~c i j? ) + ; (1.2) and the estimator is ^f = P ^c, ^f = ( ^f(x1 ); : : : ; ^f(x n )). Asymptotic optimality results in the context of wavelet basis functions in general function spaces for soft thresholding (1.2) were derived by Donoho and Johnstone (1994) and Donoho, Johnstone, Kerkyacharian and Picard (1995). For orthogonal design the estimator (1.2) corresponds to a minimum of the absolute penalized least squares estimator ^c = arg min(y? P c) (y? P c) + X i jc i j: (1.3) For non orthogonal design matrices X, Tibshirani (1996) proposes the restricted least squares estimator minimize subject to (y P?? X) (y?? X) j j jj ; > : in the context of variable selection and shrinkage by LASSO (Least absolute shrinkage and regression operator). By introducing Lagrange multipliers this leads to the proposed generalized soft threshold estimator dened by ^ = arg max l(; )? X j j j j; > : These two investigations make one of the main features of soft thresholding or absolute penalties transparent: Only few basis functions, describing characteristics of the unknown functions, are included in the estimator ^ j (x j ). Results become easier to interpret and analyze. By the following reasons the sparse representation of the estimator is of particular value in applications: Results are characterized as features of the varying coecients, such as maxima, minima or frequency. Soft thresholding directly describes the results as sums of functions with specic characteristics. Varying coecients may be highly correlated or `concurvous' (Buja, Hastie and Tibshirani, 1989). Detection and analysis of this correlation by using a parametric approach based on soft thresholding prevents from possible misinterpretation.
3 Soft Threshold Estimation for Varying{coecient Models 3 Model checking and diagnosis can be performed using only few coecients detected by soft thresholding. By using locally supported basis functions, soft thresholding adapts well to functions non homogeneous in smoothness. 2 Generalized Soft Thresholding for Varying{coecient Models Introducing vectors of point evaluations j = ( j (x j1 ); : : : ; j (x jsj )), x j1 < : : : < x jsj for the functions j (x j ), j = 1; : : : ; p, the predictor at the observed values of the eect-modiers = (x 1 ; : : : ; x p ) can be written as = Z, = (1 : : : ; p). Here Z is a large, usually very sparse matrix consisting of the z 1 ; : : : ; z p. In the case of few eect-modiers sparsity usually holds also for Z Z, otherwise the number of non-zeros depends on the actual design. Generalized soft thresholding of the functions 1 (x 1 ); : : : ; p (x p ) is carried out by representing them as a sum of (orthogonal) basis functions (x j ), k = 1 : : : S j. This framework reduces the initial function estimation problem by X j (x ju ) = c (x ju ); k to estimation of the basis coecients c. (Smoothness) restrictions on j (x j ) lead to restrictions on the c. These coecients are linked to the dependent variable by a predictor = Zc, c = (c 11 ; : : : ; c psp ), where is an (orthogonal) matrix consisting of the (x ju ). The generalized soft threshold estimator is then dened as absolute penalized likelihood estimator ^c = arg max l()? X j X k jc j; > ; : (1.4) Let s () be partial derivatives of the log{likelihood. One can show that the following rst order conditions are necessary for a maximum of (1.4): js (^)j if ^c = ; s (^) = if ^c > ; (1.5) s (^) =? if ^c < : These equations may be interpreted as follows: If a coecient ^c is set to zero, its score function s (^) is smaller than. Hence the maximum likelihood estimator is also close to zero, or the likelihood is at in this direction. Maximum likelihood estimation of this coecient would not increase the likelihood more than inclusion of a covariate vector consisting of pure noise, and thus this coecient is omitted. By adding the noise level to the score function, the nonzero coecients c are shrunken towards more favorable values leading to `smooth' j (x j ).
4 Soft Threshold Estimation for Varying{coecient Models Algorithms To obtain a fast algorithm for estimation, we follow the proposal of Tishler and Zang (1982) and approximate the absolute penalty by the dierentiable function 8 ><?c ; if c? c jc j h(c ; ) = >: ; if? < c < : (1.6) c ; if c Computation is then done by a modied Gauss{Newton or Fisher scoring procedure: Algorithm 1: Do while any jc (m+1) (Z of full rank)? c (m) j > : 1. Compute the vector d (m) with elements d (m) = sgn(c (m) ) and the diagonal matrix D (m) = diag(ifjc (m) j < g =) 2. Compute the score vector s( (m) ) (m) )=@c (m) and the (expected) negative second derivative matrix F ( (m) ) =?@ 2 l( (m) )=@c (m). 3. Solve the system [F ( (m) ) + D (m) ]c (m+1) = F ( (m) )c (m) + s( (m) )? d (m) to obtain updated values c (m+1). 4. Trim steps crossing the zero: If fc (m) 6= and sgn(c (m+1) ) 6= sgn(c (m) )g; set c(m+1) =. Trimming of coecients in step 4 ensures that for small termination criterion, the coecients c do not alternate around (?; +). At convergence of Algorithm 1 we have s (^) = d = sgn(^c ) s (^) = ^c = < if j^c j if j^c j < and the necessary conditions for a maximum of the absolute penalized log{ likelihood (1.5) are fullled up to the termination criterion. The result is checked and improved in a further step by starting Algorithm 1 again with a basis matrix S consisting only of basis functions which coecients exceeded jc j in the rst step. In varying{coecient models the number of possible basis functions is often very large and Z may not be of full rank. In the following algorithm we make use of the fact that only a small fraction of coecients c are estimated unequal to zero. To select the global threshold, it is convenient to compute the
5 Soft Threshold Estimation for Varying{coecient Models 5 estimator for a sequence of threshold parameters () > : : : > (l) > : : : > (L). We start with the embedded model () = 1 characterized by the coecients having =. The embedded model contains at least an intercept term and coecients c which are not shrunken. For varying{coecient models this usually corresponds to a common generalized linear model in the covariates z 1 ; : : : ; z p. Algorithm 2: 1. Let S be the set of all indexes with = and let l = Estimate a generalized linear model using only the columns Z S. 3. Select the threshold values based on this estimate. 4. Do while l L: (a) If 9 =2 S : js ()j > (l) then add the index with = arg max js ()j= to S. (b) Estimate the coecients c by applying Algorithm 1 only to Z S. (c) If 8 =2 S : js ()j (l) : Keep the result c (l) = c as estimate for (l) and set l = l + 1. Algorithm 2 adds successively basis coecients to the set of non zero coef- cients. When the score function s () for all zero coecients is smaller than the threshold value we have an estimation for (l) and the algorithm proceeds with the next smaller (l+1) < (l). 3 Selecting the Thresholds In contrast to common thresholding of wavelet coecients, the variation of the score functions s () depends on the entries in the matrix Z and on the actual predictor,. By choosing dierent threshold values for each coecient in step 3 of algorithm 2, we take this fact into account. The thresholds,, are selected according to the variation of the score function assuming the embedded generalized linear model dened by =. Let ^c S ; ^ S be an estimate of the coecients in the embedded model and let ^(s (^ S )) be an estimator of the variation of s (^ S ) based on this model. Thresholds are then chosen in step 3 according to = ^(s (^ S )). 3.1 Function estimation Further consideration for the threshold values have to be done for smooth varying coecients. We outline threshold selection for smoothing in the following setting: Smoothing splines As in the common penalized likelihood estimation, smooth spline functions can also be estimated by absolute penalties as described above. Let (x j ) be the
6 Soft Threshold Estimation for Varying{coecient Models 6 orthogonal spline R basis functions as R described by Demmler and Reinsch (1975) and let = f (m) (u)g2 du with R (m) (u)(m) jl (u)du =, l 6= P k. The penalty for ordinary spline smoothing, f (m) j (u)g 2 du, corresponds to c 2. For soft thresholding we use = p ^(s (^ S )) which might be regarded as an estimate of the standard deviation of a score function targeting (m) j (x j ). Coefficients for true function 5 2 True function Absolute Bias.2 Variance Smoothing spline: solid.5 1 Soft thresholding: dashed FIGURE 1. Absolute bias and variance of spline smoothing and soft thresholding for the true function (x) = 2? (5x? 2:5) 2 computed from 1 simulations. Data are based on a logit model for a binomial B(5; (xt)) distribution. The x 1; : : : ; x 1 were simulated according to a uniform U(; 1) distribution. Figure 1 compares spline smoothing with results obtained by soft thresholding of spline functions. The upper left picture shows that the true function can be well approximated by using only the rst few Demmler{Reinsch basis functions. Soft thresholding has about the same bias and variance than spline smoothing but uses only about 3 basis functions more than linear regression. Many other popular linear smoothers may be incorporated in the same manner by adopting the concept of pseudo splines due to Hastie (1996). Wavelets If, for example, j (x j ) is a baseline eect or intercept term collecting unobserved variables, often no prior assumptions about the structure of the coecient can be made. Here wavelet basis functions provide a powerful tool.
7 Soft Threshold Estimation for Varying{coecient Models 7 These orthogonal functions have compact support and decompose the j (x j ) in a hierarchical scheme. They are well suited to describe eects heterogeneous in smoothness. The thresholds may be chosen global, i.e. = ^(s (^ S )) or according to the resolution level l, e.g. = 2 l^(s (^ S )). Trigonometric series When time varying eects are included in the model, often seasonality has to be considered. In the case of periodicity, trigonometric basis functions lead to a sparse representation of the varying coecients. In principle, combinations of dierent types of basis functions, such as polynomial trigonometric series (Eubank and Speckman, 199) or polynomial regression together with wavelets can be used to estimate the j (x j ). 3.2 Selecting the global threshold 3 Estimation (True function dashed) 2 Error Coefficients for true function non zero coefficients log likelihood non zero coefficients FIGURE 2. One simulation drawn from the true function (x) = sin(1x 2 ). The data follow a logit model for a binomial B(5; (xt)) distribution, and 1 xt were simulated according to a uniform U(; 1) distribution. The upper right picture shows the estimation error 1=1 P t ( ^(xt)? (xt)) 2, computed from a sequence of 's, plotted versus the number of ^c k 6=. A good value for the global threshold or smoothing parameter may be chosen by comparing the log{likelihood with the number of non zero coecients. If
8 Soft Threshold Estimation for Varying{coecient Models 8 the true systematic can be well approximated by only a few basis functions, a sharp bend of the log{likelihood function plotted versus the number of non zero coecients is visible. This bend can be used to select the global threshold, since coecients on the right hand side do not contribute to the likelihood signicantly. Figure 2 is typical for this situation. The log{likelihood increases rapidly with inclusion of the rst ve basis functions. Here the estimation error decreases. By including more basis functions the log{likelihood increases only slightly and the error increases. If no distinct bend is visible, another set of basis functions may yield a sparser representation of the underlying systematic, and hence, more precise estimates. Acknowledgments: This work was supported by the Deutsche Forschungsgemeinschaft, Sonderforschungsbereich 386 \Statistische Analyse diskreter Strukturen, Modellierung und Anwendung in Biometrie und Okonometrie." References Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models, Annals of Statistics 17, 453{555. Demmler, A. and Reinsch, C. (1975). Oscillation matrices with spline smoothing, Numerische Mathematik 24, 375{382. Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage, Biometrika 81, 425{455. Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: asymptopia (with discussion)?, Journal of the Royal Statistical Society B 57, 31{369 Eubank, R. L. and Speckman, P. (199). Curve tting by polynomial{trigonometric regression. Biometrika 77, 1{9 Hastie, T. (1996). Pseudo splines, Journal of the Royal Statistical Society B 58, 379{396. Hastie, T. and Tibshirani, R. (1993). Varying-coecient models, Journal of the Royal Statistical Society B 55, 757{796. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society B 58, 267{288. Tishler, A. and Zang, I. (1982). An absolute deviations curve{tting algorithm for nonlinear models. In S.H. Zanakis and J.S. Rustagi (eds.) Optimization in Statistics, TIMS Studies in Management Science, Vol. 19, North Holland.
Linear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value
More informationNonparametric Regression
Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationImproved smoothing spline regression by combining estimates of dierent smoothness
Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004) 133 140 Improved smoothing spline regression by combining estimates of dierent smoothness Thomas C.M. Lee Department
More informationCoxFlexBoost: Fitting Structured Survival Models
CoxFlexBoost: Fitting Structured Survival Models Benjamin Hofner 1 Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg joint work with Torsten
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationLinear penalized spline model estimation using ranked set sampling technique
Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 669 683 Linear penalized spline model estimation using ranked set sampling technique Al Kadiri M A Abstract Benets of using Ranked
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationNonparametric Mixed-Effects Models for Longitudinal Data
Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data
More informationIntroduction to ANSYS DesignXplorer
Lecture 4 14. 5 Release Introduction to ANSYS DesignXplorer 1 2013 ANSYS, Inc. September 27, 2013 s are functions of different nature where the output parameters are described in terms of the input parameters
More informationEE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.
EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationAnalysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS
Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition
More informationBias-Variance Tradeos Analysis Using Uniform CR Bound. Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers. University of Michigan
Bias-Variance Tradeos Analysis Using Uniform CR Bound Mohammad Usman, Alfred O. Hero, Jerey A. Fessler and W. L. Rogers University of Michigan ABSTRACT We quantify fundamental bias-variance tradeos for
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability
More informationComparison of Optimization Methods for L1-regularized Logistic Regression
Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com
More informationJ. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998
Density Estimation using Support Vector Machines J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report CSD-TR-97-3 February 5, 998!()+, -./ 3456 Department of Computer Science
More informationLecture 19: November 5
0-725/36-725: Convex Optimization Fall 205 Lecturer: Ryan Tibshirani Lecture 9: November 5 Scribes: Hyun Ah Song Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationGENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute
GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version
More informationAbstract R-splines are introduced as splines t with a polynomial null space plus the sum of radial basis functions. Thin plate splines are a special c
R-Splines for Response Surface Modeling July 12, 2000 Sarah W. Hardy North Carolina State University Douglas W. Nychka National Center for Atmospheric Research Raleigh, NC 27695-8203 Boulder, CO 80307
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationClassification by Nearest Shrunken Centroids and Support Vector Machines
Classification by Nearest Shrunken Centroids and Support Vector Machines Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics, Computational Diagnostics Group,
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationLinear Penalized Spline Model Estimation Using Ranked Set Sampling Technique
Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant
More informationNonparametric and Semiparametric Econometrics Lecture Notes for Econ 221. Yixiao Sun Department of Economics, University of California, San Diego
Nonparametric and Semiparametric Econometrics Lecture Notes for Econ 221 Yixiao Sun Department of Economics, University of California, San Diego Winter 2007 Contents Preface ix 1 Kernel Smoothing: Density
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationNonparametric regression using kernel and spline methods
Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationdavidr Cornell University
1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent
More informationGeneralized Additive Models
:p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationHandbook of Statistical Modeling for the Social and Behavioral Sciences
Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationSparsity and image processing
Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationCHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT
CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationLudwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer
Ludwig Fahrmeir Gerhard Tute Statistical odelling Based on Generalized Linear Model íecond Edition. Springer Preface to the Second Edition Preface to the First Edition List of Examples List of Figures
More informationNonparametric Estimation of Distribution Function using Bezier Curve
Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationStatistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2
Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response
More informationAdaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh
Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationLecture 22 The Generalized Lasso
Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today
More information100 Myung Hwan Na log-hazard function. The discussion section of Abrahamowicz, et al.(1992) contains a good review of many of the papers on the use of
J. KSIAM Vol.3, No.2, 99-106, 1999 SPLINE HAZARD RATE ESTIMATION USING CENSORED DATA Myung Hwan Na Abstract In this paper, the spline hazard rate model to the randomly censored data is introduced. The
More informationMINI-PAPER A Gentle Introduction to the Analysis of Sequential Data
MINI-PAPER by Rong Pan, Ph.D., Assistant Professor of Industrial Engineering, Arizona State University We, applied statisticians and manufacturing engineers, often need to deal with sequential data, which
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationCH9.Generalized Additive Model
CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationSpline Models. Introduction to CS and NCS. Regression splines. Smoothing splines
Spline Models Introduction to CS and NCS Regression splines Smoothing splines 3 Cubic Splines a knots: a< 1 < 2 < < m
More informationGeneralized least squares (GLS) estimates of the level-2 coefficients,
Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical
More informationIBM SPSS Categories 23
IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification
More informationLatent Class Modeling as a Probabilistic Extension of K-Means Clustering
Latent Class Modeling as a Probabilistic Extension of K-Means Clustering Latent Class Cluster Models According to Kaufman and Rousseeuw (1990), cluster analysis is "the classification of similar objects
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationLocal Minima in Regression with Optimal Scaling Transformations
Chapter 2 Local Minima in Regression with Optimal Scaling Transformations CATREG is a program for categorical multiple regression, applying optimal scaling methodology to quantify categorical variables,
More informationImage Denoising Based on Hybrid Fourier and Neighborhood Wavelet Coefficients Jun Cheng, Songli Lei
Image Denoising Based on Hybrid Fourier and Neighborhood Wavelet Coefficients Jun Cheng, Songli Lei College of Physical and Information Science, Hunan Normal University, Changsha, China Hunan Art Professional
More informationNon-Parametric and Semi-Parametric Methods for Longitudinal Data
PART III Non-Parametric and Semi-Parametric Methods for Longitudinal Data CHAPTER 8 Non-parametric and semi-parametric regression methods: Introduction and overview Xihong Lin and Raymond J. Carroll Contents
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationRegression on a Graph
Regression on a Graph arxiv:0911.1928v1 [stat.me] 10 Nov 2009 Arne Kovac and Andrew D.A.C. Smith Abstract The Signal plus Noise model for nonparametric regression can be extended to the case of observations
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationStatistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *
OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX
More informationAssessing the Quality of the Natural Cubic Spline Approximation
Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,
More informationBayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis
Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical
More informationPenalized Spline Model-Based Estimation of the Finite Populations Total from Probability-Proportional-to-Size Samples
Journal of Of cial Statistics, Vol. 19, No. 2, 2003, pp. 99±117 Penalized Spline Model-Based Estimation of the Finite Populations Total from Probability-Proportional-to-Size Samples Hui Zheng 1 and Roderick
More informationRobust Kernel Methods in Clustering and Dimensionality Reduction Problems
Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust
More informationSmoothing parameterselection forsmoothing splines: a simulation study
Computational Statistics & Data Analysis 42 (2003) 139 148 www.elsevier.com/locate/csda Smoothing parameterselection forsmoothing splines: a simulation study Thomas C.M. Lee Department of Statistics, Colorado
More informationLocally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling
Locally Weighted Least Squares Regression for Image Denoising, Reconstruction and Up-sampling Moritz Baecher May 15, 29 1 Introduction Edge-preserving smoothing and super-resolution are classic and important
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationIntelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis
Intelligent Compaction and Quality Assurance of Roller Measurement Values utilizing Backfitting and Multiresolution Scale Space Analysis Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 arxiv:1302.4631v3
More informationPicasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python J. Ge, X. Li, H. Jiang, H. Liu, T. Zhang, M. Wang and T. Zhao Abstract We describe a new library named picasso, which
More informationDESIGN OF EXPERIMENTS and ROBUST DESIGN
DESIGN OF EXPERIMENTS and ROBUST DESIGN Problems in design and production environments often require experiments to find a solution. Design of experiments are a collection of statistical methods that,
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationLast time... Bias-Variance decomposition. This week
Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional
More informationSegmentation and Grouping
Segmentation and Grouping How and what do we see? Fundamental Problems ' Focus of attention, or grouping ' What subsets of pixels do we consider as possible objects? ' All connected subsets? ' Representation
More information* * * * * * * * * * * * * * * ** * **
Generalized additive models Trevor Hastie and Robert Tibshirani y 1 Introduction In the statistical analysis of clinical trials and observational studies, the identication and adjustment for prognostic
More informationThe cointardl addon for gretl
The cointardl addon for gretl Artur Tarassow Version 0.51 Changelog Version 0.51 (May, 2017) correction: following the literature, the wild bootstrap does not rely on resampled residuals but the initially
More informationREGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER
REGULARIZED REGRESSION FOR RESERVING AND MORTALITY MODELS GARY G. VENTER TODAY Advances in model estimation methodology Application to data that comes in rectangles Examples ESTIMATION Problems with MLE
More informationIndependent Components Analysis through Product Density Estimation
Independent Components Analysis through Product Density Estimation 'frevor Hastie and Rob Tibshirani Department of Statistics Stanford University Stanford, CA, 94305 { hastie, tibs } @stat.stanford. edu
More informationSandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing
Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications
More informationSmoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data
Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationVoxel selection algorithms for fmri
Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent
More informationThe fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R
Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton
More informationUsing the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection
Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual
More informationMoving Beyond Linearity
Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often
More informationSmoothing non-stationary noise of the Nigerian Stock Exchange All-Share Index data using variable coefficient functions
Smoothing non-stationary noise of the Nigerian Stock Exchange All-Share Index data using variable coefficient functions 1 Alabi Nurudeen Olawale, 2 Are Stephen Olusegun 1 Department of Mathematics and
More informationTime Series Analysis by State Space Methods
Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY
More information