Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines. Robert J. Hill and Michael Scholz
|
|
- Stephen Allison
- 5 years ago
- Views:
Transcription
1 Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines Robert J. Hill and Michael Scholz Department of Economics University of Graz, Austria OeNB Workshop Vienna, 9th 10th October 2014 Supported by funds of the Oesterreichische Nationalbank (Anniversary Fund, project number: 14947) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
2 Overview of the talk: 1 Hedonic House Price Indexes (): Incorporating Geospatial Data in Taxonomy: Time Dummy, Average Characteristics, Hedonic Imputation 2 : GAM with Spline vs. Postcode/Region Dummies Estimation of the Semiparametric 3 : Data Set, Missing Characteristics, Results and Main Findings Are Postcode/Region Based Indexes Downward Biased? 4 Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
3 Introduction Motivation: Houses differ both in their physical characteristics and location Exact longitude and latitude of each house are now increasingly available in housing data sets How can we incorporate geospatial data (i.e., longitudes and latitudes) in a hedonic model of the housing market? How much difference does it make? Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
4 for Incorporating Geospatial Data in a Hedonic Index : 1 Distance to amenities (including the city center, nearest train station and shopping center, etc.) as additional characteristics. 2 Spatial autoregressive models 3 A spline function (or some other nonparametric function) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
5 A Taxonomy of for Computing Hedonic House Price Indexes I Time dummy method y = Zβ + Dδ + ε where Z is a matrix of characteristics and D is a matrix of dummy variables. Index: P t = exp(ˆδ t ) where ˆδ t is the estimated coefficient obtained from the hedonic model Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
6 A Taxonomy of for Computing Hedonic House Price Indexes II Average characteristics method Laspeyres : P L t,t+1 = ˆp t+1 ( z t ) ˆp t ( z t ) Paasche : P P t,t+1 = ˆp t+1 ( z t+1 ) ˆp t ( z t+1 ) C = exp (ˆβ c,t+1 ˆβ c,t ) z c,t, c=1 C = exp (ˆβ c,t+1 ˆβ c,t ) z c,t+1, c=1 where z c,t = 1 H t H t h=1 z c,t,h and z c,t+1 = 1 H t+1 z c,t+1,h. H t+1 h=1 Note: Average characteristics methods cannot use geospatial data, since averaging longitudes and latitudes makes no sense. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
7 A Taxonomy of for Computing Hedonic House Price Indexes III Hedonic imputation method Ht+1 ( Paasche : P PSI t,t+1 = p t+1,h ˆp t,h (z t+1,h )) 1/Ht+1 h=1 Ht Laspeyres : P LSI t,t+1 = h=1 ( ) ˆp t+1,h (z t,h ) 1/Ht p t,h Fisher : P FSI t,t+1 = P PSI t,t+1 PLSI t,t+1 ( with Single Imputation) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
8 Our s I (i) semilog with geospatial spline (ii)/(iii) semilog with postcode/region dummies y = Zβ + g(z lat, z long ) + ε (1) y = Zβ + Dδ + ε (2) y is a H 1 vector of log-prices Z is an H C matrix of physical characteristics (including a constant and quarterly dummies) g(z lat, z long ) is the geospatial spline function defined on the longitudes and latitudes D is a matrix of postcode or region dummies. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
9 Our s II (i) semilog with geospatial spline y = Zβ + g(z lat, z long ) + ε (1) (ii)/(iii) semilog with postcode/region dummies y = Zβ + Dδ + ε (2) The parameters to be estimated in (i) are the C 1 vector of characteristic shadow prices β, and the geospatial spline surface g(z lat, z long ). The parameters to be estimated in (ii) or (iii) are the C 1 vector of characteristic shadow prices β, and the B 1 vector of postcode or region shadow prices δ. We consider 16 regions and 242 postcodes. So on average there are 15 postcodes in a region. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
10 Estimation of the Semiparametric I Estimation: The semiparametric model is estimated using the bam-function (Big Additive s) in the mgcv package in R. details gam For the nonparametric part we use an approximation to a thin plate spline (TPS) (developed by S. Wood, 2003), defined on the longitudes and latitudes. details tps In our context, TPRSs have two advantages over other splines: It is not necessary to specify knot locations. The spline surface can be estimated as a function of two explanatory variables. details tprs With a full TPS the computational burden rises rapidly as the number of data points is increased. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
11 Estimation of the Semiparametric II The approximation reduces the computational burden for any given level of fit. It is necessary to select a value for the parameter k which determines the approximation of our spline to a TPS: A lower value of k has a lower computational burden but leads to a worse fit. We choose a value of k where the gains from increasing it further (in terms of better fit) are low. details par choice The smoothing parameter λ determines the smoothness of the spline surface itself. The algorithm selects λ using restricted Maximum Likelihood (REML), where the likelihood function is maximized across all the parameters of the semiparametric model. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
12 Our Data Set The data Sydney, Australia from 2001 to Our characteristics are: Transaction price + exact date of sale Physical: Number of bedrooms, Number of bathrooms, Land area Location: Postcode, Longitude, Latitude Some (physical) characteristics are missing for some houses. There are more gaps in the data in the earlier years in our sample. We have a total of 454,567 transactions. All characteristics are available for only 240,142 of these transactions. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
13 Dealing with Missing Characteristics We impute the price of each house from the model below that has exactly the same mix of characteristics (HM1) : y = f(d, z 1, z 2, z 3, loc) (HM2) : y = f(d, z 2, z 3, loc) (HM3) : (HM4) : (HM5) : (HM6) : (HM7) : (HM8) : y = f(d, z 1, z 3, loc) y = f(d, z 1, z 2, loc) y = f(d, z 3, loc) y = f(d, z 2, loc) y = f(d, z 1, loc) y = f(d, loc) For all obs.: y = log(price), d = quarter dummy, loc = location Not for all obs.: z 1 = land area, z 2 = number of bedrooms, z 3 = number of bathrooms Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
14 Comparing the Performance of Our s I Different performance measures: Akaike information criterion (AIC): trade-off between the goodness of fit and the complexity of the model Sum of squared log errors SSLE t = 1 H t Repeat-Sales as a benchmark H t h=1 [ ln(ˆp th /p th ) ] 2 Z SI h Z SI h = Actual Price Relative /Imputed Price Relative = p / t+k,h p t+k,h ˆp / t+k,h p t+k,h ˆp t+k,h = p th p th p th ˆp th D SI = 1 H H h=1 [ ln(z SI h ) ]2 The spline model significantly outperforms its postcode counterpart. ˆp th Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
15 Comparing the Performance of Our s II Table: Akaike information criterion (restricted data set) (i) (ii) (iii) Table: Sum of squared log errors (restricted data set) (i) (ii) (iii) Table: Sum of squared log price relative errors D SI Restricted Full (i) (ii) (iii) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
16 Spline Surface Based on Restricted Data Set for 2007 Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
17 Price Indexes Calculated on the Restricted Data Set region LM long lat GAM pc LM median rep sales years Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
18 Price Indexes Calculated on the Full Data Set region LM long lat GAM pc LM median rep sales years Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
19 Main Findings The price index rises more when locational effects are captured using a geospatial spline, rather than postcode or region dummies. The gap between the spline and postcode based indexes is small (about 0.5 percent over 11 years). region based indexes is larger (about 4.2 percent over 11 years). The full-sample spline-based price index rises 6.5 percent more over 11 years than its restricted data set counterpart. The median index is dramatically different when the full data set is used. The gap between spline and postcode/region hedonic price indexes is smaller when the full data set is used. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
20 Are Postcode/Region Based Indexes Downward Biased? I A downward bias can arise when the locations of sold houses in a postcode or region get worse over time. Our test procedure: 1 Choose a postcode 2 Calculate the mean number of bedrooms, bathrooms, land area and quarter of sale over the 11 years for that postcode. 3 Impute the price of this average house in every location in which a house actually sold in 2001,...,2011 in that postcode (using spline model of year 2001) 4 Take the geometric mean of these imputed prices for each year. 5 Repeat for another postcode 6 Take the geometric mean across postcodes in each year. 7 Repeat steps 3-6 using the spline of year 2002, spline of 2003, etc. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
21 Are Postcode Based Indexes Downward Biased? II Findings: The geometric means from step 6 fall over time irrespective of which year s spline is used as the reference. Most of the fall occurs in the first half of the sample. The fall is bigger for regions than postcodes. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
22 Evidence of Bias in the Postcode-Based Price Indexes years Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
23 Evidence of Bias in the Region-Based Price Indexes years Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
24 s Splines (or some other nonparametric method), when combined with the hedonic imputation method, provide a flexible way of incorporating geospatial data into a house price index In our data set postcode/region based indexes seem to have a downward bias since they fail to account for a general shift over time in houses sold to worse locations in each postcode/region. The bias is negligible with postcode dummies but not so with region dummies. For a city with postcodes as finely defined as Sydney (about 14,300 residents and 7.39 square kilometers per postcode), postcode dummies do a good job of controlling for locational effects. It is important to use the full data set, and not just observations with no missing characteristics. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
25 Thank you for your attention! Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 25
26 GAM We estimate the Generalized Additive y = Zβ + g(z lat, z long ) + ε with a thin plate regression spline (tprs) Wood (2003) an optimal low rank approximation of a thin plate spline A tprs uses far fewer coefficients than a full spline: computationally efficient, while losing little statistical performance. Big Additive s bam-function from the R package mgcv. We can avoid certain problems in spline smoothing: Choice of knot location and basis Smooths of more than one predictor Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
27 Thin plate spline I Thin plate spline smoothing problem Duchon (1977): Estimate the smooth function g with d-vector x from n observations s.t. y i = g(x i ) + ε i by finding the function ˆf that minimizes penalized problem y = (y 1,..., y n ), y f 2 + λj md (f), (3) f = (f(x 1 ),..., f(x n )), λ is a smoothing parameter, J md (f) is a penalty function measuring the wiggliness of f J md =... ν ν d =m m! ν 1!... ν d! m f x ν xν d d 2 dx 1... dx d Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
28 Thin plate spline II It can be shown that the solution of (3) has the form, n M ˆf(x) = δ i η md ( x x i ) + α j φ j (x), (4) i=1 j=1 δ i and α j are coefficients to be estimated, δ i such that T δ = 0 with T ij = φ j (x i ). The M = ( ) m+d 1 d functions φj are linearly independent polynomials spanning the space of polynomials in R d of degree less than m (i.e. the null space of J md ) ( 1) m+1+d/2 2 η md (r) = 2m 1 π d/2 (m 1)!(m d/2)! r 2m d log(r) d even Γ(d/2 m) 2 2m π d/2 (m 1)! r 2m d d odd Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
29 Figure: Rank 15 Eigen Approx to 2D thin plate spline (Wood, 2006) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
30 Thin plate spline III With E by E ij = η md ( x i x j ), the thin plate spline fitting problem is now the minimization of y Eδ Tα 2 + λδ Eδ s. t. T δ = 0. (5) with respect to δ and α Ideal smoother: Exact weight to the conflicting goals of matching the data and making f smooth Disadvantage: As many parameters as there are data, i.e. O(n 3 ) calculations More on thin plate splines: Wahba (1990) or Green and Silverman (1994) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
31 Thin plate regression spline I The computational burden can be reduced with the use of a low rank approximation, Wood (2003) A parameter space basis that perturbs (5) as little as possible. The basic idea: Truncation of the space of the wiggly components of the spline (with parameter δ), while leaving the α-components unchanged. E = UDU eigen-decomposition of E Appropriate submatrix D k of D and corresponding U k, restricting δ to the column space of U k, i.e. δ = U k δ k, y U k D k δ k Tα 2 + λδ k D k δ k s. t. T U k δ k = 0, (6) with respect to δ k and α. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
32 Thin plate regression spline II The computational cost is reduced from O(n 3 ) to O(k 3 ) Remaining problem: Find U k and D k sufficiently cheaply. Wood (2003) proposes the use of the Lanczos method (Demmel (1997)) which allows the calculation at the substantially lower cost of O(n 2 k) operations. Smoothing parameter selection: Laplace approximation to obtain an approximate REML which is suitable for efficient direct optimization and computationally stable, Wood (2011) Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
33 Thin plate regression spline III In practice: 1 Choose degree of approximation k = 600 (tradeoff: oversmoothing vs. computational burden) 2 Construct tprs-basis from a randomly chosen sample of 2500 data points. The locations of these observations depend on the locational distribution of house sales in that period. Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
34 Figure: Sum of squared errors D SI of the price relatives and computational time for different basis dimensions sse of the price relatives computational time number of basis dimensions Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
35 Figure: AIC distribution and comp. time for 100 repetitions of the y2011-fit based on different number of randomly chosen observations. range of AIC for 100 repetitions computational time number of randomly chosen observations for basis construction Robert J. Hill and Michael Scholz OeNB Workshop, Oct / 10
Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines
Incorporating Geospatial Data in House Price Indexes: A Hedonic Imputation Approach with Splines Robert J. Hill and Michael Scholz University of Graz Austria robert.hill@uni-graz.at michael-scholz@uni-graz.at
More informationScholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion
Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion Dr Jens Mehrhoff*, Head of Section Business Cycle, Price and Property Market Statistics * Jens This Mehrhoff, presentation Deutsche
More informationA toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.
A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationLecture 17: Smoothing splines, Local Regression, and GAMs
Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want
More informationSemiparametric Tools: Generalized Additive Models
Semiparametric Tools: Generalized Additive Models Jamie Monogan Washington University November 8, 2010 Jamie Monogan (WUStL) Generalized Additive Models November 8, 2010 1 / 33 Regression Splines Choose
More informationDoubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica
Boston-Keio Workshop 2016. Doubly Cyclic Smoothing Splines and Analysis of Seasonal Daily Pattern of CO2 Concentration in Antarctica... Mihoko Minami Keio University, Japan August 15, 2016 Joint work with
More informationGAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K. Simple eample Consider a very simple dataset relating the timber volume of cherry trees to
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationNonparametric regression using kernel and spline methods
Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested
More informationNonparametric Mixed-Effects Models for Longitudinal Data
Nonparametric Mixed-Effects Models for Longitudinal Data Zhang Jin-Ting Dept of Stat & Appl Prob National University of Sinagpore University of Seoul, South Korea, 7 p.1/26 OUTLINE The Motivating Data
More informationGeneralized Additive Models
:p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1
More informationSplines and penalized regression
Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,
More informationMoving Beyond Linearity
Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often
More informationStraightforward intermediate rank tensor product smoothing in mixed models
Straightforward intermediate rank tensor product smoothing in mixed models Simon N. Wood, Fabian Scheipl, Julian J. Faraway January 6, 2012 Abstract Tensor product smooths provide the natural way of representing
More informationAdditive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June
for the Austrian housing market, June 14 2012 Ao. Univ. Prof. Dr. Fachbereich Stadt- und Regionalforschung Technische Universität Wien Dr. Strategic Risk Management Bank Austria UniCredit, Wien Inhalt
More informationStat 8053, Fall 2013: Additive Models
Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An
More informationGoals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)
SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationGAMs with integrated model selection using penalized regression splines and applications to environmental modelling.
GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Simon N. Wood a, Nicole H. Augustin b a Mathematical Institute, North Haugh, St Andrews,
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationSplines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)
Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function
More informationNonparametric Regression
Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis
More informationGeneralized additive models I
I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often
More informationP-spline ANOVA-type interaction models for spatio-temporal smoothing
P-spline ANOVA-type interaction models for spatio-temporal smoothing Dae-Jin Lee and María Durbán Universidad Carlos III de Madrid Department of Statistics IWSM Utrecht 2008 D.-J. Lee and M. Durban (UC3M)
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationMore advanced use of mgcv. Simon Wood Mathematical Sciences, University of Bath, U.K.
More advanced use of mgcv Simon Wood Mathematical Sciences, University of Bath, U.K. Fine control of smoothness: gamma Suppose that we fit a model but a component is too wiggly. For GCV/AIC we can increase
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationPackage gamm4. July 25, Index 10
Version 0.2-5 Author Simon Wood, Fabian Scheipl Package gamm4 July 25, 2017 Maintainer Simon Wood Title Generalized Additive Mixed Models using 'mgcv' and 'lme4' Description
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationMachine Learning. Topic 4: Linear Regression Models
Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationCoxFlexBoost: Fitting Structured Survival Models
CoxFlexBoost: Fitting Structured Survival Models Benjamin Hofner 1 Institut für Medizininformatik, Biometrie und Epidemiologie (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg joint work with Torsten
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model
More informationImproved smoothing spline regression by combining estimates of dierent smoothness
Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004) 133 140 Improved smoothing spline regression by combining estimates of dierent smoothness Thomas C.M. Lee Department
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationChapter 5: Basis Expansion and Regularization
Chapter 5: Basis Expansion and Regularization DD3364 April 1, 2012 Introduction Main idea Moving beyond linearity Augment the vector of inputs X with additional variables. These are transformations of
More informationA review of spline function selection procedures in R
Matthias Schmid Department of Medical Biometry, Informatics and Epidemiology University of Bonn joint work with Aris Perperoglou on behalf of TG2 of the STRATOS Initiative September 1, 2016 Introduction
More informationA popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines
A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this
More informationEdge and local feature detection - 2. Importance of edge detection in computer vision
Edge and local feature detection Gradient based edge detection Edge detection by function fitting Second derivative edge detectors Edge linking and the construction of the chain graph Edge and local feature
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More informationComputational Physics PHYS 420
Computational Physics PHYS 420 Dr Richard H. Cyburt Assistant Professor of Physics My office: 402c in the Science Building My phone: (304) 384-6006 My email: rcyburt@concord.edu My webpage: www.concord.edu/rcyburt
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationLecture 26: Missing data
Lecture 26: Missing data Reading: ESL 9.6 STATS 202: Data mining and analysis December 1, 2017 1 / 10 Missing data is everywhere Survey data: nonresponse. 2 / 10 Missing data is everywhere Survey data:
More informationLinear Penalized Spline Model Estimation Using Ranked Set Sampling Technique
Linear Penalized Spline Model Estimation Using Ranked Set Sampling Technique Al Kadiri M. A. Abstract Benefits of using Ranked Set Sampling (RSS) rather than Simple Random Sampling (SRS) are indeed significant
More informationNonparametric imputation method for arxiv: v2 [stat.me] 6 Feb nonresponse in surveys
Nonparametric imputation method for arxiv:1603.05068v2 [stat.me] 6 Feb 2017 nonresponse in surveys Caren Hasler and Radu V. Craiu February 7, 2017 Abstract Many imputation methods are based on statistical
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationAlgorithms for LTS regression
Algorithms for LTS regression October 26, 2009 Outline Robust regression. LTS regression. Adding row algorithm. Branch and bound algorithm (BBA). Preordering BBA. Structured problems Generalized linear
More informationMoving Beyond Linearity
Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial
More informationAssessing the Quality of the Natural Cubic Spline Approximation
Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,
More informationSTAT 705 Introduction to generalized additive models
STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear
More informationNonparametric Survey Regression Estimation in Two-Stage Spatial Sampling
Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Siobhan Everson-Stewart, F. Jay Breidt, Jean D. Opsomer January 20, 2004 Key Words: auxiliary information, environmental surveys,
More informationNonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni
Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11
More informationSmoothing parameterselection forsmoothing splines: a simulation study
Computational Statistics & Data Analysis 42 (2003) 139 148 www.elsevier.com/locate/csda Smoothing parameterselection forsmoothing splines: a simulation study Thomas C.M. Lee Department of Statistics, Colorado
More informationarxiv: v1 [stat.me] 2 Jun 2017
Inference for Penalized Spline Regression: Improving Confidence Intervals by Reducing the Penalty arxiv:1706.00865v1 [stat.me] 2 Jun 2017 Ning Dai School of Statistics University of Minnesota daixx224@umn.edu
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationDimension Reduction Methods for Multivariate Time Series
Dimension Reduction Methods for Multivariate Time Series BigVAR Will Nicholson PhD Candidate wbnicholson.com github.com/wbnicholson/bigvar Department of Statistical Science Cornell University May 28, 2015
More informationLast time... Bias-Variance decomposition. This week
Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional
More informationAM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.
AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce
More informationGAM: The Predictive Modeling Silver Bullet
GAM: The Predictive Modeling Silver Bullet Author: Kim Larsen Introduction Imagine that you step into a room of data scientists; the dress code is casual and the scent of strong coffee is hanging in the
More informationVariable selection is intended to select the best subset of predictors. But why bother?
Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.
More informationLecture 7: Splines and Generalized Additive Models
Lecture 7: and Generalized Additive Models Computational Statistics Thierry Denœux April, 2016 Introduction Overview Introduction Simple approaches Polynomials Step functions Regression splines Natural
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationInstance-Based Learning: Nearest neighbor and kernel regression and classificiation
Instance-Based Learning: Nearest neighbor and kernel regression and classificiation Emily Fox University of Washington February 3, 2017 Simplest approach: Nearest neighbor regression 1 Fit locally to each
More informationEconomics Nonparametric Econometrics
Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models
More informationNonlinearity and Generalized Additive Models Lecture 2
University of Texas at Dallas, March 2007 Nonlinearity and Generalized Additive Models Lecture 2 Robert Andersen McMaster University http://socserv.mcmaster.ca/andersen Definition of a Smoother A smoother
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationPRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET
PRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET KLAUS MOELTNER 1. Introduction In this script we show, for a single housing market, how mis-specification of the hedonic price function can lead
More informationDetection of Smoke in Satellite Images
Detection of Smoke in Satellite Images Mark Wolters Charmaine Dean Shanghai Center for Mathematical Sciences Western University December 15, 2014 TIES 2014, Guangzhou Summary Application Smoke identification
More information1D Regression. i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is...
1D Regression i.i.d. with mean 0. Univariate Linear Regression: fit by least squares. Minimize: to get. The set of all possible functions is... 1 Non-linear problems What if the underlying function is
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationMedian and Extreme Ranked Set Sampling for penalized spline estimation
Appl Math Inf Sci 10, No 1, 43-50 (016) 43 Applied Mathematics & Information Sciences An International Journal http://dxdoiorg/1018576/amis/10014 Median and Extreme Ranked Set Sampling for penalized spline
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationDiffusion Wavelets for Natural Image Analysis
Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................
More informationBernstein-Bezier Splines on the Unit Sphere. Victoria Baramidze. Department of Mathematics. Western Illinois University
Bernstein-Bezier Splines on the Unit Sphere Victoria Baramidze Department of Mathematics Western Illinois University ABSTRACT I will introduce scattered data fitting problems on the sphere and discuss
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationNONPARAMETRIC REGRESSION TECHNIQUES
NONPARAMETRIC REGRESSION TECHNIQUES C&PE 940, 28 November 2005 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and other resources available at: http://people.ku.edu/~gbohling/cpe940
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationIntroduction to floating point arithmetic
Introduction to floating point arithmetic Matthias Petschow and Paolo Bientinesi AICES, RWTH Aachen petschow@aices.rwth-aachen.de October 24th, 2013 Aachen, Germany Matthias Petschow (AICES, RWTH Aachen)
More informationA comparison of spline methods in R for building explanatory models
A comparison of spline methods in R for building explanatory models Aris Perperoglou on behalf of TG2 STRATOS Initiative, University of Essex ISCB2017 Aris Perperoglou Spline Methods in R ISCB2017 1 /
More informationCS 450 Numerical Analysis. Chapter 7: Interpolation
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationRadial Basis Function Networks
Radial Basis Function Networks As we have seen, one of the most common types of neural network is the multi-layer perceptron It does, however, have various disadvantages, including the slow speed in learning
More informationarxiv: v2 [stat.ap] 14 Nov 2016
The Cave of Shadows Addressing the human factor with generalized additive mixed models Harald Baayen a Shravan Vasishth b Reinhold Kliegl b Douglas Bates c arxiv:1511.03120v2 [stat.ap] 14 Nov 2016 a University
More informationRecent Developments in Model-based Derivative-free Optimization
Recent Developments in Model-based Derivative-free Optimization Seppo Pulkkinen April 23, 2010 Introduction Problem definition The problem we are considering is a nonlinear optimization problem with constraints:
More informationSupport Vector Regression with ANOVA Decomposition Kernels
Support Vector Regression with ANOVA Decomposition Kernels Mark O. Stitson, Alex Gammerman, Vladimir Vapnik, Volodya Vovk, Chris Watkins, Jason Weston Technical Report CSD-TR-97- November 7, 1997!()+,
More information56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997
56:272 Integer Programming & Network Flows Final Exam -- December 16, 1997 Answer #1 and any five of the remaining six problems! possible score 1. Multiple Choice 25 2. Traveling Salesman Problem 15 3.
More informationarxiv: v1 [stat.ap] 14 Nov 2017
How to estimate time-varying Vector Autoregressive Models? A comparison of two methods. arxiv:1711.05204v1 [stat.ap] 14 Nov 2017 Jonas Haslbeck Psychological Methods Group University of Amsterdam Lourens
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationOverfitting. Machine Learning CSE546 Carlos Guestrin University of Washington. October 2, Bias-Variance Tradeoff
Overfitting Machine Learning CSE546 Carlos Guestrin University of Washington October 2, 2013 1 Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias More complex class less bias More
More informationPackage ibr. R topics documented: May 1, Version Date Title Iterative Bias Reduction
Version 2.0-3 Date 2017-04-28 Title Iterative Bias Reduction Package ibr May 1, 2017 Author Pierre-Andre Cornillon, Nicolas Hengartner, Eric Matzner-Lober Maintainer ``Pierre-Andre Cornillon''
More informationPredicting housing price
Predicting housing price Shu Niu Introduction The goal of this project is to produce a model for predicting housing prices given detailed information. The model can be useful for many purpose. From estimating
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationSection 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions
More informationA Trimmed Translation-Invariant Denoising Estimator
A Trimmed Translation-Invariant Denoising Estimator Eric Chicken and Jordan Cuevas Florida State University, Tallahassee, FL 32306 Abstract A popular wavelet method for estimating jumps in functions is
More informationMLCC 2018 Local Methods and Bias Variance Trade-Off. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Local Methods and Bias Variance Trade-Off Lorenzo Rosasco UNIGE-MIT-IIT About this class 1. Introduce a basic class of learning methods, namely local methods. 2. Discuss the fundamental concept
More informationCSE446: Linear Regression. Spring 2017
CSE446: Linear Regression Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Prediction of continuous variables Billionaire says: Wait, that s not what I meant! You say: Chill
More informationLocal spatial-predictor selection
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2013 Local spatial-predictor selection Jonathan
More information