Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

Size: px

Start display at page:

Download "Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June"

Harvey Hampton
5 years ago
Views:

for the Austrian housing market, June 14 2012 Ao. Univ. Prof. Dr.

1 for the Austrian housing market, June Ao. Univ. Prof. Dr. Fachbereich Stadt- und Regionalforschung Technische Universität Wien Dr. Strategic Risk Management Bank Austria UniCredit, Wien

2 Inhalt 1. (Dis-)advantages of different data sources 2. Data 3. A hybrid hedonic model 4. Method 5. Results Slide 2

3 (Dis-)advantages of different data sources 1. (Mortgage) data provided by banks, insurance companies etc.: + Often very detailed - Usually small numbers of observations, not representative for the total market 2. Offer prices from real estate platforms: + Many observations - Offer markup ; incomplete or biased data input, also often not representative 3. Purchase price data from the land register: + Large number of observations, representative - Small number of explanatory variables; maybe tax avoidance 4. Published indices: + Easily accessible - Usually very aggregated, often self-interest of providers, usually intransparent methodology Slide 3

4 Data for single family homes Slide 4

5 Aim of this study Combining advantages of different data sources, avoiding their disadvantages: Large datasets with few structural covariates enable us to model small-scale spatial variation with high precision Small datasets with many structural covariates: Enable us to model price differences due to attributes of the house. Furthermore, if we take these prices as dependent variable, the price prediction up to the spatial scale we choose is unbiased Approach: 1. Estimate a model for the large dataset, derive the spatial prediction 2. Estimate a model for the small dataset, use the spatial prediction from (1) as explanatory covariate. Additional covariates and unexplained spatial heterogeneity correct for relative differences between the spatial predictions Compare results to a model without this newly constructed index. Slide 5

6 A Hybrid Hedonic Model 1.: Index model 2.: Hybrid Model ln( p purch ) f ( structural )... f ( external _ f ( spatial ) price _ index ) ln( p ) purch f ( neighborhood) Neighborhood vars., external price info Neighborhood covariates: -Share of academics, Overnight stays -etc. External price index ln( p) f ( structural )... f ( purchase _ price _ index ) f ( neighborhood) f ( external f ( spatial ) ln( p) _ price _ index ) Slide 6

7 Method (1) We use Structured Additive Regression (STAR) models as developed in Fahrmeir et al. (2004): Distributional and structural assumptions, given covariates and parameters, are based on Generalized Linear Models. E( y i zi, xi) h( i) with structured-additive predictor x γ is the usual parametric part of the predictor z j is a continuous covariate, time scale, location or unit-or cluster index f are one-/two (even three) dimensional, not necessarily continuous j functions i f ( z 1 )... f ( z ) x γ 1 i q iq i Slide 7

8 Method (2) Let yi f ( z i ) i, where the nonlinear effect of the continuous covariate z is assumed to be a polynomial spline of degree l. Polynomial splines are piecewise polynomials with additional regularity conditions Define by z min o 1... r 1 r zmax a partition of the range of z into r non overlapping intervals. A spline f can then be written in terms of a linear combination of d r l basis functions (z) Problem: Choice of the number and position of knots Idea: Define a large number of knots and add a penalty on the regression coefficients B k Slide 8

9 Method (3) Bayesian: Gaussian prior der Form rk( K d )/ 2 ( p β ) 2 1 exp 2 β K d β. Frequentist: Penalization term β K d β is the penalty matrix of order d K d For P-Splines, the penalty is given by K d ( k ) kd 1 2 β D d D d β β K d β Slide 9

10 Method (4) Modeling spatial heterogeneity Spatial heterogeneity is modeled explicitly using neighborhood covariates. However, a certain amount of unexplained spatial heterogeneity remains, modeled by (possibly hierarchical, see below) random effects This way, for every spatial unit a parameter is estimated, following a common distribution for the specific level The fewer observations are in one unit, the more they tend to 0 (or in a hierarchical model to the unit they are nested in). Therefore, lack of information is penalized. Slide 10

11 Method (5) Hierarchical modeling of spatial effects In a multilevel STAR model, the regression coefficients of the spatial effects themselves obey a regression model with structured additive predictor, which turns the model into a hierarchy of structured additive regression models. Thus, spatial variation is disaggregated on different hierarchical levels. In our case, there are effects on four levels: individual/census tract (1), municipal (2), district (3), state (4) level level level level 1: ln( p) 2 : f 3: f 4 : f spat spat 1 2 spat 3 ( s 1 ( s ( s 2 3 f ) ) ) 1,1 ( area)... f f 2,1 3,1 0 ( pp)... ( district 4 f 1, q f 1 ( census _ tract _ index ) x γ 2, q 1 _ index ) ( access ) f spat 3 ( s f 3 spat 2 ) ( s 3 2 ) 2 f spat 1 ( s 1 ) 1 Slide 11

12 Results (1): Plot area Slide 12

13 Results (2): Overnight stays Slide 13

14 Results (3): External price index Slide 14

15 Results (4): Purchase price index in the hybrid model Slide 15

16 Results (5): Total spatial heterogeneity, evaluated at mean attributes Slide 16

17 Thanks for your attention Slide 17

18 Results (6): Decomposition of unexplained spatial heterogeneity Slide 18

19 Results (7): Fixed Effects in the hybrid model (1) Slide 19

20 Results (8): Fixed Effects in the hybrid model (2) Slide 20

Lecture 22 The Generalized Lasso

Lecture 22 The Generalized Lasso 07 December 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Class Notes Midterm II - Due today Problem Set 7 - Available now, please hand in by the 16th Motivation Today