Generalized additive models II

Size: px
Start display at page:

Download "Generalized additive models II"

Transcription

1 Generalized additive models II Patrick Breheny October 13 Patrick Breheny BST 764: Applied Statistical Modeling 1/23

2 Coronary heart disease study Today s lecture will feature several case studies involving the application of smoothing splines and GAMs to real data sets Our first study looks at many potential risk factors for coronary heart disease (CHD): sbp: Systolic blood pressure tobacco: Cumulative lifetime tobacco consumption (kg) ldl: Low-density lipoprotein adiposity: % of weight from fat obesity: Body mass index famhist: Family history of CHD typea: A measure of stress alcohol: Current alcohol consumption age Patrick Breheny BST 764: Applied Statistical Modeling 2/23

3 CHD study: Plan of analysis There are many ways one could analyze this data, but since we are interested in all the potential risk factors, one reasonable way to proceed is to model each of the continuous factors using smoothing splines, with the degrees of freedom for each smooth estimated by GCV Here there is one categorical factor, family history, which is highly significant (p < ) with an odds ratio of 2.6 (1.6, 4.1) Patrick Breheny BST 764: Applied Statistical Modeling 3/23

4 CHD study: Smoothing term tests The remaining 8 predictors are continuous; cumulatively they add degrees of freedom to the model We are interested in carrying out likelihood ratio tests for these terms Because there are so many smooth terms, it is desirable to fix the smoothing parameters {λ j } at their full-model values; otherwise the hypothesis tests will become muddled by the fact that the reduced model differs not only in terms of its components, but also how flexible each term is allowed to be Patrick Breheny BST 764: Applied Statistical Modeling 4/23

5 CHD GAM Case study #1 Overall effect: p = 0.21 Linearity: p = 0.12 Overall effect: p = < Linearity: p = 0.02 s(sbp,1.24) s(tobacco,5.87) sbp tobacco Overall effect: p = < 0.01 Overall effect: p = 0.02 Linearity: p = 0.03 s(ldl,1) s(obesity,2.2) ldl obesity Patrick Breheny BST 764: Applied Statistical Modeling 5/23

6 CHD GAM (cont d) Overall effect: p = 0.20 Overall effect: p = < 0.01 Linearity: p = 0.05 s(adiposity,1) s(typea,3.33) adiposity typea Overall effect: p = 0.90 Overall effect: p = < 0.01 Linearity: p = 0.15 s(alcohol,1) s(age,3.39) alcohol age Patrick Breheny BST 764: Applied Statistical Modeling 6/23

7 Asthma CAFO study As another example, I collaborated with Dr. Sanderson on a project involving exposure to air pollution from concentrated animal feeding operations (CAFOs) and its effect on developing asthma in children The exposure metric was calculated based on the proximity of the home to a CAFO (or CAFOs), the size of those CAFOs, and adjusted for wind patterns to develop an overall CAFO exposure score Based on previous studies of asthma, we also adjusted for a number of risk factors: age, gender, socioeconomic status, allergies, premature birth, early respiratory infection, and whether the individual worked with pigs Patrick Breheny BST 764: Applied Statistical Modeling 7/23

8 CAFO study: Plan of analysis We could once again model each continuous term using smoothing splines, although since the focus of the study was on a single risk factor, I modeled CAFO exposure nonparametrically and included the others as simple linear terms The estimation of how CAFO exposure affects asthma risk was the purpose of the study; an interesting additional question is whether or not any interactions are present involving this exposure In the course of the study, we found some evidence for an interaction with early respiratory illness (the test for interaction had a p-value of 0.06) Patrick Breheny BST 764: Applied Statistical Modeling 8/23

9 CAFO study: Results log(cafo) Log odds (Asthma) Patrick Breheny BST 764: Applied Statistical Modeling 9/23

10 CAFO study: Results (cont d) Deviance Model explained Base 28.8% Linear 31.1% Spline 33.0% p-value for overall effect of CAFO:.001 p-value for nonlinear component of CAFO:.03 Patrick Breheny BST 764: Applied Statistical Modeling 10/23

11 CAFO study: Possible interaction No early respiratory infection log(cafo) Effect on log(odds) Early respiratory infection log(cafo) Effect on log(odds) Patrick Breheny BST 764: Applied Statistical Modeling 11/23

12 GAMMs Case study #1 I am glossing over a complication present in the CAFO study: the observations were not all independent Specifically, sampling was done at the household/family level and one would expect some correlation among subjects within a household Possibilities for dealing with this are to include a random intercept (a Generalized Additive Mixed Model, or GAMM), or to use generalized estimating equations Patrick Breheny BST 764: Applied Statistical Modeling 12/23

13 Case study #3 GAMs are wonderful tools, but they do rely on that assumption of additivity; what if effects are not additive? To put it another way, suppose we have x 1 and x 2, both continuous variables; can we estimate E(y x) = f(x 1, x 2 ) in a completely nonparametric way, without making any assumptions on f? Patrick Breheny BST 764: Applied Statistical Modeling 13/23

14 Tensor product splines Case study #3 One approach to constructing multidimensional splines is using the tensor product basis Suppose we specify a set of basis functions {h 1k } for x 1 and {h 2k } for x 2, with M 1 and M 2 elements, respectively The tensor product basis for the two-dimensional smooth function of x 1 and x 2 is given by and has M 1 M 2 elements g jk (x 1, x 2 ) = h 1j (x 1 )h 2k (x 2 ) Patrick Breheny BST 764: Applied Statistical Modeling 14/23

15 Thin plate splines Case study #1 Case study #3 The multidimensional analog of smoothing splines are called thin plate regression splines For two dimensions, we find f(x 1, x 2 ) that minimizes n l{y i, f(x 1i, x 2i )}+λ i=1 [ 2 ] 2 [ f 2 ] 2 f u u v [ 2 f v 2 Thin plate regression splines have fairly complicated basis functions ] 2 dudv Patrick Breheny BST 764: Applied Statistical Modeling 15/23

16 2D smoothing Case study #1 Case study #3 Thin plate spline ldl age Patrick Breheny BST 764: Applied Statistical Modeling 16/23

17 Case study #3 Restrictions imposed by various models GLM GLM w/ interaction ldl ldl age age GAM Thin plate ldl ldl age age Patrick Breheny BST 764: Applied Statistical Modeling 17/23

18 The curse of dimensionality Case study #3 Thin plate regression splines can be extended further into higher dimensions, but they become rather computationally intensive as the dimension exceeds 2 Also, the curse of dimensionality implies that we need an exponentially increasing amount of data to maintain accuracy as p increases Furthermore, multidimensional smooth functions are harder to visualize and interpret In other words, it is nice to have the option of including interactions, but one generally cannot stray too far from the structure provided by the additive model Patrick Breheny BST 764: Applied Statistical Modeling 18/23

19 WISEWOMAN Case study #1 Case study #3 Our final case study involves WISEWOMAN, a project initiated by the CDC offering early screenings of risk factors for heart disease, diabetes, and other chronic diseases in financially disadvantaged women In Iowa, the project contained an intervention component designed to effect lifestyle changes in at-risk individuals Patrick Breheny BST 764: Applied Statistical Modeling 19/23

20 Attendance Case study #1 Case study #3 However, no intervention can be successful if the individuals targeted do not attend In order to design better interventions, it is of interest to study factors associated with attendance vs. nonattendance In particular, investigators are interested in the role that travel distance plays in attendance Patrick Breheny BST 764: Applied Statistical Modeling 20/23

21 WISEWOMAN study: Analysis Case study #3 As with the CAFO study, the emphasis was on a particular term, so distance was modeled using a smoothing spline while other factors affecting attendance were controlled for using linear terms In this study, we found significant evidence for an interaction between travel distance and age Furthermore, we found evidence of a three-way interaction, in that the above two-dimensional smooth function differed in urban and rural settings Patrick Breheny BST 764: Applied Statistical Modeling 21/23

22 WISEWOMAN: Rural results Case study #3 Probability(Attend) 0.9 Probability(Attend) Distance Age Age Distance Patrick Breheny BST 764: Applied Statistical Modeling 22/23

23 Case study #3 WISEWOMAN: Urban vs. rural (cross-sections) Patrick Breheny BST 764: Applied Statistical Modeling 23/23

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

Patrick Breheny. November 10

Patrick Breheny. November 10 Patrick Breheny November Patrick Breheny BST 764: Applied Statistical Modeling /6 Introduction Our discussion of tree-based methods in the previous section assumed that the outcome was continuous Tree-based

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 2 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification - selection of tuning parameters

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2

Statistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

Mission Lipid Data Management Software User s Guide

Mission Lipid Data Management Software User s Guide Mission Lipid Data Management Software User s Guide V1.0 September 2018 Table of Contents 1. Overview...1 1.1 About the Mission Lipid Data Management Software...1 1.2 System Requirements...1 1.3 Materials

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Vine Medical Group Patient Registration Form Your Information

Vine Medical Group Patient Registration Form Your Information Your Information Welcome to Vine Medical Group. In order for us to offer you the high standards of clinical care we give to our patients, we ask that you complete this registration form. Before we are

More information

Road Map for CAT4 Suite. CAT4 Road Map. Road Map for CAT4 Suite

Road Map for CAT4 Suite. CAT4 Road Map. Road Map for CAT4 Suite Road Map for CAT4 Suite CAT4 Road Map Road Map for CAT4 Suite The Pen CS Clinical Audit Tool (CAT Plus Suite) is an extraction tool that takes a snapshot of your clinical data and allows you to analyse

More information

CH9.Generalized Additive Model

CH9.Generalized Additive Model CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing

More information

Using the SemiPar Package

Using the SemiPar Package Using the SemiPar Package NICHOLAS J. SALKOWSKI Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA salk0008@umn.edu May 15, 2008 1 Introduction The

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

IPUMS Training and Development: Requesting Data

IPUMS Training and Development: Requesting Data IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore

More information

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES APPLICATION OF CLASSIFICATION TECHNIQUES TO DETECT HYPERTENSIVE HEART DISEASE Tulasimala B. N* 1, Elakkiya S 2 & Keerthana N 3 *1 Assistant Professor,

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials.

Curve fitting. Lab. Formulation. Truncation Error Round-off. Measurement. Good data. Not as good data. Least squares polynomials. Formulating models We can use information from data to formulate mathematical models These models rely on assumptions about the data or data not collected Different assumptions will lead to different models.

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

A Feasibility and Acceptability Study of the Provision

A Feasibility and Acceptability Study of the Provision A Feasibility and Acceptability Study of the Provision of mhealth Interventions for Behavior Change in Prehypertensive subjects in Argentina, Guatemala, and Peru. Med e Tel 2012 Beratarrechea A 1, Fernandez

More information

Statistical Methods for the Analysis of Repeated Measurements

Statistical Methods for the Analysis of Repeated Measurements Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated

More information

Learning-based Neuroimage Registration

Learning-based Neuroimage Registration Learning-based Neuroimage Registration Leonid Teverovskiy and Yanxi Liu 1 October 2004 CMU-CALD-04-108, CMU-RI-TR-04-59 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Multiple imputation using chained equations: Issues and guidance for practice

Multiple imputation using chained equations: Issues and guidance for practice Multiple imputation using chained equations: Issues and guidance for practice Ian R. White, Patrick Royston and Angela M. Wood http://onlinelibrary.wiley.com/doi/10.1002/sim.4067/full By Gabrielle Simoneau

More information

Heart Disease Prediction on Continuous Time Series Data with Entropy Feature Selection and DWT Processing

Heart Disease Prediction on Continuous Time Series Data with Entropy Feature Selection and DWT Processing Heart Disease Prediction on Continuous Time Series Data with Entropy Feature Selection and DWT Processing 1 Veena N, 2 Dr.Anitha N 1 Research Scholar(VTU), Information Science and Engineering, B M S Institute

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

EYECARE REGISTRATION AND HISTORY

EYECARE REGISTRATION AND HISTORY EYECARE REGISTRATION AND HISTORY PATIENT INFORMATION INSURANCE Date Who is responsible for this account? Patient Relationship to Patient Address Insurance Co. Group # City State Zip Is patient covered

More information

Section 2.2 Normal Distributions. Normal Distributions

Section 2.2 Normal Distributions. Normal Distributions Section 2.2 Normal Distributions Normal Distributions One particularly important class of density curves are the Normal curves, which describe Normal distributions. All Normal curves are symmetric, single-peaked,

More information

2017 Partners in Excellence Executive Overview, Targets, and Methodology

2017 Partners in Excellence Executive Overview, Targets, and Methodology 2017 Partners in Excellence Executive Overview, s, and Methodology Overview The Partners in Excellence program forms the basis for HealthPartners financial and public recognition for medical or specialty

More information

Statistical Tests for Variable Discrimination

Statistical Tests for Variable Discrimination Statistical Tests for Variable Discrimination University of Trento - FBK 26 February, 2015 (UNITN-FBK) Statistical Tests for Variable Discrimination 26 February, 2015 1 / 31 General statistics Descriptional:

More information

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs, GAMMs and other penalized GLMs using mgcv in R. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs, GAMMs and other penalied GLMs using mgcv in R Simon Wood Mathematical Sciences, University of Bath, U.K. Simple eample Consider a very simple dataset relating the timber volume of cherry trees to

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

Fitting latency models using B-splines in EPICURE for DOS

Fitting latency models using B-splines in EPICURE for DOS Fitting latency models using B-splines in EPICURE for DOS Michael Hauptmann, Jay Lubin January 11, 2007 1 Introduction Disease latency refers to the interval between an increment of exposure and a subsequent

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

davidr Cornell University

davidr Cornell University 1 NONPARAMETRIC RANDOM EFFECTS MODELS AND LIKELIHOOD RATIO TESTS Oct 11, 2002 David Ruppert Cornell University www.orie.cornell.edu/ davidr (These transparencies and preprints available link to Recent

More information

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can: IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical

More information

IBM SPSS Categories 23

IBM SPSS Categories 23 IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification

More information

User Guide Chronic Disease Management Tariffs e-form

User Guide Chronic Disease Management Tariffs e-form User Guide Chronic Disease Management Tariffs e-form Page 1 of 15 Table of Contents Computer Requirements... 3 Accessing the e-form... 3 Are eforms Secure?... 4 Completing the Chronic Disease Management

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information

PATIENT INFORMATION SPOUSE INFORMATION REFERRAL INFORMATION INSURANCE INFORMATION IN CASE OF EMERGENCY

PATIENT INFORMATION SPOUSE INFORMATION REFERRAL INFORMATION INSURANCE INFORMATION IN CASE OF EMERGENCY Today s date: PATIENT INFORMATION Patient s Last name: First: Middle: Physician Name: Mr. Sex: Marital status (circle one) Single / Mar / Div / Sep / Wid Mailing address: City: State: ZIP Code: D.O.B:

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

Applied Statistics : Practical 9

Applied Statistics : Practical 9 Applied Statistics : Practical 9 This practical explores nonparametric regression and shows how to fit a simple additive model. The first item introduces the necessary R commands for nonparametric regression

More information

4.5 The smoothed bootstrap

4.5 The smoothed bootstrap 4.5. THE SMOOTHED BOOTSTRAP 47 F X i X Figure 4.1: Smoothing the empirical distribution function. 4.5 The smoothed bootstrap In the simple nonparametric bootstrap we have assumed that the empirical distribution

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Perception Viewed as an Inverse Problem

Perception Viewed as an Inverse Problem Perception Viewed as an Inverse Problem Fechnerian Causal Chain of Events - an evaluation Fechner s study of outer psychophysics assumes that the percept is a result of a causal chain of events: Distal

More information

2018 Partners in Excellence Executive Overview, Targets, and Methodology

2018 Partners in Excellence Executive Overview, Targets, and Methodology 2018 Partners in Excellence Executive Overview, s, and Methodology Overview The Partners in Excellence program forms the basis for HealthPartners financial and public recognition for medical or specialty

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

Hierarchical Generalized Linear Models

Hierarchical Generalized Linear Models Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel

More information

2015 Partners in Excellence Executive Overview, Targets, and Methodology

2015 Partners in Excellence Executive Overview, Targets, and Methodology 2015 Partners in Excellence Executive Overview, s, and Methodology Overview The Partners in Excellence program forms the basis for HealthPartners financial and public recognition for medical or specialty

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

1. Study Registration. 2. Confirm Registration

1. Study Registration. 2. Confirm Registration USER MANUAL 1. Study Registration Diabetic patients are more susceptible to experiencing cardiovascular events, but this can be minimized with control of blood glucose levels and other risk factors (blood

More information

Log-linear Models of Contingency Tables: Multidimensional Tables

Log-linear Models of Contingency Tables: Multidimensional Tables CSSS/SOC/STAT 536: Logistic Regression and Log-linear Models Log-linear Models of Contingency Tables: Multidimensional Tables Christopher Adolph University of Washington, Seattle March 10, 2005 Assistant

More information

Machine Learning. Topic 4: Linear Regression Models

Machine Learning. Topic 4: Linear Regression Models Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3

INTRODUCTION to SAS STATISTICAL PACKAGE LAB 3 Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you

More information

Package glmpath. R topics documented: January 28, Version 0.98 Date

Package glmpath. R topics documented: January 28, Version 0.98 Date Version 0.98 Date 2018-01-27 Package glmpath January 28, 2018 Title L1 Regularization Path for Generalized Linear Models and Cox Proportional Hazards Model Author Mee Young Park, Trevor Hastie Maintainer

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

Chapter 19 Estimating the dose-response function

Chapter 19 Estimating the dose-response function Chapter 19 Estimating the dose-response function Dose-response Students and former students often tell me that they found evidence for dose-response. I know what they mean because that's the phrase I have

More information

* * * * * * * * * * * * * * * ** * **

* * * * * * * * * * * * * * * ** * ** Generalized additive models Trevor Hastie and Robert Tibshirani y 1 Introduction In the statistical analysis of clinical trials and observational studies, the identication and adjustment for prognostic

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation Hoff Chapter 7, GH Chapter 25 April 21, 2017 Bednets and Malaria Y:presence or absence of parasites in a blood smear AGE: age of child BEDNET: bed net use (exposure) GREEN:greenness

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management

Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management Exploring Persuasiveness of Just-in-time Motivational Messages for Obesity Management Megha Maheshwari 1, Samir Chatterjee 1, David Drew 2 1 Network Convergence Lab, Claremont Graduate University http://ncl.cgu.edu

More information

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K.

A toolbox of smooths. Simon Wood Mathematical Sciences, University of Bath, U.K. A toolbo of smooths Simon Wood Mathematical Sciences, University of Bath, U.K. Smooths for semi-parametric GLMs To build adequate semi-parametric GLMs requires that we use functions with appropriate properties.

More information

Week 2: Frequency distributions

Week 2: Frequency distributions Types of data Health Sciences M.Sc. Programme Applied Biostatistics Week 2: distributions Data can be summarised to help to reveal information they contain. We do this by calculating numbers from the data

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

Machine Learning with MATLAB --classification

Machine Learning with MATLAB --classification Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

CPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova

CPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova CPSC 695 Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova Overview Data sampling for continuous surfaces Interpolation methods Global interpolation Local interpolation

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Introduction to RStudio

Introduction to RStudio First, take class through processes of: Signing in Changing password: Tools -> Shell, then use passwd command Installing packages Check that at least these are installed: MASS, ISLR, car, class, boot,

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Unified Methods for Censored Longitudinal Data and Causality

Unified Methods for Censored Longitudinal Data and Causality Mark J. van der Laan James M. Robins Unified Methods for Censored Longitudinal Data and Causality Springer Preface v Notation 1 1 Introduction 8 1.1 Motivation, Bibliographic History, and an Overview of

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

mhealth Applications in CVD Prevention and Treatment Intersection of mhealth and CVD Physical Activity 2/18/2015

mhealth Applications in CVD Prevention and Treatment Intersection of mhealth and CVD Physical Activity 2/18/2015 mhealth Applications in CVD Prevention and Treatment Theodore Feldman, MD, FACC, FACP Medical Director, Center for Prevention and Wellness at Baptist Health South Florida Medical Director, Miami Cardiac

More information

1. Two double lectures about deformable contours. 4. The transparencies define the exam requirements. 1. Matlab demonstration

1. Two double lectures about deformable contours. 4. The transparencies define the exam requirements. 1. Matlab demonstration Practical information INF 5300 Deformable contours, I An introduction 1. Two double lectures about deformable contours. 2. The lectures are based on articles, references will be given during the course.

More information

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset Simulation studies Patrick Breheny September 8 Patrick Breheny BST 764: Applied Statistical Modeling 1/17 Introduction In statistics, we are often interested in properties of various estimation and model

More information

mhealth Services to Link Health and Environment

mhealth Services to Link Health and Environment mhealth Services to Link Health and Environment A/Prof. Vijay Sivaraman School of Electrical Eng. & Telecoms. University of New South Wales http://www2.ee.unsw.edu.au/~vijay/ 1 Overview I am a technologist:

More information

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology Statistical Learning Part 2 Nonparametric Learning: The Main Ideas R. Moeller Hamburg University of Technology Instance-Based Learning So far we saw statistical learning as parameter learning, i.e., given

More information