Section 4 Matching Estimator

Size: px
Start display at page:

Download "Section 4 Matching Estimator"

Transcription

1 Section 4 Matching Estimator

2 Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis of similarity in observed characteristics. Main advantage of matching estimators: they typically do not require specifying a functional form of the outcome equation and are therefore not susceptible to misspecification bias along that dimension.

3 Assumptions of Matching Approach Assume you have access to data on treated and untreated individuals (D=1 and D=0) Assume you also have access to a set of Z variables whose distribution is not affected by D: F(Z D,Y1,Y0)=F(Z Y1,Y0) (will be explained in a few slides why necessary)

4 Assumptions of Matching Approach 1. Selection on Observables (Unconfoundedness Assump.) There exists a set of observed characteristics Z such that outcomes are independent of program participation conditional on Z (i.e. treatment assignment is strictly ignorable given Z (Rosenbaum/Rubin (1983)). 2. Common Support Assumption Assumption 2 is required, so that matches for D=0 and D=1 observations can be found

5 Implication of Assumptions If Assumptions 1 and 2 are satisfied, then the problem of determining mean program impact can be solved by substituting the Y0 distribution observed for matched-on-z non-participants for the missing Y0 distribution of participants. To justify assumption 1, individuals cannot select into the program based on anticipated treatment impact Assumption 1 implies: Under these assumptions, one can estimate the ATE, TTE and UTE

6 Weaker assumptions for TTE If interest centers on TTE, assumptions 1 and 2 can be slightly relaxed: 1. The following weaker conditional mean independence assumption on Y0 suffices: 2. Only the following support condition is necessary (>0 is not required, because this is only needed to guarantee a participant analogue for each non-participant) The weaker assumptions for the TTE allow selection into the program to depend on Y1, but not on Y0.

7 Estimation of the TTE using the Matching Approach Under these assumptions, the mean impact of the program on program participants can be written as: (using the Law of Iterated Expectations and the assumptions stated before) Here we can illustrate why the assumption is needed, that the distribution of the matching variables, Z, is not affected by whether the treatment is received.

8 Assumption about the distribution of matching variables Z Assumption: The distribution of the matching variables, Z, is not affected by whether the treatment is received (see slide 3). In the derivation of treatment effects, e.g. of the TTE (see slide before), we make use of this assumption as follows: This expression uses the conditional density to represent the density that would also have been observed in the no treatment (D=0) state, which rules out the possibility that receipt of treatment changes the density of Z. Examples: age, gender and race would generally be valid matching variables, but marital status may not be if it were directly affected by the receipt of the program.

9 Matching Estimator A prototypical matching estimator for the TTE takes the form (n1 is the number of observations in the treatment group): where is an estimator for the matched no treatment outcome Recall that Assumption 1 implies:

10 How does matching compare to a randomized experiment? Distribution of observables of the matched controls will be the same in the treatment group However, distribution of unobservables not necessarily balanced across groups Experiment has full support, but with matching there can be a failure of the common support condition (assump 2) if there are regions where the support of Z does not overlap for the D=0 and D=1 groups, then matching is only justified when performed over the region of common support, i.e. the estimated treatment effect must be defined conditionally on the region of overlap

11 Implementing Matching Estimators Problems: How to construct a match when Z is of high dimension What to do if P(D=1 Z)=1 for some Z (violation of common support assumption (A2) How to choose set of Z variables

12 Propensity Score Matching Matching estimators difficult to implement when set of conditioning variables Z is large (small cell problems) or Z continuous ( curse of dimensionality ) Rosenbaum and Rubin theorem (1983): Show that implies Reduces the matching problem to a univariate problem, provided P(D=1 Z) (the propensity score ) can be parametrically estimated

13 Proof of Rosenbaum/Rubin Theorem Show that E(D Y,Z)=E(D Z) implies E{D Y,P(Z)}= E{D P(Z)} Let P(Z)=P(D=1 Z) and note that P(D=1 Z)=E(D Z) E{D Y,P(Z)}= E{ E(D Y,Z) Y, P(Z)} [Law of Iterated Expectations] = E{ E(D Z) Y, P(Z)} [assumption 1 of matching est.] = E{ P(Z) Y, P(Z)} = P(Z) = E{ D P(Z)}

14 Implementation of the Propensity Score Matching Estimator Step 1: Estimate a model of program participation, i.e. estimate the propensity score P(Z) for each person Step 2: Select matches based on the estimated propensity score (n1 is the number of observations in the treatment group)

15 Propensity Score Matching Methods For notational simplicity, let P=P(Z) A prototypical propensity score matching estimator for the TTE takes the form: with where denotes the set of program participants, the set of nonparticipants, the region of common support (defined on next slide), and is the number of persons in the set The match for each participant is constructed as a weighted average over the outcomes of non-participants, where the weights depend on the distance between

16 Implementing Matching Estimators Problems: How to construct a match when Z is of high dimension What to do if P(D=1 Z)=1 for some Z (violation of common support assumption (A2) How to choose set of Z variables

17 Common Support Condition The common support region can be estimated by where are standard nonparametric density estimators. To ensure that the densities are strictly greater than zero, it is required that the densities are strictly positive (i.e. exceed zero by a certain amount), determined using a trimming level q. The common support condition ensures that matches for D=1 and D=0 can be found.

18 Cross-sectional matching methods: Alternative ways of constructing matched outcomes Define a neighborhood for each i in the participant sample. Neighbors for i are non-participants for whom The persons matched to i are those people in set where Alternative matching estimators that differ in how neighborhood is defined and in how the weights are constructed 1. Nearest Neighbor Matching 2. Stratification or Interval Matching 3. Kernel and Local Linear Matching

19 Cross-sectional Method 1: Nearest Neighbor Matching Traditional, pairwise matching, also called nearest-neighbor matching, sets That is the non-participant with the value of Pj that is closest to Pi is selected as the match and Ai is a singleton set. The estimator can be implemented either matching with or without replacement With replacement: same comparison group observation can be used repeatedly as a match Drawback of matching without replacement: final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected

20 Cross-sectional Method 1: Nearest Neighbor Matching Variation of nearest-neighbor matching: Caliper matching (Cochrane and Rubin (1973)) Attempts to avoid bad matches (those for which Pj is far from Pi) by imposing a tolerance on the maximum distance allowed, i.e. a match for person i is selected only if where is a prespecified tolerance. Treated persons for whom no matches can be found within the caliper are excluded from the analysis (one way of imposing the common support condition) Drawback of caliper matching: it is difficult to know a priori what choice for the tolerance level is reasonable.

21 Cross-sectional Method 2: Stratification or Interval Matching Method: 1. In this variant of matching, the common support of P is partitioned into a set of intervals. 2. Average treatment impacts are calculated through simple averaging within each interval. 3. Overall average impact estimate: a weighted average of the interval impact estimates, using the fraction of the D=1 population in each interval for the weights. Requires decision on how wide the intervals should be: Dehejia and Wahba (1999) use intervals that are selected such that the mean values of the estimated Pi s and Pj s are not statistically different from each other within intervals.

22 Kernel Method: Cross-sectional Method 3: Kernel and Local Linear Matching Uses a weighted average of all observations within the common support region: the farther away the comparison unit is from the treated unit the lower the weight. Local linear matching: Similar to the kernel estimator but includes a linear term in the weighting function, which helps to avoid bias.

23 Kernel and Local Linear Matching A kernel estimator for is given by with weights K is a kernel function and h is a bandwidth (or smoothing parameter) discussion about choice of kernel function and bandwidth, see later

24 Intro to Nonparametric Estimation Reference: Angus Deaton The Analysis of Household Surveys TO READ ch. 3.2 Nonparametric methods for estimating density functions (p ), Nonparametric regression analysis (p ) Kernel density estimation Kernel regression Choice of kernel and choice of bandwidth (trade-off between bias and variance) Local linear estimation: when and why better?

25 Estimating Univariate Densities: Histograms versus Kernel Estimators Application: when visual impression of the position and spread of the data is needed (important for example for evaluating the distribution of welfare and effects of policies on whole distribution) Histograms have the following disadvantages: Degree of arbitrariness that comes from the choice of the number of bins and of their width Problem when trying to represent continuously differentiable densities of variables that are inherently continuous histogram can obscure the genuine shape of the empirical distribution and unsuited to provide info about the derivates of density functions Alternatives: fit a parametric density to the data or nonparametric techniques (allow a more direct inspection of the data)

26 Nonparametric density estimation Idea: get away from bins of the histogram by estimating the density at every point along the x-axes. Problem: with a finite sample, there will only be empirical mass at a finite number of points. Solution: use mass at nearby points as well as the point itself. Illustration: think of sliding a band (or window) along the x-axis, calculate the fraction of the sample per unit interval within it and plot the result as an estimate of the density at the mid-point of the band Naïve estimator: but there will be steps in f(x) each time a data point enters or exits the band

27 Nonparametric density estimation Naïve estimator: but there will be steps in f(x) each time a data point enters or exits the band Modification: instead of giving all the points inside the band equal weight, give more weight to those near to x and less to those far away, so that points have a weight of zero both just outside and just inside the band replace the indicator function by a kernel function K(.)

28 Choice of Kernel and Bandwidth Choice of kernel K(.): 1. Because it is a weighting function, it should be positive and integrate to unity over the band. 2. It should be symmetric around zero, so that points below x get the same weight as those an equal distance above. 3. It should be decreasing in the absolute value of its argument. Alternative kernel functions: Epanechnikov kernel, Gaussian kernel (normal density, giving some weight to all observations) biweight kernel Choice of kernel will influence shape of the estimated density (especially when there are few points), but choice is not a critical one

29 Choice of Kernel and Bandwidth Choice of bandwidth: Results often very sensitive to choice of bandwidth. Estimating densities by kernel methods is an exercise in smoothing the raw observations into an estimated density and the bandwidth controls how much smoothing is done. Bandwidth controls trade-off between bias and variance: A large bandwidth will provide a smooth and not very variable estimate, but risks bias by bringing in observations from other parts of the density. A small bandwidth helps to pick up genuine features of the underlying density, but risks producing an unnecessarily variable plot. Oversmoothed estimates are biased and undersmoothed estimates are too variable.

30 Choice of Kernel and Bandwidth Choice of bandwidth (ctnd): Consistency of the nonparametric estimator requires that the bandwidth shrinks to zero as the sample size gets large, but not at too fast a rate (can be made formal). In practice: Consider a number of different bandwidths, plot the associated density estimated and examine the sensitivity of the estimates with respect to bandwidth choice. Formal theory of the trade-off: In standard parametric inference optimal estimation is based on minimizing the mean-squared error between the estimated and true parameters. In the nonparametric case, we estimate a function not a parameter and the there will be a mean-sq error at each point on the estimated density attempt to minimize the mean integrated squared error This way an optimal bandwidth can be estimated (after kernel is chosen).

31 Nonparametric Regression Analysis Conditional expectation of y conditional on x: Links between a conditional expectation and the underlying distributions: Intuitively: calculate the average of all y-values corresponding to each x or vector of x not feasible with finite samples and continuous x, same problem as in density estimation so adopt same solution: average over points near x Kernel regression estimator:

32 Kernel and Local Linear Matching A kernel estimator for is given by with weights K is a kernel function and h is a bandwidth (or smoothing parameter) discussion about choice of kernel function and bandwidth, see later

33 Nonparametric Regression Analysis Important: it is not possible to calculate a conditional expectation for values of x where the density is zero in practice, problems whenever the estimated density is small or zero (will make the regression function imprecise) Main strength of nonparametric over parametric regression: assumes no functional form for the relationship, allowing the data to choose, not only parameter estimates, but the shape of the curve itself Weaknesses: price of the flexibility is the much greater data requirements to implementing nonparametric methods and the difficulties of handling high-dimensional problems (alternatives: polynomial regressions and semiparametric estimation) Nonparametric methods lack the menu of options that is available for parametric methods when dealing with simultaneity, measurement error, selectivity and so forth

34 Locally Linear Regression Read Angus Deaton The Analysis of Household Surveys (p ) important: will be used again later on

35 Difference-in-Difference Matching Estimators Assumption of cross-sectional matching estimators: After conditioning on a set of observable characteristics, outcomes are conditionally mean independent of program participation. BUT: there may be systematic differences between participant and nonparticipant outcomes that could lead to a violation of the identification conditions required for matching e.g. due to program selectivity on unmeasured characteristics Solution in the case of temporally invariant differences in outcomes between participants and nonparticipants: difference-in-difference matching strategy (see Heckman, Ichimura and Todd (1997))

36 Cross-sectional versus Diff-in-Diff Matching Estimators A) Cross-sectional Matching Estimator This estimator assumes: Under these conditions, can be estimated by where n1 are the number of treated individuals for which CS2 is satisfied.

37 Cross-sectional versus Diff-in-Diff Matching Estimators B) Difference-in-Difference Matching Estimator This estimator requires repeated cross-section or panel data. Let t and t be the two time periods, one before the program start date and one after. Conditions needed to justify the application of the estimator are: Under these conditions, can be estimated by

38 Assessing the Variability of Matching Estimators Distribution theory for cross-sectional and DID kernel and local linear matching estimators: see Heckman, Ichimura and Todd (1998) But implementing the asymptotic standard error formulae can be cumbersome, so standard errors for matching estimators are often generated using bootstrap resampling methods This is valid for kernel or local linear matching estimators, but not for nearest neighbor matching estimators (see Abadie and Imbens (2004), also for alternatives in that case)

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains

More information

Kernel Density Estimation (KDE)

Kernel Density Estimation (KDE) Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

User Guide of MTE.exe

User Guide of MTE.exe User Guide of MTE.exe James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March 22, 2006 The objective of this document

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Overview of various smoothers

Overview of various smoothers Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time

More information

Econometric Tools 1: Non-Parametric Methods

Econometric Tools 1: Non-Parametric Methods University of California, Santa Cruz Department of Economics ECON 294A (Fall 2014) - Stata Lab Instructor: Manuel Barron 1 Econometric Tools 1: Non-Parametric Methods 1 Introduction This lecture introduces

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

The Normal Distribution & z-scores

The Normal Distribution & z-scores & z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

BIOL Gradation of a histogram (a) into the normal curve (b)

BIOL Gradation of a histogram (a) into the normal curve (b) (التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

The Normal Distribution & z-scores

The Normal Distribution & z-scores & z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Package CausalGAM. October 19, 2017

Package CausalGAM. October 19, 2017 Package CausalGAM October 19, 2017 Version 0.1-4 Date 2017-10-16 Title Estimation of Causal Effects with Generalized Additive Models Author Adam Glynn , Kevin Quinn

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 11/3/2016 GEO327G/386G, UT Austin 1 Tobler s Law All places are related, but nearby places are related

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland

Statistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

The Normal Distribution & z-scores

The Normal Distribution & z-scores & z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we

More information

Robust Shape Retrieval Using Maximum Likelihood Theory

Robust Shape Retrieval Using Maximum Likelihood Theory Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Treatment Effect Paper 2. Abstract

Treatment Effect Paper 2. Abstract Treatment Effect Paper 2 Abstract We use the data from the National Supported Work Demonstration to study performance of nonpropensity-score-matching estimators, and to compare them with propensity score

More information

Derivatives and Graphs of Functions

Derivatives and Graphs of Functions Derivatives and Graphs of Functions September 8, 2014 2.2 Second Derivatives, Concavity, and Graphs In the previous section, we discussed how our derivatives can be used to obtain useful information about

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

INDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential

INDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential INDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential MINNESOTA MATHEMATICS STANDARDS Grades 9, 10, 11 I. MATHEMATICAL REASONING Apply skills of mathematical

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Supplementary Figure 1. Decoding results broken down for different ROIs

Supplementary Figure 1. Decoding results broken down for different ROIs Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas

More information

More Summer Program t-shirts

More Summer Program t-shirts ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling

More information

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize. Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

23.2 Normal Distributions

23.2 Normal Distributions 1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate

More information

Equating. Lecture #10 ICPSR Item Response Theory Workshop

Equating. Lecture #10 ICPSR Item Response Theory Workshop Equating Lecture #10 ICPSR Item Response Theory Workshop Lecture #10: 1of 81 Lecture Overview Test Score Equating Using IRT How do we get the results from separate calibrations onto the same scale, so

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Machine Learning and Pervasive Computing

Machine Learning and Pervasive Computing Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)

More information

PRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET

PRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET PRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET KLAUS MOELTNER 1. Introduction In this script we show, for a single housing market, how mis-specification of the hedonic price function can lead

More information

Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras. Lecture - 24

Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras. Lecture - 24 Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras Lecture - 24 So in today s class, we will look at quadrilateral elements; and we will

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Perceptron as a graph

Perceptron as a graph Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

Notes and Announcements

Notes and Announcements Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

Automatic basis selection for RBF networks using Stein s unbiased risk estimator Automatic basis selection for RBF networks using Stein s unbiased risk estimator Ali Ghodsi School of omputer Science University of Waterloo University Avenue West NL G anada Email: aghodsib@cs.uwaterloo.ca

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Jason Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Nonparametric Methods 1 / 49 Nonparametric Methods Overview Previously, we ve assumed that the forms of the underlying densities

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24 MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

Semantic Segmentation. Zhongang Qi

Semantic Segmentation. Zhongang Qi Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

An introduction to multi-armed bandits

An introduction to multi-armed bandits An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to

More information

Graphical Models. David M. Blei Columbia University. September 17, 2014

Graphical Models. David M. Blei Columbia University. September 17, 2014 Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to

More information

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be

More information

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups. Clustering 1 Clustering Discover groups such that samples within a group are more similar to each other than samples across groups. 2 Clustering Discover groups such that samples within a group are more

More information

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)

More information

Minnesota Academic Standards for Mathematics 2007

Minnesota Academic Standards for Mathematics 2007 An Alignment of Minnesota for Mathematics 2007 to the Pearson Integrated High School Mathematics 2014 to Pearson Integrated High School Mathematics Common Core Table of Contents Chapter 1... 1 Chapter

More information

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES 2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk

More information

Sampling-based Planning 2

Sampling-based Planning 2 RBE MOTION PLANNING Sampling-based Planning 2 Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering http://users.wpi.edu/~zli11 Problem with KD-tree RBE MOTION PLANNING Curse of dimension

More information