Section 4 Matching Estimator
|
|
- Lambert Hopkins
- 5 years ago
- Views:
Transcription
1 Section 4 Matching Estimator
2 Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis of similarity in observed characteristics. Main advantage of matching estimators: they typically do not require specifying a functional form of the outcome equation and are therefore not susceptible to misspecification bias along that dimension.
3 Assumptions of Matching Approach Assume you have access to data on treated and untreated individuals (D=1 and D=0) Assume you also have access to a set of Z variables whose distribution is not affected by D: F(Z D,Y1,Y0)=F(Z Y1,Y0) (will be explained in a few slides why necessary)
4 Assumptions of Matching Approach 1. Selection on Observables (Unconfoundedness Assump.) There exists a set of observed characteristics Z such that outcomes are independent of program participation conditional on Z (i.e. treatment assignment is strictly ignorable given Z (Rosenbaum/Rubin (1983)). 2. Common Support Assumption Assumption 2 is required, so that matches for D=0 and D=1 observations can be found
5 Implication of Assumptions If Assumptions 1 and 2 are satisfied, then the problem of determining mean program impact can be solved by substituting the Y0 distribution observed for matched-on-z non-participants for the missing Y0 distribution of participants. To justify assumption 1, individuals cannot select into the program based on anticipated treatment impact Assumption 1 implies: Under these assumptions, one can estimate the ATE, TTE and UTE
6 Weaker assumptions for TTE If interest centers on TTE, assumptions 1 and 2 can be slightly relaxed: 1. The following weaker conditional mean independence assumption on Y0 suffices: 2. Only the following support condition is necessary (>0 is not required, because this is only needed to guarantee a participant analogue for each non-participant) The weaker assumptions for the TTE allow selection into the program to depend on Y1, but not on Y0.
7 Estimation of the TTE using the Matching Approach Under these assumptions, the mean impact of the program on program participants can be written as: (using the Law of Iterated Expectations and the assumptions stated before) Here we can illustrate why the assumption is needed, that the distribution of the matching variables, Z, is not affected by whether the treatment is received.
8 Assumption about the distribution of matching variables Z Assumption: The distribution of the matching variables, Z, is not affected by whether the treatment is received (see slide 3). In the derivation of treatment effects, e.g. of the TTE (see slide before), we make use of this assumption as follows: This expression uses the conditional density to represent the density that would also have been observed in the no treatment (D=0) state, which rules out the possibility that receipt of treatment changes the density of Z. Examples: age, gender and race would generally be valid matching variables, but marital status may not be if it were directly affected by the receipt of the program.
9 Matching Estimator A prototypical matching estimator for the TTE takes the form (n1 is the number of observations in the treatment group): where is an estimator for the matched no treatment outcome Recall that Assumption 1 implies:
10 How does matching compare to a randomized experiment? Distribution of observables of the matched controls will be the same in the treatment group However, distribution of unobservables not necessarily balanced across groups Experiment has full support, but with matching there can be a failure of the common support condition (assump 2) if there are regions where the support of Z does not overlap for the D=0 and D=1 groups, then matching is only justified when performed over the region of common support, i.e. the estimated treatment effect must be defined conditionally on the region of overlap
11 Implementing Matching Estimators Problems: How to construct a match when Z is of high dimension What to do if P(D=1 Z)=1 for some Z (violation of common support assumption (A2) How to choose set of Z variables
12 Propensity Score Matching Matching estimators difficult to implement when set of conditioning variables Z is large (small cell problems) or Z continuous ( curse of dimensionality ) Rosenbaum and Rubin theorem (1983): Show that implies Reduces the matching problem to a univariate problem, provided P(D=1 Z) (the propensity score ) can be parametrically estimated
13 Proof of Rosenbaum/Rubin Theorem Show that E(D Y,Z)=E(D Z) implies E{D Y,P(Z)}= E{D P(Z)} Let P(Z)=P(D=1 Z) and note that P(D=1 Z)=E(D Z) E{D Y,P(Z)}= E{ E(D Y,Z) Y, P(Z)} [Law of Iterated Expectations] = E{ E(D Z) Y, P(Z)} [assumption 1 of matching est.] = E{ P(Z) Y, P(Z)} = P(Z) = E{ D P(Z)}
14 Implementation of the Propensity Score Matching Estimator Step 1: Estimate a model of program participation, i.e. estimate the propensity score P(Z) for each person Step 2: Select matches based on the estimated propensity score (n1 is the number of observations in the treatment group)
15 Propensity Score Matching Methods For notational simplicity, let P=P(Z) A prototypical propensity score matching estimator for the TTE takes the form: with where denotes the set of program participants, the set of nonparticipants, the region of common support (defined on next slide), and is the number of persons in the set The match for each participant is constructed as a weighted average over the outcomes of non-participants, where the weights depend on the distance between
16 Implementing Matching Estimators Problems: How to construct a match when Z is of high dimension What to do if P(D=1 Z)=1 for some Z (violation of common support assumption (A2) How to choose set of Z variables
17 Common Support Condition The common support region can be estimated by where are standard nonparametric density estimators. To ensure that the densities are strictly greater than zero, it is required that the densities are strictly positive (i.e. exceed zero by a certain amount), determined using a trimming level q. The common support condition ensures that matches for D=1 and D=0 can be found.
18 Cross-sectional matching methods: Alternative ways of constructing matched outcomes Define a neighborhood for each i in the participant sample. Neighbors for i are non-participants for whom The persons matched to i are those people in set where Alternative matching estimators that differ in how neighborhood is defined and in how the weights are constructed 1. Nearest Neighbor Matching 2. Stratification or Interval Matching 3. Kernel and Local Linear Matching
19 Cross-sectional Method 1: Nearest Neighbor Matching Traditional, pairwise matching, also called nearest-neighbor matching, sets That is the non-participant with the value of Pj that is closest to Pi is selected as the match and Ai is a singleton set. The estimator can be implemented either matching with or without replacement With replacement: same comparison group observation can be used repeatedly as a match Drawback of matching without replacement: final estimate will usually depend on the initial ordering of the treated observations for which the matches were selected
20 Cross-sectional Method 1: Nearest Neighbor Matching Variation of nearest-neighbor matching: Caliper matching (Cochrane and Rubin (1973)) Attempts to avoid bad matches (those for which Pj is far from Pi) by imposing a tolerance on the maximum distance allowed, i.e. a match for person i is selected only if where is a prespecified tolerance. Treated persons for whom no matches can be found within the caliper are excluded from the analysis (one way of imposing the common support condition) Drawback of caliper matching: it is difficult to know a priori what choice for the tolerance level is reasonable.
21 Cross-sectional Method 2: Stratification or Interval Matching Method: 1. In this variant of matching, the common support of P is partitioned into a set of intervals. 2. Average treatment impacts are calculated through simple averaging within each interval. 3. Overall average impact estimate: a weighted average of the interval impact estimates, using the fraction of the D=1 population in each interval for the weights. Requires decision on how wide the intervals should be: Dehejia and Wahba (1999) use intervals that are selected such that the mean values of the estimated Pi s and Pj s are not statistically different from each other within intervals.
22 Kernel Method: Cross-sectional Method 3: Kernel and Local Linear Matching Uses a weighted average of all observations within the common support region: the farther away the comparison unit is from the treated unit the lower the weight. Local linear matching: Similar to the kernel estimator but includes a linear term in the weighting function, which helps to avoid bias.
23 Kernel and Local Linear Matching A kernel estimator for is given by with weights K is a kernel function and h is a bandwidth (or smoothing parameter) discussion about choice of kernel function and bandwidth, see later
24 Intro to Nonparametric Estimation Reference: Angus Deaton The Analysis of Household Surveys TO READ ch. 3.2 Nonparametric methods for estimating density functions (p ), Nonparametric regression analysis (p ) Kernel density estimation Kernel regression Choice of kernel and choice of bandwidth (trade-off between bias and variance) Local linear estimation: when and why better?
25 Estimating Univariate Densities: Histograms versus Kernel Estimators Application: when visual impression of the position and spread of the data is needed (important for example for evaluating the distribution of welfare and effects of policies on whole distribution) Histograms have the following disadvantages: Degree of arbitrariness that comes from the choice of the number of bins and of their width Problem when trying to represent continuously differentiable densities of variables that are inherently continuous histogram can obscure the genuine shape of the empirical distribution and unsuited to provide info about the derivates of density functions Alternatives: fit a parametric density to the data or nonparametric techniques (allow a more direct inspection of the data)
26 Nonparametric density estimation Idea: get away from bins of the histogram by estimating the density at every point along the x-axes. Problem: with a finite sample, there will only be empirical mass at a finite number of points. Solution: use mass at nearby points as well as the point itself. Illustration: think of sliding a band (or window) along the x-axis, calculate the fraction of the sample per unit interval within it and plot the result as an estimate of the density at the mid-point of the band Naïve estimator: but there will be steps in f(x) each time a data point enters or exits the band
27 Nonparametric density estimation Naïve estimator: but there will be steps in f(x) each time a data point enters or exits the band Modification: instead of giving all the points inside the band equal weight, give more weight to those near to x and less to those far away, so that points have a weight of zero both just outside and just inside the band replace the indicator function by a kernel function K(.)
28 Choice of Kernel and Bandwidth Choice of kernel K(.): 1. Because it is a weighting function, it should be positive and integrate to unity over the band. 2. It should be symmetric around zero, so that points below x get the same weight as those an equal distance above. 3. It should be decreasing in the absolute value of its argument. Alternative kernel functions: Epanechnikov kernel, Gaussian kernel (normal density, giving some weight to all observations) biweight kernel Choice of kernel will influence shape of the estimated density (especially when there are few points), but choice is not a critical one
29 Choice of Kernel and Bandwidth Choice of bandwidth: Results often very sensitive to choice of bandwidth. Estimating densities by kernel methods is an exercise in smoothing the raw observations into an estimated density and the bandwidth controls how much smoothing is done. Bandwidth controls trade-off between bias and variance: A large bandwidth will provide a smooth and not very variable estimate, but risks bias by bringing in observations from other parts of the density. A small bandwidth helps to pick up genuine features of the underlying density, but risks producing an unnecessarily variable plot. Oversmoothed estimates are biased and undersmoothed estimates are too variable.
30 Choice of Kernel and Bandwidth Choice of bandwidth (ctnd): Consistency of the nonparametric estimator requires that the bandwidth shrinks to zero as the sample size gets large, but not at too fast a rate (can be made formal). In practice: Consider a number of different bandwidths, plot the associated density estimated and examine the sensitivity of the estimates with respect to bandwidth choice. Formal theory of the trade-off: In standard parametric inference optimal estimation is based on minimizing the mean-squared error between the estimated and true parameters. In the nonparametric case, we estimate a function not a parameter and the there will be a mean-sq error at each point on the estimated density attempt to minimize the mean integrated squared error This way an optimal bandwidth can be estimated (after kernel is chosen).
31 Nonparametric Regression Analysis Conditional expectation of y conditional on x: Links between a conditional expectation and the underlying distributions: Intuitively: calculate the average of all y-values corresponding to each x or vector of x not feasible with finite samples and continuous x, same problem as in density estimation so adopt same solution: average over points near x Kernel regression estimator:
32 Kernel and Local Linear Matching A kernel estimator for is given by with weights K is a kernel function and h is a bandwidth (or smoothing parameter) discussion about choice of kernel function and bandwidth, see later
33 Nonparametric Regression Analysis Important: it is not possible to calculate a conditional expectation for values of x where the density is zero in practice, problems whenever the estimated density is small or zero (will make the regression function imprecise) Main strength of nonparametric over parametric regression: assumes no functional form for the relationship, allowing the data to choose, not only parameter estimates, but the shape of the curve itself Weaknesses: price of the flexibility is the much greater data requirements to implementing nonparametric methods and the difficulties of handling high-dimensional problems (alternatives: polynomial regressions and semiparametric estimation) Nonparametric methods lack the menu of options that is available for parametric methods when dealing with simultaneity, measurement error, selectivity and so forth
34 Locally Linear Regression Read Angus Deaton The Analysis of Household Surveys (p ) important: will be used again later on
35 Difference-in-Difference Matching Estimators Assumption of cross-sectional matching estimators: After conditioning on a set of observable characteristics, outcomes are conditionally mean independent of program participation. BUT: there may be systematic differences between participant and nonparticipant outcomes that could lead to a violation of the identification conditions required for matching e.g. due to program selectivity on unmeasured characteristics Solution in the case of temporally invariant differences in outcomes between participants and nonparticipants: difference-in-difference matching strategy (see Heckman, Ichimura and Todd (1997))
36 Cross-sectional versus Diff-in-Diff Matching Estimators A) Cross-sectional Matching Estimator This estimator assumes: Under these conditions, can be estimated by where n1 are the number of treated individuals for which CS2 is satisfied.
37 Cross-sectional versus Diff-in-Diff Matching Estimators B) Difference-in-Difference Matching Estimator This estimator requires repeated cross-section or panel data. Let t and t be the two time periods, one before the program start date and one after. Conditions needed to justify the application of the estimator are: Under these conditions, can be estimated by
38 Assessing the Variability of Matching Estimators Distribution theory for cross-sectional and DID kernel and local linear matching estimators: see Heckman, Ichimura and Todd (1998) But implementing the asymptotic standard error formulae can be cumbersome, so standard errors for matching estimators are often generated using bootstrap resampling methods This is valid for kernel or local linear matching estimators, but not for nearest neighbor matching estimators (see Abadie and Imbens (2004), also for alternatives in that case)
Economics Nonparametric Econometrics
Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More informationKernel Density Estimation
Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains
More informationKernel Density Estimation (KDE)
Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this
More informationLOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.
LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationUser Guide of MTE.exe
User Guide of MTE.exe James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March 22, 2006 The objective of this document
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationOverview of various smoothers
Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time
More informationEconometric Tools 1: Non-Parametric Methods
University of California, Santa Cruz Department of Economics ECON 294A (Fall 2014) - Stata Lab Instructor: Manuel Barron 1 Econometric Tools 1: Non-Parametric Methods 1 Introduction This lecture introduces
More informationDynamic Thresholding for Image Analysis
Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationNonparametric Density Estimation
Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter
More informationNonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni
Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11
More informationBIOL Gradation of a histogram (a) into the normal curve (b)
(التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationLearner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display
CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &
More informationPackage CausalGAM. October 19, 2017
Package CausalGAM October 19, 2017 Version 0.1-4 Date 2017-10-16 Title Estimation of Causal Effects with Generalized Additive Models Author Adam Glynn , Kevin Quinn
More informationNonparametric Regression
Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 11/3/2016 GEO327G/386G, UT Austin 1 Tobler s Law All places are related, but nearby places are related
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationChapter 7: Dual Modeling in the Presence of Constant Variance
Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to
More informationMachine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016
Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised
More informationMachine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017
Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationStatistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland
Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed
More informationHomework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:
Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes
More informationDensity estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate
Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationThe Normal Distribution & z-scores
& z-scores Distributions: Who needs them? Why are we interested in distributions? Important link between distributions and probabilities of events If we know the distribution of a set of events, then we
More informationRobust Shape Retrieval Using Maximum Likelihood Theory
Robust Shape Retrieval Using Maximum Likelihood Theory Naif Alajlan 1, Paul Fieguth 2, and Mohamed Kamel 1 1 PAMI Lab, E & CE Dept., UW, Waterloo, ON, N2L 3G1, Canada. naif, mkamel@pami.uwaterloo.ca 2
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationTreatment Effect Paper 2. Abstract
Treatment Effect Paper 2 Abstract We use the data from the National Supported Work Demonstration to study performance of nonpropensity-score-matching estimators, and to compare them with propensity score
More informationDerivatives and Graphs of Functions
Derivatives and Graphs of Functions September 8, 2014 2.2 Second Derivatives, Concavity, and Graphs In the previous section, we discussed how our derivatives can be used to obtain useful information about
More informationData Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationINDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential
INDEPENDENT SCHOOL DISTRICT 196 Rosemount, Minnesota Educating our students to reach their full potential MINNESOTA MATHEMATICS STANDARDS Grades 9, 10, 11 I. MATHEMATICAL REASONING Apply skills of mathematical
More informationSplines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)
Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationAnalysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS
Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationSupplementary Figure 1. Decoding results broken down for different ROIs
Supplementary Figure 1 Decoding results broken down for different ROIs Decoding results for areas V1, V2, V3, and V1 V3 combined. (a) Decoded and presented orientations are strongly correlated in areas
More informationMore Summer Program t-shirts
ICPSR Blalock Lectures, 2003 Bootstrap Resampling Robert Stine Lecture 2 Exploring the Bootstrap Questions from Lecture 1 Review of ideas, notes from Lecture 1 - sample-to-sample variation - resampling
More informationLecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.
Cornell University, Fall 2017 CS 6820: Algorithms Lecture notes on the simplex method September 2017 1 The Simplex Method We will present an algorithm to solve linear programs of the form maximize subject
More informationChapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea
Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.
More information23.2 Normal Distributions
1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate
More informationEquating. Lecture #10 ICPSR Item Response Theory Workshop
Equating Lecture #10 ICPSR Item Response Theory Workshop Lecture #10: 1of 81 Lecture Overview Test Score Equating Using IRT How do we get the results from separate calibrations onto the same scale, so
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04
More informationMachine Learning and Pervasive Computing
Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)
More informationPRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET
PRE-PROCESSING HOUSING DATA VIA MATCHING: SINGLE MARKET KLAUS MOELTNER 1. Introduction In this script we show, for a single housing market, how mis-specification of the hedonic price function can lead
More informationFinite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras. Lecture - 24
Finite Element Analysis Prof. Dr. B. N. Rao Department of Civil Engineering Indian Institute of Technology, Madras Lecture - 24 So in today s class, we will look at quadrilateral elements; and we will
More informationLecture 7: Linear Regression (continued)
Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More informationCCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12
Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:
More informationNotes and Announcements
Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationBasic Statistical Terms and Definitions
I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can
More information6 Distributed data management I Hashing
6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication
More informationHomework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization
ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationMachine Learning Lecture 3
Machine Learning Lecture 3 Probability Density Estimation II 19.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Exam dates We re in the process
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationAutomatic basis selection for RBF networks using Stein s unbiased risk estimator
Automatic basis selection for RBF networks using Stein s unbiased risk estimator Ali Ghodsi School of omputer Science University of Waterloo University Avenue West NL G anada Email: aghodsib@cs.uwaterloo.ca
More informationCHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves
More informationMachine Learning Lecture 3
Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course
More informationSplines and penalized regression
Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationNonparametric Methods
Nonparametric Methods Jason Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Nonparametric Methods 1 / 49 Nonparametric Methods Overview Previously, we ve assumed that the forms of the underlying densities
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationMCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24
MCMC Diagnostics Yingbo Li Clemson University MATH 9810 Yingbo Li (Clemson) MCMC Diagnostics MATH 9810 1 / 24 Convergence to Posterior Distribution Theory proves that if a Gibbs sampler iterates enough,
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN
More informationSemantic Segmentation. Zhongang Qi
Semantic Segmentation Zhongang Qi qiz@oregonstate.edu Semantic Segmentation "Two men riding on a bike in front of a building on the road. And there is a car." Idea: recognizing, understanding what's in
More informationMachine Learning Lecture 3
Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear
More informationTHE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann
Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG
More informationAn introduction to multi-armed bandits
An introduction to multi-armed bandits Henry WJ Reeve (Manchester) (henry.reeve@manchester.ac.uk) A joint work with Joe Mellor (Edinburgh) & Professor Gavin Brown (Manchester) Plan 1. An introduction to
More informationGraphical Models. David M. Blei Columbia University. September 17, 2014
Graphical Models David M. Blei Columbia University September 17, 2014 These lecture notes follow the ideas in Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. In addition,
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:
More informationSupport Vector Machines
Support Vector Machines Chapter 9 Chapter 9 1 / 50 1 91 Maximal margin classifier 2 92 Support vector classifiers 3 93 Support vector machines 4 94 SVMs with more than two classes 5 95 Relationshiop to
More informationReals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.
Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be
More informationClustering. Discover groups such that samples within a group are more similar to each other than samples across groups.
Clustering 1 Clustering Discover groups such that samples within a group are more similar to each other than samples across groups. 2 Clustering Discover groups such that samples within a group are more
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationMinnesota Academic Standards for Mathematics 2007
An Alignment of Minnesota for Mathematics 2007 to the Pearson Integrated High School Mathematics 2014 to Pearson Integrated High School Mathematics Common Core Table of Contents Chapter 1... 1 Chapter
More informationCALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES
2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk
More informationSampling-based Planning 2
RBE MOTION PLANNING Sampling-based Planning 2 Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering http://users.wpi.edu/~zli11 Problem with KD-tree RBE MOTION PLANNING Curse of dimension
More information