Overview of various smoothers

Size: px

Start display at page:

Download "Overview of various smoothers"

Ralf Moore
5 years ago
Views:

1 Chapter 2 Overview of various smoothers A scatter plot smoother is a tool for finding structure in a scatter plot: Figure 2.1: CD4 cell count since seroconversion for HIV infected men. CD4 counts vs Time CD Time since zeroconversion Suppose that we consider as the response measurements 6

2 and!" as the design points. We can think of and as outcomes of random variable # and $. However, for scatter plot smoothers we don t really need stochastic assumptions, it can be considered as a descriptive tool. A scatter plot smoother can be defined as a function (remember the general definition of function) of and with domain at least containing the values in : %&('*),+ -. There is usually /. a recipe /. that gives % /., which /. is the function '0)1,+2- evaluated at, for all. We will be calling the target value when we giving the recipe. Note: Some recipes don t give an % /. /. for all, but only for the s included in. Note we will call the vector 3% % 456 as the smooth. Here is a stupid example: If we assume a random desing model and take expectations over the empirical distribution 7, defined by the observations, we have 8 for /.:9 any 3 5, ;=< > )?$@+#A /. -B ave3 CED C /.5F Define % /. < E > )G$H+#I /. - C. What happens if the are unique? Since $ and # are, in general, non-categorical we don t expect to find many replicates at any given value of#. This means that we could end up with the data again,% /.. for all /.. Not very smooth! Note: For convenience, through out this chapter, we assume that the data are sorted by#. Many smoothers force % B to be a smooth function of. This is a fancy way of saying we think data points that are close (in ) should have roughly the same expectation. 7

3 8 CHAPTER 2. OVERVIEW OF VARIOUS SMOOTHERS 2.1 Parametric smoother These are what you have seen already. We force a function defined by few parameters on the data and use something like least squares to find the best estimates for the parameters. For example, a regression line computed with least squares can be thought of as a smoother. In this case J,)1,+ - /. KL /. MNOPNH Q NO with N a design matrix containing a column of 1 s and (cbind(1,x)). The lack of flexibility of these types of smoother can make them provide misleading results. Figure 2.2: CD4 cell count since seroconversion for HIV infected men. Regression Lines CD line parabola cubic poly Time

4 2.2. BIN SMOOTHERS Bin smoothers A bin smoother, also known as a regressogram, mimics a categorical smoother by partitioning the predicted value into disjoint and exhaustive regions, then averaging the response in each region. Formally, we choose cut-points R.TSUVS R W where R. YX&Z and R W[\Z, and define ]_^ Y3` D R ^ba CcS R ^ed e5fdef hg ei the indexes of the data points in each region. ThenJ,)1,+2- is given by % /. Ckj6lnm ave 3 Co5 /.:9 ]:^ if Notice that the bin smoother will have discontinuities. Figure 2.3: CD4 cell count since seroconversion for HIV infected men. Bin Smoother CD Time

5 10 CHAPTER 2. OVERVIEW OF VARIOUS SMOOTHERS 2.3 Running-mean/moving average Since we have no replicates and we want to force% B /. to be smooth we can use the motivation that under some stastical model, for any values ofp B E)?$q+#r - /. for close to are similar. How do we define close? A formal definition is the symmetric nearest neighborhood sut C v3wyxz `LX fb K6 `MX K ` ` { K4 wt}~ `B{ f Me5 We may now define running mean as: % Co ave j ƒ ˆk 3 5 We can also forget about the symmetric part and simply define the nearest f neighbors. Figure 2.4: CD4 cell count since seroconversion for HIV infected men. Running Mean CD4[o] Time[o] This usually too wiggly to be considered useful. Why do you think?

6 2.4. KERNEL SMOOTHERS 11 Notice we can also fit a line instead of a constant. running-line. This procedure is called Can you write out the recipe for % C for the running-line smoother? 2.4 Kernel smoothers One of the reasons why the previous smoothers is wiggly is because when we move from C to Cd two points are usually changed in the group we average. If the new two points are very different then% Co and% Cd may be quite different. One way to try and fix this is by making the transition smoother. That s the idea behind kernel smoothers. Generally speaking a kernel smoother defines a set of weights 3Š C" Be5 CŒ B for each and defines % B CŒ B Š CŽ BŽ CE We will see that most scatter plot smoothers can be considered to be kernel smoothers in this very general definition. What is called a kernel smoother in practice has a simple approach to represent the weight sequence 3Š C" B5 C B by describing the shape of the weight function Š C" B by a density function with a scale parameter that adjusts the size and the form i of the weights near. It is common to refer to this shape function as i a kernel. The kernel is a continuous, bounded, and symmetric real function which integrates to one, i o B! K

7 12 CHAPTER 2. OVERVIEW OF VARIOUS SMOOTHERS For a given scale parameter, the weight sequence is then defined by Notice: CŒ B Š C C K Š C B ir Q ˆ CŒ B i š Q ˆ š The kernel smoother is then defined for any as before by % B CŒ B Šœ C" B $ CE Notice: if we consider and to be observations of random variables # and $ then one can get an intuition for why this would work because ; )?$H+#- p cžÿ L! p6 B with p6 B the marginal distribution of# and p žÿ L the joint distribution of # $, and Q CŒ B i Q ˆ % B C Q C B i Q ˆ Because we think points that are close together are similar, a kernel smoother usually defines weights that decrease in a smooth fashion as one moves away from the target point. Running mean smoothers are kernel smoothers that use a box kernel. A natural candidate for i is the standard Gaussian density. (This is very inconvenient computationally because its never 0). This smooth is shown in Figure 2.5 for K year. In Figure 2.6 we can see the weight sequence for the box and Gaussian kernels for three values of.

8 2.4. KERNEL SMOOTHERS 13 Figure 2.5: CD4 cell count since seroconversion for HIV infected men. Kernel Smoother CD4[o] Time[o] Figure 2.6: CD4 cell count since seroconversion for HIV infected men. Kernels Weights Time

9 14 CHAPTER 2. OVERVIEW OF VARIOUS SMOOTHERS An Asymptotic result For the asymptotic theory presneted here we will assume the stochastic design model with a one-dimensional covariate. For the first time in this Chapter we will set down a specific stochastic model. Assume we have IID observations of the random variables # $ and that $ C \p # C { C" ` K4 (2.1) where# has marginal distributionp6 B and the C IID errors independent of the #. A common extra assumption is that the errors are normally distributed. We are now going to let go to infinity... What does that mean? For each we define an estimate for p B using the kernel smoother with scale parameter. Theorem 1 Under the following assumptions 1. +i o B + Z 2. k}wo,ª i o B hg 3. E $ «a Z 4. q Z g Z Then, at every point of continuity of p B and p6 B we have C B i Q ˆ C CŒ B i Q ˆ p B in probability. Proof: Homework. Hint: Start by proving the fixed design model.

10 LINEAR SMOOTHERS Linear smoothers Most of the smoother /. presented here are linear smoothers which means that the fit at any point can be written as % B B J BŽ In practice we usually have the model $ C hp # C {[ C and we have observations 3 CE C e5. Many times it is the vector ±b²3p ³ p!e56 we are after. In this case the vector of estimates± µ3 7 p 7 p 7!e56 can be written as ±&('2 with ' a matrix with the i,j-th entry J C. We will call 7 ± the smooth. This makes it easy to figure out things like the variance of± 7 since which in the case of IID data is «'c'. var)p'2l- Y' var) -'

Missing Data Analysis for the Employee Dataset

Missing Data Analysis for the Employee Dataset 67% of the observations have missing values! Modeling Setup For our analysis goals we would like to do: Y X N (X, 2 I) and then interpret the coefficients