Optimization and Simulation

Size: px

Start display at page:

Download "Optimization and Simulation"

Ralf Rafe Bennett
6 years ago
Views:

1 Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 1 / 17

2 Introduction The outputs of the simulator are random variables. Running the simulator provides one realization of these r.v. We have no access to the pdf or CDF of these r.v. Well... this is actually why we rely on simulation. How to derive statistics about a r.v. when only instances are known? How to measure the quality of this statistic? M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 2 / 17

3 Sample mean and variance Consider X 1,..., X n i.i.d. r.v. E[X i ] = µ, Var(X i ) = σ 2. The sample mean X = 1 n is an unbiased estimate of the population mean µ, as E[ X] = µ. The sample variance S 2 = 1 n 1 n i=1 X i n (X i X) 2 is an unbiased estimator of the population variance σ 2, as E[S 2 ] = σ 2. (see proof: Ross, chapter 7) i=1 M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 3 / 17

4 Sample mean and variance Recursive computation 1 Initialize X 0 = 0, S 2 1 = 0. 2 Update the mean X k+1 = X k + X k+1 X k k +1 3 Update the variance ( Sk+1 2 = 1 1 ) Sk 2 k +(k +1)( X k+1 X k ) 2. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 4 / 17

5 Mean Square Error Consider X 1,..., X n i.i.d. r.v. with CDF F. Consider a parameter θ(f) of the distribution (mean, quantile, mode, etc.) Consider θ(x 1,...,X n ) an estimator of θ(f). The Mean Square Error of the estimator is defined as [ ) ] 2 MSE(F) = E F ( θ(x1,...,x n ) θ(f), where E F emphasizes that the expectation is taken under the assumption that the r.v. all have distribution F. If F is unknown, it is not immediate to find an estimator of MSE. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 5 / 17

6 How many draws must be used? Let X a r.v. with mean θ and variance σ 2. We want to estimate the mean θ of the simulated distribution. The estimator used is the sample mean: X. The mean square error is E[( X θ) 2 ] = σ2 n The sample mean X is normally distributed with mean θ and variance σ 2 /n. So we can stop generating data when σ/ n is small. σ is approximated by the sample variance S. Law of large numbers: at least 100 draws (say) should be used. See Ross p. 121 for details. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 6 / 17

7 Mean Square Error Other indicators than the mean are desired. Theoretical results about the MSE cannot always be derived. Solution: rely on simulation. Method: bootstrapping. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 7 / 17

8 Empirical distribution function Consider X 1,..., X n i.i.d. r.v. with CDF F. Consider a realization x 1,...,x n of these r.v. The empirical distribution function is defined as F e (x) = 1 n n I{x i x}, i=1 where I{x i x} = { 1 if xi x, 0 otherwise. CDF of a r.v. that can take any x i with equal probability. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 8 / 17

9 Empirical CDF F e (x), n = 10 F(x) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 9 / 17

10 Empirical CDF F e (x), n = 100 F(x) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 10 / 17

11 Empirical CDF F e (x), n = 1000 F(x) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 11 / 17

12 Mean Square Error We use the empirical distribution function F e We can approximate [ ) ] 2 MSE(F) = E F ( θ(x1,...,x n ) θ(f), by [ ) ] 2 MSE(F e ) = E Fe ( θ(x1,...,x n ) θ(f e ), θ(f e ) can be computed directly from the data (mean, variance, etc.) M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 12 / 17

13 Mean Square Error We want to compute [ ) ] 2 MSE(F e ) = E Fe ( θ(x1,...,x n ) θ(f e ), F e is the CDF of a r.v. that can take any x i with equal probability. Therefore, MSE(F e ) = 1 n n n i 1 =1 n i n=1 Clearly impossible to compute when n is large. Solution: simulation. [ ( θ(xi1,...,x in) θ(f e )) 2 ], M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 13 / 17

14 Bootstrapping For r = 1,...,R Draw x r 1,...,xr n from F e, that is draw from the data: 1 Let s be a draw from U[0,1] 2 Set j = floor(ns). 3 Return x j. Compute M r = ( θ(x r 1,...,x r n) θ(f e )) 2, Estimate of MSE(F e ) and, therefore, of MSE(F): Typical value for R: R R r=1 M r M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 14 / 17

15 Bootstrap: simple example Data: 0.636, , 0.183, -1.67, Mean= MSE= E[( X θ) 2 ] = S 2 /n= r ˆθ θ(f e) MSE e M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 15 / 17

16 Summary The number of draws is determined by the required precision. In some cases, the precision is derived from theoretical results. If not, rely on bootstrapping. Idea: use simulation to estimate the Mean Square Error. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 16 / 17

17 Appendix: MSE for the mean Consider X 1,..., X n i.i.d. r.v. Denote θ = E[X i ] and σ 2 = Var(X i ). Consider X = n i=1 X i/n. E[ X] = n i=1 E[X i]/n = θ. MSE: E[( X θ) 2 ] = Var X ( n ) = Var X i /n i=1 = n Var(X i )/n 2 i=1 = σ 2 /n. M. Bierlaire (TRANSP-OR ENAC EPFL) Optimization and Simulation 17 / 17

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.