Monte Carlo Simulation. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Similar documents
MS&E 226: Small Data

Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response

Dealing with Categorical Data Types in a Designed Experiment

VARIANCE REDUCTION TECHNIQUES IN MONTE CARLO SIMULATIONS K. Ming Leung

CHAPTER 1 INTRODUCTION

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

The Bootstrap and Jackknife

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

Missing Data and Imputation

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Gauss for Econometrics: Simulation

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Week 4: Simple Linear Regression II

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

Introduction to Mplus

Sequential Monte Carlo Adaptation in Low-Anisotropy Participating Media. Vincent Pegoraro Ingo Wald Steven G. Parker

Lab #9: ANOVA and TUKEY tests

Descriptive Statistics, Standard Deviation and Standard Error

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

Performance of Latent Growth Curve Models with Binary Variables

UNIT 9A Randomness in Computation: Random Number Generators

Scientific Computing: An Introductory Survey

Section 2.3: Simple Linear Regression: Predictions and Inference

Problem Set #8. Econ 103

1. Estimation equations for strip transect sampling, using notation consistent with that used to

Towards Interactive Global Illumination Effects via Sequential Monte Carlo Adaptation. Carson Brownlee Peter S. Shirley Steven G.

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Lecture 25: Review I

Multivariate Capability Analysis

GMS 9.0 Tutorial MODFLOW Stochastic Modeling, PEST Null Space Monte Carlo I Use PEST to create multiple calibrated MODFLOW simulations

Reproducibility in Stochastic Simulation

Multivariate Standard Normal Transformation

INFO Wednesday, September 14, 2016

Lab 5 - Risk Analysis, Robustness, and Power

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Monte Carlo 1. Appendix A

CPSC 320 Sample Solution, Playing with Graphs!

The Normal Curve. June 20, Bryan T. Karazsia, M.A.

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Guide to Running TEMAP2.R. The R-program, TEMAP2.R, fits the TE models to the data; it can also

Monte Carlo Integration COS 323

User Guide for FOPDT and SOPDT Model Regression 1 April 2018 R. Russell Rhinehart

A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection (Kohavi, 1995)

Chapters 5-6: Statistical Inference Methods

Activity Overview A basic introduction to the many features of the calculator function of the TI-Nspire

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.

Using Mplus Monte Carlo Simulations In Practice: A Note On Non-Normal Missing Data In Latent Variable Models

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Integration. Volume Estimation

Introduction to Monte Carlo Simulations with Applications in R Using the SimDesign Package

An introduction to plotting data

Study Guide. Module 1. Key Terms

Monte Carlo Ray Tracing. Computer Graphics CMU /15-662

Statistical Analysis of List Experiments

Stat 602X Exam 2 Spring 2011

Simulation. Chapter Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall

Ph3 Mathematica Homework: Week 8

ISyE 6416: Computational Statistics Spring Lecture 13: Monte Carlo Methods

Chapter 2 Random Number Generators

STATS PAD USER MANUAL

Probability and Statistics for Final Year Engineering Students

MODFLOW Stochastic Modeling, PEST Null Space Monte Carlo I. Use PEST to create multiple calibrated MODFLOW simulations

Week 10: Heteroskedasticity II

Blending of Probability and Convenience Samples:

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Confidence Intervals: Estimators

Notes on Simulations in SAS Studio

Problem Set 3 CMSC 426 Due: Thursday, April 3

Lecture 12. August 23, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Missing Data Missing Data Methods in ML Multiple Imputation

Acknowledgments. Acronyms

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Standard Errors in OLS Luke Sonnet

L20 MPI Example, Numerical Integration. Let s now consider a real problem to write a parallel code for. Let s take the case of numerical integration.

3.9 LINEAR APPROXIMATION AND THE DERIVATIVE

Monte Carlo Analysis

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM)

Handling Data with Three Types of Missing Values:

Industrialising Small Area Estimation at the Australian Bureau of Statistics

Individual Covariates

Physics 736. Experimental Methods in Nuclear-, Particle-, and Astrophysics. - Statistical Methods -

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

Descriptive Statistics and Graphing

Excel Community SAMPLE CONTENT

Computational Methods. Randomness and Monte Carlo Methods

Enterprise Miner Tutorial Notes 2 1

Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data

A noninformative Bayesian approach to small area estimation

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Simulation: Solving Dynamic Models ABE 5646 Week 12, Spring 2009

Pair-Wise Multiple Comparisons (Simulation)

Simulation with Arena

More Summer Program t-shirts

Chapter 6: Simulation Using Spread-Sheets (Excel)

Applied Regression Modeling: A Business Approach

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

Package merror. November 3, 2015

Transcription:

Monte Carlo Simulation Ben Kite KU CRMDA 2015 Summer Methodology Institute Created by Terrence D. Jorgensen, 2014

What Is a Monte Carlo Simulation? Anything that involves generating random data in a parameter space Today, we simulate data from a known parameter space that we specify Read Johnson s (2013) Monte Carlo Analysis in Academic Research for history and applications doi:10.1093/oxfordhb/9780199934874.013.0022

What Is a Monte Carlo Simulation? Here s the basic idea Consider a statistical procedure (e.g., a t test) that receives data and returns a result i.e., parameter estimates, sample statistics Presumably there is a true population value that the estimate is supposed to represent Does the procedure yield good estimates of the true parameters? Is the sampling distribution of the estimates normal, symmetric, consistent, etc.?

What Is a Monte Carlo Simulation? Specify a population (i.e., a set of parameters), then draw random samples from it A population is a data generating process Random samples are statistically equivalent Apply the procedure to each sample Save estimates, tests, p values, etc. Evaluate the procedure Compare stats to parameters, check distributions

Goals of a Monte Carlo Simulation Check that a procedure behaves as expected Nominal Type I error rates, unbiased estimates See how a procedure behaves when assumptions are violated Inflated Type I error rates? Robust if minor? Effects of missing data? Effect of sample size? Compare 2 procedures OLS v. WLS; LGCM v. MLM

Replicating a Random Process We must be able to regenerate results exactly without saving each data set Pseudorandom number generator (PRNG) Algorithm that generates seemingly random streams of integers The random numbers you get depend on a random state characterized by a seed Seed can typically be specified using a single integer Setting the seed allow you to replicate results

Let s Generate Random Numbers in R! Let's open our R syntax and get started. Here is an outline of today's topics/tasks: Generate some simple (pseudo)random numbers Generate random samples of data (i.e., several variables) using population parameters Use Monte Carlo method to see sampling variance of a 2 group mean difference Compare the observed sampling variance to the estimated standard error (SE) of the mean difference Design a small scale Monte Carlo study How is the estimated SE affected by between group differences in N and SD?

Advice for Monte Carlo Designs DO NOT: Think of the Monte Carlo experiment as One Giant Sequential Script of commands Generate a massive block of data that needs to be saved and re loaded every time you run a procedure on it Rather, create a function for every action you need to take Data generation and manipulation functions Generates data for one run of the simulation Perturbs the data (impose missingness, etc.) A function that accepts 1 data set and analyzes it A function that combines the above steps Runs one complete replication Functions to harvest estimates, summarize/plot results

Monte Carlo Designs Think of a random sample as a person/case Multiple samples in each condition Between subjects factors change the datagenerating process Parameters, distributions, missing data, scales Within subjects factors analyze the same data using different methods Estimation method, with/out covariates, N?

Monte Carlo Outcomes Sampling distributions of anything! Consistency, efficiency, normality Bias in point and SE estimates (Root) mean squared error Confidence Interval coverage rates Rejection rates (alpha, power) Convergence rates Model fit

Design your study to test hypotheses Exploratory simulations can get out of hand Are all conditions necessary to test your hypotheses? Consider how factors are expected to affect outcomes of interest, including interactions If exploring potential effects, try 2 levels of each variable (2 k ) for pilot study Detect interactions (use η 2 /R 2, not p value) Confound higher order ones to reduce conditions Add levels to detect nonlinear effects

Write down a recipe for your code Writing syntax can be daunting, so start with plain language Ingredients Characteristics of your population(s) Manipulated factors, outcomes of interest Write down steps from beginning to end Can start broad, move to specific Ultimately, easier to translate to R, C, Fortran

Variance Reduction Techniques Save time and computing power, as well as reduce the amount of noise in your results When is it necessary to draw new samples? NOT for factors like sample size, different estimators, prior variance, competing models Typically, ONLY when the population differs (e.g., normal/nonnormal data), or the factor reflects an aspect of design that changes characteristics of the data (e.g., number of response categories)

Variance Reduction Techniques Consider sample size, etc., to be withinsample (or within replication) factors Recycle same seeds, or better yet, perform all analyses/conditions on the data the one time is generated Generate largest N, then take first N j from sample Repeat this for # of replications, within each cell of between replication design

Analysis Plan Carefully consider outcomes of interest Have testable hypotheses/predictions In each replication, save the output you intend to investigate, in a way that makes it easy to analyze Picture your analysis of results ahead of time Perhaps make up data in a spreadsheet that mimics the format of your results Could help your design

Useful Tools In R, the package portableparallelseeds allows you to exercise great control of replicability using random seed states Developed by Paul Johnson Run this syntax to install and find help files: install.packages("portableparallelseeds", repos = "http://rweb.quant.ku.edu/kran", type = "source")