STATISTICAL THINKING IN PYTHON II. Generating bootstrap replicates

Similar documents
STATISTICAL THINKING IN PYTHON I. Introduction to Exploratory Data Analysis

GenStat for Schools. Disappearing Rock Wren in Fiordland

Interpolation and curve fitting

PYTHON DATA VISUALIZATIONS

STA258H5. Al Nosedal and Alison Weir. Winter Al Nosedal and Alison Weir STA258H5 Winter / 27

Bi 1x Spring 2014: Plotting and linear regression

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

An Introduction to the Bootstrap

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Laboratory #11. Bootstrap Estimates

Chapter 5 : Informatics Practices. Class XII ( As per CBSE Board) Numpy - Array. New Syllabus Visit : python.mykvs.in for regular updates

INTRODUCTION TO DATA VISUALIZATION WITH PYTHON. Working with 2D arrays

Simulation and resampling analysis in R

More Summer Program t-shirts

Introduction to Mplus

The Python interpreter

Tutorial Four: Linear Regression

IPS9 in R: Bootstrap Methods and Permutation Tests (Chapter 16)

Lecture 15: High Dimensional Data Analysis, Numpy Overview

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Introduction to the Bootstrap and Robust Statistics

Data 8 Final Review #1

Package uclaboot. June 18, 2003

Notes on Simulations in SAS Studio

Adina Howe Instructor

The simpleboot Package

Section 4 Matching Estimator

3. parallel: (b) and (c); perpendicular (a) and (b), (a) and (c)

Python Matplotlib. MACbioIDi February March 2018

Python 101. Nadia Blagorodnova Caltech, 25th January 2017

INTRODUCTION TO DATA VISUALIZATION WITH PYTHON. Visualizing time series

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

CHAPTER 1 INTRODUCTION

Section Graphs and Lines

Bootstrapping Methods

Outputs. Inputs. the point where the graph starts, as we ll see on Example 1.

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Bell Ringer Write each phrase as a mathematical expression. Thinking with Mathematical Models

Chapter 2: Modeling Distributions of Data

Econometric Tools 1: Non-Parametric Methods

Lecture 25: Review I

MATPLOTLIB. Python for computational science November 2012 CINECA.

Fathom Dynamic Data TM Version 2 Specifications

Plotting With matplotlib

MS&E 226: Small Data

Further Simulation Results on Resampling Confidence Intervals for Empirical Variograms

Interactive Mode Python Pylab

Python Crash Course Numpy, Scipy, Matplotlib

Package complmrob. October 18, 2015

COPYRIGHTED MATERIAL CONTENTS

MATH 115: Review for Chapter 1

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Lecture 12. August 23, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Package citools. October 20, 2018

Thomas Vincent Head of Data Science, Getty Images

Back-to-Back Stem-and-Leaf Plots

TI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock

Programming for Engineers in Python

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Lesson 19: The Graph of a Linear Equation in Two Variables is a Line

Euler s Method with Python

Package fitplc. March 2, 2017

ITSx: Policy Analysis Using Interrupted Time Series

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

Data Science and Machine Learning Essentials

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

The SciPy Stack. Jay Summet

DSC 201: Data Analysis & Visualization

Data Analysis Frameworks

699DR git/github Tutorial

Multi-criteria Decision Analysis

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Prof. Dr. Rudolf Mathar, Dr. Arash Behboodi, Emilio Balda. Exercise 5. Friday, December 22, 2017

UNSUPERVISED LEARNING IN PYTHON. Visualizing the PCA transformation

Chapter 2 Modeling Distributions of Data

CS 237: Probability in Computing

#To import the whole library under a different name, so you can type "diff_name.f unc_name" import numpy as np import matplotlib.

225L Acceleration of Gravity and Measurement Statistics

Sampling from distributions

Example 1: Use the graph of the function f given below to find the following. a. Find the domain of f and list your answer in interval notation

Practical 06: Plotting and the Verlet integrator Documentation

How to Make Graphs in EXCEL

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2

2. For each of the regular expressions, give a string that will matches it:

Acknowledgments. Acronyms

REVIEW, pages

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

CME 193: Introduction to Scientific Python Lecture 6: Numpy, Scipy, Matplotlib

CHAPTER 2 Modeling Distributions of Data

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

If not instantaneous, it is extraordinarily rapid". Galileo writing about his attempt to measure the speed of light with two lanterns

Subset Selection in Multiple Regression

Bootstrapping Regression Models in R

Basic Beginners Introduction to plotting in Python

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

Correlation and Regression

Bootstrap Confidence Intervals for Regression Error Characteristic Curves Evaluating the Prediction Error of Software Cost Estimation Models

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Transcription:

STATISTICAL THINKING IN PYTHON II Generating bootstrap replicates

Michelson's speed of light measurements Data: Michelson, 1880

Resampling an array Data: [23.3, 27.1, 24.3, 25.3, 26.0] Mean = 25.2 Resampled data: [,,,, ]

Resampling an array Data: [23.3,, 24.3, 25.3, 26.0] Mean = 25.2 Resampled data: [27.1,,,, ]

Resampling an array Data: [23.3, 27.1, 24.3, 25.3, 26.0] Mean = 25.2 Resampled data: [27.1,,,, ]

Resampling an array Data: [23.3, 27.1, 24.3, 25.3, 26.0] Mean = 25.2 Resampled data: [27.1, 26.0,,, ]

Resampling an array Data: [23.3, 27.1, 24.3, 25.7, 26.0] Mean = 25.2 Resampled data: [27.1, 26.0, 23.3, 25.7, 23.3] Mean = 25.08

Mean of resampled Michelson measurements Statistical Thinking in Python II

Bootstrapping The use of resampled data to perform statistical inference

Bootstrap sample A resampled array of the data

Bootstrap replicate A statistic computed from a resampled array

Resampling engine: np.random.choice() In [1]: import numpy as np In [2]: np.random.choice([1,2,3,4,5], size=5) Out[2]: array([5, 3, 5, 5, 2])

Computing a bootstrap replicate In [1]: bs_sample = np.random.choice(michelson_speed_of_light,...: size=100) In [2]: np.mean(bs_sample) Out[2]: 299847.79999999999 In [3]: np.median(bs_sample) Out[3]: 299845.0 In [4]: np.std(bs_sample) Out[4]: 83.564286025729331

STATISTICAL THINKING IN PYTHON II Let s practice!

STATISTICAL THINKING WITH PYTHON II Bootstrap confidence intervals

Bootstrap replicate function In [1]: def bootstrap_replicate_1d(data, func):...: """Generate bootstrap replicate of 1D data."""...: bs_sample = np.random.choice(data, len(data))...: return func(bs_sample)...: In [2]: bootstrap_replicate_1d(michelson_speed_of_light, np.mean) Out[2]: 299859.20000000001 In [3]: bootstrap_replicate_1d(michelson_speed_of_light, np.mean) Out[3]: 299855.70000000001 In [4]: bootstrap_replicate_1d(michelson_speed_of_light, np.mean) Out[4]: 299850.29999999999

Many bootstrap replicates In [1]: bs_replicates = np.empty(10000) In [2]: for i in range(10000):...: bs_replicates[i] = bootstrap_replicate_1d(...: michelson_speed_of_light, np.mean)...:

Plotting a histogram of bootstrap replicates In [1]: _ = plt.hist(bs_replicates, bins=30, normed=true) In [2]: _ = plt.xlabel('mean speed of light (km/s)') In [3]: _ = plt.ylabel('pdf') In [4]: plt.show()

Bootstrap estimate of the mean Statistical Thinking in Python II

Confidence interval of a statistic If we repeated measurements over and over again, p% of the observed values would lie within the p% confidence interval.

Bootstrap confidence interval In [1]: conf_int = np.percentile(bs_replicates, [2.5, 97.5]) Out[1]: array([ 299837., 299868.])

STATISTICAL THINKING WITH PYTHON II Let s practice!

STATISTICAL THINKING IN PYTHON II Pairs bootstrap

Nonparametric inference Make no assumptions about the model or probability distribution underlying the data

2008 US swing state election results Data retrieved from Data.gov (https://www.data.gov/)

Pairs bootstrap for linear regression Resample data in pairs Compute slope and intercept from resampled data Each slope and intercept is a bootstrap replicate Compute confidence intervals from percentiles of bootstrap replicates

Generating a pairs bootstrap sample In [1]: np.arange(7) Out[1]: array([0, 1, 2, 3, 4, 5, 6]) In [1]: inds = np.arange(len(total_votes)) In [2]: bs_inds = np.random.choice(inds, len(inds)) In [3]: bs_total_votes = total_votes[bs_inds] In [4]: bs_dem_share = dem_share[bs_inds]

Computing a pairs bootstrap replicate In [1]: bs_slope, bs_intercept = np.polyfit(bs_total_votes,...: bs_dem_share, 1) In [2]: bs_slope, bs_intercept Out[2]: (3.9053605692223672e-05, 40.387910131803025) In [3]: np.polyfit(total_votes, dem_share, 1) # fit of original Out[3]: array([ 4.03707170e-05, 4.01139120e+01])

2008 US swing state election results Data retrieved from Data.gov (https://www.data.gov/)

STATISTICAL THINKING IN PYTHON II Let s practice!