MS&E 226: Small Data

Similar documents
Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

9.2 Types of Errors in Hypothesis testing

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

MS&E 226: Small Data

Model selection and validation 1: Cross-validation

I211: Information infrastructure II

Cross-validation and the Bootstrap

Lecture 25: Review I

STA 4273H: Statistical Machine Learning

Week 4: Simple Linear Regression II

model order p weights The solution to this optimization problem is obtained by solving the linear system

Section 2.3: Simple Linear Regression: Predictions and Inference

Comparison of Methods for Analyzing and Interpreting Censored Exposure Data

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Introduction to hypothesis testing

Chapter 12: Statistics

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Introduction to Machine Learning

Integers. N = sum (b i * 2 i ) where b i = 0 or 1. This is called unsigned binary representation. i = 31. i = 0

Comparison of Means: The Analysis of Variance: ANOVA

Topic:- DU_J18_MA_STATS_Topic01

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Multiple Linear Regression: Global tests and Multiple Testing

Lecture on Modeling Tools for Clustering & Regression

Knowledge in Learning. Chapter 19

Lecture 3: Linear Classification

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

1 Machine Learning System Design

ALGEBRA Sec. 5 IDENTITY AXIOMS. MathHands.com. IDENTITY AXIOMS: Identities

COMP2121: Microprocessors and Interfacing. Number Systems

HUMAN ACCURACY ANAYLSIS ON THE AMAZON MECHANICAL TURK

Performance Evaluation

Regularization and model selection

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Chapter 6: Linear Model Selection and Regularization

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Hypothesis Testing Using Randomization Distributions T.Scofield 10/03/2016

EE 511 Linear Regression

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Erdős-Rényi Model for network formation

Cross-validation and the Bootstrap

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Chapter 8. Evaluating Search Engine

A complement number system is used to represent positive and negative integers. A complement number system is based on a fixed length representation

Slice Intelligence!

The Power and Sample Size Application

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Machine Learning (CSE 446): Practical Issues

CITS4009 Introduction to Data Science

OLS Assumptions and Goodness of Fit

Introduction to Statistical Analyses in SAS

Notes on Simulations in SAS Studio

Modelling Proportions and Count Data

Using Machine Learning to Optimize Storage Systems

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Curriculum Plan Overview

Floating Point Numbers. Lecture 9 CAP

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19

Simulation and resampling analysis in R

Modelling Proportions and Count Data

Retrieval Evaluation. Hongning Wang

OUTLINE. Number system. Creating MATLAB variables Overwriting variable Error messages Making corrections Entering multiple statements per line

7. Decision or classification trees

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

International Journal of Electrical and Computer Engineering 4: Application of Neural Network in User Authentication for Smart Home System

Section E. Measuring the Strength of A Linear Association

Chapter 2. Data Representation in Computer Systems

Learning from Data: Adaptive Basis Functions

Lecture 13: Model selection and regularization

Log-Space. A log-space Turing Machine is comprised of two tapes: the input tape of size n which is cannot be written on, and the work tape of size.

FLOATING POINT NUMBERS

Introduction to Computers and Programming. Numeric Values

Robust Linear Regression (Passing- Bablok Median-Slope)

Supplemental Material: Multi-Class Open Set Recognition Using Probability of Inclusion

Math 101 Exam 1 Review

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

CS 485/ECE 440/CS 585 Fall 2011 Midterm

Binary Diagnostic Tests Clustered Samples

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

LECTURE 8 TEST DESIGN TECHNIQUES - I

Week 5: Multiple Linear Regression II

PS 6: Regularization. PART A: (Source: HTF page 95) The Ridge regression problem is:

Nonparametric Testing

Linear Regression and K-Nearest Neighbors 3/28/18

Evaluating Machine-Learning Methods. Goals for the lecture

Problem Set 2: From Perceptrons to Back-Propagation

3. Simple Types, Variables, and Constants

1 More configuration model

CSE 446 Bias-Variance & Naïve Bayes

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

CSc 545 Lecture topic: The Criss-Cross method of Linear Programming

Transcription:

MS&E 226: Small Data Lecture 14: Introduction to hypothesis testing (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 10

Hypotheses 2 / 10

Quantifying uncertainty Recall the two key goals of inference: Estimation. What is our best guess for the process that generated the data? Quantifying uncertainty. What is our uncertainty about our guess? Hypothesis testing provides another way to quantify our uncertainty. 3 / 10

Null and alternative hypotheses In hypothesis testing, we quantify our uncertainty by asking whether it is likely that data came from a particular distribution. We will focus on the following common type of hypothesis testing scenario: The data Y come from some distribution f(y θ), with parameter θ. There are two possibilities for θ: either θ = θ 0, or θ θ 0. We call the case that θ = θ 0 the null hypothesis. 1 We call the case that θ θ 0 the alternative hypothesis. 2 1 A hypothesis that is a single point is called simple. 2 A hypothesis that is not a single point is called composite. 4 / 10

Examples Some examples of null hypotheses: You observe n flips of a biased coin, and test whether it is likely that the bias of the coin is different from 1/2. 5 / 10

Examples Some examples of null hypotheses: You observe n flips of a biased coin, and test whether it is likely that the bias of the coin is different from 1/2. You compute an ordinary least squares regression, and test whether it is likely that the true coefficient β j is nonzero, given the data you have observed. 5 / 10

Examples Some examples of null hypotheses: You observe n flips of a biased coin, and test whether it is likely that the bias of the coin is different from 1/2. You compute an ordinary least squares regression, and test whether it is likely that the true coefficient β j is nonzero, given the data you have observed. You compare two versions of a webpage, and test whether it is likely that the true conversion rates of the two pages are different, given the data you have observed. (Here you are testing whether the difference of the conversion rates is nonzero.) 5 / 10

Examples Some examples of null hypotheses: You observe n flips of a biased coin, and test whether it is likely that the bias of the coin is different from 1/2. You compute an ordinary least squares regression, and test whether it is likely that the true coefficient β j is nonzero, given the data you have observed. You compare two versions of a webpage, and test whether it is likely that the true conversion rates of the two pages are different, given the data you have observed. (Here you are testing whether the difference of the conversion rates is nonzero.) In each case, you already know how to form an estimate of the desired quantity; hypothesis tests gauge whether the estimate is meaningful. 5 / 10

The hypothesis testing recipe The basic idea is: If the true parameter was θ 0... 6 / 10

The hypothesis testing recipe The basic idea is: If the true parameter was θ 0... then T (Y) should look like it came from f(y θ 0 ). 6 / 10

The hypothesis testing recipe The basic idea is: If the true parameter was θ 0... then T (Y) should look like it came from f(y θ 0 ). We compare the observed T (Y) to the sampling distribution under θ 0. 6 / 10

The hypothesis testing recipe The basic idea is: If the true parameter was θ 0... then T (Y) should look like it came from f(y θ 0 ). We compare the observed T (Y) to the sampling distribution under θ 0. If the observed T (Y) is unlikely under the sampling distribution given θ 0, we reject the null hypothesis that θ = θ 0. Note: Rejecting the null does not mean we accept the alternative! 6 / 10

Example: biased coin flipping Suppose that we flip a coin 10 times. We observe 9 heads. We estimate the bias as ˆq = 0.8. How likely are we to observe an estimate this extreme, if the coin really had bias 1/2? In that case, the number of heads in ten flips is Binomial(10, 1/2). The chance of seeing at least 9 heads is 0.0107. In other words, it is very unlikely that we would have seen so many heads if the true bias were 1/2; seems reasonable to reject the null hypothesis. 7 / 10

Decision rules In general, a hypothesis test is implemented using a decision rule given the test statistic. We focus on decision rules like the following:: If T (Y) s, then reject the null; otherwise accept the null. In other words, the test statistics we consider will have the property that they are unlikely to have large magnitude under the null (e.g., ˆq in the preceding example). 8 / 10

Errors How do we evaluate a decision rule? Note that hypothesis testing is just a version of a binary classification problem: is the null true? 9 / 10

Errors How do we evaluate a decision rule? Note that hypothesis testing is just a version of a binary classification problem: is the null true? So there are two possible errors: False positive: In fact the null is true, but we mistakenly reject the null. False negative: In fact the null is false, but we mistakenly fail to reject the null. 9 / 10

Errors How do we evaluate a decision rule? Note that hypothesis testing is just a version of a binary classification problem: is the null true? So there are two possible errors: False positive: In fact the null is true, but we mistakenly reject the null. False negative: In fact the null is false, but we mistakenly fail to reject the null. The false positive probability P(reject θ 0 ) of a test is called its size. For any specific alternative θ θ 0, P(reject θ) is a called the power at θ. 9 / 10

Good hypothesis tests So good hypothesis tests are those that: Have small false positive probability (small size) While providing small false negative probability (high power) 10 / 10