Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

Similar documents
B.2 Measures of Central Tendency and Dispersion

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

Fall 2007, Final Exam, Data Structures and Algorithms

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Telecommunications and Internet Access By Schools & School Districts

Ocean Express Procedure: Quote and Bind Renewal Cargo

A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

CostQuest Associates, Inc.

The Lincoln National Life Insurance Company Universal Life Portfolio

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

NSA s Centers of Academic Excellence in Cyber Security

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

Silicosis Prevalence Among Medicare Beneficiaries,

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

2018 NSP Student Leader Contact Form

Distracted Driving- A Review of Relevant Research and Latest Findings

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

Department of Business and Information Technology College of Applied Science and Technology The University of Akron

Tina Ladabouche. GenCyber Program Manager

Presented on July 24, 2018

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

Geographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center

ACCESS PROCESS FOR CENTRAL OFFICE ACCESS

Global Forum 2007 Venice

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

DSC 201: Data Analysis & Visualization

2018 Supply Cheat Sheet MA/PDP/MAPD

Amy Schick NHTSA, Occupant Protection Division April 7, 2011

PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance

Prizm. manufactured by. White s Electronics, Inc Pleasant Valley Road Sweet Home, OR USA. Visit our site on the World Wide Web

Moonv6 Update NANOG 34

Charter EZPort User Guide

Homework Assignment #5

Best Practices in Rapid Deployment of PI Infrastructure and Integration with OEM Supplied SCADA Systems

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Name: Business Name: Business Address: Street Address. Business Address: City ST Zip Code. Home Address: Street Address

Presentation Outline. Effective Survey Sampling of Rare Subgroups Probability-Based Sampling Using Split-Frames with Listed Households

2015 DISTRACTED DRIVING ENFORCEMENT APRIL 10-15, 2015

Team Members. When viewing this job aid electronically, click within the Contents to advance to desired page. Introduction... 2

Computing With R Handout 1

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

CSC 411: Lecture 02: Linear Regression

A Capabilities Presentation

GATS Subscriber Group Meeting Conference Call. May 23, 2013

Contact Center Compliance Webinar Bringing you the ANSWERS you need about compliance in your call center.

What Did You Learn? Key Terms. Key Concepts. 68 Chapter P Prerequisites

September 11, Unit 2 Day 1 Notes Measures of Central Tendency.notebook

AASHTO s National Transportation Product Evaluation Program

How to Make an Impressive Map of the United States with SAS/Graph for Beginners Sharon Avrunin-Becker, Westat, Rockville, MD

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

IQR = number. summary: largest. = 2. Upper half: Q3 =

CIS 467/602-01: Data Visualization

Chapter 1. Looking at Data-Distribution

Presentation to NANC. January 22, 2003

DESCRIPTION OF GLOBAL PLANS

IN-CLASS EXERCISE: INTRODUCTION TO THE R "SURVEY" PACKAGE

Acknowledgments. Acronyms

1. Estimation equations for strip transect sampling, using notation consistent with that used to

State HIE Strategic and Operational Plan Emerging Models. February 16, 2011

Descriptive Statistics and Graphing

In this computer exercise we will work with the analysis of variance in R. We ll take a look at the following topics:

Week 4: Describing data and estimation

Sideseadmed (IRT0040) loeng 4/2012. Avo

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Exploring and Understanding Data Using R.

Your Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

Selling Compellent Hardware: Controllers, Drives, Switches and HBAs Chad Thibodeau

CSE 20. Lecture 4: Number System and Boolean Function. CSE 20: Lecture2

DTFH61-13-C Addressing Challenges for Automation in Highway Construction

Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

Strengthening connections today, while building for tomorrow. Wireless broadband, small cells and 5G

INFO Wednesday, September 14, 2016

Computing With R Handout 1

Panasonic Certification Training January 21-25, 2019

R Programming Basics - Useful Builtin Functions for Statistics

The Outlook for U.S. Manufacturing

15 Wyner Statistics Fall 2013

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Statistical Pattern Recognition

Touch Input. CSE 510 Christian Holz Microsoft Research February 11, 2016

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Advanced Econometric Methods EMET3011/8014

National Continuity Programs

Configuring Oracle GoldenGate OGG 11gR2 local integrated capture and using OGG for mapping and transformations

P.O. Box 4910, Syracuse, NY Fax: Agrisurance, Inc. Workers Compensation

ASR Contact and Escalation Lists

Be sure to fax AGA a copy of every application you submitted direct along with the fax confirmation page to ensure you receive your commission!

Alaska Mathematics Standards Vocabulary Word List Grade 7

OPERATOR CERTIFICATION

Pro look sports football guide

Steve Stark Sales Executive Newcastle

Cumulative Distribution Function (CDF) Deconvolution

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Measures of Central Tendency

Transcription:

Today s Lecture Factors & Sampling Jarrett Byrnes September 8, 2014 1. A little bit about Factors 2. Sampling 3. Describing your sample Quick Review of Last Week s Computational Concepts Numbers we Understand 1. Objects 2. Functions manipulating objects 3. Data frames summary(westnile$wnv.incidence) # Min. 1st Qu. Median Mean 3rd Qu. Max. # -7.31 0.00 0.05 1.45 1.91 24.40 4. Vectors 5. Numeric, Character, Boolean, and Factor objects

What is this? What is a factor? A factor is made up of text strings A factor has levels summary(westnile$state) # AL CT DE FL GA IL IN KY MA MD MI MS NH NJ NY OH PA TN WI WV # 19 2 1 2 2 14 6 2 4 8 6 1 2 4 8 4 14 4 22 5 westnile$state[1:10] # [1] AL AL AL AL AL AL AL AL AL CT # Levels: AL CT DE FL GA IL IN KY MA MD MI MS NH NJ NY OH PA TN WI WV levels(westnile$state) # [1] "AL" "CT" "DE" "FL" "GA" "IL" "IN" "KY" "MA" "MD" "MI" "MS" "NH" "NJ" "NY" # [16] "OH" "PA" "TN" "WI" "WV" Numerics, Characters, and Factors Characters v. Factors #a numeric vector c(1,2,3) # [1] 1 2 3 #a character vector c("1", "2", "3") # [1] "1" "2" "3" #a character vector -> factor factor(c("1", "2", "3")) # [1] 1 2 3 # Levels: 1 2 3 as.character(westnile$state[1:10]) # [1] "AL" "AL" "AL" "AL" "AL" "AL" "AL" "AL" "AL" "CT" westnile$state[1:10] # [1] AL AL AL AL AL AL AL AL AL CT # Levels: AL CT DE FL GA IL IN KY MA MD MI MS NH NJ NY OH PA TN WI WV This is important, as which does not work with factors!

Exercise Create a factor vector of the letters A thorugh D that repeats 10 times (use rep) Do the same thing, but with the strings A1, B1, D1 Merge these two into a single vector. Solution v1<-factor( rep(c("a", "B", "C", "D"), 10)) v2<-factor( rep(c("a1", "B1", "C1", "D1"), 10)) v3<-factor(c(as.character(v1), as.character(v2))) v3[1:20] # [1] A B C D A B C D A B C D A B C D A B C D # Levels: A A1 B B1 C C1 D D1 Today s Lecture What is a population? 1. A little bit about Factors 2. Sampling 3. Describing your population

What is a sample? How can sampling a population go awry? Sample is not representative Replicates do not have equal chance of being sampled Replicates are not is not independent A sample of individuals in a randomly distributed population. Bias from Unequal Representation Bias from Unequal Change of Sampling If you only chose one color, you would only get one range of sizes. Spatial gradient in size

Bias from Unequal Change of Sampling Solution: Stratified Sampling Oh, I ll just grab those individuals closest to me Sample over a known gradient, aka cluster sampling Solution: Random Sampling Non-Independence & Haphazard Sampling Two sampling schemes: What if there are interactions between individuals?

Solution: Chose Samples Randomly Deciding Sampling Design What influences the measurement you are interested in? Causal Graph Path chosen with random number generator Stratified or Random? Stratified or Random? Do you know all of the influences? Do you know all of the influences? You can represent this as an equation: Size = Color + Location +???

Stratified or Random? How is your population defined? What is the scale of your inference? What might influence the inclusion of a replicate? Exercise Draw a causal graph of the influences on one thing you measure How would you sample your population? How important are external factors you know about? How important are external factors you cannot assess? Today s Lecture Sample Properties: Mean 1. A little bit about Factors 2. Sampling 3. Describing your sample

Sample Properties: Mean R: Sample Size and Estimate Precision 1. Taking a mean mean(c(1,4,5,10,15)) # [1] 7 - The average value of a sample - The value of a measurement for a single individual n - The number of individuals in a sample - The average value of a population (Greek = population, Latin = Sample) R: Sample Size and Estimate Precision R: Sample Size and Estimate Precision 2. Mean from a random population 3. Sampling from a single simulated population mean(runif(n = 500, min = 0, max = 100)) # [1] 47.53 runif draws from a Uniform distribution set.seed(5000) population<-runif(400,0,100) mean( sample(population, size=50) ) # [1] 46.83 set.seed ensures that you get the same random number every time sample draws a sample of a defined size from a vector

Exercise: Sample Size and Estimate Precision Exercise: Sample Size and Estimate Precision 1. Use runif (or rnorm, if you re feeling saucy) to simulate a population 2. How does the repeatability of the mean change as you change the sample size? set.seed(5000) population<-runif(n = 400, min = 0, max = 100) mean( sample(population, size=3) ) # [1] 64.52 mean( sample(population, size=3) ) # [1] 54.91 Exercise: Sample Size and Estimate Precision Sample Properties: Variance How variable was that population? Let s try a larger sample size and check in on repeatability mean( sample(population, size=100) ) # [1] 45.06 mean( sample(population, size=100) ) # [1] 45.96 Sums of Squares over n-1 n-1 corrects for both sample size and sample bias if describing the population Units in square of measurement

Sample Properties: Standard Deviation Units the same as the measurement If distribution is normal, 67% of data within 1 SD, 95% within 2 if describing the population Exercise: Sample Size and Estimated Sample Variation 1. Repeat the last exercise, but with the functions sd or var 2. Do you need as many samples for a precise estimate as for the mean? Next time