Univariate Data - 2. Numeric Summaries

Similar documents
Univariate Data - 2. Numeric Summaries

HyperGeometric Distribution

Day06 A. Young W. Lim Mon. Young W. Lim Day06 A Mon 1 / 16

Statistical Programming Camp: An Introduction to R

STAT 3743 PROBABILITY & STATISTICS FALL 2010 KERNS. Exam I

Statistics 251: Statistical Methods

Chapter 1. Looking at Data-Distribution

Chapter 3 - Displaying and Summarizing Quantitative Data

GAS Tutorial - 6. Expression

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

The nor1mix Package. August 3, 2006

The nor1mix Package. June 12, 2007

Pre-Calculus Multiple Choice Questions - Chapter S2

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

Day08 A. Young W. Lim Mon. Young W. Lim Day08 A Mon 1 / 27

Day02 A. Young W. Lim Sat. Young W. Lim Day02 A Sat 1 / 12

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Day06 A. Young W. Lim Wed. Young W. Lim Day06 A Wed 1 / 26

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

BINF702 SPRING CHAPTER 2 Descriptive Statistics. BINF702 - SPRING2014- SOLKA - CHAPTER 2 Descriptive Statistics

Introduction To R Workshop Selected R Commands (See Dalgaard, 2008, Appendix C)

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Day05 A. Young W. Lim Sat. Young W. Lim Day05 A Sat 1 / 14

Teaching univariate measures of location-using loss functions

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

Package sciplot. February 15, 2013

Numerical Methods 5633

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

AA BB CC DD EE. Introduction to Graphics in R

Table of Contents (As covered from textbook)

Chapter2 Description of samples and populations. 2.1 Introduction.

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS

2. HW/SW Co-design. Young W. Lim Thr. Young W. Lim 2. HW/SW Co-design Thr 1 / 21

Chapter 5: The standard deviation as a ruler and the normal model p131

GAS Tutorial - 4. Sections & Relocation

Common R commands used in Data Analysis and Statistical Inference

STA 570 Spring Lecture 5 Tuesday, Feb 1

Procedure Calls. Young W. Lim Mon. Young W. Lim Procedure Calls Mon 1 / 29

1.3 Graphical Summaries of Data

Access. Young W. Lim Fri. Young W. Lim Access Fri 1 / 18

Week 2 Basic Statistical Concepts, Part II

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Package beanplot. R topics documented: February 19, Type Package

Access. Young W. Lim Sat. Young W. Lim Access Sat 1 / 19

Lecture 6: Chapter 6 Summary

Procedure Calls. Young W. Lim Sat. Young W. Lim Procedure Calls Sat 1 / 27

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

R Exercise Practical 2. Follow the instructions described in this sheet. Type or paste any answers you are asked for into a Microsoft Word document.

The first one will centre the data and ensure unit variance (i.e. sphere the data):

Stack Tutorial. Young W. Lim Sat. Young W. Lim Stack Tutorial Sat 1 / 15

Day14 A. Young W. Lim Thr. Young W. Lim Day14 A Thr 1 / 14

R Programming: Worksheet 6

LECTURE NOTES FOR ECO231 COMPUTER APPLICATIONS I. Part Two. Introduction to R Programming. RStudio. November Written by. N.

Frequency Distributions

Univariate data. 2.1 For example: p <- c(2, 3, 5, 7, 11, 13, 17, 19)

Arrays. Young W. Lim Mon. Young W. Lim Arrays Mon 1 / 17

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Chapter 6: DESCRIPTIVE STATISTICS

Exploratory Projection Pursuit

No. of blue jelly beans No. of bags

Introduction to R. Biostatistics 615/815 Lecture 23

Visualizing Data: Freq. Tables, Histograms

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

IST 3108 Data Analysis and Graphics Using R Week 9

Day14 A. Young W. Lim Tue. Young W. Lim Day14 A Tue 1 / 15

Stat 428 Autumn 2006 Homework 2 Solutions

Regression III: Advanced Methods

Univariate Statistics Summary

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Statistical Programming with R

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Week 4: Describing data and estimation

Lecture 3 Questions that we should be able to answer by the end of this lecture:

MATH11400 Statistics Homepage

Logic Haskell Exercises

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lab 4: Distributions of random variables

Accessibility (1A) Young Won Lim 8/22/13

Name Date Types of Graphs and Creating Graphs Notes

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Understanding and Comparing Distributions. Chapter 4

050 0 N 03 BECABCDDDBDBCDBDBCDADDBACACBCCBAACEDEDBACBECCDDCEA

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Section 3.1 Shapes of Distributions MDM4U Jensen

UP School of Statistics Student Council Education and Research

STP 226 ELEMENTARY STATISTICS NOTES

Graph tool instructions and R code

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 2 - Descriptive Statistics

Ch3 E lement Elemen ar tary Descriptive Descriptiv Statistics

Ch. 1.4 Histograms & Stem-&-Leaf Plots

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

STAT 503 Fall Introduction to SAS

Plotting: An Iterative Process

Transcription:

Univariate Data - 2. Numeric Summaries Young W. Lim 2018-02-05 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 1 / 31

Outline 1 Univariate Data Based on Numerical Summaries Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 2 / 31

Based on "Using R for Introductory Statistics" John Verzani I, the copyright holder of this work, hereby publish it under the following licenses: GNU head Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License. CC BY SA This file is licensed under the Creative Commons Attribution ShareAlike 3.0 Unported License. In short: you are free to share and make derivative works of the file under the conditions that you appropriately attribute it, and that you distribute it only under a license compatible with this one. Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 3 / 31

Functions function(x) { sum(x) / length(x) } functions(x) { total <- sum(x) n <- length(x) total / n } my_mean <- function(x) { sum(x) / length(x) } my_mean( c(1,2,3,4) ) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 4 / 31

The sample mean x <- c(51, 52, 48, 83, 92) sort(x) mean(x) devs <- x - mean(x) mean(devs) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 5 / 31

The trimmed mean mean(x, trim=0.10) # trim 10% of both ends mean(exec.pay) mean(exec.pay, trim=0.10) the Macdonell (HistData) data set w <- Macdonell$frequency / sum(macdonell$frequency) y <- Macdonell$height sum(w*y) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 6 / 31

The sample median median(x) n <- length(exec.pay); trim=0.10 lo <- 1 + floor(n * trim) hi <- n + 1 - lo median(sort(exec.pay)[lo:hi]) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 7 / 31

Measures of position x <- 0:5 length(x) mean(sort(x)[3:4]) median(x) quantile(x) quantile(x, 0.25) # 0, 25, 50, 75 100 pecentiles quantile(x, seq(0, 1, by=0.2)) fivenum(x) y = c("a"=95, "B"=82, "C"=74, "D"=61)y Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 8 / 31

Other measures of center table(x) table(x) == max(table(x)) which(table(x) == max(table(x))) as.numeric(names( which(table(x) == max(table(x))))) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 9 / 31

Spread range(x) diff(range(x)) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 10 / 31

Variance and standard deviation var(x) sum( (x - mean(x))^2 ) / (length(x) -1) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 11 / 31

Sample standard deviation x <- c(100, 300, 900, NA, 200, 500, 700) range(x, na.rm=true) sd(x, na.rm=true) z_score <- function(x) (x - mean(x)) / sd(x) z_score(x) scale(x)[,1] # to extract the 1st column z <- (x -mean(x)) / sd(x) x[ z >= 1.28] mean(x) + 1.28 * sd(x) z <- (exec.pay- mean(exec.pay)) / sd(exec.pay) out <- abs(z) > 3 sum(out) / length(z) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 12 / 31

IQR (InterWuartile Range) median(rivers) IQR(rivers) IQR(rivers) / sd(rivers) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 13 / 31

mad (Median Absolute Deviation) mad(rivers) / sd(rivers) ht <- kid.weights$height mad(ht) / sd(ht) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 14 / 31

Shape Symmetry and skew Tails Modes Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 15 / 31

Symmetry and skew skew <- function(x) { n <- length(x) z <- (x - mean(x)) / sd(x) sum(z^3) / n } skew(exec.pay) four_year_hts <- \ kid.weights[kid.weights$age %in% 48:59, "height"] skew(four_year_hts) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 16 / 31

Tail kurtosis <- function(x) { n <- length(x) z <- (x -mean(x)) / sd(x) sum(z^4)/n - 3 } kurtosis(galton$parent) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 17 / 31

Modes unimodal bimodal multimodal Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 18 / 31

Viewing the shape of a data set Dot plots Stem and leaf plots Histograms Density plots Box plots Quantile graphs Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 19 / 31

Dot plots jitter stripchart Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 20 / 31

Stem-and-leaf plots stem stem(bumpers) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 21 / 31

Histogram hist(faithful$waiting) bins <- seq(40, 100, by=5) x <- faithful$waiting out <- cut(x, breaks=bins) head(out) table(out) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 22 / 31

Density plots plot( density(bumpers) ) b_hist <- hist(bumpers, plot=false) b_dens <- density(bumpers) hist(bumpers, probability=true, xlim=range(c(b_hist$breaks, b_dens$x)), ylim=range(c(b_hist$density, b_dens$y)) ) lines(b_dens, lwd=2) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 23 / 31

Standard plotting arguments xlim ylim xlab ylab main pch cex col lwd lty set x coordinate range set y coordinate range set label for x axis set label for y axis set the main title adjust plot symbols (?pch) adjust size of text and symbols adjust color of objects drawn (?colors) adjust width of lines drawn adjust how line is drawn "blank", "solid", "dashed", dotdash" bty adjust bos type ("o", "l", "7", "c", "u", "]" Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 24 / 31

Functions to add layers points lines abkube text mtext add points to a graphic add points connected by lines to a graphic add a line of the form a + bx, y=h, or x=v add text to a graphic add text to margins of a graphic Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 25 / 31

Various plots hist(bumpers, probability=true, xlim = range( c(b_hist$breaks, b_dens$x)), ylim = range( c(b_hist$density, b_dens$y))) lines(b_dens, lwd=2) boxplot(bumpers, horizontal=true, main="bumpers") x <- rep(macdonell$finger, Macdonell$frequency) qqnorm(x) x <- jitter(histdata::galton$child, factor=5) qqnorm(x) Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 26 / 31

OOO Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 27 / 31

OOO Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 28 / 31

OOO Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 29 / 31

OOO Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 30 / 31

OOO Young W. Lim Univariate Data - 2. Numeric Summaries 2018-02-05 Mon 31 / 31