Old Faithful Chris Parrish
|
|
- Kathleen Gray
- 6 years ago
- Views:
Transcription
1 Old Faithful Chris Parrish Contents Old Faithful eruptions 1 data duration waiting time short and long eruptions predicting waiting times Old Faithful eruptions references: - Old Faithful, Wikipedia - Geyser Observation and Study Association data?faithful # in package datasets data <- faithful colnames(data) <- c("duration", "waiting") head(data) ## duration waiting ## ## ## ## ## ## str(data) ## 'data.frame': 272 obs. of 2 variables: ## $ duration: num ## $ waiting : num duration Histogram. Summarize the shape, center, spread, and range of the distribution of durations of Old Faithful eruptions. 1
2 ggplot(data, aes(duration)) + geom_histogram(color = "saddlebrown", fill = "wheat") + labs(x = "Duration (min)", title = "Old Faithful Duration") Old Faithful Duration count Duration (min) data %>% summarize(mean = mean(duration), median = median(duration), sd = sd(duration), n = n()) ## mean median sd n ## Boxplot. What do you see in this boxplot that is not evident in the histogram? ggplot(data, aes(x = 1, y = duration)) + geom_boxplot(color = "saddlebrown", fill = "wheat") + labs(x = "", y = "Duration (min)", title = "Old Faithful Duration") + coord_flip() 2
3 1.4 Old Faithful Duration Duration (min) Do the mean and median coincide? Why? summary(data$duration) ## waiting time Summarize the shape, center, spread, and range of the distribution of waiting times of Old Faithful eruptions. ggplot(data, aes(waiting)) + geom_histogram(color = "saddlebrown", fill = "wheat") + labs(x = "Waiting time (min)", title = "Old Faithful Waiting Time") 3
4 Old Faithful Waiting Time count 1 data %>% summarize(mean = mean(waiting), sd = sd(waiting), n = n()) ## mean sd n ## Waiting time (min) What do you see in this boxplot that is not evident in the histogram? ggplot(data, aes(x = 1, y = waiting)) + geom_boxplot(color = "saddlebrown", fill = "wheat") + labs(x = "", y = "Waiting time (min)", title = "Old Faithful Waiting Time") + coord_flip() 4
5 1.4 Old Faithful Waiting Time Waiting time (min) summary(data$waiting) ## short and long eruptions Both distributions are bimodal. Are short eruptions followed by short waiting times? Tag the observations with short durations. data <- mutate(data, short.duration = duration <= 3) str(data) ## 'data.frame': 272 obs. of 3 variables: ## $ duration : num ## $ waiting : num ## $ short.duration: logi FALSE TRUE FALSE TRUE FALSE TRUE... data %>% group_by(short.duration) %>% summarize(mean = mean(duration), sd = sd(duration), n = n()) ## # A tibble: 2 4 ## short.duration mean sd n ## <lgl> <dbl> <dbl> <int> ## 1 FALSE ## 2 TRUE Did we successfully separate the eruptions with short and long durations? ggplot(data, aes(duration)) + geom_histogram(color = "saddlebrown", fill = "wheat") + facet_grid(short.duration ~.) + labs(x = "Duration (min)", title = "Old Faithful") 5
6 Old Faithful count 1 1 FALSE TRUE Duration (min) Are short duration eruptions followed by short waiting times? Are there many exceptions? ggplot(data, aes(waiting)) + geom_histogram(color = "saddlebrown", fill = "wheat") + facet_grid(short.duration ~.) + labs(x = "Waiting time (min)", title = "Old Faithful") 6
7 Old Faithful count 1 1 FALSE TRUE Waiting time (min) Summarize the overlap in waiting times after eruptions with short and long durations. Create a way to quantify the overlap. How many observations fall into the overlap zone? How many observations were there altogether in the original dataset? waiting.after.short <- data[data$short == TRUE, "waiting"] waiting.after.long <- data[data$short == FALSE, "waiting"] # longest waiting times after short tail(sort(waiting.after.short), ) ## [1] # shortest waiting times after long head(sort(waiting.after.long), ) ## [1] predicting waiting times Prepare some advice for park rangers who answer visitors questions at Yosemite National Park. How long will a visitor have to wait for the next eruption after a short eruption? How much error should we expect in this estimate? summary(waiting.after.short) 7
8 ## sd(waiting.after.short) ## [1] How long will a visitor have to wait for the next eruption after a long eruption? How much error should we expect in this estimate? summary(waiting.after.long) ## sd(waiting.after.long) ## [1] How long will a visitor have to wait for the next eruption if we don t know the length of the last eruption? How much error should we expect in this estimate? summary(data$waiting) ## sd(data$waiting) ## [1]
Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response
Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative
More informationName: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations
Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: Chapter P: Preliminaries Section P.2: Exploring Data Example 1: Think About It! What will it look
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More informationEXPLORATORY DATA ANALYSIS. Introducing the data
EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0
More informationMATH11400 Statistics Homepage
MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which
More informationStatistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationHot springs that erupt intermittently in a column
L A B 1 MODELING OLD FAITHFUL S ERUPTIONS Modeling Data Hot springs that erupt intermittently in a column of steam and hot water are called geysers. Geysers may erupt in regular or irregular intervals
More informationLAB 1: Graphical Descriptions of Data
LAB 1: Graphical Descriptions of Data Part I: Before Class 1) Read this assignment all the way through; 2) Know the terms and understand the concepts of: - scatterplots - stemplots - distributions - histograms
More information1 Simple Linear Regression
Math 158 Jo Hardin R code 1 Simple Linear Regression Consider a dataset from ISLR on credit scores. Because we don t know the sampling mechanism used to collect the data, we are unable to generalize the
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationStat Day 6 Graphs in Minitab
Stat 150 - Day 6 Graphs in Minitab Example 1: Pursuit of Happiness The General Social Survey (GSS) is a large-scale survey conducted in the U.S. every two years. One of the questions asked concerns how
More informationLecture 4: Data Visualization I
Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview
More information8.3 simulating from the fitted model Chris Parrish July 3, 2016
8. simulating from the fitted model Chris Parrish July, 6 Contents speed of light (Simon Newcomb, 88) simulate data, fit the model, and check the coverage of the conf intervals............... model....................................................
More informationMore Numerical and Graphical Summaries using Percentiles. David Gerard
More Numerical and Graphical Summaries using Percentiles David Gerard 2017-09-18 1 Learning Objectives Percentiles Five Number Summary Boxplots to compare distributions. Sections 1.6.5 and 1.6.6 in DBC.
More informationLAB 2: LINEAR MODELING
LAB 2: LINEAR MODELING Objectives: 1. Create linear models from real data. 2. Use interpolation and extrapolation; analyze and evaluate results. 3. Read, analyze and interpret graphs. 4. Find average rates
More informationStatistics for Biologists: Practicals
Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of
More informationRstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang
Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationYour Name: Section: 2. To develop an understanding of the standard deviation as a measure of spread.
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #3 Interpreting the Standard Deviation and Exploring Transformations Objectives: 1. To review stem-and-leaf plots and their
More informationIntroduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives
Introduction to R and R-Studio 2018-19 Toy Program #2 Basic Descriptives Summary The goal of this toy program is to give you a boiler for working with your own excel data. So, I m hoping you ll try!. In
More informationThe Average and SD in R
The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the
More informationplots Chris Parrish August 20, 2015
plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data
More informationCS302 Topic: Simulation, Part II
CS302 Topic: Simulation, Part II Thursday, Oct. 6, 200 Gone out Here is a computer simulation of your dinner Announcements Lab 4 (Stock Reports); due this Friday, Oct. Lab (Algorithm Analysis) now available;
More informationFacets and Continuous graphs
Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
More informationGLM Poisson Chris Parrish August 18, 2016
GLM Poisson Chris Parrish August 18, 2016 Contents 3. Introduction to the generalized linear model (GLM) 1 3.3. Poisson GLM in R and WinBUGS for modeling time series of counts 1 3.3.1. Generation and analysis
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationHypothesis Test Exercises from Class, Oct. 12, 2018
Hypothesis Test Exercises from Class, Oct. 12, 218 Question 1: Is there a difference in mean sepal length between virsacolor irises and setosa ones? Worked on by Victoria BienAime and Pearl Park Null Hypothesis:
More informationNo Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.
No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.
More informationThomas Vincent Head of Data Science, Getty Images
VISUALIZING TIME SERIES DATA IN PYTHON Clean your time series data Thomas Vincent Head of Data Science, Getty Images The CO2 level time series A snippet of the weekly measurements of CO2 levels at the
More informationChapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram
More informationEric Pitman Summer Workshop in Computational Science
Eric Pitman Summer Workshop in Computational Science 2. Data Structures: Vectors and Data Frames Jeanette Sperhac Data Objects in R These objects, composed of multiple atomic data elements, are the bread
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationCS302 Topics: * More on Simulation and Priority Queues * Random Number Generation
CS302 Topics: * More on Simulation and Priority Queues * Random Number Generation Thursday, Oct. 5, 2006 Greetings from Salvador, Brazil!! (i.e., from my recent trip to present a plenary lecture on multi-robot
More informationElementary Statistics. Chapter 2 Review: Summarizing & Graphing Data
Name Elementary Statistics Date Period Chapter 2 Review: Summarizing & Graphing Data Quick Quiz p.74 #1-10 Use the following information to answer questions 1-3: When one is constructing a table representing
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationNonparametric Density Estimation
Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationK-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017
K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold
More informationLecture Notes 3: Data summarization
Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &
More informationPre-Calculus Multiple Choice Questions - Chapter S2
1 Which of the following is NOT part of a univariate EDA? a Shape b Center c Dispersion d Distribution Pre-Calculus Multiple Choice Questions - Chapter S2 2 Which of the following is NOT an acceptable
More informationThe diamonds dataset Visualizing data in R with ggplot2
Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part
More informationCS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017
CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017 Announcement Read book for R control structure and function.
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationR package
R package www.r-project.org Download choose the R version for your OS install R for the first time Download R 3 run R MAGDA MIELCZAREK 2 help help( nameofthefunction )? nameofthefunction args(nameofthefunction)
More informationBox Plots. OpenStax College
Connexions module: m46920 1 Box Plots OpenStax College This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License 3.0 Box plots (also called box-and-whisker
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationINTRODUCTION TO DATA. Welcome to the course!
INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationPackage cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features
Type Package Version 0.0.2 Title Encode Categorical Features Package cattonum May 2, 2018 Functions for dummy encoding, frequency encoding, label encoding, leave-one-out encoding, mean encoding, median
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationTopic (3) SUMMARIZING DATA - TABLES AND GRAPHICS
Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS 3- Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS A) Frequency Distributions For Samples Defn: A FREQUENCY DISTRIBUTION is a tabular or graphical display
More informationk-nn classification with R QMMA
k-nn classification with R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l1-knn-eng.html#(1) 1/16 HW (Height and weight) of adults Statistics
More informationMath 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots
Math 167 Pre-Statistics Chapter 4 Summarizing Data Numerically Section 3 Boxplots Objectives 1. Find quartiles of some data. 2. Find the interquartile range of some data. 3. Construct a boxplot to describe
More informationSection 3.1 Shapes of Distributions MDM4U Jensen
Section 3.1 Shapes of Distributions MDM4U Jensen Part 1: Histogram Review Example 1: Earthquakes are measured on a scale known as the Richter Scale. There data are a sample of earthquake magnitudes in
More informationfile:///users/williams03/a/workshops/2015.march/final/intro_to_r.html
Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and
More informationChapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd
Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,
More informationPROFOX ASSOCIATES, INC. Oximetry: Comprehensive Report Comments: Overnight study breathing room air.
Oximetry: Comprehensive Report Recording time: 05:25:40 Highest pulse: 86 Highest SpO2: 97% Excluded sampling: 00:01:52 Lowest pulse: 52 Lowest SpO2: 72% Total valid sampling: 05:23:48 Mean pulse: 62 Mean
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationOrange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)
Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated
More informationIntroduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio
Introduction to R and R-Studio 2018-19 Toy Program #1 R Essentials This illustration Assumes that You Have Installed R and R-Studio If you have not already installed R and RStudio, please see: Windows
More informationData Import and Export
Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationExploratory Data Analysis September 8, 2010
Exploratory Data Analysis p. 1/2 Exploratory Data Analysis September 8, 2010 Exploratory Data Analysis p. 2/2 Scatter Plots plot(x,y) plot(y x) Note use of model formula Today: how to add lines/smoothed
More informationSession 3 Nick Hathaway;
Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................
More informationAP Statistics. Study Guide
Measuring Relative Standing Standardized Values and z-scores AP Statistics Percentiles Rank the data lowest to highest. Counting up from the lowest value to the select data point we discover the percentile
More informationSTA130 - Class #2: Nathan Taback
STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and
More informationVCEasy VISUAL FURTHER MATHS. Overview
VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationAP Statistics Prerequisite Packet
Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these
More informationNEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age
NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive
More informationSection 2.3: Simple Linear Regression: Predictions and Inference
Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple
More information1 Building a simple data package for R. 2 Data files. 2.1 bmd data
1 Building a simple data package for R Suppose that we wish to make a package containing data sets only available in-house or on CRAN. This is often done for the data sets in the examples and exercises
More informationMath 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency
Math 14 Introductory Statistics Summer 008 6-9-08 Class Notes Sections 3, 33 3: 1-1 odd 33: 7-13, 35-39 Measures of Central Tendency odd Notation: Let N be the size of the population, n the size of the
More informationChapter 7. The Data Frame
Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.
More informationExploratory Data Analysis - Part 2 September 8, 2005
Exploratory Data Analysis - Part 2 September 8, 2005 Exploratory Data Analysis - Part 2 p. 1/20 Trellis Plots Trellis plots (S-Plus) and Lattice plots in R also create layouts for multiple plots. A trellis
More informationLab #7 - More on Regression in R Econ 224 September 18th, 2018
Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas
More informationCenter, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like.
Center, Shape, & Spread Center, shape, and spread are all words that describe what a particular graph looks like. Center When we talk about center, shape, or spread, we are talking about the distribution
More informationData visualization with ggplot2
Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2
More informationPage 1. Graphical and Numerical Statistics
TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise
More informationUnivariate Statistics Summary
Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationBasic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)
Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-3_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.
More informationSession 1 Nick Hathaway;
Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................
More informationWhile not exactly the same, these definitions highlight four key elements of statistics.
What Is Statistics? Some Definitions of Statistics This is a book primarily about statistics, but what exactly is statistics? In other words, what is this book about? 1 Here are some definitions of statistics
More informationPackage ANOVAreplication
Type Package Version 1.1.2 Package ANOVAreplication September 30, 2017 Title Test ANOVA Replications by Means of the Prior Predictive p- Author M. A. J. Zondervan-Zwijnenburg Maintainer M. A. J. Zondervan-Zwijnenburg
More informationChapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.
Chapter 3 Descriptive Measures Slide 3-2 Section 3.1 Measures of Center Slide 3-3 Definition 3.1 Mean of a Data Set The mean of a data set is the sum of the observations divided by the number of observations.
More informationXGBoost: The Art and Science of Communicating Machine Learning Algorithms. Amy Szadziewska, Peak 6 th February 2018
XGBoost: The Art and Science of Communicating Machine Learning Algorithms Amy Szadziewska, Peak 6 th February 2018 Overview Why decision trees are interpretable but not good at predicting Why XGBoost is
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More information03 - Intro to graphics (with ggplot2)
3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................
More informationINTRODUCTION TO R. Basic Graphics
INTRODUCTION TO R Basic Graphics Graphics in R Create plots with code Replication and modification easy Reproducibility! graphics package ggplot2, ggvis, lattice graphics package Many functions plot()
More informationChapter 5: The beast of bias
Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared
More informationChapter 5: The normal model
Chapter 5: The normal model Objective (1) Learn how rescaling a distribution affects its summary statistics. (2) Understand the concept of normal model. (3) Learn how to analyze distributions using the
More informationThe main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?
Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when
More informationPackage ecotox. June 28, 2018
Type Package Title Analysis of Ecotoxicology Version 1.3.2 Package ecotox June 28, 2018 Description A simple approach to using a probit or logit analysis to calculate lethal concentration (LC) or time
More information