Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2

Similar documents
R Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1

Introduction to R Software

Regression Models Course Project Vincent MARIN 28 juillet 2016

Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

DAY 52 BOX-AND-WHISKER

Statistics 251: Statistical Methods

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Lab #7 - More on Regression in R Econ 224 September 18th, 2018

Special Review Section. Copyright 2014 Pearson Education, Inc.

Chapter 3 - Displaying and Summarizing Quantitative Data

Bar Charts and Frequency Distributions

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Will Landau. January 24, 2013

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Chapter 1. Looking at Data-Distribution

STA 570 Spring Lecture 5 Tuesday, Feb 1

Chapter 7. The Data Frame

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 6: DESCRIPTIVE STATISTICS

AND NUMERICAL SUMMARIES. Chapter 2

Intro to R for Epidemiologists

8 Organizing and Displaying

What are we working with? Data Abstractions. Week 4 Lecture A IAT 814 Lyn Bartram

Table of Contents (As covered from textbook)

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Box and Whisker Plot Review A Five Number Summary. October 16, Box and Whisker Lesson.notebook. Oct 14 5:21 PM. Oct 14 5:21 PM.

Name Geometry Intro to Stats. Find the mean, median, and mode of the data set. 1. 1,6,3,9,6,8,4,4,4. Mean = Median = Mode = 2.

Univariate descriptives

Introduction to R: Day 2 September 20, 2017

Univariate Statistics Summary

3.3 The Five-Number Summary Boxplots

1 Overview of Statistics; Essential Vocabulary

Chuck Cartledge, PhD. 20 January 2018

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Section 9: One Variable Statistics

Exercise 1: Introduction to Stata

Install RStudio from - use the standard installation.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

AP Statistics Summer Assignment:

2.1: Frequency Distributions and Their Graphs

Visual Analytics. Visualizing multivariate data:

Math 227 EXCEL / MEGASTAT Guide

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

Middle Years Data Analysis Display Methods

Lecture 6: Chapter 6 Summary

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2: Understanding Data Distributions with Tables and Graphs

UNIT 1A EXPLORING UNIVARIATE DATA

AP Statistics Prerequisite Packet

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Name Date Types of Graphs and Creating Graphs Notes

Statistics Lecture 6. Looking at data one variable

Stat 241 Review Problems

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Measures of Central Tendency:

Downloaded from

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Lecture Series on Statistics -HSTC. Frequency Graphs " Dr. Bijaya Bhusan Nanda, Ph. D. (Stat.)

WHOLE NUMBER AND DECIMAL OPERATIONS

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Az R adatelemzési nyelv

Mean,Median, Mode Teacher Twins 2015

Making Science Graphs and Interpreting Data

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

The Tidyverse BIOF 339 9/25/2018

Notes based on: Data Mining for Business Intelligence

More Numerical and Graphical Summaries using Percentiles. David Gerard

Sections 2.3 and 2.4

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

Math 167 Pre-Statistics. Chapter 4 Summarizing Data Numerically Section 3 Boxplots

Dr. V. Alhanaqtah. Econometrics. Graded assignment

LSP 121. LSP 121 Math and Tech Literacy II. Topics. Quartiles. Intro to Statistics. More Descriptive Statistics

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Chapter 2 Exploring Data with Graphs and Numerical Summaries

SPSS. (Statistical Packages for the Social Sciences)

Descriptive Statistics By

STAT:5400 Computing in Statistics

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Week 2: Frequency distributions

Lecture 1: Exploratory data analysis

DATA PREPROCESSING. Pronalaženje skrivenog znanja Bojan Furlan

Regression III: Advanced Methods

Organizing and Summarizing Data

Measures of Dispersion

getting started in R

MATH NATION SECTION 9 H.M.H. RESOURCES

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Numerical Summaries of Data Section 14.3

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

LESSON 3: CENTRAL TENDENCY

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Transcription:

Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA Spring 2017 Spring 2017 CS130 - Intro to R 2 Goals for this lecture: Review constructing Data Frame, Categorizing variables Construct basic graph, learn how to recode variables Load built-in mtcars example Data Frame Recoding Variables Learn how to present data graphically in a bar graph, pie chart, histogram, or box plot. Learn how to load data from a file into a Data Frame 1

Spring 2017 CS130 - Intro to R 3 Review: Problem For the given CS130 class information, create a data frame, cs130dataframe.r that contains the following data ID Year Age 0001 FR 18 0002 FR 18 0003 SR 22 0004 JR 22 0005 SO 19 0006 FR 19 0007 SR 23 0008 SO 19 0009 SR 22 Spring 2017 CS130 - Intro to R 4 Review: Continued Using the command str(cs130dataframe), classify each variable ID, Year, Age as: quantitative or qualitative discrete, continuous, neither nominal, ordinal, neither Variable Quantitative or Qualitative? Discrete, continuous, neither? Nominal, ordinal, neither? ID Year Age 2

Spring 2017 CS130 - Intro to R 5 Table function, barplots The table function will return a vector of table counts For instance classname = table(cs130dataframe$year) will return a count of the number of FR, SO, JR, and SR s: To graph this table, enter barplot(classname) Problems? Spring 2017 CS130 - Intro to R 6 Barplot: Categorical Data Need to create categorical (factor) data so desired order is maintained: > classnamecategorical= factor(cs130dataframe$year, levels=c("fr","so","jr","sr"), labels=c("fr","so","jr","sr")) Note: levels parameter is our desired order labels parameter is optional (defaults to levels) > Barplot(table(classNameCategorical)) //Order now correct 3

Spring 2017 CS130 - Intro to R 7 mtcars Data Frame R has a built-in data frame called mtcars: Data was extracted from the 1974 Motor Trend US magazine, and 11 aspects of automobile design and performance for 32 automobiles(1973 74 models). [1] mpg Miles/(US) gallon [2] cyl Number of cylinders [3] disp Displacement (cu.in.) [4] hp Gross horsepower [5] drat Rear axle ratio [6] wt Weight (1000 lbs) [7] qsec 1/4 mile time [8] vs V/S (vshape or straight line engine) [9] am Transmission (0 = automatic, 1 = manual) [10] gear Number of forward gears [11] carb Number of carburetors Spring 2017 CS130 - Intro to R 8 R: Useful functions Copy mtcars to tempmtcars to protect mtcars data > tempmtcars = mtcars Useful R functions length(object) # number of variables str(object) # structure of an object class(object) # class or type of an object names(object) # names dim(object) # number of observations and variables In the console, call each function using tempmtcars as the object 4

Spring 2017 CS130 - Intro to R 9 Recoding Variables Recode am variable as categorical variable amcategorical: 1. Default method with no parameters: >tempmtcars$amcategorical = as.factor (tempmtcars$am) 2. Method that allows levels and labels > tempmtcars$amlabels = factor(tempmtcars$am, levels=c('0','1'),labels=c("auto","manual")) > tempmtcars$amordered = factor(tempmtcars$am, levels=c('1, 0 ), labels=c("manual, auto ),ordered=true) Spring 2017 CS130 - Intro to R 10 Graphing Recoded Variables > barplot(table(tempmtcars$amcategorical) > barplot(table(tempmtcars$amordered)) > barplot(table(tempmtcars$amlabels)) 5

Spring 2017 CS130 - Intro to R 11 Bar Chart http://statmethods.net/graphs/bar.html A bar chart or bar graph is a chart that presents grouped data with rectangular bars with lengths proportional to the values that they represent. function table returns a vector of frequency data barplot(table(tempmtcars$amcategorical), main = "Car Data", xlab = "Transmission") Spring 2017 CS130 - Intro to R 12 Recoding Variables Create a new variable mpgclass where mpg<=25 is low, mpg>25 is high > tempmtcars$mpgclass[tempmtcars$mpg <= 25] = "low" > tempmtcars$mpgclass[tempmtcars$mpg > 25] = "high" > tempmtcars$mpgclass [1] "low" "low" "low" "low" "low" "low" "low" "low" [9] "low" "low" "low" "low" "low" "low" "low" "low" [17] "low" "high" "high" "high" "low" "low" "low "low" [25] "low" "high" "high" "high" "low" "low" "low" "low > typeof(tempmtcars$mpgclass) [1] "character" > barplot(table(tempmtcars$mpgclass), main = "Car Data", xlab="mpg") 6

Spring 2017 CS130 - Intro to R 13 Bar Chart > barplot (table(tempmtcars$cyl), main = "Car Distribution", xlab = "Number of Cylinders", col = c("darkblue", "green", "red"), names.arg = c("4 Cylinder", "6 Cylinder", "8 Cylinder")) Spring 2017 CS130 - Intro to R 14 Pie Chart http://statmethods.net/graphs/pie.html A pie chart is a circular graphical representation of data that illustrates a numerical proportion A pie chart gives a better visualization of the frequency of occurrence as a percent > pie(table (tempmtcars$cyl), labels = c("4 Cylinder", "6 Cylinder", "8 Cylinder"), main=" Car Distribution") 7

Spring 2017 CS130 - Intro to R 15 Histogram http://statmethods.net/graphs/density.html A histogram is a graphical representation of the distribution of numerical data Bin are adjacent intervals usually of equal size Notice: breaks <> number of bins and breaks is just a suggestion and not guaranteed >hist (tempmtcars$mpg) > hist (tempmtcars$mpg, breaks=10, col= blue ) Can also use main, xlab, ylab attributes as with barchar Spring 2017 CS130 - Intro to R 16 Boxplots http://statmethods.net/graphs/boxplot.html A boxplot is a way of graphically showing numerical data through quartiles A box-and-whisker plot is a boxplot that shows variability outside the upper and lower quartiles Quartile the three points that divide the ranked data values into 4 equal sized groups 8

Spring 2017 CS130 - Intro to R 17 Quartile Definitions first quartile/lower quartile/25th percentile/ Q 1 splits off the lowest 25% of data from the highest 75% second quartile /median/50th percentile / Q 2 cuts data set in half third quartile/upper quartile/75th percentile / Q 3 splits off the highest 25% of data from the lowest 75% interquartile range / IQR IQR = Q 3 - Q 1 Spring 2017 CS130 - Intro to R 19 Problem Continued Using R, show the box-and-whisker plot and quantiles for 6, 7, 19, 20, 42, 100, 200 6, 7, 20, 100, 200 9

Spring 2017 CS130 - Intro to R 20 Paint Problem Let s put everything together A paint manufacturer tested two experimental brands of paint over a period of months to determine how long they would last without fading. Here are the results: BrandA BrandB Report on the following 10 25 -Mean 20 35 -Median 60 40 40 45 -Std Deviation 50 35 -Minimum 30 30 -Maximum Spring 2017 CS130 - Intro to R 21 Paint Problem 1. Using Rstudio, create an R script on your desktop called paintdataframe.r that creates a data frame paintdata for the paint data. 1. Name the variables brandapaint and brandbpaint 2. Enter the data 3. Output the data frame 4. Save and run the script. Show me. 10

Spring 2017 CS130 - Intro to R 22 Paint Problem Continued 5. Compute and output the mean, median, std deviation, minimum, and maximum for each brand of paint [1] "Brand A Mean = 35" [1] "Brand A Median = 35" [1] "Brand A Std Dev = 18.7082869338697" [1] "Brand A Minimum = 10" [1] "Brand A Maximum = 60" [1] "" [1] "Brand B Mean = 35" [1] "Brand B Median = 35" [1] "Brand B Std Dev = 7.07106781186548" [1] "Brand B Minimum = 25" [1] "Brand B Maximum = 45" Spring 2017 CS130 - Intro to R 23 Paint Problem Continued 6. Output a Box-and-Whisker Plot for each brand of paint as follows. Get as close as possible. This isn t easy but give it a try. 7. What do the descriptive statistics tell us? 8. Which paint would you buy? Justify your answer. 11