Basics of Plotting Data

Similar documents
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington

An Introduction to R Graphics

Homework 1 Excel Basics

Dr. V. Alhanaqtah. Econometrics. Graded assignment

Lab 1: Introduction to data

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

Chapter 6: DESCRIPTIVE STATISTICS

Statistics Lecture 6. Looking at data one variable

Statistics 251: Statistical Methods

36-402/608 HW #1 Solutions 1/21/2010

Making plots in R [things I wish someone told me when I started grad school]

An introduction to WS 2015/2016

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Lab 2: Introduction to data

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Visualizing the World

Tips and Guidance for Analyzing Data. Executive Summary

Practical 2: Plotting

Scatterplot: The Bridge from Correlation to Regression

Lab 2: Introduction to data

Data Visualization. Andrew Jaffe Instructor

STA 570 Spring Lecture 5 Tuesday, Feb 1

Brief Guide on Using SPSS 10.0

An introduction to ggplot: An implementation of the grammar of graphics in R

How to Make Graphs in EXCEL

CIND123 Module 6.2 Screen Capture

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Introduction to Minitab 1

Revision Topic 11: Straight Line Graphs

Exploratory Data Analysis

Preparing for Data Analysis

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

/4 Directions: Graph the functions, then answer the following question.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Data Management Project Using Software to Carry Out Data Analysis Tasks

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

No. of blue jelly beans No. of bags

Correlation. January 12, 2019

Install RStudio from - use the standard installation.

Page 1. Graphical and Numerical Statistics

Statistical Programming with R

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Data Visualization in R

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Module 10. Data Visualization. Andrew Jaffe Instructor

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Statistical Programming Camp: An Introduction to R

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

1 Introduction to Using Excel Spreadsheets

Lab 1: Introduction to Data

Basic Medical Statistics Course

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

WELCOME! Lecture 3 Thommy Perlinger

Solution to Tumor growth in mice

Plotting Complex Figures Using R. Simon Andrews v

Excel Tips and FAQs - MS 2010

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Bar Charts and Frequency Distributions

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Preparing for Data Analysis

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21.

Getting Started with JMP at ISU

Table of Contents (As covered from textbook)

University of California, Los Angeles Department of Statistics

Data Visualization in R

Lecture Notes 3: Data summarization

This document is designed to get you started with using R

Lecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression

Graphics #1. R Graphics Fundamentals & Scatter Plots

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Chapter2 Description of samples and populations. 2.1 Introduction.

Course on Microarray Gene Expression Analysis

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

CSC 1315! Data Science

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

University of California, Los Angeles Department of Statistics

Statistical transformations

Introduction to R: Day 2 September 20, 2017

Visual Analytics. Visualizing multivariate data:

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night

Chapter 5: The beast of bias

Key Strokes To make a histogram or box-and-whisker plot: (Using canned program in TI)

Using Excel for Graphical Analysis of Data

Chuck Cartledge, PhD. 20 January 2018

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

An Introductory Guide to R

AA BB CC DD EE. Introduction to Graphics in R

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Scientific Python: matplotlib

Creating Graphs Using SAS ODS Graphics Designer

Types of Plotting Functions. Managing graphics devices. Further High-level Plotting Functions. The plot() Function

Running Minitab for the first time on your PC

Transcription:

Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based output (similar to illustrator), which produces very clean publication quality graphs. In addition, the graphs are fully customizeable, which gives R very powerful graphing capabilities (see http://addictedtor.free.fr/graphiques/ for some examples). This section will cover how to create and save simple graphs (e.g., histogram, scatterplot, barplot, and boxplots) as well as outline some of the basic user options to customize the graphs. 1 The 4 basic types of graphs For this section will be using a subset of the 2009 sample adult health questionnaire dataset from the Centers for Disease Control. The first step is to load the data. > data<-read.table(paste(website,"samadult_mh.csv",sep=""),sep=",", header=true) 1.1 Histograms Histograms provide a useful way to examine the distribution of the data. We use the hist function to plot the distribution of weight in America. > hist(data$aweightp) 1

Histogram of data$aweightp Frequency 0 500 1000 1500 2000 2500 100 150 200 250 300 data$aweightp Figure 1: Histogram of the weight distribution of Americans 2

As you can see creating a histogram of a variable is very simple. Later in this section we will discuss how to customize the graph by labeling the axes, changing the colors, and adding a title. 1.2 Scatterplots Often it is useful to examine the relationship between two variables using a scatterplot. We can use the generic plot function in R to examine the relationship between weight and body mass. There are two different ways to use the function. The first way is to specify the x and y variables in the graph. > plot(data$aweightp,data$bmi) The second way is to specify the linear equation Y = X. > plot(bmi~aweightp,data=data) Both methods produce the same plot. 1.3 Barplots Barplots are a useful way to display statistical summaries of several different variables at once. This can be done using the barplot function in R. Using this method requires first summarizing your data (e.g., mean, median, or mode) and then using the summary values as input into barplot. 1.3.1 Plotting Several Variables To plot the average of several different variables, one simply has to take the mean of the variables. For example, to plot the average response to the mental health questions, we first need to calculate the mean of the data. To calculate the mean it is important to tell R to remove the missing values with na.rm=true. We can then assign these values to a new variable. > mh.data<-mean(data[10:14],na.rm=true) These questions are on a 5 point ordinal scale with 5 being I have never experienced it. To make the graphs more readable, it might help to reverse code the data by subtracting it from 5. 3

100 150 200 250 300 10 20 30 40 50 60 70 80 data$aweightp data$bmi Figure 2: Scatterplot of the relationship between weight and BMI 4

0.0 0.1 0.2 0.3 0.4 0.5 0.6 NERVOUS RESTLESS HOPELESS EFFORT WORTHLS Figure 3: Barplot depicting the average response to the mental health questions > mh.data<-5-mh.data We then plot the summarized data using the barplot function. > barplot(mh.data) 1.3.2 Plotting Levels of a Factor Sometimes you may be interested in creating barplots for each level of a factor when your data is in the long format. For example, to plot the average weight split by sex, we have to calculate the mean weight for each level of sex. To do this we can use the subset function to select the data for each sex. For example, to select the males we tell R to grab the height of all the rows containing males using the logical operation data$sex==1 (i.e., subset(data$aheight,data$sex==1). We can then take the average of these rows making sure that we exclude the missing values (i.e., mean(data,na.rm=true)). Finally, we can combine the mean of the average height of males and females into a single array using the c() function. When plotting this 5

0 10 20 30 40 50 60 Male Female Figure 4: Barplot depicting the average weight of males and females data with the barplot function, we will also need to label the categories with the names.arg=c("male","female") specifier. > barplot(c(mean(subset(data$aheight,data$sex==1),na.rm=true), mean(subset(data$aheight,data$sex==0),na.rm=true)), names.arg=c("male","female")) Unfortunately, for whatever reason it is not straightforward to easily add error bars to these barplots. However, there are many alternative methods to create barplots with errorbars using the barplot2 function from gplots package or ggplot2, which will be discussed in more detail in the advanced graphics section. 6

1.4 Boxplots Another useful way to graphically summarize several different variables is the boxplot function. Boxplots depict the median, the upper 75 and lower 25 quartiles and any outliers. They are useful for displaying the distribution of a variable. Similar to barplots, boxplots can plot several different variables, or different levels of a factor. 1.4.1 Plotting Several Variables Here we will illustrate how to plot the same data as the barplot example using the responses to the mental health questions. Don t forget that we are reverse coding the responses. > boxplot(5-data[10:14]) As you can see, most respondents reported never experiencing any symptoms of mental health problems in the past 30 days. The graph reveals that the median is 0 and the data is highly positively skewed. 1.4.2 Plotting Levels of a Factor Graphically examining the distribution of a variable at different levels of a factor can easily be accomplished using the boxplot function. We simply need to indicate the Y variable and the X factor, here being height and sex respectively. > boxplot(aheight~sex,data=data,names=c("female","male")) 1.5 Saving Graphs By default R will print the graphics to your screen. To print them to a file, you must first tell R to direct the graphic output to a device. There are several different devices including: pdf and png. It is easy to create publication quality pdfs of your figure using the pdf() command. Alternatively, you could also write to other devices such as png(). > pdf("filename.pdf") > dev.off() 7

NERVOUS HOPELESS WORTHLS 0 1 2 3 4 Figure 5: Boxplot depicting the distribution of responses on the mental health questions 8

60 65 70 75 Female Male Figure 6: Boxplot depicting the different distributions of the weight of males and females 9

pdf 2 After you are done creating your plot, it is necessary to close the device using the dev.off() command. For example, to save the scatterplot we created earlier: > pdf("fig4_weightxbmi.pdf") > plot(bmi ~AWEIGHTP,data=data) > dev.off() pdf 2 1.6 Customizing Graphs Graphs can be customized with relative ease in R using the user options. There are a number of different points, colors, and sizes that can be used in your graphs. 1.6.1 Points This example shows all 25 symbols that you can use to produce points in your graphs (pch=?). It is easy to vary the size using the cex=? specifier. It is also easy to vary the color using the col=? specifier. 1.6.2 Customizing the Scatterplot In this section will demonstrate how to customize the appearance of the scatterplot that we created earlier. We will change the points to filled in circles (pch=16) and make them transparent to better reflect the density of the data (col=rgb(0, 0, 0,.007)). We will also make the points slightly bigger (cex=4). We will plot the sex by weight interaction by adding separate regression lines for males and females. We will split the data file into two new files by sex and then calculate a linear regression to find the intercept and slope parameters for each sex. These parameters can then be added on to the graph using the abline function. Line types can be manipulated with the lty=? specifer. We will also add a legend to indicate the color of each sex. 10

0 1 2 3 4 5 6 7 0 1 2 3 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 2 3 4 5 Figure 7: Variations of point styles 11

Finally, It is easy to relabel the axes (xlab="weight (lbs)") and the size of the text (cex.lab=1.5) as well as add a title (main="interaction Plot"). > plot(bmi~aweightp, data=data,col=rgb(0, 0, 0,.007), pch=16,cex=4, ylab="body Mass Index (BMI)",xlab="Weight (lbs)",cex.lab=1.25, main="interaction between sex and weight in predicting BMI") > m<-subset(data,data$sex==1) > f<-subset(data,data$sex==0) > mline<-lm(bmi~aweightp,data=m) > fline<-lm(bmi~aweightp,data=f) > abline(mline,col="blue",lwd=6,lty=2) > abline(fline,col="red",lwd=6,lty=3) > legend("topright", c("female","male"), pch=15, col=c("red","blue"), title="sex", inset =.02) 12

80 Interaction between sex and weight in predicting BMI Sex 60 50 40 30 10 20 Body Mass Index (BMI) 70 Female Male 100 150 200 250 300 Weight (lbs) Figure 8: Scatterplot depicting degree of association between weight and BMI 13