Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Size: px
Start display at page:

Download "Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018"

Transcription

1 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory graphics are usually made very quickly in the process of checking for errors, outliers, distribution, and correlations of variables. The goal of making graphs is usually developing a personal understanding of the data and to prioritize tasks for follow up analysis. Grammar of ggplot2 ggplot2, one other important package of tidyverse, is designed for data visualization of data frames. gg of the name represents grammar of graphic, and ggplot2 has been recognized as one of three main graphic system of R. The most important thing to get used to with ggplot2 is the logical structure of plots. The code you write specifies the connections between the variables in your data, and the x and y location, colors, size, shapes etc that you can see on the screen. In ggplot2, these logical connections between your data and the plot elements are called aesthetic mappings or just aesthetics. You begin every plot by telling the ggplot() function what your data is, and then how the variables in this data logically map onto the plot s aesthetic mapping. Then you take the result and say what general sort of plot you want, such as a scatterplot, a boxplot, or a bar chart. In ggplot2, the overall type of plot is called a geom. Each geom has a function that creates it and the function s name follows the pattern of geom_... (). For example, geom_point() makes scatterplots, geom_bar() makes bar plots, geom_boxplot() makes boxplots, and so on. You combine these two pieces, the ggplot(data, mapping) object and the geom_... (), by literally adding them together in an expression, using the + symbol. Data, mapping (or aesthetics), and geometry (geom) are three mandatory components for ggplot2. As other functions, the output of ggplot2 can be assigned to an object for further editing. Other optional ggplot2 grammar components will be introduced in Lab for figure customization. 1

2 A little too complex? Don t worry; you will get familiar with the grammar system very soon. In this lab, we will use this ggplot2 syntax to plot the following exploratory graphics: histogram (density plot), boxplot, scatterplot, and scatterplot matrix. Data preparation For this exercise, use a weather station dataset AB_Stations.csv that you can download from the course website. The first three columns specify the weather station ID, as well as the ecosystems and the biome of Alberta in which the weather station is located. This is followed by a number of climate variables that you can use for exploration (=mean annual temp, MWMT= mean warmest month temp, MCMT=mean coldest month temp, MAP=mean annual precipitation, MSP=mean summer precipitation, =an index). Load required packages. #install.package('tidyverse') library(tidyverse) # if no tidyverse package installed Import the dataset with the code below, and use head(), tail(), str()or View() functions to check the imported data table. dat1 <- read.csv("e:\\lab3\\ab_stations.csv") head(dat1, 1) ## STATION ECOSYS MWMT MCMT MAP MSP ## G-NF ## G-DMG ## G-MG ## G-DMG ## G-NF ## B-AP ## B-UBH ## M-M ## B-CP ## B-KU Hisograms One useful plot type for exploration of raw data is histograms. They are commonly used to visually check the distribution of continuous variables. The geom of histogram is geom_histogram(). For histograms the y axis is ing the number of observations in each bin (default of ggplot2), but y can also be set as density. According to the ggplot2 syntax, we can execute the following command to get a histogram for a variable, in this case the variable : hist_a <- ggplot(dat1, aes(x = )) + geom_histogram(color = 'gray9') hist_a ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. 2

3 1 1 1 ggplot2 chooses the bin width by default when generating histograms, but chances are that bin width is not the most appropriate one for any histogram you may want to make. It is therefore critical to change bins to verify whether the resulting histogram reflects the data accurately. Too many bins makes histograms overly peaky and losses the whole picture of distribution, while too few bins cover limited details of the distribution. The following two methods to change bins: 1. First method: set the number of bins you want for the histogram; ggplot(dat1, aes(x = )) + geom_histogram(bins =, color = 'gray9') # bins ggplot(dat1, aes(x = )) + geom_histogram(bins = 2, color = 'gray9')#2 bin ggplot(dat1, aes(x = )) + geom_histogram(bins =, color = 'gray9')# bin Second method: set the width of bins: ggplot(dat1, aes(x = )) + geom_histogram(binwidth = 1, color = 'gray9') ggplot(dat1, aes(x = )) + geom_histogram(binwidth =, color = 'gray9') ggplot(dat1, aes(x = )) + geom_histogram(binwidth = 1, color = 'gray9') Great to visually check the effectiveness of data transformations. In this case, the square-root transformation achieves approximately a normal distribution. 3

4 hist_b <- ggplot(dat1, aes(x = sqrt())) + geom_histogram(color = 'gray9') hist_b ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth` sqrt() You can also fill colors of the bins by group/class. In many scenarios we have multiple distributions we would like to visualize simultaneously. For example, were the biomes having similar dryness situation? One commonly employed visualization strategy is stacking bars on top of each other and filling histogram in different colors for groups; hist_c <- ggplot(dat1, aes(x =, fill = )) + geom_histogram() Although ing numbers is used as y axis by default, you can change y axis as density. Given uneven sample size for each group/class, density histograms may show inconsistent pattern with the frequency ones. hist_d <- ggplot(dat1, aes(x =, fill = )) + geom_histogram(aes(y =..density..))#specify y as density ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. hist_c hist_d density One biggest disadvantage of the stacked histogram is hard to quantify each group; for example, how many samples of have values around 4? About 1 or 8? It s not so clear to compare distributions among groups. To solve this, one way is to change the positions of bins. One common way is dodging which preserves the vertical position of a geom while adjusting the horizontal position. hist_e <- ggplot(dat1, aes(x =, fill = )) + geom_histogram(position = 'dodge') #change bin positions hist_e ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. 4

5 Histograms have been a popular visualization option since at least the 18th century, in part because they are easily generated by hand. More recently, as extensive computing power has become popularized in everyday devices such as laptops and cell phones, we see them increasingly being replaced by density plots. In a density plot, we attempt to visualize the underlying probability distribution of the data by drawing an appropriate continuous curve; hist_f <- ggplot(dat1, aes(x =, fill = )) + geom_density(alpha =.4) #introduce transparency hist_f density Similarly, we fill density curve with different colors. The alpha argument is used to introduce transparency of the color, and alpha value in the range of (totally transparent) to 1 (no transparent). Also, try to add multiple geom: hist_f + geom_histogram(aes(y =..density..), alpha =.6, position = 'dodge') ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`.

6 6 density Scatter plots With scatter plots you can visually check the relationships among variables. Are they linear or curvilinear? Outliers are also easily visible. Now, try to use a scatter plot to visually check the relationships among variables and to identify outliers. To check the relationships between Mean Summer Precipitation (MSP) and Mean Annual Precipitation (MAP), we can use them as x, y in the plot respectively (normally, y axis is for dependent variable, and x is for independent variable, but in this case it s ok to exchange axes). The geom function for scatter plot is geom_point() ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point() MSP MAP Cool! It seems there is a positive relationship between these two variables. However, several overlapped points could influence the interpretability of the plot. One easy way is to introduce the transparency of points. plt <- ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point(alpha =.3) plt 6

7 MSP MAP Besides changing the transparency, changing the point position by erintuitively adding random noise could be helpful to see each point. plt_jittered <- ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point(position = "jitter") plt_jittered MSP MAP You can also add labels to your plot with the geom function geom_text().in this case, we want to label STATION name of points. hjust and vjust is used to control the placement of labels. plt_label <- ggplot(dat1, aes(x = MAP, y = MSP, label = STATION)) + geom_point() + geom_text(hjust =, vjust =, size = 2.2, color = 'gray4') plt_label MSP MAP 7

8 Could you tell the STATION ID of the two outliers around the lower right corner of the plot? However, do all types follow the same relationship between MAP and MSP? To figure it out, we need add some visual aid to separate these types (e.g., color, shape) plt_biome <- ggplot(dat1, aes(x = MAP, y = MSP, color =, shape = )) + geom_point() plt_biome 37 MSP MAP How about density plot for 2-D plot? Try: plt + geom_density2d() MSP MAP Box plots Just like scatter plots, boxplots is a good way to visually check the relationships among two variables. If one variable is continuous (as y) and the other is categorical (as x), then boxplot is a good option. For instance, to understand the general distribution of mean annual temperature () of each biome types (). The geom for boxplot is geom_boxplot() ggplot(dat1, aes(x =, y = )) + geom_boxplot() 8

9 . If you still have time, you can add the following arguments within the parentheses of geom_boxplot()and see what their functions could be: varwidth = T, notch = T ggplot(dat1, aes(x =, y = )) + geom_boxplot(varwidth = T) # the width of box reflecting the ggplot(dat1, aes(x =, y = )) + geom_boxplot(notch = T) # add notch to box(es) ## notch went outside hinges. Try setting notch=false... Similarly, we can make boxplot for mean annual temperature () of ecosystems (ECOSYS). ggplot(dat1, aes(x = ECOSYS, y = )) + geom_boxplot(). B APB BSA B CMB CPB DMB KUB LBHB NMB Peac B PRP B UBH G DMGG FFG FPG MG NFM AM LFM MM SAM UF ECOSYS 9

10 Since the names of ecosystems takes spaces and can easily overlap, we prefer to use ECOSYS as y axis and have horizontal boxplot: ggplot(dat1, aes(x = ECOSYS, y = )) + geom_boxplot() + coord_flip() # horizontal: flip the x, y axes # Great! Now you can color boxplots based their types; ggplot(dat1, aes(x = ECOSYS, y =, fill = )) + geom_boxplot(varwidth = T) + coord_flip() # colored by groups a M UF b M UF M SA M SA M M M M M LF M LF M A M A G NF G NF G MG G MG G FP G FP ECOSYS G FF G DMG B UBH B PRP B Peac ECOSYS G FF G DMG B UBH B PRP B Peac B NM B NM B LBH B LBH B KU B KU B DM B DM B CP B CP B CM B CM B BSA B BSA B AP B AP.. Boxplots are generally useful, but it does only focus on five numbers of the samples (min, max, 2th, th, and 7th). To add more details about distribution, we can add points (jittered) or violin plot as alternatives. ggplot(dat1, aes(x =, y = )) + geom_violin()+ geom_boxplot(width =.1) ggplot(dat1, aes(x =, y = )) + geom_boxplot() + geom_point(position='jitter', alpha=.2, size=2) 1

11 7.... The first commend narrowed the width of boxplots and added them into violin plot, and the second one added scatter plots into boxplots. Well done! So far we just analyze one continuous variable once a time. Can we visual multiple variables in one plot? Hope you still remember in lab2b we applied gather() function to transform a data frame from wide to long. In ggplot2, x or y must only be determined by one single variable. Therefore, first we need to gather multiple interested variables into one, and then use the new data table for ggplot2 plotting. For example, let make a boxplot to check the distribution of three types of mean annual temperature (), mean warmest month temperature (MWMT) and mean coldest month temperature (MCMT). dat2 <- gather(dat1, key = 'temp', value = 'value',, MCMT, MWMT) head(dat2) #quick check the new data table ## STATION ECOSYS MAP MSP temp value ## G-NF ## G-DMG ## G-MG ## G-DMG ## G-NF ## B-AP ggplot(dat2, aes(x = temp, y = value, fill = )) + geom_boxplot() #using different colors for types 2 value 2 MCMT MWMT temp 11

12 Looks nice! If you change the temperature variables as treatments, then different performance among and within groups is a strong clue of interaction..4. Multi-panel scatter plots in R So far, in this lab we learnt 1-dimensional (histogram, density plot), 2-dimensional (scatter plot, boxplot) exploratory graphics, they normally can only analyze one or a pair of variables a time. If you have a data table with 1 potential independent variables, plotting them one by one is not effective. To get the general idea of the relationships among variables in very short time: #need use ggpairs() function of GGally package #install.packages('ggally') library(ggally) ggpairs(dat1[, c('', 'MAP', 'MSP', '', '')], aes(color = )) MAP MSP Cor :.177 :.24 ssland:.26 :.91 Cor : 431 :.311 :.374 :.171 Cor :.711 :.47 :.4 : 371 Cor :.73 :.727 :.1 :.98 Cor :.667 :.117 :.74 :.892 Cor :.492 :.123 :.818 : MAP MSP Voilà. Now you can see the plot matrix among, MAP, MSP, with, and also use different colors distinguish types. 12

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

2. Rotation & Principal Component Analysis

2. Rotation & Principal Component Analysis 2. Rotation & Principal Component Analysis Many classical multivariate techniques rely on rotating a dataset in multiple dimensions and then looking at the results through a 2-dimensional window (e.g.

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

User manual forggsubplot

User manual forggsubplot User manual forggsubplot Garrett Grolemund September 3, 2012 1 Introduction ggsubplot expands the ggplot2 package to help users create multi-level plots, or embedded plots." Embedded plots embed subplots

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017

ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017 ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017 Introduction Know your data: data exploration is an important part of research Data visualization is an excellent way to explore data ggplot2

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Data Science Essentials

Data Science Essentials Data Science Essentials Lab 2 Working with Summary Statistics Overview In this lab, you will learn how to use either R or Python to compute and understand the basics of descriptive statistics. Descriptive

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences' Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Econ 2148, spring 2019 Data visualization

Econ 2148, spring 2019 Data visualization Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

1 The ggplot2 workflow

1 The ggplot2 workflow ggplot2 @ statistics.com Week 2 Dope Sheet Page 1 dope, n. information especially from a reliable source [the inside dope]; v. figure out usually used with out; adj. excellent 1 This week s dope This week

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

EXPLORATORY DATA ANALYSIS. Introducing the data

EXPLORATORY DATA ANALYSIS. Introducing the data EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0

More information

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86

social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 Who s ahead in the polls? 2/86 What values are displayed in this chart? 3/86

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3C Evaluating Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to evaluate and improve the performance of

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Package ggsubplot. February 15, 2013

Package ggsubplot. February 15, 2013 Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett

More information

Introductory Tutorial: Part 1 Describing Data

Introductory Tutorial: Part 1 Describing Data Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018 Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP

More information

Chapter 2 Assignment (due Thursday, April 19)

Chapter 2 Assignment (due Thursday, April 19) (due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

Data Handling: Import, Cleaning and Visualisation

Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline

More information

LAB 2: DATA FILTERING AND NOISE REDUCTION

LAB 2: DATA FILTERING AND NOISE REDUCTION NAME: LAB TIME: LAB 2: DATA FILTERING AND NOISE REDUCTION In this exercise, you will use Microsoft Excel to generate several synthetic data sets based on a simplified model of daily high temperatures in

More information

Data Visualization. Andrew Jaffe Instructor

Data Visualization. Andrew Jaffe Instructor Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Notes for week 3. Ben Bolker September 26, Linear models: review

Notes for week 3. Ben Bolker September 26, Linear models: review Notes for week 3 Ben Bolker September 26, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Outline day 4 May 30th

Outline day 4 May 30th Graphing in R: basic graphing ggplot2 package Outline day 4 May 30th 05/2017 117 Graphing in R: basic graphing 05/2017 118 basic graphing Producing graphs R-base package graphics offers funcaons for producing

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Visual Analytics. Visualizing multivariate data:

Visual Analytics. Visualizing multivariate data: Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis

More information

Our Changing Forests Level 2 Graphing Exercises (Google Sheets)

Our Changing Forests Level 2 Graphing Exercises (Google Sheets) Our Changing Forests Level 2 Graphing Exercises (Google Sheets) In these graphing exercises, you will learn how to use Google Sheets to create a simple pie chart to display the species composition of your

More information

Microscopic Measurement

Microscopic Measurement Microscopic Measurement Estimating Specimen Size : The area of the slide that you see when you look through a microscope is called the " field of view ". If you know the diameter of your field of view,

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions STAT 515 --- STATISTICAL METHODS Statistics: The science of using data to make decisions and draw conclusions Two branches: Descriptive Statistics: The collection and presentation (through graphical and

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

Visualizing univariate data 1

Visualizing univariate data 1 Visualizing univariate data 1 Xijin Ge SDSU Math/Stat Broad perspectives of exploratory data analysis(eda) EDA is not a mere collection of techniques; EDA is a new altitude and philosophy as to how we

More information

Chapter 2 Assignment (due Thursday, October 5)

Chapter 2 Assignment (due Thursday, October 5) (due Thursday, October 5) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

Plotting Graphs. Error Bars

Plotting Graphs. Error Bars E Plotting Graphs Construct your graphs in Excel using the method outlined in the Graphing and Error Analysis lab (in the Phys 124/144/130 laboratory manual). Always choose the x-y scatter plot. Number

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

= 3 + (5*4) + (1/2)*(4/2)^2.

= 3 + (5*4) + (1/2)*(4/2)^2. Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

Math 121 Project 4: Graphs

Math 121 Project 4: Graphs Math 121 Project 4: Graphs Purpose: To review the types of graphs, and use MS Excel to create them from a dataset. Outline: You will be provided with several datasets and will use MS Excel to create graphs.

More information

Chuck Cartledge, PhD. 20 January 2018

Chuck Cartledge, PhD. 20 January 2018 Big Data: Data Analysis Boot Camp Visualizing the Iris Dataset Chuck Cartledge, PhD 20 January 2018 1/31 Table of contents (1 of 1) 1 Intro. 2 Histograms Background 3 Scatter plots 4 Box plots 5 Outliers

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Today s Lecture Objectives 1 Distinguishing different types of plots and their purpose 2 Learning

More information

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

plot(seq(0,10,1), seq(0,10,1), main = the Title, xlim=c(1,20), ylim=c(1,20), col=darkblue); R for Biologists Day 3 Graphing and Making Maps with Your Data Graphing is a pretty convenient use for R, especially in Rstudio. plot() is the most generalized graphing function. If you give it all numeric

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you

More information

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): Graphing on Excel Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): The first step is to organize your data in columns. Suppose you obtain

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

ggplot in 3 easy steps (maybe 2 easy steps)

ggplot in 3 easy steps (maybe 2 easy steps) 1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number

More information

Practical 1P1 Computing Exercise

Practical 1P1 Computing Exercise Practical 1P1 Computing Exercise What you should learn from this exercise How to use the teaching lab computers and printers. How to use a spreadsheet for basic data analysis. How to embed Excel tables

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

SES123 Computer Methods Lab Procedures

SES123 Computer Methods Lab Procedures SES123 Computer Methods Lab Procedures Introduction Science and engineering commonly involve numerical calculations, graphs, photographic images, and various types of figures. In this lab, you will use

More information

Temperature Patterns: Functions and Line Graphs

Temperature Patterns: Functions and Line Graphs activity 3.1 Temperature Patterns: Functions and Line Graphs In this activity, you will work with examples in which curves obtained by joining known points of the graph of a function can help you understand

More information

C ontent descriptions

C ontent descriptions C ontent descriptions http://topdrawer.aamt.edu.au/reasoning/big-ideas/same-and-different Year Number and Algebra F 2 Establish understanding of the language and processes of counting by naming numbers

More information