Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018
|
|
- Barry Paul
- 5 years ago
- Views:
Transcription
1 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory graphics are usually made very quickly in the process of checking for errors, outliers, distribution, and correlations of variables. The goal of making graphs is usually developing a personal understanding of the data and to prioritize tasks for follow up analysis. Grammar of ggplot2 ggplot2, one other important package of tidyverse, is designed for data visualization of data frames. gg of the name represents grammar of graphic, and ggplot2 has been recognized as one of three main graphic system of R. The most important thing to get used to with ggplot2 is the logical structure of plots. The code you write specifies the connections between the variables in your data, and the x and y location, colors, size, shapes etc that you can see on the screen. In ggplot2, these logical connections between your data and the plot elements are called aesthetic mappings or just aesthetics. You begin every plot by telling the ggplot() function what your data is, and then how the variables in this data logically map onto the plot s aesthetic mapping. Then you take the result and say what general sort of plot you want, such as a scatterplot, a boxplot, or a bar chart. In ggplot2, the overall type of plot is called a geom. Each geom has a function that creates it and the function s name follows the pattern of geom_... (). For example, geom_point() makes scatterplots, geom_bar() makes bar plots, geom_boxplot() makes boxplots, and so on. You combine these two pieces, the ggplot(data, mapping) object and the geom_... (), by literally adding them together in an expression, using the + symbol. Data, mapping (or aesthetics), and geometry (geom) are three mandatory components for ggplot2. As other functions, the output of ggplot2 can be assigned to an object for further editing. Other optional ggplot2 grammar components will be introduced in Lab for figure customization. 1
2 A little too complex? Don t worry; you will get familiar with the grammar system very soon. In this lab, we will use this ggplot2 syntax to plot the following exploratory graphics: histogram (density plot), boxplot, scatterplot, and scatterplot matrix. Data preparation For this exercise, use a weather station dataset AB_Stations.csv that you can download from the course website. The first three columns specify the weather station ID, as well as the ecosystems and the biome of Alberta in which the weather station is located. This is followed by a number of climate variables that you can use for exploration (=mean annual temp, MWMT= mean warmest month temp, MCMT=mean coldest month temp, MAP=mean annual precipitation, MSP=mean summer precipitation, =an index). Load required packages. #install.package('tidyverse') library(tidyverse) # if no tidyverse package installed Import the dataset with the code below, and use head(), tail(), str()or View() functions to check the imported data table. dat1 <- read.csv("e:\\lab3\\ab_stations.csv") head(dat1, 1) ## STATION ECOSYS MWMT MCMT MAP MSP ## G-NF ## G-DMG ## G-MG ## G-DMG ## G-NF ## B-AP ## B-UBH ## M-M ## B-CP ## B-KU Hisograms One useful plot type for exploration of raw data is histograms. They are commonly used to visually check the distribution of continuous variables. The geom of histogram is geom_histogram(). For histograms the y axis is ing the number of observations in each bin (default of ggplot2), but y can also be set as density. According to the ggplot2 syntax, we can execute the following command to get a histogram for a variable, in this case the variable : hist_a <- ggplot(dat1, aes(x = )) + geom_histogram(color = 'gray9') hist_a ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. 2
3 1 1 1 ggplot2 chooses the bin width by default when generating histograms, but chances are that bin width is not the most appropriate one for any histogram you may want to make. It is therefore critical to change bins to verify whether the resulting histogram reflects the data accurately. Too many bins makes histograms overly peaky and losses the whole picture of distribution, while too few bins cover limited details of the distribution. The following two methods to change bins: 1. First method: set the number of bins you want for the histogram; ggplot(dat1, aes(x = )) + geom_histogram(bins =, color = 'gray9') # bins ggplot(dat1, aes(x = )) + geom_histogram(bins = 2, color = 'gray9')#2 bin ggplot(dat1, aes(x = )) + geom_histogram(bins =, color = 'gray9')# bin Second method: set the width of bins: ggplot(dat1, aes(x = )) + geom_histogram(binwidth = 1, color = 'gray9') ggplot(dat1, aes(x = )) + geom_histogram(binwidth =, color = 'gray9') ggplot(dat1, aes(x = )) + geom_histogram(binwidth = 1, color = 'gray9') Great to visually check the effectiveness of data transformations. In this case, the square-root transformation achieves approximately a normal distribution. 3
4 hist_b <- ggplot(dat1, aes(x = sqrt())) + geom_histogram(color = 'gray9') hist_b ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth` sqrt() You can also fill colors of the bins by group/class. In many scenarios we have multiple distributions we would like to visualize simultaneously. For example, were the biomes having similar dryness situation? One commonly employed visualization strategy is stacking bars on top of each other and filling histogram in different colors for groups; hist_c <- ggplot(dat1, aes(x =, fill = )) + geom_histogram() Although ing numbers is used as y axis by default, you can change y axis as density. Given uneven sample size for each group/class, density histograms may show inconsistent pattern with the frequency ones. hist_d <- ggplot(dat1, aes(x =, fill = )) + geom_histogram(aes(y =..density..))#specify y as density ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. hist_c hist_d density One biggest disadvantage of the stacked histogram is hard to quantify each group; for example, how many samples of have values around 4? About 1 or 8? It s not so clear to compare distributions among groups. To solve this, one way is to change the positions of bins. One common way is dodging which preserves the vertical position of a geom while adjusting the horizontal position. hist_e <- ggplot(dat1, aes(x =, fill = )) + geom_histogram(position = 'dodge') #change bin positions hist_e ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`. 4
5 Histograms have been a popular visualization option since at least the 18th century, in part because they are easily generated by hand. More recently, as extensive computing power has become popularized in everyday devices such as laptops and cell phones, we see them increasingly being replaced by density plots. In a density plot, we attempt to visualize the underlying probability distribution of the data by drawing an appropriate continuous curve; hist_f <- ggplot(dat1, aes(x =, fill = )) + geom_density(alpha =.4) #introduce transparency hist_f density Similarly, we fill density curve with different colors. The alpha argument is used to introduce transparency of the color, and alpha value in the range of (totally transparent) to 1 (no transparent). Also, try to add multiple geom: hist_f + geom_histogram(aes(y =..density..), alpha =.6, position = 'dodge') ## `stat_bin()` using `bins = 3`. Pick better value with `binwidth`.
6 6 density Scatter plots With scatter plots you can visually check the relationships among variables. Are they linear or curvilinear? Outliers are also easily visible. Now, try to use a scatter plot to visually check the relationships among variables and to identify outliers. To check the relationships between Mean Summer Precipitation (MSP) and Mean Annual Precipitation (MAP), we can use them as x, y in the plot respectively (normally, y axis is for dependent variable, and x is for independent variable, but in this case it s ok to exchange axes). The geom function for scatter plot is geom_point() ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point() MSP MAP Cool! It seems there is a positive relationship between these two variables. However, several overlapped points could influence the interpretability of the plot. One easy way is to introduce the transparency of points. plt <- ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point(alpha =.3) plt 6
7 MSP MAP Besides changing the transparency, changing the point position by erintuitively adding random noise could be helpful to see each point. plt_jittered <- ggplot(dat1, aes(x = MAP, y = MSP)) + geom_point(position = "jitter") plt_jittered MSP MAP You can also add labels to your plot with the geom function geom_text().in this case, we want to label STATION name of points. hjust and vjust is used to control the placement of labels. plt_label <- ggplot(dat1, aes(x = MAP, y = MSP, label = STATION)) + geom_point() + geom_text(hjust =, vjust =, size = 2.2, color = 'gray4') plt_label MSP MAP 7
8 Could you tell the STATION ID of the two outliers around the lower right corner of the plot? However, do all types follow the same relationship between MAP and MSP? To figure it out, we need add some visual aid to separate these types (e.g., color, shape) plt_biome <- ggplot(dat1, aes(x = MAP, y = MSP, color =, shape = )) + geom_point() plt_biome 37 MSP MAP How about density plot for 2-D plot? Try: plt + geom_density2d() MSP MAP Box plots Just like scatter plots, boxplots is a good way to visually check the relationships among two variables. If one variable is continuous (as y) and the other is categorical (as x), then boxplot is a good option. For instance, to understand the general distribution of mean annual temperature () of each biome types (). The geom for boxplot is geom_boxplot() ggplot(dat1, aes(x =, y = )) + geom_boxplot() 8
9 . If you still have time, you can add the following arguments within the parentheses of geom_boxplot()and see what their functions could be: varwidth = T, notch = T ggplot(dat1, aes(x =, y = )) + geom_boxplot(varwidth = T) # the width of box reflecting the ggplot(dat1, aes(x =, y = )) + geom_boxplot(notch = T) # add notch to box(es) ## notch went outside hinges. Try setting notch=false... Similarly, we can make boxplot for mean annual temperature () of ecosystems (ECOSYS). ggplot(dat1, aes(x = ECOSYS, y = )) + geom_boxplot(). B APB BSA B CMB CPB DMB KUB LBHB NMB Peac B PRP B UBH G DMGG FFG FPG MG NFM AM LFM MM SAM UF ECOSYS 9
10 Since the names of ecosystems takes spaces and can easily overlap, we prefer to use ECOSYS as y axis and have horizontal boxplot: ggplot(dat1, aes(x = ECOSYS, y = )) + geom_boxplot() + coord_flip() # horizontal: flip the x, y axes # Great! Now you can color boxplots based their types; ggplot(dat1, aes(x = ECOSYS, y =, fill = )) + geom_boxplot(varwidth = T) + coord_flip() # colored by groups a M UF b M UF M SA M SA M M M M M LF M LF M A M A G NF G NF G MG G MG G FP G FP ECOSYS G FF G DMG B UBH B PRP B Peac ECOSYS G FF G DMG B UBH B PRP B Peac B NM B NM B LBH B LBH B KU B KU B DM B DM B CP B CP B CM B CM B BSA B BSA B AP B AP.. Boxplots are generally useful, but it does only focus on five numbers of the samples (min, max, 2th, th, and 7th). To add more details about distribution, we can add points (jittered) or violin plot as alternatives. ggplot(dat1, aes(x =, y = )) + geom_violin()+ geom_boxplot(width =.1) ggplot(dat1, aes(x =, y = )) + geom_boxplot() + geom_point(position='jitter', alpha=.2, size=2) 1
11 7.... The first commend narrowed the width of boxplots and added them into violin plot, and the second one added scatter plots into boxplots. Well done! So far we just analyze one continuous variable once a time. Can we visual multiple variables in one plot? Hope you still remember in lab2b we applied gather() function to transform a data frame from wide to long. In ggplot2, x or y must only be determined by one single variable. Therefore, first we need to gather multiple interested variables into one, and then use the new data table for ggplot2 plotting. For example, let make a boxplot to check the distribution of three types of mean annual temperature (), mean warmest month temperature (MWMT) and mean coldest month temperature (MCMT). dat2 <- gather(dat1, key = 'temp', value = 'value',, MCMT, MWMT) head(dat2) #quick check the new data table ## STATION ECOSYS MAP MSP temp value ## G-NF ## G-DMG ## G-MG ## G-DMG ## G-NF ## B-AP ggplot(dat2, aes(x = temp, y = value, fill = )) + geom_boxplot() #using different colors for types 2 value 2 MCMT MWMT temp 11
12 Looks nice! If you change the temperature variables as treatments, then different performance among and within groups is a strong clue of interaction..4. Multi-panel scatter plots in R So far, in this lab we learnt 1-dimensional (histogram, density plot), 2-dimensional (scatter plot, boxplot) exploratory graphics, they normally can only analyze one or a pair of variables a time. If you have a data table with 1 potential independent variables, plotting them one by one is not effective. To get the general idea of the relationships among variables in very short time: #need use ggpairs() function of GGally package #install.packages('ggally') library(ggally) ggpairs(dat1[, c('', 'MAP', 'MSP', '', '')], aes(color = )) MAP MSP Cor :.177 :.24 ssland:.26 :.91 Cor : 431 :.311 :.374 :.171 Cor :.711 :.47 :.4 : 371 Cor :.73 :.727 :.1 :.98 Cor :.667 :.117 :.74 :.892 Cor :.492 :.123 :.818 : MAP MSP Voilà. Now you can see the plot matrix among, MAP, MSP, with, and also use different colors distinguish types. 12
Statistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationVisualizing Data: Customization with ggplot2
Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers
More informationRstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang
Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning
More informationThe diamonds dataset Visualizing data in R with ggplot2
Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part
More informationFacets and Continuous graphs
Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
More informationPlotting with Rcell (Version 1.2-5)
Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson
More informationData visualization with ggplot2
Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2
More informationIntroduction to R and the tidyverse. Paolo Crosetto
Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:
More informationIntroduction to Graphics with ggplot2
Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to
More informationggplot2 for beginners Maria Novosolov 1 December, 2014
ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working
More information2. Rotation & Principal Component Analysis
2. Rotation & Principal Component Analysis Many classical multivariate techniques rely on rotating a dataset in multiple dimensions and then looking at the results through a 2-dimensional window (e.g.
More informationPackage ggextra. April 4, 2018
Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',
More informationUser manual forggsubplot
User manual forggsubplot Garrett Grolemund September 3, 2012 1 Introduction ggsubplot expands the ggplot2 package to help users create multi-level plots, or embedded plots." Embedded plots embed subplots
More informationLecture 4: Data Visualization I
Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview
More informationggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017
ggplot2 for Epi Studies Leah McGrath, PhD November 13, 2017 Introduction Know your data: data exploration is an important part of research Data visualization is an excellent way to explore data ggplot2
More informationData Visualization. Module 7
Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationData Science Essentials
Data Science Essentials Lab 2 Working with Summary Statistics Overview In this lab, you will learn how to use either R or Python to compute and understand the basics of descriptive statistics. Descriptive
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationLarge data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.
Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons
More informationData Science and Machine Learning Essentials
Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationData Visualization Using R & ggplot2. Karthik Ram October 6, 2013
Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")
More informationA set of rules describing how to compose a 'vocabulary' into permissible 'sentences'
Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationEcon 2148, spring 2019 Data visualization
Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationData Science and Machine Learning Essentials
Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct
More informationAn introduction to ggplot: An implementation of the grammar of graphics in R
An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics
More informationImporting and visualizing data in R. Day 3
Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation
More information1 The ggplot2 workflow
ggplot2 @ statistics.com Week 2 Dope Sheet Page 1 dope, n. information especially from a reliable source [the inside dope]; v. figure out usually used with out; adj. excellent 1 This week s dope This week
More informationLAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA
LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to
More informationIntroduction to Data Visualization
Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationBIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26
Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations
More informationEXPLORATORY DATA ANALYSIS. Introducing the data
EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0
More informationsocial data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86
social data science Data Visualization Sebastian Barfort August 08, 2016 University of Copenhagen Department of Economics 1/86 Who s ahead in the polls? 2/86 What values are displayed in this chart? 3/86
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationSAS Visual Analytics 8.2: Working with Report Content
SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationData Science and Machine Learning Essentials
Data Science and Machine Learning Essentials Lab 3C Evaluating Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to evaluate and improve the performance of
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationPackage ggsubplot. February 15, 2013
Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett
More informationIntroductory Tutorial: Part 1 Describing Data
Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the
More informationCreating elegant graphics in R with ggplot2
Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is
More informationUnderstanding and Comparing Distributions. Chapter 4
Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables
More informationGgplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)
Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationExploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018
Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP
More informationChapter 2 Assignment (due Thursday, April 19)
(due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationData Handling: Import, Cleaning and Visualisation
Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline
More informationLAB 2: DATA FILTERING AND NOISE REDUCTION
NAME: LAB TIME: LAB 2: DATA FILTERING AND NOISE REDUCTION In this exercise, you will use Microsoft Excel to generate several synthetic data sets based on a simplified model of daily high temperatures in
More informationData Visualization. Andrew Jaffe Instructor
Module 9 Data Visualization Andrew Jaffe Instructor Basic Plots We covered some basic plots previously, but we are going to expand the ability to customize these basic graphics first. 2/45 Read in Data
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationNotes for week 3. Ben Bolker September 26, Linear models: review
Notes for week 3 Ben Bolker September 26, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationOutline day 4 May 30th
Graphing in R: basic graphing ggplot2 package Outline day 4 May 30th 05/2017 117 Graphing in R: basic graphing 05/2017 118 basic graphing Producing graphs R-base package graphics offers funcaons for producing
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationVisual Analytics. Visualizing multivariate data:
Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or
More informationUniversity of Florida CISE department Gator Engineering. Visualization
Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationOur Changing Forests Level 2 Graphing Exercises (Google Sheets)
Our Changing Forests Level 2 Graphing Exercises (Google Sheets) In these graphing exercises, you will learn how to use Google Sheets to create a simple pie chart to display the species composition of your
More informationMicroscopic Measurement
Microscopic Measurement Estimating Specimen Size : The area of the slide that you see when you look through a microscope is called the " field of view ". If you know the diameter of your field of view,
More informationMath 227 EXCEL / MEGASTAT Guide
Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationDescriptive Statistics, Standard Deviation and Standard Error
AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.
More informationSTAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions
STAT 515 --- STATISTICAL METHODS Statistics: The science of using data to make decisions and draw conclusions Two branches: Descriptive Statistics: The collection and presentation (through graphical and
More informationBIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...
BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...
More informationComputing With R Handout 1
Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution
More informationVisualizing univariate data 1
Visualizing univariate data 1 Xijin Ge SDSU Math/Stat Broad perspectives of exploratory data analysis(eda) EDA is not a mere collection of techniques; EDA is a new altitude and philosophy as to how we
More informationChapter 2 Assignment (due Thursday, October 5)
(due Thursday, October 5) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationExcel Tips and FAQs - MS 2010
BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my
More informationPlotting Graphs. Error Bars
E Plotting Graphs Construct your graphs in Excel using the method outlined in the Graphing and Error Analysis lab (in the Phys 124/144/130 laboratory manual). Always choose the x-y scatter plot. Number
More informationSession 3 Nick Hathaway;
Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................
More information= 3 + (5*4) + (1/2)*(4/2)^2.
Physics 100 Lab 1: Use of a Spreadsheet to Analyze Data by Kenneth Hahn and Michael Goggin In this lab you will learn how to enter data into a spreadsheet and to manipulate the data in meaningful ways.
More information03 - Intro to graphics (with ggplot2)
3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................
More informationMath 121 Project 4: Graphs
Math 121 Project 4: Graphs Purpose: To review the types of graphs, and use MS Excel to create them from a dataset. Outline: You will be provided with several datasets and will use MS Excel to create graphs.
More informationChuck Cartledge, PhD. 20 January 2018
Big Data: Data Analysis Boot Camp Visualizing the Iris Dataset Chuck Cartledge, PhD 20 January 2018 1/31 Table of contents (1 of 1) 1 Intro. 2 Histograms Background 3 Scatter plots 4 Box plots 5 Outliers
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationAdvanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel
Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Today s Lecture Objectives 1 Distinguishing different types of plots and their purpose 2 Learning
More informationplot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");
R for Biologists Day 3 Graphing and Making Maps with Your Data Graphing is a pretty convenient use for R, especially in Rstudio. plot() is the most generalized graphing function. If you give it all numeric
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationOrientation Assignment for Statistics Software (nothing to hand in) Mary Parker,
Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you
More informationGraphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):
Graphing on Excel Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): The first step is to organize your data in columns. Suppose you obtain
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationggplot in 3 easy steps (maybe 2 easy steps)
1 ggplot in 3 easy steps (maybe 2 easy steps) 1.1 aesthetic: what you want to graph (e.g. x, y, z). 1.2 geom: how you want to graph it. 1.3 options: optional titles, themes, etc. 2 Background R has a number
More informationPractical 1P1 Computing Exercise
Practical 1P1 Computing Exercise What you should learn from this exercise How to use the teaching lab computers and printers. How to use a spreadsheet for basic data analysis. How to embed Excel tables
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationSES123 Computer Methods Lab Procedures
SES123 Computer Methods Lab Procedures Introduction Science and engineering commonly involve numerical calculations, graphs, photographic images, and various types of figures. In this lab, you will use
More informationTemperature Patterns: Functions and Line Graphs
activity 3.1 Temperature Patterns: Functions and Line Graphs In this activity, you will work with examples in which curves obtained by joining known points of the graph of a function can help you understand
More informationC ontent descriptions
C ontent descriptions http://topdrawer.aamt.edu.au/reasoning/big-ideas/same-and-different Year Number and Algebra F 2 Establish understanding of the language and processes of counting by naming numbers
More information