Data Visualization in R

Size: px
Start display at page:

Download "Data Visualization in R"

Transcription

1 Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting patterns and structures with their eyes Data visualization methods try to explore these capabilities In spite of all advantages visualization methods also have several problems, particularly with very large data sets L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

2 Introduction Outline of what we will learn Tools for univariate data Tools for bivariate data Tools for multivariate data Multidimensional scaling methods L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 R Graphics Introduction L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

3 Introduction to Standard Graphics Standard Graphics (the graphics package) R standard graphics, available through package graphics, includes several functions that provide standard statistical plots, like: Scatterplots Boxplots Piecharts Barplots etc. These graphs can be obtained tyipically by a single function call Example of a scatterplot plot(1:10,sin(1:10)) These graphs can be easily augmented by adding several elements to these graphs (lines, text, etc.) L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Graphics Devices Introduction to Standard Graphics R graphics functions produce output that depends on the active graphics device The default and more frequently used device is the screen There are many more graphical devices in R, like the pdf device, the jpeg device, etc. The user just needs to open (and in the end close) the graphics output device she/he wants. R takes care of producing the type of output required by the device This means that to produce a certain plot on the screen or as a GIF graphics file the R code is exactly the same. You only need to open the target output device before! Several devices may be open at the same time, but only one is the active device L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

4 A few examples A scatterplot Introduction to Standard Graphics plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1))) The same but stored on a jpeg file jpeg('exp.jpg') plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1))) dev.off() And now as a pdf file pdf('exp.pdf',width=6,height=6) plot(seq(0,4*pi,0.1),sin(seq(0,4*pi,0.1))) dev.off() L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Package ggplot2 Introduction to ggplot Package ggplot2 implements the ideas created by Wilkinson (2005) on a grammar of graphics This grammar is the result of a theoretical study on what is a statistical graphic ggplot2 builds upon this theory by implementing the concept of a layered grammar of graphics (Wickham, 2009) The grammar defines a statistical graphic as: a mapping from data into aesthetic attributes (color, shape, size, etc.) of geometric objects (points, lines, bars, etc.) L. Wilkinson (2005). The Grammar of Graphics. Springer. H. Wickham (2009). A layered grammar of graphics. Journal of Computational and Graphical Statistics. L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

5 Introduction to ggplot The Basics of the Grammar of Graphics Key elements of a statistical graphic: data aesthetic mappings geometric objects statistical transformations scales coordinate system faceting L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Aesthetic Mappings Introduction to ggplot Controls the relation between data variables and graphic variables map the Temperature variable of a data set into the x variable in a scatter plot map the Species of a plant into the colour of dots in a graphic etc. L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

6 Geometric Objects Introduction to ggplot Controls what is shown in the graphics show each observation by a point using the aesthetic mappings that map two variables in the data set into the x, y variables of the plot etc. L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Introduction to ggplot Statistical Transformations Allows us to calculate and do statistical analysis over the data in the plot Use the data and approximate it by a regression line on the x, y coordinates Count occurrences of certain values etc. L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

7 Introduction to ggplot Scales Maps the data values into values in the coordinate system of the graphics device L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Coordinate System Introduction to ggplot The coordinate system used to plot the data Cartesian Polar etc. L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

8 Introduction to ggplot Faceting Split the data into sub-groups and draw sub-graphs for each group L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Introduction to ggplot A Simple Example data(iris) library(ggplot2) ggplot(iris, aes(x=petal.length, y=sepal.length, colour=species ) ) + geom_point() Petal.Length Sepal.Length Species setosa versicolor virginica L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

9 Univariate Plots Distribution of Values of Nominal Variables The Distribution of Values of Nominal Variables The Barplot The Barplot is a graph whose main purpose is to display a set of values as heights of bars It can be used to display the frequency of occurrence of different values of a nominal variable as follows: First obtain the number of occurrences of each value Then use the Barplot to display these counts L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Univariate Plots Distribution of Values of Nominal Variables Barplots in base graphics barplot(table(sales$insp), main='distribution of the Types of Inspection') Distribution of the Types of Inspection 1e+05 2e+05 3e+05 ok unkn fraud L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

10 Univariate Plots Distribution of Values of Nominal Variables Barplots in ggplot2 ggplot(sales,aes(x=insp)) + geom_bar() + ggtitle('distribution of the Types of Inspection') 4e+05 Distribution of the Types of Inspection 3e+05 count 2e+05 1e+05 ok unkn fraud Insp L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Univariate Plots Distribution of Values of Nominal Variables Pie Charts Pie charts serve the same purpose as bar plots but present the information in the form of a pie. pie(table(sales$insp), main='distribution of the Types of Inspection') Distribution of the Types of Inspection unkn ok fraud Note: L.Torgo Avoid (FCUP - LIAAD pie / charts! UP) Data Visualization Aug, / 38

11 Univariate Plots Distribution of Values of Nominal Variables Pie Charts in ggplot ggplot(sales,aes(x=factor(1),fill=insp)) + geom_bar(width=1) + ggtitle('distribution of the Types of Inspection') + coord_polar(theta="y") Distribution of the Types of Inspection 4e+05 1 Insp factor(1) 3e+05 1e+05 ok unkn fraud 2e+05 count L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Univariate Plots Distribution of the Values of a Continuous Variable The Distribution of Values of a Continuous Variable The Histogram The Histogram is a graph whose main purpose is to display how the values of a continuous variable are distributed It is obtained as follows: First the range of the variable is divided into a set of bins (intervals of values) Then the number of occurrences of values on each bin is counted Then this number is displayed as a bar L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

12 Univariate Plots Distribution of the Values of a Continuous Variable Two examples of Histograms in R hist(sales$,main='distribution of ', xlab='',ylab='nr. of Occurrencies') hist(sales$,main='distribution of ', xlab='',ylab='percentage',prob=true) Distribution of Distribution of Nr. of Occurrencies 1e+05 2e+05 3e+05 4e+05 Percentage 1e 08 2e 08 3e 08 4e 08 5e 08 1e+08 2e+08 3e+08 4e+08 1e+08 2e+08 3e+08 4e+08 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Univariate Plots Distribution of the Values of a Continuous Variable Two examples of Histograms in ggplot2 filter(sales,prod == "p1",!is.na()) %>% ggplot(aes(x=)) + geom_histogram(binwidth=2000) + ggtitle("distribution of of Prod P1") + ylab("nr. of Occurrencies") filter(sales,prod == "p1",!is.na()) %>% ggplot(sales,aes(x=)) + geom_histogram(binwidth=2000,aes(y=..density..)) + ggtitle("distribution of of Prod P1") + ylab("percentage") Distribution of of Prod P1 Distribution of of Prod P1 4e 04 3e Nr. of Occurrencies 50 Percentage 2e 04 1e L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

13 Univariate Plots Distribution of the Values of a Continuous Variable Showing the Distribution of Values Box Plots Box plots provide interesting summaries of a variable distribution For instance, they inform us of the interquartile range and of the outliers (if any) L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Univariate Plots Distribution of the Values of a Continuous Variable An Example with base graphics boxplot(sales$,ylab='distribution of ities') Distribution of ities 1e+08 2e+08 3e+08 4e+08 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

14 Univariate Plots Distribution of the Values of a Continuous Variable An Example with ggplot2 ggplot(sales,aes(x=factor(0),y=)) + geom_boxplot() + ylab("distribution of ities") + xlab("") + scale_x_discrete(breaks=null) 4e+08 Distribution of ities 3e+08 2e+08 1e+08 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Conditioned Graphs Conditioned Graphs Data sets frequently have nominal variables that can be used to create sub-groups of the data according to these variables values e.g. the sub-group of male clients of a company Some of the visual summaries described before can be obtained on each of these sub-groups Conditioned plots allow the simultaneous presentation of these sub-group graphs to better allow finding eventual differences between the sub-groups Base graphics do not have conditioning but ggplot2 has it through the concept of facets L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

15 Conditioned Graphs Conditioned histograms Conditioned Histograms Goal: Constrast the distribution of data sub-groups dt <- filter(sales,prod %in% c("p1","p2","p3"),!is.na()) ggplot(dt,aes(x=)) + geom_histogram(binwidth=2000) + facet_wrap(~ Prod) + ggtitle("distribution of ity by Product") Distribution of ity by Product p1 p2 p3 100 count L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Conditioned Graphs Conditioned histograms Conditioned Histograms (2) dt <- filter(sales,prod %in% c("p226","p314","p319")) %>% mutate(unitprice=val/) ggplot(dt,aes(x=unitprice)) + geom_histogram(binwidth=10,aes(y=..density..)) + ggtitle("distribution of Unit Price") Distribution of Unit Price p226 p314 p density ok unkn fraud UnitPrice L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

16 Conditioned Graphs Conditioned Box Plots Conditioned Box Plots on base graphics boxplot( ~ Insp,sales, main='distribution of for the different transactions', xlab='type of Transaction',ylab='Distribution of ') ok unkn fraud 1e+08 2e+08 3e+08 4e+08 Distribution of for the different transactions Type of Transaction Distribution of L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Conditioned Graphs Conditioned Box Plots Conditioned Box Plots on ggplot2 ggplot(sales,aes(x=insp,y=))+geom_boxplot() + ggtitle('distribution of for the different transactions') ## Warning: Removed rows containing non-finite values (stat_boxplot). 1e+08 2e+08 3e+08 4e+08 ok unkn fraud Insp Distribution of for the different transactions L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

17 Other Plots Scatterplots Scatterplots in base graphics The Scatterplot is the natural graph for showing the relationship between two numerical variables dt <- filter(sales,prod=="p11") plot(dt$,dt$val, main='relationship between Value and ity for P1', xlab='',ylab='val') Relationship between Value and ity for P1 Val 2e+05 4e+05 6e L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Other Plots Scatterplots Scatterplots in ggplot2 ggplot(dt,aes(x=,y=val)) + geom_point() + ggtitle('relationship between Value and ity for P1') Relationship between Value and ity for P1 6e+05 4e+05 Val 2e+05 5e+04 1e+05 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

18 Other Plots Scatterplots Conditioned Scatterplots using the ggplot2 package ggplot(dt,aes(x=,y=val)) + geom_point() + facet_wrap(~insp) ok unkn fraud 5e+04 1e+05 5e+04 1e+05 5e+04 1e+05 2e+05 4e+05 6e+05 Val L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Other Plots Scatterplot matrices Scatterplot matrices through package GGally These graphs try to present all pairwise scatterplots in a data set. They are unfeasible for data sets with many variables. library(ggally) ggpairs(sales,columns=3:4, diag=list(continuous="density", discrete="bar"),axislabels="show") Corr: Val Val 1e+08 2e+08 3e+08 4e+08 1e+06 2e+06 3e+06 4e e+06 2e+06 3e+06 4e+06 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

19 Other Plots Scatterplot matrices Scatterplot matrices involving nominal variables ggpairs(sales,columns=3:5,axislabels="show") Corr: Val Insp Val Insp 1e+082e+083e+084e+085e+08 1e+062e+063e+064e+06 ok unkn fraud e+06 2e+06 3e+06 4e+06 1e+05 2e+05 3e+05 1e+05 2e+05 3e+05 1e+05 2e+05 3e+05 L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38 Other Plots Parallel plots Parallel Plots Parallel plots are also interesting for visualizing a data set ggparcoord(filter(sales,prod=="p226"),columns=3:4,groupcolumn="insp") Val variable value Insp ok unkn fraud L.Torgo (FCUP - LIAAD / UP) Data Visualization Aug, / 38

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Data Manipulation using dplyr

Data Manipulation using dplyr Data Manipulation in R Reading and Munging Data L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2017 Data Manipulation using dplyr The dplyr is a package

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

Graphing Bivariate Relationships

Graphing Bivariate Relationships Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Chuck Cartledge, PhD. 20 January 2018

Chuck Cartledge, PhD. 20 January 2018 Big Data: Data Analysis Boot Camp Visualizing the Iris Dataset Chuck Cartledge, PhD 20 January 2018 1/31 Table of contents (1 of 1) 1 Intro. 2 Histograms Background 3 Scatter plots 4 Box plots 5 Outliers

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

Econ 2148, spring 2019 Data visualization

Econ 2148, spring 2019 Data visualization Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information

DATA VISUALIZATION WITH GGPLOT2. Coordinates

DATA VISUALIZATION WITH GGPLOT2. Coordinates DATA VISUALIZATION WITH GGPLOT2 Coordinates Coordinates Layer Controls plot dimensions coord_ coord_cartesian() Zooming in scale_x_continuous(limits =...) xlim() coord_cartesian(xlim =...) Original Plot

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

EXPLORATORY DATA ANALYSIS. Introducing the data

EXPLORATORY DATA ANALYSIS. Introducing the data EXPLORATORY DATA ANALYSIS Introducing the data Email data set > email # A tibble: 3,921 21 spam to_multiple from cc sent_email time image 1 not-spam 0 1 0 0

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons

More information

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences'

A set of rules describing how to compose a 'vocabulary' into permissible 'sentences' Lecture 8: The grammar of graphics STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University Grammar? A set of rules describing how to compose a 'vocabulary'

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

A Brief Introduction to Data Mining

A Brief Introduction to Data Mining A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Sept, 2014 Introduction Motivation for Data Mining?

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

Visual Analytics. Visualizing multivariate data:

Visual Analytics. Visualizing multivariate data: Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or

More information

Introduction to R for Epidemiologists

Introduction to R for Epidemiologists Introduction to R for Epidemiologists Jenna Krall, PhD Thursday, January 29, 2015 Final project Epidemiological analysis of real data Must include: Summary statistics T-tests or chi-squared tests Regression

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

CIND123 Module 6.2 Screen Capture

CIND123 Module 6.2 Screen Capture CIND123 Module 6.2 Screen Capture Hello, everyone. In this segment, we will discuss the basic plottings in R. Mainly; we will see line charts, bar charts, histograms, pie charts, and dot charts. Here is

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12 Stat405 Displaying distributions Hadley Wickham 1. The diamonds data 2. Histograms and bar charts 3. Homework Diamonds Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour,

More information

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created

Graphics in R. There are three plotting systems in R. base Convenient, but hard to adjust after the plot is created Graphics in R There are three plotting systems in R base Convenient, but hard to adjust after the plot is created lattice Good for creating conditioning plot ggplot2 Powerful and flexible, many tunable

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Data Presentation. Figure 1. Hand drawn data sheet

Data Presentation. Figure 1. Hand drawn data sheet Data Presentation The purpose of putting results of experiments into graphs, charts and tables is two-fold. First, it is a visual way to look at the data and see what happened and make interpretations.

More information

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Maps & layers Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 1. Introduction to map data 2. Map projections 3. Loading & converting

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

A Brief Introduction to Data Mining

A Brief Introduction to Data Mining A Brief Introduction to Data Mining L. Torgo ltorgo@dcc.fc.up.pt Departamento de Ciência de Computadores Faculdade de Ciências / Universidade do Porto Feb, 2017 What is Data Mining? Introduction A possible

More information

An Introduction to Minitab Statistics 529

An Introduction to Minitab Statistics 529 An Introduction to Minitab Statistics 529 1 Introduction MINITAB is a computing package for performing simple statistical analyses. The current version on the PC is 15. MINITAB is no longer made for the

More information

Univariate descriptives

Univariate descriptives Univariate descriptives Johan A. Elkink University College Dublin 18 September 2014 18 September 2014 1 / Outline 1 Graphs for categorical variables 2 Graphs for scale variables 3 Frequency tables 4 Central

More information

Getting Started with JMP at ISU

Getting Started with JMP at ISU Getting Started with JMP at ISU 1 Introduction JMP (pronounced like jump ) is the new campus-wide standard statistical package for introductory statistics courses at Iowa State University. JMP is produced

More information

Visualizing Data: Freq. Tables, Histograms

Visualizing Data: Freq. Tables, Histograms Visualizing Data: Freq. Tables, Histograms Engineering Statistics Section 1.2 Josh Engwer TTU 25 January 2016 Josh Engwer (TTU) Visualizing Data: Freq. Tables, Histograms 25 January 2016 1 / 23 Descriptive

More information

Graphical critique & theory. Hadley Wickham

Graphical critique & theory. Hadley Wickham Graphical critique & theory Hadley Wickham Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

Introduction to R and Statistical Data Analysis

Introduction to R and Statistical Data Analysis Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,

More information

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University

TNM093 Tillämpad visualisering och virtuell verklighet. Jimmy Johansson C-Research, Linköping University TNM093 Tillämpad visualisering och virtuell verklighet Jimmy Johansson C-Research, Linköping University Introduction to Visualization New Oxford Dictionary of English, 1999 visualize - verb [with obj.]

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 1. Diving in: scatterplots & aesthetics 2. Facetting 3. Geoms

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Boxplot

Boxplot Boxplot By: Meaghan Petix, Samia Porto & Franco Porto A boxplot is a convenient way of graphically depicting groups of numerical data through their five number summaries: the smallest observation (sample

More information

Outline day 4 May 30th

Outline day 4 May 30th Graphing in R: basic graphing ggplot2 package Outline day 4 May 30th 05/2017 117 Graphing in R: basic graphing 05/2017 118 basic graphing Producing graphs R-base package graphics offers funcaons for producing

More information

Making Science Graphs and Interpreting Data

Making Science Graphs and Interpreting Data Making Science Graphs and Interpreting Data Eye Opener: 5 mins What do you see? What do you think? Look up terms you don t know What do Graphs Tell You? A graph is a way of expressing a relationship between

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

Package ggiraphextra

Package ggiraphextra Type Package Package ggiraphextra December 3, 2016 Title Make Interactive 'ggplot2'. Extension to 'ggplot2' and 'ggiraph' Version 0.1.0 Maintainer Keon-Woong Moon URL https://github.com/cardiomoon/ggiraphextra

More information

Introduction to Statistical Graphics Procedures

Introduction to Statistical Graphics Procedures Introduction to Statistical Graphics Procedures Selvaratnam Sridharma, U.S. Census Bureau, Washington, DC ABSTRACT SAS statistical graphics procedures (SG procedures) that were introduced in SAS 9.2 help

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

Chapter 2: Looking at Multivariate Data

Chapter 2: Looking at Multivariate Data Chapter 2: Looking at Multivariate Data Multivariate data could be presented in tables, but graphical presentations are more effective at displaying patterns. We can see the patterns in one variable at

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Introduction to R for Beginners, Level II Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Basics of R Powerful programming language and environment for statistical computing Useful for very basic analysis

More information

Multivariate Data More Overview

Multivariate Data More Overview Multivariate Data More Overview CS 4460 - Information Visualization Jim Foley Last Revision August 2016 Some Key Concepts Quick Review Data Types Data Marks Basic Data Types N-Nominal (categorical) Equal

More information

Graphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples.

Graphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples. Graphics in R Jim Bentley 1 Sample Data The following code creates a couple of sample data frames that we will use in our examples. > sex = c(rep("female",12),rep("male",7)) > mass = c(36.1, 54.6, 48.5,

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6 Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA Statistics is concerned with scientific methods of collecting, recording, organising, summarising, presenting and analysing data from which future

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

User manual forggsubplot

User manual forggsubplot User manual forggsubplot Garrett Grolemund September 3, 2012 1 Introduction ggsubplot expands the ggplot2 package to help users create multi-level plots, or embedded plots." Embedded plots embed subplots

More information

Notes for week 3. Ben Bolker September 26, Linear models: review

Notes for week 3. Ben Bolker September 26, Linear models: review Notes for week 3 Ben Bolker September 26, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Introduction to Lattice Graphics. Richard Pugh 4 th December 2012

Introduction to Lattice Graphics. Richard Pugh 4 th December 2012 Introduction to Lattice Graphics Richard Pugh 4 th December 2012 Agenda Overview of Lattice Functions Creating basic graphics Panelled Graphics Grouped Data Multiple Variables Writing Panel Functions Summary

More information

Project 11 Graphs (Using MS Excel Version )

Project 11 Graphs (Using MS Excel Version ) Project 11 Graphs (Using MS Excel Version 2007-10) Purpose: To review the types of graphs, and use MS Excel 2010 to create them from a dataset. Outline: You will be provided with several datasets and will

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

1.2. Pictorial and Tabular Methods in Descriptive Statistics

1.2. Pictorial and Tabular Methods in Descriptive Statistics 1.2. Pictorial and Tabular Methods in Descriptive Statistics Section Objectives. 1. Stem-and-Leaf displays. 2. Dotplots. 3. Histogram. Types of histogram shapes. Common notation. Sample size n : the number

More information

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics

DATA VISUALIZATION WITH GGPLOT2. Grid Graphics DATA VISUALIZATION WITH GGPLOT2 Grid Graphics ggplot2 internals Explore grid graphics 35 30 Elements of ggplot2 plot 25 How do graphics work in R? 2 plotting systems mpg 20 15 base package grid graphics

More information

Using Built-in Plotting Functions

Using Built-in Plotting Functions Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information