Financial Econometrics Practical

Size: px
Start display at page:

Download "Financial Econometrics Practical"

Transcription

1 Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction Install ggplot Get data Tidy Plotting Tidy data Introduction The aim of this tutorial is to introduce you to tidy analysis in R - which is essential for making plots easily and nicely # Plotting in R While there are many and extremely diverse packages that can be used for plotting purposes in R - the one I use most is undoubtedly ggplot2. For dynamic plotting and financial series plots, there are also packages like dygraphs and plotly. Here follows a very basic, high level view of ggplot2 plotting functionality. I suggest supplementing this tutorial by reading this post with examples, and saving or printing out this ggplot cheatsheet. 1

2 1.0.1 Install ggplot2 First things first, let s ensure ggplot2 is correctly installed on our machines: if (!require("devtools")) install.packages("devtools") if (!require("rmsfuns")) devtools::install_github("nicktz/rmsfuns") library(rmsfuns) load_pkg("ggplot2") 1.1 Get data Tidy Very important: ggplot2 wants your data to be in a tidy format. Basically (and I suggest reading Hadley s Tidy data paper), tidy data is summarised by Hadley Wickham as follows: Each variable has a column Each observation has its own row Each type of observation then forms a table The following is a perfect example of tidy data: pkgstoload <- c("lubridate", "tidyverse") load_pkg(pkgstoload) # Create some real looking fake data: data <- bind_rows( bind_rows( data.frame( Financial Econometrics (NF KATZKE) Page 2

3 date = ymd( ), Universe = "JALSH", Tickers = rep(paste0(c("aaa", "BBB", "CCC", "DDD", "EEE", "FFF"), " SJ Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"),each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "ZAR", Score = rnorm(36, 13, 4)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)), data.frame(date = ymd( ), Universe = "SPGLOB", Tickers = rep(paste0(c("ttt", "UUU", "VVV", "XXX", "YYY", "ZZZ"), " SPGLOB Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"), each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "Dollar", Score = rnorm(36)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(./lag(.) - 1))), bind_rows( data.frame( date = ymd( ), Universe = "JALSH", Tickers = rep(paste0(c("aaa", "BBB", "CCC", "DDD", "EEE", "FFF"), " SJ Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"),each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "ZAR", Score = rnorm(36, 10, 4)) %>% Financial Econometrics (NF KATZKE) Page 3

4 tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)), data.frame(date = ymd( ), Universe = "SPGLOB", Tickers = rep(paste0(c("ttt", "UUU", "VVV", "XXX", "YYY", "ZZZ"), " SPGLOB Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"), each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "Dollar", Score = rnorm(36)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)))) %>% filter(!is.na(universe)) # View(data) Notice that all the variables are in their own columns, and each observation has its own row. This is called a long format (many rows, few columns). We could also spread the data back to wide format (many columns). Let s do a little wrangling and look at what a wide format would look like, and then how to bring it back to long format: # First, let's drop sectors and currencies for this illustration, # else a whole lot of NA's are created when spreading.. (check yourself) datawide <- data %>% select(-sectors, -Currencies, -Universe) %>% spread(key = Tickers, value = Score) %>% mutate(universe = "JALSH", AnotherColumn = "Random") # Add some noise columns head(datawide) ## # A tibble: 6 x 16 Financial Econometrics (NF KATZKE) Page 4

5 ## date SubFactors `AAA SJ Equity` `BBB SJ Equity` `CCC SJ Equity` ## <date> <chr> <dbl> <dbl> <dbl> ## EBIT.EV ## FCF.EV ## ROA ## ROE ## TRR6M1M ## Volat.D ## #... with 11 more variables: `DDD SJ Equity` <dbl>, `EEE SJ ## # Equity` <dbl>, `FFF SJ Equity` <dbl>, `TTT SPGLOB Equity` <dbl>, `UUU ## # SPGLOB Equity` <dbl>, `VVV SPGLOB Equity` <dbl>, `XXX SPGLOB ## # Equity` <dbl>, `YYY SPGLOB Equity` <dbl>, `ZZZ SPGLOB Equity` <dbl>, ## # Universe <chr>, AnotherColumn <chr> Notice that in the wrangle above, all the Tickers have their own column. But as mentioned, by definition of tidyness, all columns must be a variable. As there are similarities between the wide columns (they are all Tickers) - they belong to the same column called Tickers! So we need to make the wide data tidy by gathering all the ticker columns into a single column. I specifically mutated two random columns to show you how to gather only the Ticker columns... datatidyagain <- datawide %>% gather(key = Tickers, value = Scores, contains(" Equity") ) head(datatidyagain) Note how easy that was if you know the three inputs above: key is the name column by which to distinguish the observations Financial Econometrics (NF KATZKE) Page 5

6 Observation values are given by Scores third, the columns to gather. Note I used contains(" Equity ), as all the Tickers end with Equity. Calling columns this way is useful, with other commands including: ends_with, one_of(... ), etc (see cheatsheet!). You can also pass in vectors to gather or select by - this would likely require using the package lazyeval though... (I will help with this if required as it can be complex). If we now want to focus only on, e.g., the factor ROE in order to calculate the mean ROE, we simply pipe it in dplyr (don t forget the dplyr cheatsheet): data %>% group_by(date, Universe, SubFactors) %>% filter(!is.na(score)) %>% # Filter only valid scores mutate( ZScore = (Score - mean(score, na.rm = TRUE)) / sd(score, na.rm = TRUE)) %>% ungroup() # Zscore column created ## # A tibble: 108 x 8 ## date Universe Tickers SubFactors Sectors Currencies ## <date> <chr> <chr> <chr> <chr> <chr> ## JALSH AAA SJ Equity ROE Fin ZAR ## JALSH BBB SJ Equity ROE Ind ZAR ## JALSH CCC SJ Equity ROE Cons ZAR ## JALSH DDD SJ Equity ROE Fin ZAR ## JALSH EEE SJ Equity ROE Fin ZAR ## JALSH FFF SJ Equity ROE Ind ZAR ## JALSH AAA SJ Equity ROA Fin ZAR ## JALSH BBB SJ Equity ROA Ind ZAR Financial Econometrics (NF KATZKE) Page 6

7 ## JALSH CCC SJ Equity ROA Cons ZAR ## JALSH DDD SJ Equity ROA Fin ZAR ## #... with 98 more rows, and 2 more variables: Score <dbl>, ZScore <dbl> Notice that a ZScore column has now been added to our dataframe with minimal effort. What you need to get right though (and focus with) is correctly grouping and accurately mutating. Tip: use ViewXL to check your calculation in excel if you are uncertain Plotting Tidy data To plot from a tidy format - I suggest using the powerful plotting platform ggplot2. It thinks as follows: tidy data as input aesthetic properties (is it a boxplot, lineplot, scatterplot, etc) faceting (repeating a plot type in a grid, e.g.) To plot the last tut s BRICS returns data in a line plot, e.g., do the following: # Get data: retdata <- read_csv(" load_pkg("ggthemes") # Gives you nice themes to play with... # Make data tidy: retdata <- retdata %>% gather(key = Countries, value = TRI, -Date) Financial Econometrics (NF KATZKE) Page 7

8 # Plot each country's TRI on the same plot: ggplot(data = retdata) + geom_line(aes(x = Date, y = TRI, colour = Countries)) TRI Countries brz chn ind rus zar Date # Plot each country's TRI on different plots: g1 <- ggplot(data = retdata) + geom_line(aes(x = Date, y = TRI, colour = Countries)) + facet_wrap(~countries, scales = "free") + theme_hc() # Type theme and you should see options pop up in Rstudio... Financial Econometrics (NF KATZKE) Page 8

9 print(g1) brz chn ind TRI rus zar Date Countries brz chn ind rus zar # Remove scales = free to make plots have similar scales... # To keep the plot specifications, but only plot a subset o/t data: g1 %+% subset(retdata, Date > as.date(' ')) + ggtitle("post-crisis TRI") Financial Econometrics (NF KATZKE) Page 9

10 TRI Post Crisis TRI brz chn ind rus zar Date Countries brz chn ind rus zar g1 %+% subset(retdata, Date <= as.date(' ')) + ggtitle("pre-crisis TRI") Financial Econometrics (NF KATZKE) Page 10

11 Pre Crisis TRI brz chn ind TRI rus zar Date Countries brz chn ind rus zar # How amazing was that?! Financial Econometrics (NF KATZKE) Page 11

12 Figure 1.1: Saving the plot is simple too: # Scatterplot of our created dataset earlier: g <- ggplot(data = data) + geom_boxplot(aes(x = SubFactors, y = Score, fill = SubFactors) ) print(g) # printed in Rstudio Financial Econometrics (NF KATZKE) Page 12

13 25 20 Score SubFactors EBIT.EV FCF.EV ROA ROE TRR6M1M Volat.D.60 0 EBIT.EV FCF.EV ROA ROE TRR6M1M Volat.D.60 SubFactors # Specify folder to save plot in: pathloc <- file.path("c:","practical3plot") # Specify your own location # On a mac - specify this path by hand... build_path(pathloc, Silent = F) ## [1] "C:/Practical3Plot" ggsave(filename = file.path(pathloc,"plot.png"), plot = g, width = 6, height = 6, device = "png") Financial Econometrics (NF KATZKE) Page 13

14 Play around with documentation and examples here, keep the cheatsheet close by and, of course, stackoverflow is your friend. As the course progresses, I will add plots using ggplot2. For now, take note and play with examples. You should also be able to now understand your Texevier template s graphing command (note how I added the figure s details in a function and sourced it in text). Financial Econometrics (NF KATZKE) Page 14

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Лекция 4 Трансформация данных в R

Лекция 4 Трансформация данных в R Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка

More information

Data Manipulation. Module 5

Data Manipulation.   Module 5 Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio Tidy Evaluation Lionel Henry and Hadley Wickham RStudio Tidy evaluation Our vision for dealing with a special class of R functions Usually called NSE but we prefer quoting functions Most interesting language

More information

Introducing R/Tidyverse to Clinical Statistical Programming

Introducing R/Tidyverse to Clinical Statistical Programming Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

STA130 - Class #2: Nathan Taback

STA130 - Class #2: Nathan Taback STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

CRAN and Libraries CRAN AND LIBRARIES

CRAN and Libraries CRAN AND LIBRARIES V CRAN AND LIBRARIES V CRAN and Libraries One of the major advantages of using R for data analysis is the rich and active community that surrounds it. There is a rich ecosystem of extensions (also known

More information

Lecture 12: Data carpentry with tidyverse

Lecture 12: Data carpentry with tidyverse http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Subsetting, dplyr, magrittr Author: Lloyd Low; add:

Subsetting, dplyr, magrittr Author: Lloyd Low;  add: Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017 R R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt 08 June 2017 Introduction What is R?! R is a programming language for statistical computing and graphics R is free and open-source

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

Stat. 450 Section 1 or 2: Homework 8

Stat. 450 Section 1 or 2: Homework 8 Stat. 450 Section 1 or 2: Homework 8 Prof. Eric A. Suess So how should you complete your homework for this class? First thing to do is type all of your information about the problems you do in the text

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:

More information

CSSS 512: Lab 1. Logistics & R Refresher

CSSS 512: Lab 1. Logistics & R Refresher CSSS 512: Lab 1 Logistics & R Refresher 2018-3-30 Agenda 1. Logistics Labs, Office Hours, Homeworks Goals and Expectations R, R Studio, R Markdown, L ATEX 2. Time Series Data in R Unemployment in Maine

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Package arphit. March 28, 2019

Package arphit. March 28, 2019 Type Package Title RBA-style R Plots Version 0.3.1 Author Angus Moore Package arphit March 28, 2019 Maintainer Angus Moore Easily create RBA-style graphs

More information

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar Data Wrangling Some definitions A data table is a collection of variables and observations A variable (when data are tidy) is a single column in a data table An observation is a single row in a data table,

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

Introduction to R and the tidyverse

Introduction to R and the tidyverse Introduction to R and the tidyverse Paolo Crosetto Paolo Crosetto Introduction to R and the tidyverse 1 / 58 Lecture 3: merging & tidying data Paolo Crosetto Introduction to R and the tidyverse 2 / 58

More information

Data Handling: Import, Cleaning and Visualisation

Data Handling: Import, Cleaning and Visualisation Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

Lecture 3: Data Wrangling I

Lecture 3: Data Wrangling I Lecture 3: Data Wrangling I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 12.03.2018 Outline 1 Overview

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting

More information

Lecture 3. Homework Review and Recoding I R Teaching Team. September 5, 2018

Lecture 3. Homework Review and Recoding I R Teaching Team. September 5, 2018 Lecture 3 Homework Review and Recoding I 2018 R Teaching Team September 5, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3.

More information

Julia Silge Data Scientist at Stack Overflow

Julia Silge Data Scientist at Stack Overflow SENTIMENT ANALYSIS IN R: THE TIDY WAY Welcome! Julia Silge Data Scientist at Stack Overflow In this course, you will... learn how to implement sentiment analysis using tidy data principles explore sentiment

More information

Introductory Tutorial: Part 1 Describing Data

Introductory Tutorial: Part 1 Describing Data Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Dplyr Introduction Matthew Flickinger July 12, 2017

Dplyr Introduction Matthew Flickinger July 12, 2017 Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:

More information

Data Science Template End-to-End ports Analysis

Data Science Template End-to-End ports Analysis Data Science Template End-to-End ports Analysis Graham Williams 15th September 2018 This template provides an example of a data science template for visualising data. Through visualisation we are able

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information

How to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44

How to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44 1 / 44 How to Wrangle Data using R with tidyr and dplyr Ken Butler March 30, 2015 It is said that... 2 / 44 80% of data analysis: getting the data into the right form maybe 20% is making graphs, fitting

More information

1 Introduction to Using Excel Spreadsheets

1 Introduction to Using Excel Spreadsheets Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

Data Manipulation in R

Data Manipulation in R Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017 1 / 67 Introduction to dplyr dplyr is Hadley s package for data manipulation dplyr provides abstractions for

More information

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Maps & layers Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 1. Introduction to map data 2. Map projections 3. Loading & converting

More information

CLEANING DATA IN R. Type conversions

CLEANING DATA IN R. Type conversions CLEANING DATA IN R Type conversions Types of variables in R character: "treatment", "123", "A" numeric: 23.44, 120, NaN, Inf integer: 4L, 1123L factor: factor("hello"), factor(8) logical: TRUE, FALSE,

More information

Package catenary. May 4, 2018

Package catenary. May 4, 2018 Type Package Title Fits a Catenary to Given Points Version 1.1.2 Date 2018-05-04 Package catenary May 4, 2018 Gives methods to create a catenary object and then plot it and get properties of it. Can construct

More information

Contents 1 Admin 2 Testing hypotheses tests 4 Simulation 5 Parallelization Admin

Contents 1 Admin 2 Testing hypotheses tests 4 Simulation 5 Parallelization Admin magrittr t F F .. NA library(pacman) p_load(dplyr) x % as_tibble() ## # A tibble: 5 x 2 ## a b ## ## 1 1.. ## 2 2 1 ## 3 3 2 ##

More information

Package ggmosaic. February 9, 2017

Package ggmosaic. February 9, 2017 Title Mosaic Plots in the 'ggplot2' Framework Version 0.1.2 Package ggmosaic February 9, 2017 Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Data Visualization. Module 7

Data Visualization.  Module 7 Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric

More information

Analyzing Economic Data using R

Analyzing Economic Data using R Analyzing Economic Data using R Introduction & Organization Sebastiano Manzan BUS 4093H Fall 2016 1 / 30 What is this course about? The goal of the course is to introduce you to the analysis of economic

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2 Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2 Package gggenes November 7, 2018 Provides a 'ggplot2' geom and helper functions for drawing gene arrow maps. Depends R (>= 3.3.0) Imports grid (>=

More information

Loading Data into R. Loading Data Sets

Loading Data into R. Loading Data Sets Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three

More information

PRESENTING DATA. Overview. Some basic things to remember

PRESENTING DATA. Overview. Some basic things to remember PRESENTING DATA This handout is one of a series that accompanies An Adventure in Statistics: The Reality Enigma by me, Andy Field. These handouts are offered for free (although I hope you will buy the

More information

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29 dplyr Bjarki Þór Elvarsson and Einar Hjörleifsson Marine Research Institute Bjarki&Einar (MRI) R-ICES 1 / 29 Working with data A Reformat a variable (e.g. as factors or dates) B Split one variable into

More information

Data Manipulation using dplyr

Data Manipulation using dplyr Data Manipulation in R Reading and Munging Data L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2017 Data Manipulation using dplyr The dplyr is a package

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Finance Data: Datastream. An Introduction Guide. Table Of Content. 1. Introduction When do I use Datastream 2

Finance Data: Datastream. An Introduction Guide. Table Of Content. 1. Introduction When do I use Datastream 2 Finance Data: Datastream An Introduction Guide Table Of Content 1. Introduction 2 1.1 When do I use Datastream 2 2. Getting started: Open up Excel with Datastream 2 2.1 Import simple time-series data 3

More information

INTRODUCTION TO DATA. Welcome to the course!

INTRODUCTION TO DATA. Welcome to the course! INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Excel Simulations - 1

Excel Simulations - 1 Excel Simulations - [] We are going to look at a number of ways Excel can be used to create worksheet simulations that help students visualize concepts. The first type of simulation we will create will

More information

Equities and Fixed Income. Introduction Manual

Equities and Fixed Income. Introduction Manual Finance Data Thomson Reuters Eikon Equities and Fixed Income Introduction Manual March 2017 Contents 1. Introduction... 3 1.1 When do I use Eikon?... 3 1.2 Request access for Eikon... 3 1.3 Where can I

More information

Package lvplot. August 29, 2016

Package lvplot. August 29, 2016 Version 0.2.0 Title Letter Value 'Boxplots' Package lvplot August 29, 2016 Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data

More information

Homework 5: Spatial Games : Programming for Scientists Due: Thursday, March 3, 2016 at 11:59 PM

Homework 5: Spatial Games : Programming for Scientists Due: Thursday, March 3, 2016 at 11:59 PM Homework 5: Spatial Games 02-201: Programming for Scientists Due: Thursday, March 3, 2016 at 11:59 PM 1. Reading Read Ch. 8 and Ch. 9 of An Introduction to Programming in Go (on pointers and structs).

More information

One PageR Data Science. # Dates and time.

One PageR Data Science. # Dates and time. Graham.Williams@togaware.com 16th May 2018 Visit https://essentials.togaware.com/onepagers for more Essentials. Date and time data is common in many disciplines, particularly where our observations are

More information

Creating Functions in R_Instructor

Creating Functions in R_Instructor Creating Functions in R_Instructor October 18, 2017 In [57]: library(repr) options(repr.plot.width=4, repr.plot.height=3) 1 Creating Functions in R Abstracting your code into many small functions is key

More information

Package ggsubplot. February 15, 2013

Package ggsubplot. February 15, 2013 Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett

More information

Rediscover Charts IN THIS CHAPTER NOTE. Inserting Excel Charts into PowerPoint. Getting Inside a Chart. Understanding Chart Layouts

Rediscover Charts IN THIS CHAPTER NOTE. Inserting Excel Charts into PowerPoint. Getting Inside a Chart. Understanding Chart Layouts 6 Rediscover Charts Brand new to Office 2007 is the new version of Charts to replace the old Microsoft Graph Chart and the Microsoft Excel Graph both of which were inserted as OLE objects in previous versions

More information

Equities and Fixed Income. Introduction Manual

Equities and Fixed Income. Introduction Manual Finance Data Thomson Reuters Eikon Equities and Fixed Income Introduction Manual Date Author 01-03-2017 Nicky Zaugg 17-10-2017 Nicky Zaugg Contents 1. Introduction... 3 1.1 When do I use Eikon?... 3 1.2

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Actual Major League Baseball Salaries ( )

Actual Major League Baseball Salaries ( ) Chapter 2: Organizing and Presenting Data (Page 31) Why do we use graphs? Organize Summarize Analyze Data In a nutshell, Graphs make it easier to: understand describe what is going on with the data Definition

More information

Data cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis. Data Wrangling 1

Data cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis. Data Wrangling 1 Data cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis Data Wrangling 1 Data Wrangling 2 Executive Summary Through data wrangling, data is prepared for further

More information

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 1. Diving in: scatterplots & aesthetics 2. Facetting 3. Geoms

More information

lazyeval A uniform approach to NSE

lazyeval A uniform approach to NSE lazyeval A uniform approach to NSE July 2016 Hadley Wickham @hadleywickham Chief Scientist, RStudio Motivation Take this simple variant of subset() subset

More information

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Data input & output Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University June 2012 1. Working directories 2. Loading data 3. Strings and factors

More information

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12 Stat405 More about data Hadley Wickham 1. (Data update + announcement) 2. Motivating problem 3. External data 4. Strings and factors 5. Saving data Slot machines they be sure casinos are honest? CC by-nc-nd:

More information

The Survey System Tutorial. CATI Surveys

The Survey System Tutorial. CATI Surveys The Survey System Tutorial CATI Surveys The Survey System offers two kinds of telephone interviewing: desktop CATI, in which interviewers are in a central location using PC software to connect to a local

More information

Making use of other Applications

Making use of other Applications AppGameKit 2 Collision Using Arrays Making use of other Applications Although we need game software to help makes games for modern devices, we should not exclude the use of other applications to aid the

More information

Package anomalize. April 17, 2018

Package anomalize. April 17, 2018 Type Package Title Tidy Anomaly Detection Version 0.1.1 Package anomalize April 17, 2018 The 'anomalize' package enables a ``tidy'' workflow for detecting anomalies in data. The main functions are time_decompose(),

More information

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont

Quick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date

More information

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer

Hadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer Hadley Wickham ggplot2 Elegant Graphics for Data Analysis July 26, 2016 Springer To my parents, Alison & Brian Wickham. Without them, and their unconditional love and support, none of this would have

More information

MEASURING WELLBEING EMPIRICAL PROJECT 4. Key concepts. LEARNING OBJECTIVES In this project you will:

MEASURING WELLBEING EMPIRICAL PROJECT 4. Key concepts. LEARNING OBJECTIVES In this project you will: EMPIRICAL PROJECT 4 MEASURING WELLBEING LEARNING OBJECTIVES In this project you will: check datasets for missing data sort data and assign ranks based on values distinguish between time series and cross

More information

Package tibble. August 22, 2017

Package tibble. August 22, 2017 Encoding UTF-8 Version 1.3.4 Title Simple Data Frames Package tibble August 22, 2017 Provides a 'tbl_df' class (the 'tibble') that provides stricter checking and better formatting than the traditional

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

CPSC 217 Midterm (Python 3 version)

CPSC 217 Midterm (Python 3 version) CPSC 217 Midterm (Python 3 version) Duration: 60 minutes 7 March 2011 This exam has 81 questions and 14 pages. This exam is closed book. No notes, books, calculators or electronic devices, or other assistance

More information

Package GetITRData. October 22, 2017

Package GetITRData. October 22, 2017 Package GetITRData October 22, 2017 Title Reading Financial Reports from Bovespa's ITR System Version 0.6 Date 2017-10-21 Reads quarterly and annual financial reports including assets, liabilities, income

More information

Making sense of census microdata

Making sense of census microdata Making sense of census microdata Tutorial 3: Creating aggregated variables and visualisations First, open a new script in R studio and save it in your working directory, so you will be able to access this

More information

Scripting Tutorial - Lesson 2

Scripting Tutorial - Lesson 2 Home TI-Nspire Authoring TI-Nspire Scripting HQ Scripting Tutorial - Lesson 2 Scripting Tutorial - Lesson 2 Download supporting files for this tutorial Texas Instruments TI-Nspire Scripting Support Page

More information

Transform Data! The Basics Part I!

Transform Data! The Basics Part I! Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used

More information