Financial Econometrics Practical
|
|
- Felix Chandler
- 5 years ago
- Views:
Transcription
1 Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction Install ggplot Get data Tidy Plotting Tidy data Introduction The aim of this tutorial is to introduce you to tidy analysis in R - which is essential for making plots easily and nicely # Plotting in R While there are many and extremely diverse packages that can be used for plotting purposes in R - the one I use most is undoubtedly ggplot2. For dynamic plotting and financial series plots, there are also packages like dygraphs and plotly. Here follows a very basic, high level view of ggplot2 plotting functionality. I suggest supplementing this tutorial by reading this post with examples, and saving or printing out this ggplot cheatsheet. 1
2 1.0.1 Install ggplot2 First things first, let s ensure ggplot2 is correctly installed on our machines: if (!require("devtools")) install.packages("devtools") if (!require("rmsfuns")) devtools::install_github("nicktz/rmsfuns") library(rmsfuns) load_pkg("ggplot2") 1.1 Get data Tidy Very important: ggplot2 wants your data to be in a tidy format. Basically (and I suggest reading Hadley s Tidy data paper), tidy data is summarised by Hadley Wickham as follows: Each variable has a column Each observation has its own row Each type of observation then forms a table The following is a perfect example of tidy data: pkgstoload <- c("lubridate", "tidyverse") load_pkg(pkgstoload) # Create some real looking fake data: data <- bind_rows( bind_rows( data.frame( Financial Econometrics (NF KATZKE) Page 2
3 date = ymd( ), Universe = "JALSH", Tickers = rep(paste0(c("aaa", "BBB", "CCC", "DDD", "EEE", "FFF"), " SJ Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"),each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "ZAR", Score = rnorm(36, 13, 4)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)), data.frame(date = ymd( ), Universe = "SPGLOB", Tickers = rep(paste0(c("ttt", "UUU", "VVV", "XXX", "YYY", "ZZZ"), " SPGLOB Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"), each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "Dollar", Score = rnorm(36)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(./lag(.) - 1))), bind_rows( data.frame( date = ymd( ), Universe = "JALSH", Tickers = rep(paste0(c("aaa", "BBB", "CCC", "DDD", "EEE", "FFF"), " SJ Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"),each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "ZAR", Score = rnorm(36, 10, 4)) %>% Financial Econometrics (NF KATZKE) Page 3
4 tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)), data.frame(date = ymd( ), Universe = "SPGLOB", Tickers = rep(paste0(c("ttt", "UUU", "VVV", "XXX", "YYY", "ZZZ"), " SPGLOB Equity"), 6), SubFactors = rep(c("roe", "ROA", "EBIT.EV", "FCF.EV", "Volat.D.60", "TRR6M1M"), each = 6), Sectors = rep(c("fin", "Ind", "Cons", "Fin", "Fin", "Ind"), 6), Currencies = "Dollar", Score = rnorm(36)) %>% tbl_df() %>% mutate(date = ymd(date)) %>% mutate_at(.vars = vars(-date, -Score),.funs = funs(as.character)))) %>% filter(!is.na(universe)) # View(data) Notice that all the variables are in their own columns, and each observation has its own row. This is called a long format (many rows, few columns). We could also spread the data back to wide format (many columns). Let s do a little wrangling and look at what a wide format would look like, and then how to bring it back to long format: # First, let's drop sectors and currencies for this illustration, # else a whole lot of NA's are created when spreading.. (check yourself) datawide <- data %>% select(-sectors, -Currencies, -Universe) %>% spread(key = Tickers, value = Score) %>% mutate(universe = "JALSH", AnotherColumn = "Random") # Add some noise columns head(datawide) ## # A tibble: 6 x 16 Financial Econometrics (NF KATZKE) Page 4
5 ## date SubFactors `AAA SJ Equity` `BBB SJ Equity` `CCC SJ Equity` ## <date> <chr> <dbl> <dbl> <dbl> ## EBIT.EV ## FCF.EV ## ROA ## ROE ## TRR6M1M ## Volat.D ## #... with 11 more variables: `DDD SJ Equity` <dbl>, `EEE SJ ## # Equity` <dbl>, `FFF SJ Equity` <dbl>, `TTT SPGLOB Equity` <dbl>, `UUU ## # SPGLOB Equity` <dbl>, `VVV SPGLOB Equity` <dbl>, `XXX SPGLOB ## # Equity` <dbl>, `YYY SPGLOB Equity` <dbl>, `ZZZ SPGLOB Equity` <dbl>, ## # Universe <chr>, AnotherColumn <chr> Notice that in the wrangle above, all the Tickers have their own column. But as mentioned, by definition of tidyness, all columns must be a variable. As there are similarities between the wide columns (they are all Tickers) - they belong to the same column called Tickers! So we need to make the wide data tidy by gathering all the ticker columns into a single column. I specifically mutated two random columns to show you how to gather only the Ticker columns... datatidyagain <- datawide %>% gather(key = Tickers, value = Scores, contains(" Equity") ) head(datatidyagain) Note how easy that was if you know the three inputs above: key is the name column by which to distinguish the observations Financial Econometrics (NF KATZKE) Page 5
6 Observation values are given by Scores third, the columns to gather. Note I used contains(" Equity ), as all the Tickers end with Equity. Calling columns this way is useful, with other commands including: ends_with, one_of(... ), etc (see cheatsheet!). You can also pass in vectors to gather or select by - this would likely require using the package lazyeval though... (I will help with this if required as it can be complex). If we now want to focus only on, e.g., the factor ROE in order to calculate the mean ROE, we simply pipe it in dplyr (don t forget the dplyr cheatsheet): data %>% group_by(date, Universe, SubFactors) %>% filter(!is.na(score)) %>% # Filter only valid scores mutate( ZScore = (Score - mean(score, na.rm = TRUE)) / sd(score, na.rm = TRUE)) %>% ungroup() # Zscore column created ## # A tibble: 108 x 8 ## date Universe Tickers SubFactors Sectors Currencies ## <date> <chr> <chr> <chr> <chr> <chr> ## JALSH AAA SJ Equity ROE Fin ZAR ## JALSH BBB SJ Equity ROE Ind ZAR ## JALSH CCC SJ Equity ROE Cons ZAR ## JALSH DDD SJ Equity ROE Fin ZAR ## JALSH EEE SJ Equity ROE Fin ZAR ## JALSH FFF SJ Equity ROE Ind ZAR ## JALSH AAA SJ Equity ROA Fin ZAR ## JALSH BBB SJ Equity ROA Ind ZAR Financial Econometrics (NF KATZKE) Page 6
7 ## JALSH CCC SJ Equity ROA Cons ZAR ## JALSH DDD SJ Equity ROA Fin ZAR ## #... with 98 more rows, and 2 more variables: Score <dbl>, ZScore <dbl> Notice that a ZScore column has now been added to our dataframe with minimal effort. What you need to get right though (and focus with) is correctly grouping and accurately mutating. Tip: use ViewXL to check your calculation in excel if you are uncertain Plotting Tidy data To plot from a tidy format - I suggest using the powerful plotting platform ggplot2. It thinks as follows: tidy data as input aesthetic properties (is it a boxplot, lineplot, scatterplot, etc) faceting (repeating a plot type in a grid, e.g.) To plot the last tut s BRICS returns data in a line plot, e.g., do the following: # Get data: retdata <- read_csv(" load_pkg("ggthemes") # Gives you nice themes to play with... # Make data tidy: retdata <- retdata %>% gather(key = Countries, value = TRI, -Date) Financial Econometrics (NF KATZKE) Page 7
8 # Plot each country's TRI on the same plot: ggplot(data = retdata) + geom_line(aes(x = Date, y = TRI, colour = Countries)) TRI Countries brz chn ind rus zar Date # Plot each country's TRI on different plots: g1 <- ggplot(data = retdata) + geom_line(aes(x = Date, y = TRI, colour = Countries)) + facet_wrap(~countries, scales = "free") + theme_hc() # Type theme and you should see options pop up in Rstudio... Financial Econometrics (NF KATZKE) Page 8
9 print(g1) brz chn ind TRI rus zar Date Countries brz chn ind rus zar # Remove scales = free to make plots have similar scales... # To keep the plot specifications, but only plot a subset o/t data: g1 %+% subset(retdata, Date > as.date(' ')) + ggtitle("post-crisis TRI") Financial Econometrics (NF KATZKE) Page 9
10 TRI Post Crisis TRI brz chn ind rus zar Date Countries brz chn ind rus zar g1 %+% subset(retdata, Date <= as.date(' ')) + ggtitle("pre-crisis TRI") Financial Econometrics (NF KATZKE) Page 10
11 Pre Crisis TRI brz chn ind TRI rus zar Date Countries brz chn ind rus zar # How amazing was that?! Financial Econometrics (NF KATZKE) Page 11
12 Figure 1.1: Saving the plot is simple too: # Scatterplot of our created dataset earlier: g <- ggplot(data = data) + geom_boxplot(aes(x = SubFactors, y = Score, fill = SubFactors) ) print(g) # printed in Rstudio Financial Econometrics (NF KATZKE) Page 12
13 25 20 Score SubFactors EBIT.EV FCF.EV ROA ROE TRR6M1M Volat.D.60 0 EBIT.EV FCF.EV ROA ROE TRR6M1M Volat.D.60 SubFactors # Specify folder to save plot in: pathloc <- file.path("c:","practical3plot") # Specify your own location # On a mac - specify this path by hand... build_path(pathloc, Silent = F) ## [1] "C:/Practical3Plot" ggsave(filename = file.path(pathloc,"plot.png"), plot = g, width = 6, height = 6, device = "png") Financial Econometrics (NF KATZKE) Page 13
14 Play around with documentation and examples here, keep the cheatsheet close by and, of course, stackoverflow is your friend. As the course progresses, I will add plots using ggplot2. For now, take note and play with examples. You should also be able to now understand your Texevier template s graphing command (note how I added the figure s details in a function and sourced it in text). Financial Econometrics (NF KATZKE) Page 14
An Introduction to R. Ed D. J. Berry 9th January 2017
An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient
More informationSession 3 Nick Hathaway;
Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................
More informationЛекция 4 Трансформация данных в R
Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка
More informationData Manipulation. Module 5
Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping
More informationLecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018
Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The
More informationData Import and Formatting
Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data
More informationA Whistle-Stop Tour of the Tidyverse
A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available
More informationTidy Evaluation. Lionel Henry and Hadley Wickham RStudio
Tidy Evaluation Lionel Henry and Hadley Wickham RStudio Tidy evaluation Our vision for dealing with a special class of R functions Usually called NSE but we prefer quoting functions Most interesting language
More informationIntroducing R/Tidyverse to Clinical Statistical Programming
Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic
More informationData visualization with ggplot2
Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2
More informationThe diamonds dataset Visualizing data in R with ggplot2
Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part
More informationIntroduction to Graphics with ggplot2
Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to
More informationStatistical transformations
Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn
More informationSTA130 - Class #2: Nathan Taback
STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and
More informationData Wrangling in the Tidyverse
Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction
More informationCRAN and Libraries CRAN AND LIBRARIES
V CRAN AND LIBRARIES V CRAN and Libraries One of the major advantages of using R for data analysis is the rich and active community that surrounds it. There is a rich ecosystem of extensions (also known
More informationLecture 12: Data carpentry with tidyverse
http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)
More informationAn introduction to ggplot: An implementation of the grammar of graphics in R
An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics
More informationSubsetting, dplyr, magrittr Author: Lloyd Low; add:
Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And
More informationFacets and Continuous graphs
Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display
More informationR. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017
R R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt 08 June 2017 Introduction What is R?! R is a programming language for statistical computing and graphics R is free and open-source
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More information03 - Intro to graphics (with ggplot2)
3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................
More informationStat. 450 Section 1 or 2: Homework 8
Stat. 450 Section 1 or 2: Homework 8 Prof. Eric A. Suess So how should you complete your homework for this class? First thing to do is type all of your information about the problems you do in the text
More informationThe Tidyverse BIOF 339 9/25/2018
The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:
More informationCSSS 512: Lab 1. Logistics & R Refresher
CSSS 512: Lab 1 Logistics & R Refresher 2018-3-30 Agenda 1. Logistics Labs, Office Hours, Homeworks Goals and Expectations R, R Studio, R Markdown, L ATEX 2. Time Series Data in R Unemployment in Maine
More informationRstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang
Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationPackage arphit. March 28, 2019
Type Package Title RBA-style R Plots Version 0.3.1 Author Angus Moore Package arphit March 28, 2019 Maintainer Angus Moore Easily create RBA-style graphs
More informationData wrangling. Reduction/Aggregation: reduces a variable to a scalar
Data Wrangling Some definitions A data table is a collection of variables and observations A variable (when data are tidy) is a single column in a data table An observation is a single row in a data table,
More informationK-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017
K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold
More informationIntroduction to R and the tidyverse
Introduction to R and the tidyverse Paolo Crosetto Paolo Crosetto Introduction to R and the tidyverse 1 / 58 Lecture 3: merging & tidying data Paolo Crosetto Introduction to R and the tidyverse 2 / 58
More informationData Handling: Import, Cleaning and Visualisation
Data Handling: Import, Cleaning and Visualisation 1 Data Display Lecture 11: Visualisation and Dynamic Documents Prof. Dr. Ulrich Matter (University of St. Gallen) 13/12/18 In the last part of a data pipeline
More informationAssignment 5.5. Nothing here to hand in
Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:
More informationLecture 3: Data Wrangling I
Lecture 3: Data Wrangling I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 12.03.2018 Outline 1 Overview
More informationData Import and Export
Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationSession 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA
Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting
More informationLecture 3. Homework Review and Recoding I R Teaching Team. September 5, 2018
Lecture 3 Homework Review and Recoding I 2018 R Teaching Team September 5, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3.
More informationJulia Silge Data Scientist at Stack Overflow
SENTIMENT ANALYSIS IN R: THE TIDY WAY Welcome! Julia Silge Data Scientist at Stack Overflow In this course, you will... learn how to implement sentiment analysis using tidy data principles explore sentiment
More informationIntroductory Tutorial: Part 1 Describing Data
Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the
More informationIntroduction to R and the tidyverse. Paolo Crosetto
Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:
More informationDplyr Introduction Matthew Flickinger July 12, 2017
Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationData Science Template End-to-End ports Analysis
Data Science Template End-to-End ports Analysis Graham Williams 15th September 2018 This template provides an example of a data science template for visualising data. Through visualisation we are able
More informationLecture 4: Data Visualization I
Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview
More informationVisualizing the World
Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing
More informationHow to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44
1 / 44 How to Wrangle Data using R with tidyr and dplyr Ken Butler March 30, 2015 It is said that... 2 / 44 80% of data analysis: getting the data into the right form maybe 20% is making graphs, fitting
More information1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More informationAn Introduction to R Graphics
An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost
More informationData Manipulation in R
Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017 1 / 67 Introduction to dplyr dplyr is Hadley s package for data manipulation dplyr provides abstractions for
More informationMaps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.
Maps & layers Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 1. Introduction to map data 2. Map projections 3. Loading & converting
More informationCLEANING DATA IN R. Type conversions
CLEANING DATA IN R Type conversions Types of variables in R character: "treatment", "123", "A" numeric: 23.44, 120, NaN, Inf integer: 4L, 1123L factor: factor("hello"), factor(8) logical: TRUE, FALSE,
More informationPackage catenary. May 4, 2018
Type Package Title Fits a Catenary to Given Points Version 1.1.2 Date 2018-05-04 Package catenary May 4, 2018 Gives methods to create a catenary object and then plot it and get properties of it. Can construct
More informationContents 1 Admin 2 Testing hypotheses tests 4 Simulation 5 Parallelization Admin
magrittr t F F .. NA library(pacman) p_load(dplyr) x % as_tibble() ## # A tibble: 5 x 2 ## a b ## ## 1 1.. ## 2 2 1 ## 3 3 2 ##
More informationPackage ggmosaic. February 9, 2017
Title Mosaic Plots in the 'ggplot2' Framework Version 0.1.2 Package ggmosaic February 9, 2017 Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer
More informationImporting and visualizing data in R. Day 3
Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation
More informationIntroduction to Data Visualization
Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en
More informationData Visualization. Module 7
Data Visualization http://datascience.tntlab.org Module 7 Today s Agenda A Brief Reminder to Update your Software A walkthrough of ggplot2 Big picture New cheatsheet, with some familiar caveats Geometric
More informationAnalyzing Economic Data using R
Analyzing Economic Data using R Introduction & Organization Sebastiano Manzan BUS 4093H Fall 2016 1 / 30 What is this course about? The goal of the course is to introduce you to the analysis of economic
More informationFuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)
Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing
More informationPackage gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2
Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2 Package gggenes November 7, 2018 Provides a 'ggplot2' geom and helper functions for drawing gene arrow maps. Depends R (>= 3.3.0) Imports grid (>=
More informationLoading Data into R. Loading Data Sets
Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three
More informationPRESENTING DATA. Overview. Some basic things to remember
PRESENTING DATA This handout is one of a series that accompanies An Adventure in Statistics: The Reality Enigma by me, Andy Field. These handouts are offered for free (although I hope you will buy the
More informationGrammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29
dplyr Bjarki Þór Elvarsson and Einar Hjörleifsson Marine Research Institute Bjarki&Einar (MRI) R-ICES 1 / 29 Working with data A Reformat a variable (e.g. as factors or dates) B Split one variable into
More informationData Manipulation using dplyr
Data Manipulation in R Reading and Munging Data L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2017 Data Manipulation using dplyr The dplyr is a package
More informationThe Average and SD in R
The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the
More informationFinance Data: Datastream. An Introduction Guide. Table Of Content. 1. Introduction When do I use Datastream 2
Finance Data: Datastream An Introduction Guide Table Of Content 1. Introduction 2 1.1 When do I use Datastream 2 2. Getting started: Open up Excel with Datastream 2 2.1 Import simple time-series data 3
More informationINTRODUCTION TO DATA. Welcome to the course!
INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationExcel Simulations - 1
Excel Simulations - [] We are going to look at a number of ways Excel can be used to create worksheet simulations that help students visualize concepts. The first type of simulation we will create will
More informationEquities and Fixed Income. Introduction Manual
Finance Data Thomson Reuters Eikon Equities and Fixed Income Introduction Manual March 2017 Contents 1. Introduction... 3 1.1 When do I use Eikon?... 3 1.2 Request access for Eikon... 3 1.3 Where can I
More informationPackage lvplot. August 29, 2016
Version 0.2.0 Title Letter Value 'Boxplots' Package lvplot August 29, 2016 Implements the letter value 'boxplot' which extends the standard 'boxplot' to deal with both larger and smaller number of data
More informationHomework 5: Spatial Games : Programming for Scientists Due: Thursday, March 3, 2016 at 11:59 PM
Homework 5: Spatial Games 02-201: Programming for Scientists Due: Thursday, March 3, 2016 at 11:59 PM 1. Reading Read Ch. 8 and Ch. 9 of An Introduction to Programming in Go (on pointers and structs).
More informationOne PageR Data Science. # Dates and time.
Graham.Williams@togaware.com 16th May 2018 Visit https://essentials.togaware.com/onepagers for more Essentials. Date and time data is common in many disciplines, particularly where our observations are
More informationCreating Functions in R_Instructor
Creating Functions in R_Instructor October 18, 2017 In [57]: library(repr) options(repr.plot.width=4, repr.plot.height=3) 1 Creating Functions in R Abstracting your code into many small functions is key
More informationPackage ggsubplot. February 15, 2013
Package ggsubplot February 15, 2013 Maintainer Garrett Grolemund License GPL Title Explore complex data by embedding subplots within plots. LazyData true Type Package Author Garrett
More informationRediscover Charts IN THIS CHAPTER NOTE. Inserting Excel Charts into PowerPoint. Getting Inside a Chart. Understanding Chart Layouts
6 Rediscover Charts Brand new to Office 2007 is the new version of Charts to replace the old Microsoft Graph Chart and the Microsoft Excel Graph both of which were inserted as OLE objects in previous versions
More informationEquities and Fixed Income. Introduction Manual
Finance Data Thomson Reuters Eikon Equities and Fixed Income Introduction Manual Date Author 01-03-2017 Nicky Zaugg 17-10-2017 Nicky Zaugg Contents 1. Introduction... 3 1.1 When do I use Eikon?... 3 1.2
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationActual Major League Baseball Salaries ( )
Chapter 2: Organizing and Presenting Data (Page 31) Why do we use graphs? Organize Summarize Analyze Data In a nutshell, Graphs make it easier to: understand describe what is going on with the data Definition
More informationData cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis. Data Wrangling 1
Data cleansing and wrangling with Diabetes.csv data set Shiloh Bradley Webster University St. Louis Data Wrangling 1 Data Wrangling 2 Executive Summary Through data wrangling, data is prepared for further
More informationggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011
ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 1. Diving in: scatterplots & aesthetics 2. Facetting 3. Geoms
More informationlazyeval A uniform approach to NSE
lazyeval A uniform approach to NSE July 2016 Hadley Wickham @hadleywickham Chief Scientist, RStudio Motivation Take this simple variant of subset() subset
More informationData input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.
Data input & output Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University June 2012 1. Working directories 2. Loading data 3. Strings and factors
More informationStat405. More about data. Hadley Wickham. Tuesday, September 11, 12
Stat405 More about data Hadley Wickham 1. (Data update + announcement) 2. Motivating problem 3. External data 4. Strings and factors 5. Saving data Slot machines they be sure casinos are honest? CC by-nc-nd:
More informationThe Survey System Tutorial. CATI Surveys
The Survey System Tutorial CATI Surveys The Survey System offers two kinds of telephone interviewing: desktop CATI, in which interviewers are in a central location using PC software to connect to a local
More informationMaking use of other Applications
AppGameKit 2 Collision Using Arrays Making use of other Applications Although we need game software to help makes games for modern devices, we should not exclude the use of other applications to aid the
More informationPackage anomalize. April 17, 2018
Type Package Title Tidy Anomaly Detection Version 0.1.1 Package anomalize April 17, 2018 The 'anomalize' package enables a ``tidy'' workflow for detecting anomalies in data. The main functions are time_decompose(),
More informationQuick introduction to descriptive statistics and graphs in. R Commander. Written by: Robin Beaumont
Quick introduction to descriptive statistics and graphs in R Commander Written by: Robin Beaumont e-mail: robin@organplayers.co.uk http://www.robin-beaumont.co.uk/virtualclassroom/stats/course1.html Date
More informationHadley Wickham. ggplot2. Elegant Graphics for Data Analysis. July 26, Springer
Hadley Wickham ggplot2 Elegant Graphics for Data Analysis July 26, 2016 Springer To my parents, Alison & Brian Wickham. Without them, and their unconditional love and support, none of this would have
More informationMEASURING WELLBEING EMPIRICAL PROJECT 4. Key concepts. LEARNING OBJECTIVES In this project you will:
EMPIRICAL PROJECT 4 MEASURING WELLBEING LEARNING OBJECTIVES In this project you will: check datasets for missing data sort data and assign ranks based on values distinguish between time series and cross
More informationPackage tibble. August 22, 2017
Encoding UTF-8 Version 1.3.4 Title Simple Data Frames Package tibble August 22, 2017 Provides a 'tbl_df' class (the 'tibble') that provides stricter checking and better formatting than the traditional
More informationAssignment 0. Nothing here to hand in
Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very
More informationCPSC 217 Midterm (Python 3 version)
CPSC 217 Midterm (Python 3 version) Duration: 60 minutes 7 March 2011 This exam has 81 questions and 14 pages. This exam is closed book. No notes, books, calculators or electronic devices, or other assistance
More informationPackage GetITRData. October 22, 2017
Package GetITRData October 22, 2017 Title Reading Financial Reports from Bovespa's ITR System Version 0.6 Date 2017-10-21 Reads quarterly and annual financial reports including assets, liabilities, income
More informationMaking sense of census microdata
Making sense of census microdata Tutorial 3: Creating aggregated variables and visualisations First, open a new script in R studio and save it in your working directory, so you will be able to access this
More informationScripting Tutorial - Lesson 2
Home TI-Nspire Authoring TI-Nspire Scripting HQ Scripting Tutorial - Lesson 2 Scripting Tutorial - Lesson 2 Download supporting files for this tutorial Texas Instruments TI-Nspire Scripting Support Page
More informationTransform Data! The Basics Part I!
Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used
More information