An Exploration of the Dark Arts. reshape/reshape2, plyr & ggplot2

Size: px
Start display at page:

Download "An Exploration of the Dark Arts. reshape/reshape2, plyr & ggplot2"

Transcription

1 An Exploration of the Dark Arts reshape/reshape2, & ggplot2

2 Hadley Wickham Won the John M. Chambers Statistical Software Award (2006) ggplot and reshape Author of 14 R packages since 2005 Basic philosopy appears to be unifying very useful functionalities into a common sytax within R

3 The Dark Arts Wickham s wizardry Unified a lot of R s (non-modeling) functionality into three packages Allows very flexible data manipulation and visualization Syntax different, yet logical New and improved (and much faster)

4 Three unforgivable packages Unforgivable if you don t know and (sometimes) use them (split-analyze-combine paradigm for analysis) reshape/reshape2 (Data manipulation and aggregation) ggplot2 (Visualization using Grammar of Graphics)

5 Three unforgivable packages reshape reshape2 ggplot2

6 Long data

7 Long data indexed by variables x1 x2 x3 x4 x5...

8 Long data x1 indexed by variables x2 Split x3 x4 x5

9 Long data x1 indexed by variables x2 Split x3 Do something x4 x5

10 Long data indexed by variables Split x1 x2 x3 x4 x5 Do something... Put back together

11 Do something create summaries Plot by index reshape ggplot2 x1 x2 x3 x4 x5

12 Do something Transform data ddply(dat,.(by), function) Model by index ldply(dat,.(by), function)

13 Do something Transform data ddply(dat,.(by), function) Model by index ldply(dat,.(by), function) Input data.frame Input data.frame ldply Output list ddply Output data.frame

14 Have time series or repeated measurements Want baseline measurements for each subject library() baseline <- ddply(data1,"mrno", MakeBaseline) # baseline <- ddply(data1,.(mrno), MakeBaseline) # baseline <- ddply(data1,~mrno, MakeBaseline) MakeBaseline takes earliest observations from a particular subject

15 Have time series or repeated measurements Want baseline measurements for each subject

16 Want to add variables by id library() newdata <- ddply(data1,"mrno", transform, tempc=5*(temp-32)/9)

17 Want to add variables by id

18 Want to summarize data by id library() ddply(baseball, "id", summarise, duration = max(year) - min(year), nteams = length(unique(team)))

19 Want to summarize data by id

20 Want to run models by index library() ldply(warpbreaks, "tension", function(x) lm(breaks~wool, data=x)) ldply(warpbreaks, tension, function(x) summary(lm(breaks~wool, data=x))$coef)

21 Want to run models by index

22 lapply aggregate apply mapply tapply by with

23

24 Now faster with parallel computing

25 reshape Wide data

26 reshape Long data

27 reshape Two basic functions: melt (wide to long) cast (long to aggregate) Makes splitting data by variable much easier

28 reshape id age sex Dx1 Dx2 Dx3 Dx M F

29 melt(data, id.vars=c( id, age, sex )) # melt(data, id.vars = 1:3) id age sex variable value 1 24 M Dx M Dx M Dx M Dx F Dx F Dx F Dx F Dx4 NA

30 reshape This splitting then allows aggregating using cast cast(melted, variable~sex, length) Dx1 Dx2 Dx3 Dx4 M F

31 reshape This splitting then allows aggregating using cast cast(melted,variable~sex, function(x)sum(is.na(x)) Dx1 Dx2 Dx3 Dx4 M F

32 ggplot2 ggplot2 plays really well with and reshape With you can create strata-specific plot objects With reshape you can easily create grouping variables ldply(dat,index, function(x){ p <- qplot(x,y, data=x) return(p) }) melted <- melt(data, id.vars=1:3) ggplot(melted, aes(x,y,groups=variable)+geom_point()

33 Example data(baseball) demo = baseball[baseball$year==2007, c(1,3,7:9)] id team ab r h francju01 ATL francju01 NYN zaungr01 TOR witasja01 TBA williwo02 HOU wickmbo01 ARI 0 0 0

34 Example data(baseball) demo = baseball[baseball$year==2007, c(1,3,7:9)] demo2 = melt(demo, id=1:2) id team variable value 1 francju01 ATL ab 40 2 francju01 NYN ab 50 3 zaungr01 TOR ab witasja01 TBA ab 0 5 williwo02 HOU ab 59 6 wickmbo01 ARI ab 0

35 Example data(baseball) demo = baseball[baseball$year==2007, c(1,3,7:9)] demo2 = melt(demo, id=1:2) demo3 = cast(demo2, team~variable, mean) team ab r h 1 ARI ATL BAL BOS CHA CHN

36 Example data(baseball) demo = baseball[baseball$year==2007, c(1,3,7:9)] demo2 = melt(demo, id=1:2) demo3 = cast(demo2, team~variable, mean) demo4 = melt(demo3, id=team) demo4$team = as.numeric(as.factor(demo4$team) qplot(team, value, color=variable, data=demo4, geom= line ) team value variable ab 1 18 ab ab ab ab ab ab ab ab ab ab ab

A quick introduction to plyr

A quick introduction to plyr A quick introduction to plyr Sean Anderson April 14, 2011 What is plyr? It s a bundle of awesomeness (i.e. an R package) that makes it simple to split apart data, do stuff to it, and mash it back together.

More information

Stat405. Tables. Hadley Wickham. Tuesday, October 23, 12

Stat405. Tables. Hadley Wickham. Tuesday, October 23, 12 Stat405 Tables Hadley Wickham Today we will use the reshape2 and xtable packages, and the movies.csv.bz2 dataset. install.packages(c("reshape2", "xtable")) 2.0 1.5 height 1.0 subject John Smith Mary Smith

More information

Transformations. Hadley Wickham. October 2009

Transformations. Hadley Wickham. October 2009 Transformations Hadley Wickham October 2009 1. US baby names data 2. Transformations 3. Summaries 4. Doing it by group Baby names Top 1000 male and female baby names in the US, from 1880 to 2008. 258,000

More information

Package reshape. R topics documented: August 6, Version Title Flexibly Reshape Data

Package reshape. R topics documented: August 6, Version Title Flexibly Reshape Data Version 0.8.7 Title Flexibly Reshape Data Package reshape August 6, 2017 Flexibly restructure and aggregate data using just two functions: melt and cast. URL http://had.co.nz/reshape Depends R (>= 2.6.1)

More information

Tidy data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Tidy data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Tidy data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University August 2011 1. What is tidy data? 2. Five common causes of messiness 3. Tidying messy

More information

Does Pivot Tables and More Jim Holtman

Does Pivot Tables and More Jim Holtman Does Pivot Tables and More Jim Holtman jholtman@gmail.com There were several papers at CMG2008, and previous conferences, that got me thinking about other ways that R can help with the analysis and visualization

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29 dplyr Bjarki Þór Elvarsson and Einar Hjörleifsson Marine Research Institute Bjarki&Einar (MRI) R-ICES 1 / 29 Working with data A Reformat a variable (e.g. as factors or dates) B Split one variable into

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates Department of Statistics University of Wisconsin, Madison 2010-09-03 Outline R Graphics Systems Brain weight Cathedrals Longshoots Domedata Summary

More information

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013

Properties of Data. Digging into Data: Jordan Boyd-Graber. University of Maryland. February 11, 2013 Properties of Data Digging into Data: Jordan Boyd-Graber University of Maryland February 11, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Properties of Data February 11, 2013 1 / 43 Roadmap Munging

More information

Introduction to the R Language

Introduction to the R Language Introduction to the R Language Loop Functions Biostatistics 140.776 1 / 32 Looping on the Command Line Writing for, while loops is useful when programming but not particularly easy when working interactively

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Stat 849: Plotting responses and covariates

Stat 849: Plotting responses and covariates Stat 849: Plotting responses and covariates Douglas Bates 10-09-03 Outline Contents 1 R Graphics Systems Graphics systems in R ˆ R provides three dierent high-level graphics systems base graphics The system

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

Model visualisation. Hadley Wickham. October 2009

Model visualisation. Hadley Wickham. October 2009 Model visualisation Hadley Wickham October 2009 1. Model visualisation 2. Extracting data from a single model: butterfly abundance 3. Fitting and visualising multiple models: Texas housing data Butterfly

More information

the R environment The R language is an integrated suite of software facilities for:

the R environment The R language is an integrated suite of software facilities for: the R environment The R language is an integrated suite of software facilities for: Data Handling and storage Matrix Math: Manipulating matrices, vectors, and arrays Statistics: A large, integrated set

More information

Learning SAS. Hadley Wickham

Learning SAS. Hadley Wickham Learning SAS Hadley Wickham Outline Intro & data manipulation basics Fitting models x2 Writing macros No graphics (see http://support.sas.com/ techsup/sample/sample_graph.html for why) Today s outline

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

Data manipulation. Thomas Lumley Ken Rice. UW Biostatistics. Seattle, June 2008

Data manipulation. Thomas Lumley Ken Rice. UW Biostatistics. Seattle, June 2008 Data manipulation Thomas Lumley Ken Rice UW Biostatistics Seattle, June 2008 Merging and matching The data for an analysis often do not come in a single file. Combining multiple files is necessary. If

More information

Transform Data! The Basics Part I continued!

Transform Data! The Basics Part I continued! Transform Data! The Basics Part I continued! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

Transform Data! The Basics Part I!

Transform Data! The Basics Part I! Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used

More information

The data.table Package for Data Cleaning and Preparation in R Ben Lim, Huan Qin December 9, 2016

The data.table Package for Data Cleaning and Preparation in R Ben Lim, Huan Qin December 9, 2016 The data.table Package for Data Cleaning and Preparation in R Ben Lim, Huan Qin December 9, 2016 Contents Introduction.............................................. 2 Reading in the data.........................................

More information

Package reshape2. R topics documented: February 15, Type Package. Title Flexibly reshape data: a reboot of the reshape package. Version 1.2.

Package reshape2. R topics documented: February 15, Type Package. Title Flexibly reshape data: a reboot of the reshape package. Version 1.2. Package reshape2 February 15, 2013 Type Package Title Flexibly reshape : a reboot of the reshape package. Version 1.2.2 Author Hadley Wickham Maintainer Hadley Wickham

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Visualization Tools Dr. David Koop Visualization for Exploration 2 MTA Fare Data Exploration 3 MTA Fare Data Exploration 4 MTA Fare Data Exploration 5 MTA Fare Data

More information

lecture 2: a crash course in r

lecture 2: a crash course in r lecture 2: a crash course in r STAT 545: Introduction to computational statistics Vinayak Rao Department of Statistics, Purdue University August 20, 2018 The programming language From the manual, is a

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting

More information

Package wethepeople. March 3, 2013

Package wethepeople. March 3, 2013 Package wethepeople March 3, 2013 Title An R client for interacting with the White House s We The People petition API. Implements an R client for the We The People API. The client supports loading, signatures,

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Data Science with R Transform and Manipulate Data

Data Science with R Transform and Manipulate Data Transform and Manipulate Data Graham.Williams@togaware.com 9th July 2014 Visit http://onepager.togaware.com/ for more OnePageR s. In this module we introduce approaches to manipulate and transform our

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

Stat405. Dates and times. Hadley Wickham. Thursday, October 18, 12

Stat405. Dates and times. Hadley Wickham. Thursday, October 18, 12 Stat405 Dates and times Hadley Wickham Lubridate problems Try: Sys.setlocale("LC_ALL", "en") Sys.setenv("LANGUAGE" = "en") 1. Date-times 2. Time spans 3. Math with date-times Homeworks Should be your own

More information

Pivot Tables. This is a handout for you to keep. Please feel free to use it for taking notes.

Pivot Tables. This is a handout for you to keep. Please feel free to use it for taking notes. Class Description This is an introduction to using Pivot Tables in spreadsheets, focusing on Microsoft Excel. Attendees should have a good basic knowledge of spreadsheets. Class Length One and one half

More information

Using R Efficiently. Felix Andrews, ANU

Using R Efficiently. Felix Andrews, ANU Using R Efficiently Felix Andrews, ANU 2009-07-13 Using R Efficiently R can be a blessing or a curse: a time-waster or a time-saver. Three Styles of Using R 1.Interactive 2.Scripts, functions 3.Documents

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

When you open SPSS for the first time, the SPSS Data Editor opens. However, a

When you open SPSS for the first time, the SPSS Data Editor opens. However, a 1 1 Getting Started WHERE IS SPSS? Please note that SPSS does not come pre-installed on your PC. You need to install it separately. To open SPSS in Windows 7, click the Start Button. Type SPSS in the Search

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

SS ANOVA. Josef Fruehwald. March 22, 2010

SS ANOVA. Josef Fruehwald. March 22, 2010 SS ANOVA Josef Fruehwald March 22, 2010 1 The Data Here s a little demonstration of how to use and analyze Smoothing Spline ANOVAs. Davidson (2006) has a good overview of what SS-ANOVAs are, and an overview

More information

Chapter 2 The SAS Environment

Chapter 2 The SAS Environment Chapter 2 The SAS Environment Abstract In this chapter, we begin to become familiar with the basic SAS working environment. We introduce the basic 3-screen layout, how to navigate the SAS Explorer window,

More information

Starting with R. John Mount The Berkeley R Language Beginner Study Group 9/17/2013. Tuesday, September 17, 13

Starting with R. John Mount The Berkeley R Language Beginner Study Group 9/17/2013. Tuesday, September 17, 13 Starting with R John Mount The Berkeley R Language Beginner Study Group 9/17/2013 1 Outline Why use R? A quick example analysis. A bit (positive and negative) on the R programming language. (back to positive)

More information

Epidemiologic Analysis Using R

Epidemiologic Analysis Using R Epidemiologic Analysis Using R functions and indexing Charles DiMaggio and Steve Mooney Epidemiologic Analysis Using R Summer 2015 Outline 1 functions for epidemiologists marginals - apply() stratified

More information

Lecture 6: Scatter Plots and R Skills

Lecture 6: Scatter Plots and R Skills Lecture 6: Scatter Plots and R Skills March 5, 2018 Overview Course Administration Good, Bad and Ugly Few, Chapter 13 Line Charts in R Course Administration 1. Rosa has graded problem sets thank you block

More information

SML 201 Week 3 John D. Storey Spring 2016

SML 201 Week 3 John D. Storey Spring 2016 SML 201 Week 3 John D. Storey Spring 2016 Contents Functions 4 Rationale................................. 4 Defining a New Function......................... 4 Example 1.................................

More information

Package ggmosaic. February 9, 2017

Package ggmosaic. February 9, 2017 Title Mosaic Plots in the 'ggplot2' Framework Version 0.1.2 Package ggmosaic February 9, 2017 Mosaic plots in the 'ggplot2' framework. Mosaic plot functionality is provided in a single 'ggplot2' layer

More information

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506. An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted

More information

5b. Descriptive Statistics - Part II

5b. Descriptive Statistics - Part II 5b. Descriptive Statistics - Part II In this lab we ll cover how you can calculate descriptive statistics that we discussed in class. We also learn how to summarize large multi-level databases efficiently,

More information

Graphical critique & theory. Hadley Wickham

Graphical critique & theory. Hadley Wickham Graphical critique & theory Hadley Wickham Exploratory graphics Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing. Iteration is crucial

More information

ggplot2 for beginners Maria Novosolov 1 December, 2014

ggplot2 for beginners Maria Novosolov 1 December, 2014 ggplot2 for beginners Maria Novosolov 1 December, 214 For this tutorial we will use the data of reproductive traits in lizards on different islands (found in the website) First thing is to set the working

More information

Package ezsummary. August 29, 2016

Package ezsummary. August 29, 2016 Type Package Title Generate Data Summary in a Tidy Format Version 0.2.1 Package ezsummary August 29, 2016 Functions that simplify the process of generating print-ready data summary using 'dplyr' syntax.

More information

Overview of R. Biostatistics

Overview of R. Biostatistics Overview of R Biostatistics 140.776 Stroustrup s Law There are only two kinds of languages: the ones people complain about and the ones nobody uses. R is a dialect of S What is R? What is S? S is a language

More information

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015 predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout

More information

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018

Lecture 09. Graphics::ggplot I R Teaching Team. October 1, 2018 Lecture 09 Graphics::ggplot I 2018 R Teaching Team October 1, 2018 Acknowledgements 1. Mike Fliss & Sara Levintow! 2. stackoverflow (particularly user David for lecture styling - link) 3. R Markdown: The

More information

Workshop: R and Bioinformatics

Workshop: R and Bioinformatics Workshop: R and Bioinformatics Jean Monlong & Simon Papillon Human Genetics department October 28, 2013 1 Why using R for bioinformatics? I Flexible statistics and data visualization software. I Many packages

More information

Creating an Autocorrelation Plot in ggplot2

Creating an Autocorrelation Plot in ggplot2 Denver User Group Peter DeWitt dewittpe@gmail.com 18 January 2011 1 Objectives 2 Motivation 3 Development of the Plot The Data Set For One Variable A More General Plot Function 4 Final Result Finished

More information

Experiences in Building a Compiler for an Object-Oriented Language

Experiences in Building a Compiler for an Object-Oriented Language Experiences in Building a Compiler for an Object-Oriented Language José de Oliveira Guimarães Departamento de Computação UFSCar, São Carlos - SP, Brazil jose@dc.ufscar.br Abstract Traditionally books on

More information

Statistical Computing (36-350) Importing Data from the Web II. Cosma Shalizi and Vincent Vu November 21, 2011

Statistical Computing (36-350) Importing Data from the Web II. Cosma Shalizi and Vincent Vu November 21, 2011 Statistical Computing (36-350) Importing Data from the Web II Cosma Shalizi and Vincent Vu November 21, 2011 Agenda Regular expressions Construction Debugging Example: Continuation from Friday s Lab Agenda

More information

Intro to R. Some history. Some history

Intro to R. Some history. Some history Intro to R Héctor Corrada Bravo CMSC858B Spring 2012 University of Maryland Computer Science http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2&pagewanted=1 http://www.forbes.com/forbes/2010/0524/opinions-software-norman-nie-spss-ideas-opinions.html

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17

WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17 WAY MORE THAN YOU EVER WANTED TO KNOW ABOUT C++ TEMPLATES AND C++ TEMPLATE METAPROGRAMMING DYLAN KNUTSON, UCSD 13-17 Creative Commons Attribution + Noncommercial WHO AM I Facebook Seattle s Core Systems/Infra

More information

Visualizing ASH. John Beresniewicz NoCOUG 2018

Visualizing ASH. John Beresniewicz NoCOUG 2018 Visualizing ASH John Beresniewicz NoCOUG 2018 Agenda What is ASH? Mechanism and properties Usage: ASH Math, Average Active Sessions ASH Visualizations EM Performance: Wait class details, Top Activity,

More information

Python, SageMath/Cloud, R and Open-Source

Python, SageMath/Cloud, R and Open-Source Python, SageMath/Cloud, R and Open-Source Harald Schilly 2016-10-14 TANCS Workshop Institute of Physics University Graz The big picture The Big Picture Software up to the end of 1979: Fortran: LINPACK

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

Lecture 19: Oct 19, Advanced SQL. SQL Joins dbplyr SQL Injection Resources. James Balamuta STAT UIUC

Lecture 19: Oct 19, Advanced SQL. SQL Joins dbplyr SQL Injection Resources. James Balamuta STAT UIUC Lecture 19: Oct 19, 2018 Advanced SQL SQL Joins dbplyr SQL Injection Resources James Balamuta STAT 385 @ UIUC Announcements hw07 is due Friday, Nov 2nd, 2018 at 6:00 PM Office Hour Changes John Lee's are

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

Subsetting, dplyr, magrittr Author: Lloyd Low; add:

Subsetting, dplyr, magrittr Author: Lloyd Low;  add: Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And

More information

Package dgo. July 17, 2018

Package dgo. July 17, 2018 Package dgo July 17, 2018 Title Dynamic Estimation of Group-Level Opinion Version 0.2.15 Date 2018-07-16 Fit dynamic group-level item response theory (IRT) and multilevel regression and poststratification

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

UTILITY FUNCTIONS IN R

UTILITY FUNCTIONS IN R UTILITY FUNCTIONS IN R DIYA DAS GRADUATE STUDENT, NGAI LAB, DEPT OF MOLECULAR & CELL BIOLOGY MOORE/SLOAN DATA SCIENCE FELLOW, BERKELEY INSTITUTE FOR DATA SCIENCE WHAT DO I MEAN BY UTILITY FUNCTIONS? Anything

More information

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen Experimental epidemiology analyses with R and R commander Lars T. Fadnes Centre for International Health University of Bergen 1 Click to add an outline 2 How to install R commander? - install.packages("rcmdr",

More information

Unit 6 - Software Design and Development LESSON 3 KEY FEATURES

Unit 6 - Software Design and Development LESSON 3 KEY FEATURES Unit 6 - Software Design and Development LESSON 3 KEY FEATURES Last session 1. Language generations. 2. Reasons why languages are used by organisations. 1. Proprietary or open source. 2. Features and tools.

More information

CIS 45, The Introduction. What is a database? What is data? What is information?

CIS 45, The Introduction. What is a database? What is data? What is information? CIS 45, The Introduction I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won t last out the year. The editor

More information

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny 171:161: Introduction to Biostatistics Breheny Lab #3 The focus of this lab will be on using SAS and R to provide you with summary statistics of different variables with a data set. We will look at both

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

DIRECTV Message Board

DIRECTV Message Board DIRECTV Message Board DIRECTV Message Board is an exciting new product for commercial customers. It is being shown at DIRECTV Revolution 2012 for the first time, but the Solid Signal team were lucky enough

More information

To make sense of data, you can start by answering the following questions:

To make sense of data, you can start by answering the following questions: Taken from the Introductory Biology 1, 181 lab manual, Biological Sciences, Copyright NCSU (with appreciation to Dr. Miriam Ferzli--author of this appendix of the lab manual). Appendix : Understanding

More information

Pure, predictable, pipeable: creating fluent interfaces with R. Hadley Chief Scientist, RStudio

Pure, predictable, pipeable: creating fluent interfaces with R. Hadley Chief Scientist, RStudio Pure, predictable, pipeable: creating fluent interfaces with R Hadley Wickham @hadleywickham Chief Scientist, RStudio January 2015 MOAH PIPES! Hadley Wickham @hadleywickham Chief Scientist, RStudio January

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Aggregation Dr. David Koop Data Analysis Scenarios Often want to analyze data by some grouping: - Dogs vs. cats - Millennials vs. Gen-X vs. Baby Boomers - Physics

More information

Light. Teacher Resource

Light. Teacher Resource Watch the film: You might find it useful to watch the light film before you read through the resources. bbc.com/teach/terrificscientific/ks2/zv9qf4j Introduction We need light to be able to see things.

More information

Introduction to R programming a SciLife Lab course

Introduction to R programming a SciLife Lab course Introduction to R programming a SciLife Lab course 20 October 2017 What R really is? a programming language, a programming platform (= environment + interpreter), a software project driven by the core

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Data Aggregation & Time Series Dr. David Koop Tidy Data: Baby Names Example Baby Names, Social Security Administration Popularity in 2016 Rank Male name Female name

More information

Reproducible and dynamic access to OECD data

Reproducible and dynamic access to OECD data Reproducible and dynamic access to OECD data 2016-01-17 Introduction The OECD package allows the user to download data from the OECD s API in a dynamic and reproducible way. The package can be installed

More information

Unit 6 - Software Design and Development LESSON 3 KEY FEATURES

Unit 6 - Software Design and Development LESSON 3 KEY FEATURES Unit 6 - Software Design and Development LESSON 3 KEY FEATURES Last session 1. Language generations. 2. Reasons why languages are used by organisations. 1. Proprietary or open source. 2. Features and tools.

More information

an introduction to R for epidemiologists

an introduction to R for epidemiologists an introduction to R for epidemiologists basic analyses and indexing Charles DiMaggio, PhD, MPH, PA-C New York University Department of Surgery and Population Health NYU-Bellevue Division of Trauma and

More information

Lecture 3: Basics of R Programming

Lecture 3: Basics of R Programming Lecture 3: Basics of R Programming This lecture introduces you to how to do more things with R beyond simple commands. Outline: 1. R as a programming language 2. Grouping, loops and conditional execution

More information

Hot X: Algebra Exposed

Hot X: Algebra Exposed Hot X: Algebra Exposed Solution Guide for Chapter 10 Here are the solutions for the Doing the Math exercises in Hot X: Algebra Exposed! DTM from p.137-138 2. To see if the point is on the line, let s plug

More information

CSCI-GA Scripting Languages

CSCI-GA Scripting Languages CSCI-GA.3033.003 Scripting Languages 12/02/2013 OCaml 1 Acknowledgement The material on these slides is based on notes provided by Dexter Kozen. 2 About OCaml A functional programming language All computation

More information

Lecture 12: Transforming and Reshaping Data October 2014

Lecture 12: Transforming and Reshaping Data October 2014 Lecture 12: Transforming and Reshaping Data 36-350 6 October 2014 In Previous Episodes Accessing vectors, arrays, and data frames Applying a function across a vector, array, or data frame Extracting data

More information

Graphing with Excel. Mr. Heinrich/Mr. Flock R.O.W.V.A. High School, Oneida, IL Physics 4B

Graphing with Excel. Mr. Heinrich/Mr. Flock R.O.W.V.A. High School, Oneida, IL Physics 4B Graphing with Excel Mr. Heinrich/Mr. Flock R.O.W.V.A. High School, Oneida, IL Physics 4B For almost any project that requires the analysis and manipulation of data sets, the standard is Microsoft Office

More information

Below you ll find some formulas that will help you work through AdvancedExcel.xlsx.

Below you ll find some formulas that will help you work through AdvancedExcel.xlsx. Advanced Excel Jaimi Dowdell * IRE/NICAR *@JaimiDowdell * jaimi@ire.org Updated March 18, 2016 Below you ll find some formulas that will help you work through AdvancedExcel.xlsx. Exercise 1 Sometimes it

More information

Introduction to R programming a SciLife Lab course

Introduction to R programming a SciLife Lab course Introduction to R programming a SciLife Lab course 22 March 2017 What R really is? a programming language, a programming platform (= environment + interpreter), a software project driven by the core team

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Introductory Tutorial: Part 1 Describing Data

Introductory Tutorial: Part 1 Describing Data Introductory Tutorial: Part 1 Describing Data Introduction Welcome to this R-Instat introductory tutorial. R-Instat is a free, menu driven statistics software powered by R. It is designed to exploit the

More information

Package plusser. R topics documented: August 29, Date Version Title A Google+ Interface for R

Package plusser. R topics documented: August 29, Date Version Title A Google+ Interface for R Date 2014-04-27 Version 0.4-0 Title A Google+ Interface for R Package plusser August 29, 2016 plusser provides an API interface to Google+ so that posts, profiles and pages can be automatically retrieved.

More information