Introduction to R and the tidyverse

Size: px
Start display at page:

Download "Introduction to R and the tidyverse"

Transcription

1 Introduction to R and the tidyverse Paolo Crosetto Paolo Crosetto Introduction to R and the tidyverse 1 / 58

2 Lecture 3: merging & tidying data Paolo Crosetto Introduction to R and the tidyverse 2 / 58

3 Before we start: tidyverse you should all by now be with your laptops so please let s go back to the initial setup and let cleanly install the tidyverse install.packages("tidyverse") Paolo Crosetto Introduction to R and the tidyverse 3 / 58

4 Before we start: tidyverse the tidyverse package install lots of stuff but in particular, ggplot2, dplyr -> seen earlier tidyr -> seen today you load the package using library(tidyverse) and it loads all the needed packages for you Paolo Crosetto Introduction to R and the tidyverse 4 / 58

5 Todays topics today we will deal with three topics: 1 getting data into (and out of) R 2 joining data from different tables 3 tidying data Paolo Crosetto Introduction to R and the tidyverse 5 / 58

6 importing data Paolo Crosetto Introduction to R and the tidyverse 6 / 58

7 getting data into R: packages up to now we have worked with data sets that come from packages easy to do: install a package, then call a function with data attached all the hard work has been made for you if you wish you can import the data into your workspace e.g. library(nycflights13) df <- flights Paolo Crosetto Introduction to R and the tidyverse 7 / 58

8 getting data into R: other sources life is not always that easy you might have data in the form of (aaarg!) Excel files you might have comma separated (csv) data you might have data coming from SPSS, SAS, STATA,... or text data from ASCII sources Paolo Crosetto Introduction to R and the tidyverse 8 / 58

9 getting data into R: readr vs haven when you load the tidyverse (library(tidyverse)) you automatically load readr this is a package that gives you (verb) functions to load data into R nicely readr provides functions to load most text-based delimited files especially.csv if you want to read in a STATA or SAS or SPSS file, you need the package haven (library(haven)) readr is autmatically loaded by the tidyverse call haven needs to be loaded explicitely (not shown here) Paolo Crosetto Introduction to R and the tidyverse 9 / 58

10 A simple example you find some data here: goo.gl/ kpycfh this is the human develoment index, by country highest numbers (nearest to 1) are better save the file to disk to somewher you know about save it as HDI.csv open it up with a text editor: what do you see? Paolo Crosetto Introduction to R and the tidyverse 10 / 58

11 A simple example now that your data is saved, how do you import it to R? you use read_csv("path_to_file") in my case: df <- read_csv("/home/paolo/dropbox/public/hdidata.csv") ## Parsed with column specification: ## cols( ## `HDI Rank` = col_integer(), ## Country = col_character(), ## HDI = col_double() ## ) Paolo Crosetto Introduction to R and the tidyverse 11 / 58

12 there is more but... read_csv just made under the hood a ton of things for you but it doesn t really matter at your stage so you just live with the results. other useful functions: if the separator is ; rather than, use read_csv2 if the separator is a TAB rather than, use read_tsv Paolo Crosetto Introduction to R and the tidyverse 12 / 58

13 some hints you can always export to.csv in all programs even in Excel! so once you have exported to.csv, all is downhill from there and it is even ebtter to do it because.csv is universal while other binary formats (.dta,.xls... ) force you to have the appropriate tool for reading them so try to keep a copy of your data in a text-based format, it is always readable should everything go wrong. Paolo Crosetto Introduction to R and the tidyverse 13 / 58

14 Joining datasets Paolo Crosetto Introduction to R and the tidyverse 14 / 58

15 data scattered around you do not always have all the data you need in one dataset it is usually scattered around several datasets that might or might not be linked / linkable e.g. you might need to merge data coming from different sources (INSEE and Eurostat) or you might do some computations / summarize and would like to merge these back Paolo Crosetto Introduction to R and the tidyverse 15 / 58

16 using the nycflights13 dataset again planes library(nycflights13) planes <- nycflights13::planes planes ## # A tibble: 3,322 x 9 ## tailnum year type manufacturer model ## <chr> <int> <chr> <chr> <chr> ## 1 N Fixed wing multi engine EMBRAER EMB-145XR ## 2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A ## 3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 5 N Fixed wing multi engine EMBRAER EMB-145LR ## 6 N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 7 N107US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 8 N108UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 9 N109UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## 10 N110UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A ## #... with 3,312 more rows, and 4 more variables: engines <int>, ## # seats <int>, speed <int>, engine <chr> Paolo Crosetto Introduction to R and the tidyverse 16 / 58

17 using the nycflights13 dataset again airports airports <- nycflights13::airports airports ## # A tibble: 1,458 x 8 ## faa name lat lon alt tz ## <chr> <chr> <dbl> <dbl> <int> <dbl> ## 1 04G Lansdowne Airport ## 2 06A Moton Field Municipal Airport ## 3 06C Schaumburg Regional ## 4 06N Randall Airport ## 5 09J Jekyll Island Airport ## 6 0A9 Elizabethton Municipal Airport ## 7 0G6 Williams County Airport ## 8 0G7 Finger Lakes Regional Airport ## 9 0P2 Shoestring Aviation Airfield ## 10 0S9 Jefferson County Intl ## #... with 1,448 more rows, and 2 more variables: dst <chr>, tzone <chr> Paolo Crosetto Introduction to R and the tidyverse 17 / 58

18 ## #... with 336,766 more rows, and 12 more variables: sched_arr_time <int> ## # A tibble: 336,776 x 19 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> ## ## ## ## ## ## ## ## ## ## ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, ## # minute <dbl>, time_hour <dttm> Paolo Crosetto Introduction to R and the tidyverse 18 / 58 using the nycflights13 dataset again flights flights <- nycflights13::flights flights

19 inspect the datasets what do these dataset contain? what variables do they have in common? do they have some unique identifier (key)? how are these related to one another? Paolo Crosetto Introduction to R and the tidyverse 19 / 58

20 the datasets planes has information on each plane (model, type, date of construction... ) airports has information on each airport (faa code, location, lat, long ) flights has information on each flight that left/landed in a NYC airport Paolo Crosetto Introduction to R and the tidyverse 20 / 58

21 joining different datasets: example problem: do newer planes fly the longest routes from NYC? to answer this, you need to combine data from two sources: flights to get the route s length in terms of miles planes to get the date the plane was first operational how do you join the two data frames? Paolo Crosetto Introduction to R and the tidyverse 21 / 58

22 joining two datasets: key first you need to find a unique identifier for your data: a key unique identifiers have the characteristics of being unique in the whole dataset in order to find them, either you use your intuition or you check planes %>% count(tailnum) %>% filter(n>1) ## # A tibble: 0 x 2 ## #... with 2 variables: tailnum <chr>, n <int> count(var) gives the count of how many times each element of var appears as a new variable n by filtering for just n>1 you check if any value appears twice Paolo Crosetto Introduction to R and the tidyverse 22 / 58

23 there is some overlapping information on the two tables but there is also new information column Paolo Crosetto D only in dataset Y Introduction to R and the tidyverse 23 / 58 joining once you know the key, you can use the join family of functions imagine you have two datasets with variables and values as follows: Figure 1:

24 joining joining always combines data from two tables into one syntax alays the same: join(left, right, by = "key") left and right two data frames key the unique identifier of obsevations (in one or both data frames) Paolo Crosetto Introduction to R and the tidyverse 24 / 58

25 the joining family different join functions make different assumptions about what to do of the data that are NOT matched full_join() keeps everything, adds NA inner_join() keeps only matched data Paolo Crosetto Introduction to R and the tidyverse 25 / 58

26 the default left_join() left_join() is the default because you usually add some variable to a large dataset in our case: * do newer planes fly the longest routes from NYC?* we have most information on the flights dataset we need only the year built from the planes dataset Paolo Crosetto Introduction to R and the tidyverse 26 / 58

27 answering our question joining distance <- flights %>% select(tailnum, distance) yearbuilt <- planes %>% select(tailnum, year) answer <- left_join(distance, yearbuilt, by = "tailnum") answer ## # A tibble: 336,776 x 3 ## tailnum distance year ## <chr> <dbl> <int> ## 1 N ## 2 N ## 3 N619AA ## 4 N804JB ## 5 N668DN ## 6 N ## 7 N516JB ## 8 N829AS ## 9 N593JB ## 10 N3ALAA 733 NA ## #... with 336,766 more rows Paolo Crosetto Introduction to R and the tidyverse 27 / 58

28 answering our question: the answer there does not seem to be any connection beteen the year and the length of the flight answer %>% group_by(year) %>% summarise(dist = mean(distance, na.rm = TRUE)) %>% ggplot(aes(x = year, y = dist))+geom_point()+ geom_smooth(method = "lm") dist 1000 Paolo Crosetto Introduction to R and the tidyverse 28 / 58

29 joining exercise how many flights through NYC land in an airport whose altitude is > 1000mt? note: 1 mètre = 3,28084 mètres altitude is in the airports df flights are in the flights df Paolo Crosetto Introduction to R and the tidyverse 29 / 58

30 ## #... with 10,281 more rows, and 13 more variables: sched_arr_time <int>, ## # A tibble: 10,291 x 20 ## year month day dep_time sched_dep_time dep_delay arr_time ## <int> <int> <int> <int> <int> <dbl> <int> ## ## ## ## ## ## ## ## ## ## ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>, ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>, Paolo Crosetto Introduction to R and the tidyverse 30 / 58 solution a lot of flights, since denver sits at 1600mt! alt_df <- airports %>% select(faa,alt) %>% mutate(alt = alt/ ) %>% ren answer <- left_join(flights, alt_df, by = "dest") %>% filter(alt>1000) answer

31 solution: a plot to see the impact of Denver answer %>% ggplot(aes(dest))+geom_bar() 6000 count ABQ BZN DEN EGE HDN JAC MTJ SLC dest Paolo Crosetto Introduction to R and the tidyverse 31 / 58

32 joining three datasets how old are the planes that fly to airports whose altitude is >1000mt? Paolo Crosetto Introduction to R and the tidyverse 32 / 58

33 joining three datasets, solution answer <- left_join(flights,yearbuilt, by = "tailnum") answer <- left_join(answer, alt_df, by = "dest") answer %>% filter(alt>1000) %>% summarise(avgyear = mean(year.y, na.rm = TRU ## # A tibble: 1 x 1 ## avgyear ## <dbl> ## answer %>% filter(alt<=1000) %>% summarise(avgyear = mean(year.y, na.rm = TR ## # A tibble: 1 x 1 ## avgyear ## <dbl> ## Paolo Crosetto Introduction to R and the tidyverse 33 / 58

34 tidy data Paolo Crosetto Introduction to R and the tidyverse 34 / 58

35 messy data -> tidy data Happy families are all alike; every unhappy family is unhappy in its own way. - Leo Tolstoy the data we have worked with so far are all well formatted this is not the case in real life we need to be able to format data in a convenint way if you work with the tools we ve seen (dplyr, ggplot2) then you want tidy data Paolo Crosetto Introduction to R and the tidyverse 35 / 58

36 a simple dataset in four versions table1 ## # A tibble: 6 x 4 ## country year cases population ## <chr> <int> <int> <int> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 36 / 58

37 a simple dataset in four versions table2 ## # A tibble: 12 x 4 ## country year type count ## <chr> <int> <chr> <int> ## 1 Afghanistan 1999 cases 745 ## 2 Afghanistan 1999 population ## 3 Afghanistan 2000 cases 2666 ## 4 Afghanistan 2000 population ## 5 Brazil 1999 cases ## 6 Brazil 1999 population ## 7 Brazil 2000 cases ## 8 Brazil 2000 population ## 9 China 1999 cases ## 10 China 1999 population ## 11 China 2000 cases ## 12 China 2000 population Paolo Crosetto Introduction to R and the tidyverse 37 / 58

38 a simple dataset in four versions table3 ## # A tibble: 6 x 3 ## country year rate ## * <chr> <int> <chr> ## 1 Afghanistan / ## 2 Afghanistan / ## 3 Brazil / ## 4 Brazil / ## 5 China / ## 6 China / Paolo Crosetto Introduction to R and the tidyverse 38 / 58

39 a simple dataset in four versions table4a #cases ## # A tibble: 3 x 3 ## country `1999` `2000` ## * <chr> <int> <int> ## 1 Afghanistan ## 2 Brazil ## 3 China table4b #population ## # A tibble: 3 x 3 ## country `1999` `2000` ## * <chr> <int> <int> ## 1 Afghanistan ## 2 Brazil ## 3 China Paolo Crosetto Introduction to R and the tidyverse 39 / 58

40 tidy, untidy data tidy dat has the following characteristics: each variable has its own column each observation has its own row each value has its own cell have a look at the tables. what is an observation? what is a variable? do you see problems in the tables? Paolo Crosetto Introduction to R and the tidyverse 40 / 58

41 tidy data: the tidyr package tidyr is part fo the tidyverse it is automatically loaded with library(tidyverse) tidyr provides 4 main verbs gather vs. spread separate vs. unite Paolo Crosetto Introduction to R and the tidyverse 41 / 58

42 gathering: from wide to long table4a sometimes variables are in the column names: bad! ## # A tibble: 3 x 3 ## country `1999` `2000` ## * <chr> <int> <int> ## 1 Afghanistan ## 2 Brazil ## 3 China year is a variable but it is on the column names content is cases but has no variable name Paolo Crosetto Introduction to R and the tidyverse 42 / 58

43 gathering we need to reshape the data from wide to long, so that year becomes a variable and 1999 and 2000 become values. we use gather(vars, key, value) vars is the variable names that are not actually variables but values key is the (new) name to be given to the (new) column that will be created to store the (former) variable names value is the (new) name to be given to the (new) column that will be created to store the values that were spread over several variables Paolo Crosetto Introduction to R and the tidyverse 43 / 58

44 gathering what happens if we just provide NO arguments? everything is gathered just two columns left (key & value) table4a %>% gather() ## # A tibble: 9 x 2 ## key value ## <chr> <chr> ## 1 country Afghanistan ## 2 country Brazil ## 3 country China ## ## ## ## ## ## Paolo Crosetto Introduction to R and the tidyverse 44 / 58

45 gathering what if we provide arguments? cases <- table4a %>% gather(`1999`,`2000`, key = year, value = cases) %>% ar cases ## # A tibble: 6 x 3 ## country year cases ## <chr> <chr> <int> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 45 / 58

46 gathering we can do the same for the population table (table4b) pop <- table4b %>% gather(`1999`,`2000`, key = year, value = population) pop ## # A tibble: 6 x 3 ## country year population ## <chr> <chr> <int> ## 1 Afghanistan ## 2 Brazil ## 3 China ## 4 Afghanistan ## 5 Brazil ## 6 China Paolo Crosetto Introduction to R and the tidyverse 46 / 58

47 gathering we can merge the two tables and we ll get back to table1 left_join(cases,pop, by = c("country","year")) ## # A tibble: 6 x 4 ## country year cases population ## <chr> <chr> <int> <int> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 47 / 58

48 spreading: from long to wide table2 ## # A tibble: 12 x 4 ## country year type count ## <chr> <int> <chr> <int> ## 1 Afghanistan 1999 cases 745 ## 2 Afghanistan 1999 population ## 3 Afghanistan 2000 cases 2666 ## 4 Afghanistan 2000 population ## 5 Brazil 1999 cases ## 6 Brazil 1999 population ## 7 Brazil 2000 cases ## 8 Brazil 2000 population ## 9 China 1999 cases ## 10 China 1999 population ## 11 China 2000 cases ## 12 China 2000 population Paolo Crosetto Introduction to R and the tidyverse 48 / 58

49 spreading: from long to wide we need to reshape the data from long to wide, so that type gets split into the variables cases and population and count values get assigned to the proper column. we use spread(key, value) key is the (existing) name of the column that contains variable names value is the (existing) name of the variable that contains values of the (to be created) variables Paolo Crosetto Introduction to R and the tidyverse 49 / 58

50 spreading: from long to wide spread(table2, key = type, value = count) ## # A tibble: 6 x 4 ## country year cases population ## * <chr> <int> <int> <int> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 50 / 58

51 separating: from one to more variables what is wrong with this table? table3 ## # A tibble: 6 x 3 ## country year rate ## * <chr> <int> <chr> ## 1 Afghanistan / ## 2 Afghanistan / ## 3 Brazil / ## 4 Brazil / ## 5 China / ## 6 China / Paolo Crosetto Introduction to R and the tidyverse 51 / 58

52 separating the variable rate contains two informations: number of cases and population we need to separate the variable into two (in this case) variables separate(table3, col = rate, into = c("cases", "population")) ## # A tibble: 6 x 4 ## country year cases population ## * <chr> <int> <chr> <chr> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 52 / 58

53 separating -separate() correctly guessed that the point to separate was / - but this is not always so easy - so you can provide the actual separator character with sep= - if we use the wrong one... separate(table3, col = rate, into = c("cases", "population"), sep = "7") ## Warning: Too many values at 3 locations: 1, 3, 5 ## Warning: Too few values at 1 locations: 2 ## # A tibble: 6 x 4 ## country year cases population ## * <chr> <int> <chr> <chr> ## 1 Afghanistan /1998 ## 2 Afghanistan / <NA> ## 3 Brazil ## 4 Brazil / ## 5 China / ## 6 China / Paolo Crosetto Introduction to R and the tidyverse 53 / 58

54 separating separate() keeps the variables as characters this is safe: doesnt make assumptions but sometimes it is best to have it create int or dbl variables separate(table3, col = rate, into = c("cases", "population"), convert = TRUE ## # A tibble: 6 x 4 ## country year cases population ## * <chr> <int> <int> <int> ## 1 Afghanistan ## 2 Afghanistan ## 3 Brazil ## 4 Brazil ## 5 China ## 6 China Paolo Crosetto Introduction to R and the tidyverse 54 / 58

55 uniting: from several to one variable table5 ## # A tibble: 6 x 4 ## country century year rate ## * <chr> <chr> <chr> <chr> ## 1 Afghanistan / ## 2 Afghanistan / ## 3 Brazil / ## 4 Brazil / ## 5 China / ## 6 China / Paolo Crosetto Introduction to R and the tidyverse 55 / 58

56 uniting the complementary verb to separate() is unite() unite(table5, year, century, year) ## # A tibble: 6 x 3 ## country year rate ## * <chr> <chr> <chr> ## 1 Afghanistan 19_99 745/ ## 2 Afghanistan 20_ / ## 3 Brazil 19_ / ## 4 Brazil 20_ / ## 5 China 19_ / ## 6 China 20_ / by deault unite() uses _ as a separator Paolo Crosetto Introduction to R and the tidyverse 56 / 58

57 uniting unite(table5, year, century, year, sep = "") ## # A tibble: 6 x 3 ## country year rate ## * <chr> <chr> <chr> ## 1 Afghanistan / ## 2 Afghanistan / ## 3 Brazil / ## 4 Brazil / ## 5 China / ## 6 China / Paolo Crosetto Introduction to R and the tidyverse 57 / 58

58 exercise look at (messy) Eurostat data on GDP and tidy it Paolo Crosetto Introduction to R and the tidyverse 58 / 58

Lecture 3: Data Wrangling I

Lecture 3: Data Wrangling I Lecture 3: Data Wrangling I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 12.03.2018 Outline 1 Overview

More information

Dplyr Introduction Matthew Flickinger July 12, 2017

Dplyr Introduction Matthew Flickinger July 12, 2017 Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First

More information

Лекция 4 Трансформация данных в R

Лекция 4 Трансформация данных в R Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка

More information

Introduction to rsolr

Introduction to rsolr Introduction to rsolr Michael Lawrence August 21, 2018 Contents 1 Introduction 1 2 Demonstration: nycflights13 2 2.1 The Dataset............................ 2 2.2 Populating a Solr core......................

More information

Stat. 450 Section 1 or 2: Homework 3

Stat. 450 Section 1 or 2: Homework 3 Stat. 450 Section 1 or 2: Homework 3 Prof. Eric A. Suess So how should you complete your homework for this class? First thing to do is type all of your information about the problems you do in the text

More information

Stat. 450 Section 1 or 2: Homework 8

Stat. 450 Section 1 or 2: Homework 8 Stat. 450 Section 1 or 2: Homework 8 Prof. Eric A. Suess So how should you complete your homework for this class? First thing to do is type all of your information about the problems you do in the text

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

Data Manipulation in R

Data Manipulation in R Data Manipulation in R Introduction to dplyr May 15, 2017 Data Manipulation in R May 15, 2017 1 / 67 Introduction to dplyr dplyr is Hadley s package for data manipulation dplyr provides abstractions for

More information

Importing rectangular text files Importing other types of data Trasforming data

Importing rectangular text files Importing other types of data Trasforming data Lecture 3 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents Importing rectangular text files Importing other types of data Trasforming data Importing data with readr The readr package

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

Loading Data into R. Loading Data Sets

Loading Data into R. Loading Data Sets Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

Session 1 Nick Hathaway;

Session 1 Nick Hathaway; Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................

More information

Data Import and Export

Data Import and Export Data Import and Export Eugen Buehler October 17, 2018 Importing Data to R from a file CSV (comma separated value) tab delimited files Excel formats (xls, xlsx) SPSS/SAS/Stata RStudio will tell you if you

More information

STA130 - Class #2: Nathan Taback

STA130 - Class #2: Nathan Taback STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and

More information

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor.

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor. Reading data into R There is a famous, but apocryphal, story about Mrs Beeton, the 19th century cook and writer, which says that she began her recipe for rabbit stew with the instruction First catch your

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Data Input/Output. Introduction to R for Public Health Researchers

Data Input/Output. Introduction to R for Public Health Researchers Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R "can't find" RStudio can help, and so

More information

Lab2 Jacob Reiser September 30, 2016

Lab2 Jacob Reiser September 30, 2016 Lab2 Jacob Reiser September 30, 2016 Introduction: An R-Blogger recently found a data set from a project of New York s Public Library called What s on the Menu, which can be found at https://www.r-bloggers.com/a-fun-gastronomical-dataset-whats-on-the-menu/.

More information

COSC 6339 Big Data Analytics. NoSQL (III) HBase in Hadoop MapReduce 3 rd homework assignment. Edgar Gabriel Spring 2017.

COSC 6339 Big Data Analytics. NoSQL (III) HBase in Hadoop MapReduce 3 rd homework assignment. Edgar Gabriel Spring 2017. COSC 6339 Big Data Analytics NoSQL (III) HBase in Hadoop MapReduce 3 rd homework assignment Edgar Gabriel Spring 2017 Recap on HBase Column-Oriented data store NoSQL DB Data is stored in Tables Tables

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:

More information

EXCELLING WITH ANALYSIS AND VISUALIZATION

EXCELLING WITH ANALYSIS AND VISUALIZATION EXCELLING WITH ANALYSIS AND VISUALIZATION A PRACTICAL GUIDE FOR DEALING WITH DATA Prepared by Ann K. Emery July 2016 Ann K. Emery 1 Welcome Hello there! In July 2016, I led two workshops Excel Basics for

More information

ETC1010: Data Modelling and Computing. Lecture 6: Reading di erent data formats

ETC1010: Data Modelling and Computing. Lecture 6: Reading di erent data formats ETC1010: Data Modelling and Computing Lecture 6: Reading di erent data formats Di Cook (dicook@monash.edu, @visnut) Week 6 1 / 16 Overview SPSS format (PISA data) read_csv vs read.csv Handling large data

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

University of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature

University of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature There is a custom feature available in Query Viewer that allows files to be created from queries and copied to a user s PC. This feature doesn t have the same size limitations as running a query to HTML

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

Getting and Cleaning Data. Biostatistics

Getting and Cleaning Data. Biostatistics Getting and Cleaning Data Biostatistics 140.776 Getting and Cleaning Data Getting data: APIs and web scraping Cleaning data: Tidy data Transforming data: Regular expressions Getting Data Web site Nature

More information

Power Query for Parsing Data

Power Query for Parsing Data Excel Power Query Power Query for Parsing Data Data Models Screen 1In Excel 2010 and 2013 need to install the Power Query; however, in 2016 is automatically part of the Data Tab ribbon and the commands

More information

CSSS 512: Lab 1. Logistics & R Refresher

CSSS 512: Lab 1. Logistics & R Refresher CSSS 512: Lab 1 Logistics & R Refresher 2018-3-30 Agenda 1. Logistics Labs, Office Hours, Homeworks Goals and Expectations R, R Studio, R Markdown, L ATEX 2. Time Series Data in R Unemployment in Maine

More information

Preparing IBM SPSS Data and MS Excel Files for Conducting Mplus Analyses. Lynn N. Tabata

Preparing IBM SPSS Data and MS Excel Files for Conducting Mplus Analyses. Lynn N. Tabata Ronald H. Heck 1 Preparing IBM SPSS Data and MS Excel Files for Conducting Mplus Analyses Lynn N. Tabata IBM SPSS and Excel data files (.sav and.xls) may be exported to one of several file formats that

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

Data Input/Output. Introduction to R for Public Health Researchers

Data Input/Output. Introduction to R for Public Health Researchers Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R can t find RStudio can help, and so do

More information

File Input/Output in Python. October 9, 2017

File Input/Output in Python. October 9, 2017 File Input/Output in Python October 9, 2017 Moving beyond simple analysis Use real data Most of you will have datasets that you want to do some analysis with (from simple statistics on few hundred sample

More information

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio

Tidy Evaluation. Lionel Henry and Hadley Wickham RStudio Tidy Evaluation Lionel Henry and Hadley Wickham RStudio Tidy evaluation Our vision for dealing with a special class of R functions Usually called NSE but we prefer quoting functions Most interesting language

More information

Learning SAS. Hadley Wickham

Learning SAS. Hadley Wickham Learning SAS Hadley Wickham Outline Intro & data manipulation basics Fitting models x2 Writing macros No graphics (see http://support.sas.com/ techsup/sample/sample_graph.html for why) Today s outline

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

Data Manipulation. Module 5

Data Manipulation.   Module 5 Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping

More information

Data Input/Output. Andrew Jaffe. January 4, 2016

Data Input/Output. Andrew Jaffe. January 4, 2016 Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS. 1 SPSS 11.5 for Windows Introductory Assignment Material covered: Opening an existing SPSS data file, creating new data files, generating frequency distributions and descriptive statistics, obtaining printouts

More information

Week 4. Big Data Analytics - data.frame manipulation with dplyr

Week 4. Big Data Analytics - data.frame manipulation with dplyr Week 4. Big Data Analytics - data.frame manipulation with dplyr Hyeonsu B. Kang hyk149@eng.ucsd.edu April 2016 1 Dplyr In the last lecture we have seen how to index an individual cell in a data frame,

More information

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017

R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt. 08 June 2017 R R. Muralikrishnan Max Planck Institute for Empirical Aesthetics Frankfurt 08 June 2017 Introduction What is R?! R is a programming language for statistical computing and graphics R is free and open-source

More information

Spatial Ecology Lab 6: Landscape Pattern Analysis

Spatial Ecology Lab 6: Landscape Pattern Analysis Spatial Ecology Lab 6: Landscape Pattern Analysis Damian Maddalena Spring 2015 1 Introduction This week in lab we will begin to explore basic landscape metrics. We will simply calculate percent of total

More information

comma separated values .csv extension. "save as" CSV (Comma Delimited)

comma separated values .csv extension. save as CSV (Comma Delimited) What is a CSV and how do I import it? A CSV is a comma separated values file which allows data to be saved in a table structured format. CSVs look like normal spreadsheet but with a.csv extension. Traditionally

More information

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS 24 January 2017 Stefan Breet breet@rsm.nl www.stefanbreet.com TODAY What is R? How to use R? The Basics How to use R? The Data Analysis Process WHAT IS R? AN

More information

Assignment 0. Nothing here to hand in

Assignment 0. Nothing here to hand in Assignment 0 Nothing here to hand in The questions here have solutions attached. Follow the solutions to see what to do, if you cannot otherwise guess. Though there is nothing here to hand in, it is very

More information

Chapter 2 The SAS Environment

Chapter 2 The SAS Environment Chapter 2 The SAS Environment Abstract In this chapter, we begin to become familiar with the basic SAS working environment. We introduce the basic 3-screen layout, how to navigate the SAS Explorer window,

More information

How to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44

How to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44 1 / 44 How to Wrangle Data using R with tidyr and dplyr Ken Butler March 30, 2015 It is said that... 2 / 44 80% of data analysis: getting the data into the right form maybe 20% is making graphs, fitting

More information

The Data Journalist Chapter 7 tutorial Geocoding in ArcGIS Desktop

The Data Journalist Chapter 7 tutorial Geocoding in ArcGIS Desktop The Data Journalist Chapter 7 tutorial Geocoding in ArcGIS Desktop Summary: In many cases, online geocoding services are all you will need to convert addresses and other location data into geographic data.

More information

Introduction to Stata - Session 1

Introduction to Stata - Session 1 Introduction to Stata - Session 1 Simon, Hong based on Andrea Papini ECON 3150/4150, UiO January 15, 2018 1 / 33 Preparation Before we start Sit in teams of two Download the file auto.dta from the course

More information

Business Analytics Nanodegree Syllabus

Business Analytics Nanodegree Syllabus Business Analytics Nanodegree Syllabus Master data fundamentals applicable to any industry Before You Start There are no prerequisites for this program, aside from basic computer skills. You should be

More information

HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE

HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE If your buyers use PayPal to pay for their purchases, you can quickly export all names and addresses to a type of spreadsheet known as a

More information

Workshop. Import Workshop

Workshop. Import Workshop Import Overview This workshop will help participants understand the tools and techniques used in importing a variety of different types of data. It will also showcase a couple of the new import features

More information

Lecture 12: Data carpentry with tidyverse

Lecture 12: Data carpentry with tidyverse http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)

More information

A framework for data-related skills

A framework for data-related skills Setting the stage for data science: integration of data management skills in introductory and second courses in statistics Nicholas J. Horton, Benjamin S. Baumer, and Hadley Wickham March 25, 2015 Statistics

More information

Introducing R/Tidyverse to Clinical Statistical Programming

Introducing R/Tidyverse to Clinical Statistical Programming Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic

More information

Getting Our Feet Wet with Stata SESSION TWO Fall, 2018

Getting Our Feet Wet with Stata SESSION TWO Fall, 2018 Getting Our Feet Wet with Stata SESSION TWO Fall, 2018 Instructor: Cathy Zimmer 962-0516, cathy_zimmer@unc.edu 1) REMINDER BRING FLASH DRIVES! 2) QUESTIONS ON EXERCISES? 3) WHAT IS Stata SYNTAX? a) A set

More information

Introduction to Stata: An In-class Tutorial

Introduction to Stata: An In-class Tutorial Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid

More information

A whirlwind introduction to using R for your research

A whirlwind introduction to using R for your research A whirlwind introduction to using R for your research Jeremy Chacón 1 Outline 1. Why use R? 2. The R-Studio work environment 3. The mock experimental analysis: 1. Writing and running code 2. Getting data

More information

CS130 Software Tools. Fall 2010 Intro to SPSS and Data Handling

CS130 Software Tools. Fall 2010 Intro to SPSS and Data Handling Software Tools Intro to SPSS and Data Handling 1 Types of Analyses When doing data analysis, we are interested in two types of summaries: Statistical Summaries (e.g. descriptive, hypothesis testing) Visual

More information

Introduction to Functions. Biostatistics

Introduction to Functions. Biostatistics Introduction to Functions Biostatistics 140.776 Functions The development of a functions in R represents the next level of R programming, beyond writing code at the console or in a script. 1. Code 2. Functions

More information

Lecture 1: MATLAB - advanced use cases

Lecture 1: MATLAB - advanced use cases Lecture 1: MATLAB - advanced use cases Data handling and analysis Juha Kuortti and Heikki Apiola February 10, 2018 Aalto University juha.kuortti@aalto.fi Importing and exporting data: basics Creating and

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Barchard Introduction to SPSS Marks

Barchard Introduction to SPSS Marks Barchard Introduction to SPSS 22.0 3 Marks Purpose The purpose of this assignment is to introduce you to SPSS, the most commonly used statistical package in the social sciences. You will create a new data

More information

An Introduction to Tidyverse

An Introduction to Tidyverse An Introduction to Tidyverse Joey Stanley Doctoral Candidate in Linguistics, University of Georgia joeystanley.com Presented at the UGA Willson Center DigiLab Friday, November 10, 2017 This is the third

More information

plot(seq(0,10,1), seq(0,10,1), main = "the Title", xlim=c(1,20), ylim=c(1,20), col="darkblue");

plot(seq(0,10,1), seq(0,10,1), main = the Title, xlim=c(1,20), ylim=c(1,20), col=darkblue); R for Biologists Day 3 Graphing and Making Maps with Your Data Graphing is a pretty convenient use for R, especially in Rstudio. plot() is the most generalized graphing function. If you give it all numeric

More information

What is Stata? A programming language to do sta;s;cs Strongly influenced by economists Open source, sort of. An acceptable way to manage data

What is Stata? A programming language to do sta;s;cs Strongly influenced by economists Open source, sort of. An acceptable way to manage data Introduc)on to Stata Training Workshop on the Commitment to Equity Methodology CEQ Ins;tute, Asian Development Bank, and The Ministry of Finance Dili May-June, 2017 What is Stata? A programming language

More information

Frances Provan i #)# #%'

Frances Provan i #)# #%' !"#$%&#& Frances Provan i ##+), &'!#( $& #)# *% #%' & SPSS Versions... 2 Some slide shorthand... 2 Did you know you could... 2 Nice newish graphs... 2 Population Pyramids... 2 Population Pyramids: categories...

More information

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel

MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel MIS 0855 Data Science (Section 006) Fall 2017 In-Class Exercise (Day 18) Finding Bad Data in Excel Objective: Find and fix a data set with incorrect values Learning Outcomes: Use Excel to identify incorrect

More information

How to import text files to Microsoft Excel 2016:

How to import text files to Microsoft Excel 2016: How to import text files to Microsoft Excel 2016: You would use these directions if you get a delimited text file from a government agency (or some other source). This might be tab-delimited, comma-delimited

More information

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler JMP in a nutshell 1 HR, 17 Apr 2018 The software JMP Pro 14 is installed on the Macs of the Phonetics Institute. Private versions can be bought from

More information

Code Plug Management: Contact List Import/Export. Version 1.0, Dec 16, 2015

Code Plug Management: Contact List Import/Export. Version 1.0, Dec 16, 2015 Code Plug Management: Contact List Import/Export Version 1.0, Dec 16, 2015 Background This presentation will show how to update and maintain contact lists in the CS750 The following applications will be

More information

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21 Reference Guide Adding a Generic File Store - Importing From a Local or Network Folder Page 1 of 21 Adding a Generic File Store TABLE OF CONTENTS Background First Things First The Process Creating the

More information

CS130/230 Lecture 6 Introduction to StatView

CS130/230 Lecture 6 Introduction to StatView Thursday, January 15, 2004 Intro to StatView CS130/230 Lecture 6 Introduction to StatView StatView is a statistical analysis program that allows: o Data management in a spreadsheet-like format o Graphs

More information

Exploratory data analysis

Exploratory data analysis Lecture 4 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents Exploratory data analysis Exploratory data analysis What is exploratory data analysis (EDA) In this lecture we discuss how

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:

More information

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client Lab 2.0 - MySQL CISC3140, Fall 2011 DUE: Oct. 6th (Part 1 only) Part 1 1. Getting started This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client host

More information

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function. Today Function The today function: =TODAY() It has no arguments, and returns the date that the computer is set to. It is volatile, so if you save it and reopen the file one month later the new, updated

More information

An Introductory Guide to SpecTRM

An Introductory Guide to SpecTRM An Introductory Guide to SpecTRM SpecTRM (pronounced spectrum and standing for Specification Tools and Requirements Methodology) is a toolset to support the specification and development of safe systems

More information

TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING

TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING Guillaume Colley, Lead Data Analyst, BCCFE Page 1 Contents Customized SAS Session Run system options as SAS starts Labels management Shortcut

More information

DATA SCIENCE AND MACHINE LEARNING

DATA SCIENCE AND MACHINE LEARNING DATA SCIENCE AND MACHINE LEARNING Introduction to Data Tables Associate Professor in Applied Statistics, Department of Mathematics, School of Applied Mathematical & Physical Sciences, National Technical

More information

1. Open the New American FactFinder using this link:

1. Open the New American FactFinder using this link: Exercises for Mapping and Using US Census Data MIT GIS Services, IAP 2012 More information, including a comparison of tools available through the MIT Libraries, can be found at: http://libraries.mit.edu/guides/types/census/tools-overview.html

More information

These are notes for the third lecture; if statements and loops.

These are notes for the third lecture; if statements and loops. These are notes for the third lecture; if statements and loops. 1 Yeah, this is going to be the second slide in a lot of lectures. 2 - Dominant language for desktop application development - Most modern

More information

Tutorial 4 - Attribute data in ArcGIS

Tutorial 4 - Attribute data in ArcGIS Tutorial 4 - Attribute data in ArcGIS COPY the Lab4 archive to your server folder and unpack it. The objectives of this tutorial include: Understand how ArcGIS stores and structures attribute data Learn

More information

Lecture 1: Overview

Lecture 1: Overview 15-150 Lecture 1: Overview Lecture by Stefan Muller May 21, 2018 Welcome to 15-150! Today s lecture was an overview that showed the highlights of everything you re learning this semester, which also meant

More information

Excel Functions & Tables

Excel Functions & Tables Excel Functions & Tables Winter 2012 Winter 2012 CS130 - Excel Functions & Tables 1 Review of Functions Quick Mathematics Review As it turns out, some of the most important mathematics for this course

More information

MiniBase Workbook. Schoolwires Centricity2

MiniBase Workbook. Schoolwires Centricity2 MiniBase Workbook Schoolwires Centricity2 Table of Contents Introduction... 1 Create a New MiniBase... 2 Add Records to the MiniBase:... 3 Add Records One at a Time... 3 Import Records:... 4 Deploy the

More information

Importing data sets in R

Importing data sets in R Importing data sets in R R can import and export different types of data sets including csv files text files excel files access database STATA data SPSS data shape files audio files image files and many

More information

Introduction to Stata Getting Data into Stata. 1. Enter Data: Create a New Data Set in Stata...

Introduction to Stata Getting Data into Stata. 1. Enter Data: Create a New Data Set in Stata... Introduction to Stata 2016-17 02. Getting Data into Stata 1. Enter Data: Create a New Data Set in Stata.... 2. Enter Data: How to Import an Excel Data Set.... 3. Import a Stata Data Set Directly from the

More information

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide Incident Response Programming with R Eric Zielinski Sr. Consultant, Nationwide About Me? Cyber Defender for Nationwide Over 15 years in Information Security Speaker at various conferences FIRST, CEIC,

More information

Using the CRM Pivot Tables

Using the CRM Pivot Tables Using the CRM Pivot Tables Pivot tables have now been added to your CRM system: we hope that these will provide you with an easy way to produce charts and graphs straight from your CRM, using the most

More information

Importing vehicles into Glass-Net from ANY Dealer Management System

Importing vehicles into Glass-Net from ANY Dealer Management System Importing vehicles into Glass-Net from ANY Dealer Management System Step 1: Preparing your DMS CSV Ensure you download a Comma Separated Values (CSV) file from your Dealer Management System (DMS) and save

More information

Using vletter Handwriting Software with Mail Merge in Word 2007

Using vletter Handwriting Software with Mail Merge in Word 2007 Using vletter Handwriting Software with Mail Merge in Word 2007 Q: What is Mail Merge? A: The Mail Merge feature in Microsoft Word allows you to merge an address file with a form letter in order to generate

More information

Excel Functions & Tables

Excel Functions & Tables Excel Functions & Tables Fall 2014 Fall 2014 CS130 - Excel Functions & Tables 1 Review of Functions Quick Mathematics Review As it turns out, some of the most important mathematics for this course revolves

More information

Earthquake data in geonet.org.nz

Earthquake data in geonet.org.nz Earthquake data in geonet.org.nz There is are large gaps in the 2012 and 2013 data, so let s not use it. Instead we ll use a previous year. Go to http://http://quakesearch.geonet.org.nz/ At the screen,

More information

Analysis and visualization with v isone

Analysis and visualization with v isone Analysis and visualization with v isone Jürgen Lerner University of Konstanz Egoredes Summerschool Barcelona, 21. 25. June, 2010 About v isone. Visone is the Italian word for mink. In Spanish visón. visone

More information

SPSS TRAINING SPSS VIEWS

SPSS TRAINING SPSS VIEWS SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data

More information