AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS

Size: px
Start display at page:

Download "AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS"

Transcription

1 AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS 24 January 2017 Stefan Breet breet@rsm.nl

2 TODAY What is R? How to use R? The Basics How to use R? The Data Analysis Process

3 WHAT IS R? AN OVERVIEW

4 WHAT IS R? R is a language and environment for statistical computing and graphics R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes (R-Project.org, 2017): an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

5 THE HISTORY OF R R in it s current form is a dialect of the programming language S First version of S is distributed outside of Bell labs S is developed by John Chambers and colleagues at Bell Labs as a statistical analysis environment for internal use Two books are published and the source code is licensed by AT&T for educational purposes

6 THE HISTORY OF R R in it s current form is a dialect of the programming language S Ross Iahak and Robert Gentleman create R at the University of Auckland and closely 1988 modeled it on S The system is rewritten in C and resembles the system we First public announcement of R use today. Features like functions are introduced and are described in the book The New S Language. Iahak & Gentleman make R free to use under the GNU General Public Licence

7 THE HISTORY OF R R in it s current form is a dialect of the programming language S The R Core Group is formed (including people associated with S-Plus), which controls R s source code R version is released R version is released

8 THE POPULARITY OF R R s popularity is rapidly increasing IEEE Spectrum s Programming Language Ranking C Java Python C++ R C# PhP JavaScript Ruby Go - Matlab SAS IEEE Spectrum Ranking Source: IEEE Spectrum Source: Muenchen, 2016

9 COMPANIES THAT USE R R is popular outside of academia as well

10 ADVANTAGES OF USING R Why do scientists, data analysts and companies like R? It is more than a piece of statistical software: it s a programming language! You can create your own objects, functions and packages. It s free and open source. Everybody can access the source code, build extra features (objects, functions, packages), detect bugs and solve them. It s extremely versatile. It is the most comprehensive statistical analysis environment available. Ranging from the most basic statistical tests to the most complex analyses or data visualisations, R can do it.

11 ADVANTAGES OF USING R Why do scientists, data analysts and companies like R? It makes Reproducible Research easy. You can easily save, edit and share the code behind your analyses so others can reproduce them. Version control (via Github or Subversion) is easy to implement. Training & Support. R has a huge community of users and contributors, which makes it easy to find support if you have question. There are plenty of online courses and resources available if you want to learn more. Data Visualisation. R has by far the best data visualisation tools. It s easy to create beautiful plots, images and figures even if you want to make them interactive or dynamic.

12 DISADVANTAGES OF USING R R has a couple of disadvantages compared to other statistical software R has a steep learning curve in the beginning Once you master the basics, however, learning the advance stuff is easy It can be relatively slow and more complex than other programming languages (such as Python) The data you can load into R is limited to the size of your computer s working memory (RAM) Rule of thumb: don t use datasets larger than half the size of your working memory (RAM) However, packages have been developed that can handle big datasets No quality control on every package

13 R VS. STATA Both are popular among academics R is a programming language, while STATA is a software package R is free and open source, while STATA has to be purchased (and is expensive) STATA serves as a reliable basis (quality of applications is secured by the company), while R s range of different applications is much larger R has more users and therefore a larger support community

14 R VS. PYTHON Both are among the most popular programming languages R is designed for academics and data analysts, while Python is designed for programmers (that might want to analyse data or apply statistical techniques) R focuses on better and user friendly data analysis, statistics and graphical models, while Python focuses on productivity and readability R is a great place to start: you can access Python via the rpython package You can find a comparison here

15 PACKAGES You can extend R by installing packages Packages are collections of R functions, data and compiled code Packages are stored in the library R comes with a standard set of packages and others are available for download Before you can use a package, you have to load it into the global environment You can load packages (i.e., sets of functions) with library( package.name )

16 EXAMPLES The people behind RStudio have published several well-developed packages RMarkdown Create dynamic reports Shiny Create interactive web applications ggplot2 Create beautiful plots

17 MORE EXAMPLES A package exists for every type of statistical analysis Analysis Regression Analysis (LM & GLM) Meta-Analysis Structural Equation Modeling Survival Analysis Psychological, Psychometric & Personality Research Text Mining / Analysis Mediation Analysis Moderation Analysis Network Analysis Package stats metafor, metasem lavaan, openmx, SEM survival psych tm mediation pequod igraph, sna

18 HOW TO USE R THE BASICS

19 ALWAYS START WITH RSTUDIO RStudio makes R easier to use The R Console The RStudio Interface

20 RSTUDIO It includes a code editor, debugging tools and visualization tools Code Editor = Edit and save your scripts Console = Quickly try out code + view the output of your code Workspace Workspace = Overview of all the R objects loaded into the working memory of your computer. Plots = Preview of your plots + overview of the files in your working directory + list of the packages on your computer + help documentation Code Editor Console Plots The RStudio Interface

21 TIDY DATA Adhere to the principles of Tidy Data to ensure high-quality code and analyses A dataset is said to be tidy if it satisfies the following conditions: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each type of observational unit forms a table. Tidy data makes it easy to carry out data analysis. More information: Wickham, H Tidy Data. Journal of Statistical Software, 59(10).

22 TIDY DATA (CONTINUED) Make sure that others can understand your code Always make sure that the names of variables, values and functions are descriptive and human-readable Be consistent: use lower case and do not separate words by a space but use a dot (.) or underscore (_) instead For example: entrepreneurial_behavior instead of EntBehav or EB log_tenure instead of Tenure(Log) firm_performance instead of Firm Performance male / female instead of 0 / 1

23 GETTING STARTED Basic calculations & objects R is basically a calculator on steroids. Run the following commands in the RStudio console: 4+4 6*6 You can store a value as an object. You can create new objects with te assignment operator <- : x <- 4 y <- 6 8/ ^2 All R statements where you create objects have the same form: object_name <- value

24 OBJECTS & STRINGS Since objects contain the values you assigned to them, you can use objects in your calculations: You can store text as well by using double quotes : my_name <- stefan x+y x*y x-y We call such a piece of text a character string. x/y What happens if we compute this?: x + my_name

25 DATA TYPES Understanding the different data types in R is really important! x and my_name are vectors of a different class. You can check the type of an R object with the There are six types of atomic vectors the most basic/simple object in R: class() function: class(x) class(my_name) Logical Integer Double Numeric Most important! x = numeric my_name = character Character Complex Raw

26 COMBINING VALUES Before we continue exploring the different data types, it s useful to learn how to combine values You can combine/concatenate values by using the c() function: numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) Let s combine other data types: Character: my_full_name <- c( stefan, breet ) To see the result, type the name of the object and hit enter, or use the print() function: print(numbers) Logical: true_or_false <- c(true, FALSE, TRUE, FALSE) class(true_or_false)

27 FUNCTIONS You use functions to do most of your computations and analyses Every function consists of two parts: a name and a set of arguments (including an object): Let s try out a couple of functions: length(numbers) mean(numbers) mean(object,,, ) median(numbers) max(numbers) To see a description of the arguments, you can use the help function by putting a question mark (?) before the function min(numbers) summary(numbers) name:?mean

28 LOGICAL OPERATORS Also known as conditional statements You can compare R objects with logical operators. The result is a logical vector (i.e., TRUE / FALSE / NA) Examples: y < x y > x Operator Description y == x < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to!= not equal to x y x OR y x & y x AND y y!= x y > numbers Create a new object z: z <- 4 Check if z is equal to x or y: z == x z == y

29 LISTS A list can contain different types of data Atomic vectors can only contain data of the same type (numeric, logical or character). Lists and data frames can contain different data types. You can access elements of the list with the $ sign by calling their name: my_first_list$numbers Let s make a list of the objects we created with the list() function and provide them with a name: my_first_list <- list( my_name = my_full_name, numbers = numbers, Or by using brackets [ ]. Acces the first element as follows: my_first_list[1] And the second element: my_first_list[2] logical = true_or_false)

30 LIST VS DATAFRAME Let s take a closer look at our list Element Name Values Length my_name "stefan" "breet" 2 numbers logical TRUE FALSE TRUE FALSE 4 Note: every list element has a different length!

31 LIST VS DATAFRAME A dataframe can store different data types as well If every vector has the my_name numbers logical same length, we call "stefan" 1 TRUE the object a data "breet" 2 FALSE frame (existing of rows NA 3 TRUE and columns) and we NA 4 FALSE call the vectors variables. NA 5 NA NA 6 NA NA = Not available = NA 7 NA missing data NA 8 NA

32 SUMMARY What do you need to remember? Data Structures Operators Functions Atomic Vectors (Character, Numeric, Logical) Matrices Lists Data Frames Assignment Operator x <- value Arithmetic Operators + - / * ^ Logical Operators < <= > >= ==!= function_name( argument1, argument2, argument3, )

33 HOW TO USE R THE DATA ANALYSIS PROCESS

34 ABALONE DATA Data Analysis in R: An Example Abalones are marine snails and their meat is used for food, with prices in China reaching levels up to $ 85,- per kilogram. The dataset contains the physical measurements of 4176 abalones The research question: Can we predict the age of abalones based on their physical measurements?

35 THE DATASET Predicting the age of abalone from physical measurements Variable Name Data Type Measurement Description Sex nominal M, F, and I (infant) Length continous mm Longest shell measurement Diameter continous mm perpendicular to length Height continous mm with meat in shell Whole Weight continous grams whole abalone Shucked Weight continous grams weight of meat Viscera Weight continous grams gut weight (after bleeding) Shell Weight continous grams after being dried Rings integer +1.5 gives the age in years Source: UCI Machine Learning Respoitory

36 ALWAYS CREATE AN RSTUDIO PROJECT Creating a project makes file management easier Managing Files The RStudio Project is the place where you store raw and tidy data, the scripts you use to process and analyse the data, and output such as plots and tables or dynamic reports. Create a new project with the name R Workshop in a new directory. Loading and Saving Files A project automatically specifies the working directory, so you can use relative paths to access and save files. Absolute Path = /Users/Stefan/Documents/ data.csv or C:/Documents/data.csv Relative Path =./data.csv The dot represents the working directory. Note: You should never use absolute paths in your scripts! Click here for more information about projects

37 ALWAYS CREATE A SCRIPT Store your code in a script so you can rerun it later Use the console to experiment with code, but put it in a script as soon as you have written code Open a new R Script and save it with the name R Workshop that works and does what you want. RStudio s script editor wil also highlight syntax errors.

38 HOW TO IMPORT DATA INTO R You can import various file formats by using the appropriate package Rectangular Data Excel SPSS SAS STATA.csv /.tsv /.fwf.xls /.xlsx.sav.sas7bdat.dta base R readr readxl haven Key Functions: read.table() read.csv() read.csv2() Key Functions: read_csv() read_tsv() read_table() Key Functions: read_excel() Key Functions: read_sav() read_sas() read_dta()

39 STEP 1: DATA PROCESSING Let s import the dataset into R We re going to use the readr package to import the dataset: # Load the readr package library(readr) Copy-paste the abalone.csv dataset in the working directory # Load the dataset and assign it to an R object by giving it a name abalone <- read_csv(./abalone.csv") Note: every dataset imported, created or saved with the readr or haven packages will be stored as a special kind of data frame: the tibble. Tibbles have two advantages compared to standard data.frames: 1. Tibbles print only the first 10 rows + all the columns that fit on your screen (instead of all the rows and all the columns), so it s easier to work with large data 2. Each column reports its type (e.g., character, double, integer, etc.) If necessary, you can store a tibble as a data frame with the as.data.frame() function.

40 INSPECTING A DATA FRAME There are several ways in which you can inspect a data frame Let s see what the data looks like: print(abalone) # View the column names colnames(abalone) Notice that the column names are missing. The read_csv() function automatically reads the first row as column names. Let s change that: # Rename the columns with variable names by assigning the names to the colnames attribute colnames(abalone) <- c("sex", # Importing the dataset without column names abalone <- read_csv(./ abalone.csv, col_names = FALSE) "length", "diameter", height", whole_weight", "shucked_weight", viscera_weight","shell_weight", "rings")

41 PROCESSING DATA There are several ways in which you can inspect a data frame The first variable (sex) is stored as a character vector. For further analysis, however, we need to change it to a nominal variable with three levels (male, female, infant) with the as.factor() function: Let s check the levels of this variable with levels(): levels(abalone$sex) The levels are not in accordance with the tidy data principles, so let s rename them: # Change the class of the sex variable from character to factor abalone$sex <- as.factor(abalone$sex) # Rename the levels levels(abalone$sex) <- c("female", "infant", male") # Check the class of the variable class(abalone$sex) # Recheck the levels levels(abalone$sex)

42 DATA MANIPULATION WITH DPLYR dplyr is the best package for data manipulation The dplyr package provides a function for each basic verb of data manipulation, such as filtering, selecting columns, arranging, mutating, etc. # Select a subset of columns with select(): abalone <- select(abalone, age, sex, diameter, height, whole_weight) # Load the dplyr package library(dplyr) # Sort the dataset by age with arrange(): abalone <- arrange(abalone, age) Every dplyr functions starts with the name of the dataset as the first function argument. # Filter the dataset by age with filter(): filter(abalone, age > 25) # Create a new variable called age with mutate(): abalone <- mutate(abalone, age = rings + 1.5) # Pick the first ten rows with the slice() function: slice(abalone, 1:10)

43 DATA VISUALISATION Use ggplot to visualise your data R has built in data visualisation tools (the plot() function), but the ggplot2 package provides better quality plots and more options. # Load the ggplot2 package library(ggplot2) You can create plots in two different ways: 1. Use the quick plot function (qplot()), recommended for quick data inspection 2. Built plots layer by layer with the ggplot() function, recommended for creating highquality plots The easiest way to create plots with ggplot2 is by using the quick plot function qplot(). # Plot the dependent variable age: qplot(x = age, data = abalone) # Plot age vs diameter qplot(x = diameter, y = age, data = abalone) # Color the points by sex qplot(x = diameter, y = age, colour = sex, data = abalone)

44 THE PSYCH PACKAGE A great package for management scholars is the psych package. It offers great functions for analysing data from experiments and questionnaires, and comes with a couple of handy functions such as describe(). # Load the psych package library(psych) # Use the describe() and describeby() functions to inspect the dependent variable age describe(abalone$age) describeby(abalone$age, group = abalone$sex) The variable age is skewed and has a positive kurtosis. Let s log-transform this variable/ # Use the mutate function (dplyr) and log() function to create a new variable: abalone <- mutate(abalone, log_age = log(age)) # Inspect the dependent variable again describe(abalone$log_age) # Visualise the variable: qplot(x = log_age, data = abalone)

45 REGRESSION ANALYSIS It s really easy to conduct a regression analysis in R You can use the lm() (linear model) and glm() (generalised linear model) functions in R to conduct a regression analysis. You specify the regression formula as part of the function: # View a summary of the model: summary(model1) # Visually inspect the model plot(model1) lm(formula = age ~ sex + diameter + whole_weight, data = abalone) You can conduct moderation analysis by adapting the formula: # Store the resulting model model1 <- lm(formula = age ~ sex + diameter + whole_weight, data = abalone) model2 <- lm(formula = age ~ sex + diameter + whole_weight + diameter*whole_weight, data = abalone)

46 MODERATION ANALYSIS WITH THE PEQUOD PACKAGE The pequod package makes it easy to conduct moderation analysis The pequod package provides functions for moderated regression with residual centering, diagnostics (colinnearity), simple slopes analysis and interaction plots with the lmres() function. # Load the pequod package library(pequod) # Run model 1 model1 <- lmres(formula = age ~ sex + diameter + whole_weight, data = abalone) summary(model1) # Run model 2 model2 <- lmres(formula = age ~ sex + diameter + whole_weight + diameter*whole_weight, data = abalone) summary(model2) # Conduct a simple slopes test by indicating the predictor variable and the moderator ss <- simpleslope(object = model2, pred = diameter, mod1 = whole_weight ) print(ss) # Create an interaction plot PlotSlope(ss)

47 HOW TO USE R RESOURCES

48 A COUPLE OF USEFUL RESOURCES Books: R for Data Science (Grolemun & Wickham, 2016). Freely available online. Advanced R (Wickham, 2014). Freely available online. Online courses: Coursera EdX (e.g, Microsoft) Websites: Quick-R Stackoverflow (for asking questions)

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

IST Computational Tools for Statistics I. DEÜ, Department of Statistics

IST Computational Tools for Statistics I. DEÜ, Department of Statistics IST 1051 Computational Tools for Statistics I 1 DEÜ, Department of Statistics Course Objectives Computational Tools for Statistics-I course can increase the understanding of statistics and helps to learn

More information

Getting and Cleaning Data. Biostatistics

Getting and Cleaning Data. Biostatistics Getting and Cleaning Data Biostatistics 140.776 Getting and Cleaning Data Getting data: APIs and web scraping Cleaning data: Tidy data Transforming data: Regular expressions Getting Data Web site Nature

More information

Loading Data into R. Loading Data Sets

Loading Data into R. Loading Data Sets Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three

More information

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

SQL Server 2017: Data Science with Python or R?

SQL Server 2017: Data Science with Python or R? SQL Server 2017: Data Science with Python or R? Dejan Sarka Sponsor Introduction Dejan Sarka (dsarka@solidq.com, dsarka@siol.net, @DejanSarka) 30 years of experience SQL Server MVP, MCT, 16 books 20+ courses,

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

Introduction to R (& Rstudio) Fall R Workshop August 23-24, 2016

Introduction to R (& Rstudio) Fall R Workshop August 23-24, 2016 Introduction to R (& Rstudio) Fall R Workshop August 23-24, 2016 Why R? FREE Open source Constantly updating the functions is has Constantly adding new functions Learning R will help you learn other programming

More information

STAT 113: R/RStudio Intro

STAT 113: R/RStudio Intro STAT 113: R/RStudio Intro Colin Reimer Dawson Last Revised September 1, 2017 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1 is to log

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

ST Lab 1 - The basics of SAS

ST Lab 1 - The basics of SAS ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

R in the City. Richard Saldanha Oxquant Consulting LondonR Group Meeting 3rd November 2009

R in the City. Richard Saldanha Oxquant Consulting LondonR Group Meeting 3rd November 2009 R in the City Richard Saldanha Oxquant Consulting richard@oxquant.com LondonR Group Meeting 3rd November 2009 S Language Development 1965 Bell Labs pre-s work on a statistical computing language 1977 Bell

More information

Data-informed collection decisions using R or, learning R using collection data

Data-informed collection decisions using R or, learning R using collection data Data-informed collection decisions using R or, learning R using collection data Heidi Tebbe Collections & Research Librarian for Engineering and Data Science NCSU Libraries Collections & Research Librarian

More information

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant. BIMM 143 Data analysis with R Lecture 4 Barry Grant http://thegrantlab.org/bimm143 Recap From Last Time: Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Making sense of census microdata

Making sense of census microdata Making sense of census microdata Tutorial 3: Creating aggregated variables and visualisations First, open a new script in R studio and save it in your working directory, so you will be able to access this

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

Getting Started. Slides R-Intro: R-Analytics: R-HPC:

Getting Started. Slides R-Intro:   R-Analytics:   R-HPC: Getting Started Download and install R + Rstudio http://www.r-project.org/ https://www.rstudio.com/products/rstudio/download2/ TACC ssh username@wrangler.tacc.utexas.edu % module load Rstats %R Slides

More information

Introduction to R. Andy Grogan-Kaylor October 22, Contents

Introduction to R. Andy Grogan-Kaylor October 22, Contents Introduction to R Andy Grogan-Kaylor October 22, 2018 Contents 1 Background 2 2 Introduction 2 3 Base R and Libraries 3 4 Working Directory 3 5 Writing R Code or Script 4 6 Graphical User Interface 4 7

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

R Workshop Daniel Fuller

R Workshop Daniel Fuller R Workshop Daniel Fuller Welcome to the R Workshop @ Memorial HKR The R project for statistical computing is a free open source statistical programming language and project. Follow these steps to get started:

More information

Data Input/Output. Andrew Jaffe. January 4, 2016

Data Input/Output. Andrew Jaffe. January 4, 2016 Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory

More information

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 welcome Course Description The objective of this course is to learn how to

More information

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political

More information

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also

More information

Session 1 Nick Hathaway;

Session 1 Nick Hathaway; Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor.

Reading data into R. 1. Data in human readable form, which can be inspected with a text editor. Reading data into R There is a famous, but apocryphal, story about Mrs Beeton, the 19th century cook and writer, which says that she began her recipe for rabbit stew with the instruction First catch your

More information

Creating a data file and entering data

Creating a data file and entering data 4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that

More information

Overview of R. Biostatistics

Overview of R. Biostatistics Overview of R Biostatistics 140.776 Stroustrup s Law There are only two kinds of languages: the ones people complain about and the ones nobody uses. R is a dialect of S What is R? What is S? S is a language

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

Introduction to R. 1 Introduction 2. 2 What You Need 2

Introduction to R. 1 Introduction 2. 2 What You Need 2 Introduction to R Dave Armstrong University of Wisconsin-Milwaukee Department of Political Science e: armstrod@uwm.edu w: http://www.quantoid.net/teachuw/uwmpsych Contents 1 Introduction 2 2 What You Need

More information

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University

RNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day four Quantifying expression Intro to R Differential expression

More information

An Introduction to R. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata October 17, 2012

An Introduction to R. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata October 17, 2012 An Introduction to R Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata October 17, 2012 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly

More information

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting

More information

Analyzing Economic Data using R

Analyzing Economic Data using R Analyzing Economic Data using R Introduction & Organization Sebastiano Manzan BUS 4093H Fall 2016 1 / 30 What is this course about? The goal of the course is to introduce you to the analysis of economic

More information

Software Development. Integrated Software Environment

Software Development. Integrated Software Environment Software Development Integrated Software Environment Source Code vs. Machine Code What is source code? Source code and object code refer to the "before" and "after" versions of a computer program that

More information

Data Manipulation. Module 5

Data Manipulation.   Module 5 Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping

More information

Individual Covariates

Individual Covariates WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015

Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015 Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015 Contents Things to Know Before You Begin.................................... 1 Entering and Outputting Data......................................

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software

Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software Talitha Washington, Howard University Edray Goins, Purdue University Luis Melara, Shippensburg

More information

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services

R: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services R: A Gentle Introduction Vega Bharadwaj George Mason University Data Services Part I: Why R? What do YOU know about R and why do you want to learn it? Reasons to use R Free and open-source User-created

More information

Introduction to R: Part I

Introduction to R: Part I Introduction to R: Part I Jeffrey C. Miecznikowski March 26, 2015 R impact R is the 13th most popular language by IEEE Spectrum (2014) Google uses R for ROI calculations Ford uses R to improve vehicle

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus Introduction (SPSS) SPSS is the acronym of Statistical Package for the Social Sciences. SPSS is one of the most popular statistical packages which can perform highly complex data manipulation and analysis

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

STAT 213: R/RStudio Intro

STAT 213: R/RStudio Intro STAT 213: R/RStudio Intro Colin Reimer Dawson Last Revised February 10, 2016 1 Starting R/RStudio Skip to the section below that is relevant to your choice of implementation. Installing R and RStudio Locally

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

History, installation and connection

History, installation and connection History, installation and connection The men behind our software Jim Goodnight, CEO SAS Inc Ross Ihaka Robert Gentleman (Duncan Temple Lang) originators of R 2 / 75 History SAS From late 1960s, North Carolina

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Getting Started with R

Getting Started with R Getting Started with R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Tool Some of you may have used

More information

Statistics Statistical Computing Software

Statistics Statistical Computing Software Statistics 135 - Statistical Computing Software Mark E. Irwin Department of Statistics Harvard University Autumn Term Monday, September 19, 2005 - January 2006 Copyright c 2005 by Mark E. Irwin Personnel

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

Outline for Today. Introduction to An Introduction to Computational Data Analysis for Biology. What is this Course About?

Outline for Today. Introduction to An Introduction to Computational Data Analysis for Biology. What is this Course About? Outline for Today Introduction to An Introduction to Computational Data Analysis for Biology http://jarrettbyrnes.info/biol697 Jarrett Byrnes UMass Boston 1. Why this course? 2. Who are we? 3. How will

More information

(c) What is the result of running the following program? x = 3 f = function (y){y+x} g = function (y){x =10; f(y)} g (7) Solution: The result is 10.

(c) What is the result of running the following program? x = 3 f = function (y){y+x} g = function (y){x =10; f(y)} g (7) Solution: The result is 10. Statistics 506 Exam 2 December 17, 2015 1. (a) Suppose that li is a list containing K arrays, each of which consists of distinct integers that lie between 1 and n. That is, for each k = 1,..., K, li[[k]]

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

On R for Statistics. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata September 16, 2011

On R for Statistics. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata September 16, 2011 On R for Statistics Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata September 16, 2011 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction In this exercise, we will learn how to reorganize and reformat a data

More information

Introducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015

Introducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Introducion to R and parallel libraries Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Overview What is R R Console Input and Evaluation Data

More information

The History and Use of R. Joseph Kambourakis

The History and Use of R. Joseph Kambourakis The History and Use of R Joseph Kambourakis Ground Rules Interrupt me These are all my opinions and not of EMC or Big Data Analytics, Discovery & Visualization Meetup Slides will be available Joseph

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Fraud Detection Using Random Forest Algorithm

Fraud Detection Using Random Forest Algorithm Fraud Detection Using Random Forest Algorithm Eesha Goel Computer Science Engineering and Technology, GZSCCET, Bhatinda, India eesha1992@rediffmail.com Abhilasha Computer Science Engineering and Technology,

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 2: Software Introduction Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University jacoby@msu.edu Getting Started with R What is R? A tiny R session

More information

Introducing R/Tidyverse to Clinical Statistical Programming

Introducing R/Tidyverse to Clinical Statistical Programming Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

The "R" Statistics library: Research Applications

The R Statistics library: Research Applications Edith Cowan University Research Online ECU Research Week Conferences, Symposia and Campus Events 2012 The "R" Statistics library: Research Applications David Allen Edith Cowan University Abhay Singh Edith

More information

Introduction to Functions. Biostatistics

Introduction to Functions. Biostatistics Introduction to Functions Biostatistics 140.776 Functions The development of a functions in R represents the next level of R programming, beyond writing code at the console or in a script. 1. Code 2. Functions

More information

R Short Course Session 1

R Short Course Session 1 R Short Course Session 1 Daniel Zhao, PhD Sixia Chen, PhD Department of Biostatistics and Epidemiology College of Public Health, OUHSC 10/23/2015 Outline Overview of the 5 sessions Pre-requisite requirements

More information

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017 R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

R basics workshop Sohee Kang

R basics workshop Sohee Kang R basics workshop Sohee Kang Math and Stats Learning Centre Department of Computer and Mathematical Sciences Objective To teach the basic knowledge necessary to use R independently, thus helping participants

More information

Data Input/Output. Introduction to R for Public Health Researchers

Data Input/Output. Introduction to R for Public Health Researchers Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R "can't find" RStudio can help, and so

More information

Introduction to R. Daniel Berglund. 9 November 2017

Introduction to R. Daniel Berglund. 9 November 2017 Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is

More information

Prediction Using Regression Analysis

Prediction Using Regression Analysis Prediction Using Regression Analysis Shantanu Sarkar 1, Anuj Vaijapurkar 2, VimalKumar Bhardwaj 3,Swarnalatha P 4 1,2,3 School of Computer Science, VIT University, Vellore 4 Assistant Professor, School

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

UNIT 4. Research Methods in Business

UNIT 4. Research Methods in Business UNIT 4 Preparing Data for Analysis:- After data are obtained through questionnaires, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any have to be

More information

2015 Vanderbilt University

2015 Vanderbilt University Excel Supplement 2015 Vanderbilt University Introduction This guide describes how to perform some basic data manipulation tasks in Microsoft Excel. Excel is spreadsheet software that is used to store information

More information

Package quickreg. R topics documented:

Package quickreg. R topics documented: Package quickreg September 28, 2017 Title Build Regression Models Quickly and Display the Results Using 'ggplot2' Version 1.5.0 A set of functions to extract results from regression models and plot the

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Software and Data www.research-training.net/manchester2018 Graeme.Hutcheson@manchester.ac.uk University of

More information

Introduction to Scripting Languages. October 2017

Introduction to Scripting Languages. October 2017 Introduction to Scripting Languages damien.francois@uclouvain.be October 2017 1 Goal of this session: Advocate the use of scripting languages and help you choose the most suitable for your needs 2 Agenda

More information

Intro to R. Fall Fall 2017 CS130 - Intro to R 1

Intro to R. Fall Fall 2017 CS130 - Intro to R 1 Intro to R Fall 2017 Fall 2017 CS130 - Intro to R 1 Intro to R R is a language and environment that allows: Data management Graphs and tables Statistical analyses You will need: some basic statistics We

More information

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1 Intermediate Stata Jeremy Craig Green 1 March 2011 3/29/2011 1 Advantages of Stata Ubiquitous in economics and political science Gaining popularity in health sciences Large library of add-on modules Version

More information