AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS
|
|
- Alison Hawkins
- 5 years ago
- Views:
Transcription
1 AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS 24 January 2017 Stefan Breet breet@rsm.nl
2 TODAY What is R? How to use R? The Basics How to use R? The Data Analysis Process
3 WHAT IS R? AN OVERVIEW
4 WHAT IS R? R is a language and environment for statistical computing and graphics R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes (R-Project.org, 2017): an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
5 THE HISTORY OF R R in it s current form is a dialect of the programming language S First version of S is distributed outside of Bell labs S is developed by John Chambers and colleagues at Bell Labs as a statistical analysis environment for internal use Two books are published and the source code is licensed by AT&T for educational purposes
6 THE HISTORY OF R R in it s current form is a dialect of the programming language S Ross Iahak and Robert Gentleman create R at the University of Auckland and closely 1988 modeled it on S The system is rewritten in C and resembles the system we First public announcement of R use today. Features like functions are introduced and are described in the book The New S Language. Iahak & Gentleman make R free to use under the GNU General Public Licence
7 THE HISTORY OF R R in it s current form is a dialect of the programming language S The R Core Group is formed (including people associated with S-Plus), which controls R s source code R version is released R version is released
8 THE POPULARITY OF R R s popularity is rapidly increasing IEEE Spectrum s Programming Language Ranking C Java Python C++ R C# PhP JavaScript Ruby Go - Matlab SAS IEEE Spectrum Ranking Source: IEEE Spectrum Source: Muenchen, 2016
9 COMPANIES THAT USE R R is popular outside of academia as well
10 ADVANTAGES OF USING R Why do scientists, data analysts and companies like R? It is more than a piece of statistical software: it s a programming language! You can create your own objects, functions and packages. It s free and open source. Everybody can access the source code, build extra features (objects, functions, packages), detect bugs and solve them. It s extremely versatile. It is the most comprehensive statistical analysis environment available. Ranging from the most basic statistical tests to the most complex analyses or data visualisations, R can do it.
11 ADVANTAGES OF USING R Why do scientists, data analysts and companies like R? It makes Reproducible Research easy. You can easily save, edit and share the code behind your analyses so others can reproduce them. Version control (via Github or Subversion) is easy to implement. Training & Support. R has a huge community of users and contributors, which makes it easy to find support if you have question. There are plenty of online courses and resources available if you want to learn more. Data Visualisation. R has by far the best data visualisation tools. It s easy to create beautiful plots, images and figures even if you want to make them interactive or dynamic.
12 DISADVANTAGES OF USING R R has a couple of disadvantages compared to other statistical software R has a steep learning curve in the beginning Once you master the basics, however, learning the advance stuff is easy It can be relatively slow and more complex than other programming languages (such as Python) The data you can load into R is limited to the size of your computer s working memory (RAM) Rule of thumb: don t use datasets larger than half the size of your working memory (RAM) However, packages have been developed that can handle big datasets No quality control on every package
13 R VS. STATA Both are popular among academics R is a programming language, while STATA is a software package R is free and open source, while STATA has to be purchased (and is expensive) STATA serves as a reliable basis (quality of applications is secured by the company), while R s range of different applications is much larger R has more users and therefore a larger support community
14 R VS. PYTHON Both are among the most popular programming languages R is designed for academics and data analysts, while Python is designed for programmers (that might want to analyse data or apply statistical techniques) R focuses on better and user friendly data analysis, statistics and graphical models, while Python focuses on productivity and readability R is a great place to start: you can access Python via the rpython package You can find a comparison here
15 PACKAGES You can extend R by installing packages Packages are collections of R functions, data and compiled code Packages are stored in the library R comes with a standard set of packages and others are available for download Before you can use a package, you have to load it into the global environment You can load packages (i.e., sets of functions) with library( package.name )
16 EXAMPLES The people behind RStudio have published several well-developed packages RMarkdown Create dynamic reports Shiny Create interactive web applications ggplot2 Create beautiful plots
17 MORE EXAMPLES A package exists for every type of statistical analysis Analysis Regression Analysis (LM & GLM) Meta-Analysis Structural Equation Modeling Survival Analysis Psychological, Psychometric & Personality Research Text Mining / Analysis Mediation Analysis Moderation Analysis Network Analysis Package stats metafor, metasem lavaan, openmx, SEM survival psych tm mediation pequod igraph, sna
18 HOW TO USE R THE BASICS
19 ALWAYS START WITH RSTUDIO RStudio makes R easier to use The R Console The RStudio Interface
20 RSTUDIO It includes a code editor, debugging tools and visualization tools Code Editor = Edit and save your scripts Console = Quickly try out code + view the output of your code Workspace Workspace = Overview of all the R objects loaded into the working memory of your computer. Plots = Preview of your plots + overview of the files in your working directory + list of the packages on your computer + help documentation Code Editor Console Plots The RStudio Interface
21 TIDY DATA Adhere to the principles of Tidy Data to ensure high-quality code and analyses A dataset is said to be tidy if it satisfies the following conditions: 1. Each variable forms a column. 2. Each observation forms a row. 3. Each type of observational unit forms a table. Tidy data makes it easy to carry out data analysis. More information: Wickham, H Tidy Data. Journal of Statistical Software, 59(10).
22 TIDY DATA (CONTINUED) Make sure that others can understand your code Always make sure that the names of variables, values and functions are descriptive and human-readable Be consistent: use lower case and do not separate words by a space but use a dot (.) or underscore (_) instead For example: entrepreneurial_behavior instead of EntBehav or EB log_tenure instead of Tenure(Log) firm_performance instead of Firm Performance male / female instead of 0 / 1
23 GETTING STARTED Basic calculations & objects R is basically a calculator on steroids. Run the following commands in the RStudio console: 4+4 6*6 You can store a value as an object. You can create new objects with te assignment operator <- : x <- 4 y <- 6 8/ ^2 All R statements where you create objects have the same form: object_name <- value
24 OBJECTS & STRINGS Since objects contain the values you assigned to them, you can use objects in your calculations: You can store text as well by using double quotes : my_name <- stefan x+y x*y x-y We call such a piece of text a character string. x/y What happens if we compute this?: x + my_name
25 DATA TYPES Understanding the different data types in R is really important! x and my_name are vectors of a different class. You can check the type of an R object with the There are six types of atomic vectors the most basic/simple object in R: class() function: class(x) class(my_name) Logical Integer Double Numeric Most important! x = numeric my_name = character Character Complex Raw
26 COMBINING VALUES Before we continue exploring the different data types, it s useful to learn how to combine values You can combine/concatenate values by using the c() function: numbers <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) Let s combine other data types: Character: my_full_name <- c( stefan, breet ) To see the result, type the name of the object and hit enter, or use the print() function: print(numbers) Logical: true_or_false <- c(true, FALSE, TRUE, FALSE) class(true_or_false)
27 FUNCTIONS You use functions to do most of your computations and analyses Every function consists of two parts: a name and a set of arguments (including an object): Let s try out a couple of functions: length(numbers) mean(numbers) mean(object,,, ) median(numbers) max(numbers) To see a description of the arguments, you can use the help function by putting a question mark (?) before the function min(numbers) summary(numbers) name:?mean
28 LOGICAL OPERATORS Also known as conditional statements You can compare R objects with logical operators. The result is a logical vector (i.e., TRUE / FALSE / NA) Examples: y < x y > x Operator Description y == x < less than <= less than or equal to > greater than >= greater than or equal to == exactly equal to!= not equal to x y x OR y x & y x AND y y!= x y > numbers Create a new object z: z <- 4 Check if z is equal to x or y: z == x z == y
29 LISTS A list can contain different types of data Atomic vectors can only contain data of the same type (numeric, logical or character). Lists and data frames can contain different data types. You can access elements of the list with the $ sign by calling their name: my_first_list$numbers Let s make a list of the objects we created with the list() function and provide them with a name: my_first_list <- list( my_name = my_full_name, numbers = numbers, Or by using brackets [ ]. Acces the first element as follows: my_first_list[1] And the second element: my_first_list[2] logical = true_or_false)
30 LIST VS DATAFRAME Let s take a closer look at our list Element Name Values Length my_name "stefan" "breet" 2 numbers logical TRUE FALSE TRUE FALSE 4 Note: every list element has a different length!
31 LIST VS DATAFRAME A dataframe can store different data types as well If every vector has the my_name numbers logical same length, we call "stefan" 1 TRUE the object a data "breet" 2 FALSE frame (existing of rows NA 3 TRUE and columns) and we NA 4 FALSE call the vectors variables. NA 5 NA NA 6 NA NA = Not available = NA 7 NA missing data NA 8 NA
32 SUMMARY What do you need to remember? Data Structures Operators Functions Atomic Vectors (Character, Numeric, Logical) Matrices Lists Data Frames Assignment Operator x <- value Arithmetic Operators + - / * ^ Logical Operators < <= > >= ==!= function_name( argument1, argument2, argument3, )
33 HOW TO USE R THE DATA ANALYSIS PROCESS
34 ABALONE DATA Data Analysis in R: An Example Abalones are marine snails and their meat is used for food, with prices in China reaching levels up to $ 85,- per kilogram. The dataset contains the physical measurements of 4176 abalones The research question: Can we predict the age of abalones based on their physical measurements?
35 THE DATASET Predicting the age of abalone from physical measurements Variable Name Data Type Measurement Description Sex nominal M, F, and I (infant) Length continous mm Longest shell measurement Diameter continous mm perpendicular to length Height continous mm with meat in shell Whole Weight continous grams whole abalone Shucked Weight continous grams weight of meat Viscera Weight continous grams gut weight (after bleeding) Shell Weight continous grams after being dried Rings integer +1.5 gives the age in years Source: UCI Machine Learning Respoitory
36 ALWAYS CREATE AN RSTUDIO PROJECT Creating a project makes file management easier Managing Files The RStudio Project is the place where you store raw and tidy data, the scripts you use to process and analyse the data, and output such as plots and tables or dynamic reports. Create a new project with the name R Workshop in a new directory. Loading and Saving Files A project automatically specifies the working directory, so you can use relative paths to access and save files. Absolute Path = /Users/Stefan/Documents/ data.csv or C:/Documents/data.csv Relative Path =./data.csv The dot represents the working directory. Note: You should never use absolute paths in your scripts! Click here for more information about projects
37 ALWAYS CREATE A SCRIPT Store your code in a script so you can rerun it later Use the console to experiment with code, but put it in a script as soon as you have written code Open a new R Script and save it with the name R Workshop that works and does what you want. RStudio s script editor wil also highlight syntax errors.
38 HOW TO IMPORT DATA INTO R You can import various file formats by using the appropriate package Rectangular Data Excel SPSS SAS STATA.csv /.tsv /.fwf.xls /.xlsx.sav.sas7bdat.dta base R readr readxl haven Key Functions: read.table() read.csv() read.csv2() Key Functions: read_csv() read_tsv() read_table() Key Functions: read_excel() Key Functions: read_sav() read_sas() read_dta()
39 STEP 1: DATA PROCESSING Let s import the dataset into R We re going to use the readr package to import the dataset: # Load the readr package library(readr) Copy-paste the abalone.csv dataset in the working directory # Load the dataset and assign it to an R object by giving it a name abalone <- read_csv(./abalone.csv") Note: every dataset imported, created or saved with the readr or haven packages will be stored as a special kind of data frame: the tibble. Tibbles have two advantages compared to standard data.frames: 1. Tibbles print only the first 10 rows + all the columns that fit on your screen (instead of all the rows and all the columns), so it s easier to work with large data 2. Each column reports its type (e.g., character, double, integer, etc.) If necessary, you can store a tibble as a data frame with the as.data.frame() function.
40 INSPECTING A DATA FRAME There are several ways in which you can inspect a data frame Let s see what the data looks like: print(abalone) # View the column names colnames(abalone) Notice that the column names are missing. The read_csv() function automatically reads the first row as column names. Let s change that: # Rename the columns with variable names by assigning the names to the colnames attribute colnames(abalone) <- c("sex", # Importing the dataset without column names abalone <- read_csv(./ abalone.csv, col_names = FALSE) "length", "diameter", height", whole_weight", "shucked_weight", viscera_weight","shell_weight", "rings")
41 PROCESSING DATA There are several ways in which you can inspect a data frame The first variable (sex) is stored as a character vector. For further analysis, however, we need to change it to a nominal variable with three levels (male, female, infant) with the as.factor() function: Let s check the levels of this variable with levels(): levels(abalone$sex) The levels are not in accordance with the tidy data principles, so let s rename them: # Change the class of the sex variable from character to factor abalone$sex <- as.factor(abalone$sex) # Rename the levels levels(abalone$sex) <- c("female", "infant", male") # Check the class of the variable class(abalone$sex) # Recheck the levels levels(abalone$sex)
42 DATA MANIPULATION WITH DPLYR dplyr is the best package for data manipulation The dplyr package provides a function for each basic verb of data manipulation, such as filtering, selecting columns, arranging, mutating, etc. # Select a subset of columns with select(): abalone <- select(abalone, age, sex, diameter, height, whole_weight) # Load the dplyr package library(dplyr) # Sort the dataset by age with arrange(): abalone <- arrange(abalone, age) Every dplyr functions starts with the name of the dataset as the first function argument. # Filter the dataset by age with filter(): filter(abalone, age > 25) # Create a new variable called age with mutate(): abalone <- mutate(abalone, age = rings + 1.5) # Pick the first ten rows with the slice() function: slice(abalone, 1:10)
43 DATA VISUALISATION Use ggplot to visualise your data R has built in data visualisation tools (the plot() function), but the ggplot2 package provides better quality plots and more options. # Load the ggplot2 package library(ggplot2) You can create plots in two different ways: 1. Use the quick plot function (qplot()), recommended for quick data inspection 2. Built plots layer by layer with the ggplot() function, recommended for creating highquality plots The easiest way to create plots with ggplot2 is by using the quick plot function qplot(). # Plot the dependent variable age: qplot(x = age, data = abalone) # Plot age vs diameter qplot(x = diameter, y = age, data = abalone) # Color the points by sex qplot(x = diameter, y = age, colour = sex, data = abalone)
44 THE PSYCH PACKAGE A great package for management scholars is the psych package. It offers great functions for analysing data from experiments and questionnaires, and comes with a couple of handy functions such as describe(). # Load the psych package library(psych) # Use the describe() and describeby() functions to inspect the dependent variable age describe(abalone$age) describeby(abalone$age, group = abalone$sex) The variable age is skewed and has a positive kurtosis. Let s log-transform this variable/ # Use the mutate function (dplyr) and log() function to create a new variable: abalone <- mutate(abalone, log_age = log(age)) # Inspect the dependent variable again describe(abalone$log_age) # Visualise the variable: qplot(x = log_age, data = abalone)
45 REGRESSION ANALYSIS It s really easy to conduct a regression analysis in R You can use the lm() (linear model) and glm() (generalised linear model) functions in R to conduct a regression analysis. You specify the regression formula as part of the function: # View a summary of the model: summary(model1) # Visually inspect the model plot(model1) lm(formula = age ~ sex + diameter + whole_weight, data = abalone) You can conduct moderation analysis by adapting the formula: # Store the resulting model model1 <- lm(formula = age ~ sex + diameter + whole_weight, data = abalone) model2 <- lm(formula = age ~ sex + diameter + whole_weight + diameter*whole_weight, data = abalone)
46 MODERATION ANALYSIS WITH THE PEQUOD PACKAGE The pequod package makes it easy to conduct moderation analysis The pequod package provides functions for moderated regression with residual centering, diagnostics (colinnearity), simple slopes analysis and interaction plots with the lmres() function. # Load the pequod package library(pequod) # Run model 1 model1 <- lmres(formula = age ~ sex + diameter + whole_weight, data = abalone) summary(model1) # Run model 2 model2 <- lmres(formula = age ~ sex + diameter + whole_weight + diameter*whole_weight, data = abalone) summary(model2) # Conduct a simple slopes test by indicating the predictor variable and the moderator ss <- simpleslope(object = model2, pred = diameter, mod1 = whole_weight ) print(ss) # Create an interaction plot PlotSlope(ss)
47 HOW TO USE R RESOURCES
48 A COUPLE OF USEFUL RESOURCES Books: R for Data Science (Grolemun & Wickham, 2016). Freely available online. Advanced R (Wickham, 2014). Freely available online. Online courses: Coursera EdX (e.g, Microsoft) Websites: Quick-R Stackoverflow (for asking questions)
An Introduction to R. Ed D. J. Berry 9th January 2017
An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient
More informationData Wrangling in the Tidyverse
Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction
More informationIST Computational Tools for Statistics I. DEÜ, Department of Statistics
IST 1051 Computational Tools for Statistics I 1 DEÜ, Department of Statistics Course Objectives Computational Tools for Statistics-I course can increase the understanding of statistics and helps to learn
More informationGetting and Cleaning Data. Biostatistics
Getting and Cleaning Data Biostatistics 140.776 Getting and Cleaning Data Getting data: APIs and web scraping Cleaning data: Tidy data Transforming data: Regular expressions Getting Data Web site Nature
More informationLoading Data into R. Loading Data Sets
Loading Data into R Loading Data Sets Rather than manually entering data using c() or something else, we ll want to load data in stored in a data file. For this class, these will usually be one of three
More informationData Import and Formatting
Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data
More informationSQL Server 2017: Data Science with Python or R?
SQL Server 2017: Data Science with Python or R? Dejan Sarka Sponsor Introduction Dejan Sarka (dsarka@solidq.com, dsarka@siol.net, @DejanSarka) 30 years of experience SQL Server MVP, MCT, 16 books 20+ courses,
More informationFuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)
Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing
More informationThe Tidyverse BIOF 339 9/25/2018
The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,
More informationIntroduction to R Programming
Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data
More informationA Whistle-Stop Tour of the Tidyverse
A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationModule 1: Introduction RStudio
Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator
More informationAn Introduction to R- Programming
An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University
More informationIntroduction to R (& Rstudio) Fall R Workshop August 23-24, 2016
Introduction to R (& Rstudio) Fall R Workshop August 23-24, 2016 Why R? FREE Open source Constantly updating the functions is has Constantly adding new functions Learning R will help you learn other programming
More informationSTAT 113: R/RStudio Intro
STAT 113: R/RStudio Intro Colin Reimer Dawson Last Revised September 1, 2017 1 Starting R/RStudio There are two ways you can run the software we will be using for labs, R and RStudio. Option 1 is to log
More informationLab 1: Getting started with R and RStudio Questions? or
Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download
More informationST Lab 1 - The basics of SAS
ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationR in the City. Richard Saldanha Oxquant Consulting LondonR Group Meeting 3rd November 2009
R in the City Richard Saldanha Oxquant Consulting richard@oxquant.com LondonR Group Meeting 3rd November 2009 S Language Development 1965 Bell Labs pre-s work on a statistical computing language 1977 Bell
More informationData-informed collection decisions using R or, learning R using collection data
Data-informed collection decisions using R or, learning R using collection data Heidi Tebbe Collections & Research Librarian for Engineering and Data Science NCSU Libraries Collections & Research Librarian
More informationRecap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.
BIMM 143 Data analysis with R Lecture 4 Barry Grant http://thegrantlab.org/bimm143 Recap From Last Time: Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing
More informationIntroduction to R. Introduction to Econometrics W
Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,
More informationMaking sense of census microdata
Making sense of census microdata Tutorial 3: Creating aggregated variables and visualisations First, open a new script in R studio and save it in your working directory, so you will be able to access this
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationSTAT 540 Computing in Statistics
STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External
More informationGetting Started. Slides R-Intro: R-Analytics: R-HPC:
Getting Started Download and install R + Rstudio http://www.r-project.org/ https://www.rstudio.com/products/rstudio/download2/ TACC ssh username@wrangler.tacc.utexas.edu % module load Rstats %R Slides
More informationIntroduction to R. Andy Grogan-Kaylor October 22, Contents
Introduction to R Andy Grogan-Kaylor October 22, 2018 Contents 1 Background 2 2 Introduction 2 3 Base R and Libraries 3 4 Working Directory 3 5 Writing R Code or Script 4 6 Graphical User Interface 4 7
More informationFuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)
Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing
More informationR Workshop Daniel Fuller
R Workshop Daniel Fuller Welcome to the R Workshop @ Memorial HKR The R project for statistical computing is a free open source statistical programming language and project. Follow these steps to get started:
More informationData Input/Output. Andrew Jaffe. January 4, 2016
Data Input/Output Andrew Jaffe January 4, 2016 Before we get Started: Working Directories R looks for files on your computer relative to the working directory It s always safer to set the working directory
More informationsocial data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40
social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 welcome Course Description The objective of this course is to learn how to
More informationGoals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows
Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political
More informationIntroduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016
Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also
More informationSession 1 Nick Hathaway;
Session 1 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents R Basics 1 Variables/objects.............................................. 1 Functions..................................................
More informationLogical operators: R provides an extensive list of logical operators. These include
meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few
More informationReading data into R. 1. Data in human readable form, which can be inspected with a text editor.
Reading data into R There is a famous, but apocryphal, story about Mrs Beeton, the 19th century cook and writer, which says that she began her recipe for rabbit stew with the instruction First catch your
More informationCreating a data file and entering data
4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that
More informationOverview of R. Biostatistics
Overview of R Biostatistics 140.776 Stroustrup s Law There are only two kinds of languages: the ones people complain about and the ones nobody uses. R is a dialect of S What is R? What is S? S is a language
More informationEPIB Four Lecture Overview of R
EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided
More informationIntroduction to R. 1 Introduction 2. 2 What You Need 2
Introduction to R Dave Armstrong University of Wisconsin-Milwaukee Department of Political Science e: armstrod@uwm.edu w: http://www.quantoid.net/teachuw/uwmpsych Contents 1 Introduction 2 2 What You Need
More informationRNA-Seq. Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University
RNA-Seq Joshua Ainsley, PhD Postdoctoral Researcher Lab of Leon Reijmers Neuroscience Department Tufts University joshua.ainsley@tufts.edu Day four Quantifying expression Intro to R Differential expression
More informationAn Introduction to R. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata October 17, 2012
An Introduction to R Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata October 17, 2012 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly
More informationSession 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA
Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA Predictive Modeling: Getting
More informationAnalyzing Economic Data using R
Analyzing Economic Data using R Introduction & Organization Sebastiano Manzan BUS 4093H Fall 2016 1 / 30 What is this course about? The goal of the course is to introduce you to the analysis of economic
More informationSoftware Development. Integrated Software Environment
Software Development Integrated Software Environment Source Code vs. Machine Code What is source code? Source code and object code refer to the "before" and "after" versions of a computer program that
More informationData Manipulation. Module 5
Data Manipulation http://datascience.tntlab.org Module 5 Today s Agenda A couple of base-r notes Advanced data typing Relabeling text In depth with dplyr (part of tidyverse) tbl class dplyr grammar Grouping
More informationIndividual Covariates
WILD 502 Lab 2 Ŝ from Known-fate Data with Individual Covariates Today s lab presents material that will allow you to handle additional complexity in analysis of survival data. The lab deals with estimation
More informationAn Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012
An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.
More informationEntering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015
Entering and Outputting Data 2 nd best TA ever: Steele H. Valenzuela February 2-6, 2015 Contents Things to Know Before You Begin.................................... 1 Entering and Outputting Data......................................
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationPython for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT
Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.
More informationSolving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software
Solving the Unsolvable Through Scientific Computing: Explorations in the Best Uses of Popular Mathematics Software Talitha Washington, Howard University Edray Goins, Purdue University Luis Melara, Shippensburg
More informationR: A Gentle Introduction. Vega Bharadwaj George Mason University Data Services
R: A Gentle Introduction Vega Bharadwaj George Mason University Data Services Part I: Why R? What do YOU know about R and why do you want to learn it? Reasons to use R Free and open-source User-created
More informationIntroduction to R: Part I
Introduction to R: Part I Jeffrey C. Miecznikowski March 26, 2015 R impact R is the 13th most popular language by IEEE Spectrum (2014) Google uses R for ROI calculations Ford uses R to improve vehicle
More informationAn introduction to ggplot: An implementation of the grammar of graphics in R
An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics
More informationIntroduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus
Introduction (SPSS) SPSS is the acronym of Statistical Package for the Social Sciences. SPSS is one of the most popular statistical packages which can perform highly complex data manipulation and analysis
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationMails : ; Document version: 14/09/12
Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary
More informationSTAT 213: R/RStudio Intro
STAT 213: R/RStudio Intro Colin Reimer Dawson Last Revised February 10, 2016 1 Starting R/RStudio Skip to the section below that is relevant to your choice of implementation. Installing R and RStudio Locally
More informationLab 1. Introduction to R & SAS. R is free, open-source software. Get it here:
Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console
More informationHistory, installation and connection
History, installation and connection The men behind our software Jim Goodnight, CEO SAS Inc Ross Ihaka Robert Gentleman (Duncan Temple Lang) originators of R 2 / 75 History SAS From late 1960s, North Carolina
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationGetting Started with R
Getting Started with R STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Tool Some of you may have used
More informationStatistics Statistical Computing Software
Statistics 135 - Statistical Computing Software Mark E. Irwin Department of Statistics Harvard University Autumn Term Monday, September 19, 2005 - January 2006 Copyright c 2005 by Mark E. Irwin Personnel
More informationComputer lab 2 Course: Introduction to R for Biologists
Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient
More informationMBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R
MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic
More informationIntroduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center
Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable
More informationOutline for Today. Introduction to An Introduction to Computational Data Analysis for Biology. What is this Course About?
Outline for Today Introduction to An Introduction to Computational Data Analysis for Biology http://jarrettbyrnes.info/biol697 Jarrett Byrnes UMass Boston 1. Why this course? 2. Who are we? 3. How will
More information(c) What is the result of running the following program? x = 3 f = function (y){y+x} g = function (y){x =10; f(y)} g (7) Solution: The result is 10.
Statistics 506 Exam 2 December 17, 2015 1. (a) Suppose that li is a list containing K arrays, each of which consists of distinct integers that lie between 1 and n. That is, for each k = 1,..., K, li[[k]]
More informationLecture 1: Getting Started and Data Basics
Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate
More informationOn R for Statistics. Subhajit Dutta Stat-Math Unit. Indian Statistical Institute, Kolkata September 16, 2011
On R for Statistics Subhajit Dutta Stat-Math Unit Indian Statistical Institute, Kolkata September 16, 2011 Why R? It is FREE!! Basic as well as specialized data analysis technique at your fingertips. Highly
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction In this exercise, we will learn how to reorganize and reformat a data
More informationIntroducion to R and parallel libraries. Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015
Introducion to R and parallel libraries Giorgio Pedrazzi, CINECA Matteo Sartori, CINECA School of Data Analytics and Visualisation Milan, 09/06/2015 Overview What is R R Console Input and Evaluation Data
More informationThe History and Use of R. Joseph Kambourakis
The History and Use of R Joseph Kambourakis Ground Rules Interrupt me These are all my opinions and not of EMC or Big Data Analytics, Discovery & Visualization Meetup Slides will be available Joseph
More informationIntroduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics
Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)
More informationFraud Detection Using Random Forest Algorithm
Fraud Detection Using Random Forest Algorithm Eesha Goel Computer Science Engineering and Technology, GZSCCET, Bhatinda, India eesha1992@rediffmail.com Abhilasha Computer Science Engineering and Technology,
More informationRegression III: Advanced Methods
Lecture 2: Software Introduction Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University jacoby@msu.edu Getting Started with R What is R? A tiny R session
More informationIntroducing R/Tidyverse to Clinical Statistical Programming
Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic
More informationIntroduction to Statistics using R/Rstudio
Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs
More informationLastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.
Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means
More informationThe "R" Statistics library: Research Applications
Edith Cowan University Research Online ECU Research Week Conferences, Symposia and Campus Events 2012 The "R" Statistics library: Research Applications David Allen Edith Cowan University Abhay Singh Edith
More informationIntroduction to Functions. Biostatistics
Introduction to Functions Biostatistics 140.776 Functions The development of a functions in R represents the next level of R programming, beyond writing code at the console or in a script. 1. Code 2. Functions
More informationR Short Course Session 1
R Short Course Session 1 Daniel Zhao, PhD Sixia Chen, PhD Department of Biostatistics and Epidemiology College of Public Health, OUHSC 10/23/2015 Outline Overview of the 5 sessions Pre-requisite requirements
More informationR and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017
R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices
More informationStatistics for Biologists: Practicals
Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of
More informationLAB #1: DESCRIPTIVE STATISTICS WITH R
NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab
More informationR Basics / Course Business
R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats
More informationR basics workshop Sohee Kang
R basics workshop Sohee Kang Math and Stats Learning Centre Department of Computer and Mathematical Sciences Objective To teach the basic knowledge necessary to use R independently, thus helping participants
More informationData Input/Output. Introduction to R for Public Health Researchers
Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R "can't find" RStudio can help, and so
More informationIntroduction to R. Daniel Berglund. 9 November 2017
Introduction to R Daniel Berglund 9 November 2017 1 / 15 R R is available at the KTH computers If you want to install it yourself it is available at https://cran.r-project.org/ Rstudio an IDE for R is
More informationPrediction Using Regression Analysis
Prediction Using Regression Analysis Shantanu Sarkar 1, Anuj Vaijapurkar 2, VimalKumar Bhardwaj 3,Swarnalatha P 4 1,2,3 School of Computer Science, VIT University, Vellore 4 Assistant Professor, School
More informationIntroduction to Minitab 1
Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,
More informationUNIT 4. Research Methods in Business
UNIT 4 Preparing Data for Analysis:- After data are obtained through questionnaires, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any have to be
More information2015 Vanderbilt University
Excel Supplement 2015 Vanderbilt University Introduction This guide describes how to perform some basic data manipulation tasks in Microsoft Excel. Excel is spreadsheet software that is used to store information
More informationPackage quickreg. R topics documented:
Package quickreg September 28, 2017 Title Build Regression Models Quickly and Display the Results Using 'ggplot2' Version 1.5.0 A set of functions to extract results from regression models and plot the
More informationGeneralized Linear Models
Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Software and Data www.research-training.net/manchester2018 Graeme.Hutcheson@manchester.ac.uk University of
More informationIntroduction to Scripting Languages. October 2017
Introduction to Scripting Languages damien.francois@uclouvain.be October 2017 1 Goal of this session: Advocate the use of scripting languages and help you choose the most suitable for your needs 2 Agenda
More informationIntro to R. Fall Fall 2017 CS130 - Intro to R 1
Intro to R Fall 2017 Fall 2017 CS130 - Intro to R 1 Intro to R R is a language and environment that allows: Data management Graphs and tables Statistical analyses You will need: some basic statistics We
More informationIntermediate Stata. Jeremy Craig Green. 1 March /29/2011 1
Intermediate Stata Jeremy Craig Green 1 March 2011 3/29/2011 1 Advantages of Stata Ubiquitous in economics and political science Gaining popularity in health sciences Large library of add-on modules Version
More information