Introduction to R. Adrienn Szabó. DMS Group, MTA SZTAKI. Aug 30, /62

Size: px
Start display at page:

Download "Introduction to R. Adrienn Szabó. DMS Group, MTA SZTAKI. Aug 30, /62"

Transcription

1 Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, /62

2 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control Structures 3 ExtRa stuff R packages Unit testing in R 2/62

3 What is R? R is a dialect of the S language.... but seriously... R is a... ˆ Programming language (free, open source) ˆ Computing environment (like Matlab) ˆ Community (quite an active one) ˆ Ecosystem (rapid conversion from data-science knowledge to productivity) (S was developed at Bell Labs in the 1970s as an internal statistical analysis environment) 3/62

4 What is R? Some say that it s Not Really A Programming Language but rather R is an interactive environment for doing statistics * python-displacing-r-as-the-programming-language-for-data-science 4/62

5 What for? ˆ statistical computing (and statistical tests) ˆ dataset exploration ˆ analysis (time series analysis, classification, clustering, etc.) ˆ linear and nonlinear modelling ˆ visualization ˆ recently the favourite tool of data scientists 5/62

6 What for? 6/62

7 R is not only for programmers! A couple of titles from the latest user! conference (more than 150 talks): ˆ Teaching R to high school students (and teachers) ˆ Visualizing Lack of Fit in Complex Regression Mode ˆ A real time, responsive Quantitative trading analysis Mobile App using R ˆ eegr: an R package to analyze electrophysiological (EEG) signals (MTA TTK) 7/62

8 More titles BD Because everyone has to deal with Big Data nowadays... ˆ PivotalR: A Package for Machine Learning on Big Data ˆ Massive Predictive Modeling (Oracle) ˆ Domino: A Platform-as-a-Service for Industrialized Data Analysis ˆ Plyrmr: a data manipulation DSL for big data 8/62

9 More titles ML Machine Learning is cool... ˆ 10 R packages to win Kaggle competitions ˆ The Arborist: a Scalable Decision Tree Implementation ˆ Representing Model Ensembles in PMML ˆ Distributed Matrix Exponentiation in R 9/62

10 More titles Networks & Twitter & maps Who doesn t want to study Twitter or social data? ˆ Simulating Influenza Transmission with Real Network Data ˆ Spatial Tweetstistics with R: Geographical Distribution of English Loan Words in Spanish Tweets ˆ Opportunities through the use of Open-Street-Map data in social sciences 10/62

11 More titles RR These folks seem to care about Reproducible Research as well... ˆ R and Reproducibility: a Proposal ˆ rctrack: An R Package that Automatically Collects and Archives Details for Reproducible Computing ˆ Fostering the next generation of open science with R ˆ Teaching data analysis in R through the lens of reproducibility (poster) 11/62

12 Why? 12/62

13 Features of R ˆ Free software ˆ Runs on almost any standard computing platform/os ˆ Active development, about yearly releases + bugfixes ˆ Sophisticated graphics capabilities ˆ Useful for interactive work, but contains a powerful programming language for developing new tools (user -> programmer) 13/62

14 R vs. Python ˆ Python is more general-purpose, easier to write programs in it 14/62

15 R vs. Python ˆ R has more "stats + data analytics" libraries ready-to-use 15/62

16 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control Structures 3 ExtRa stuff R packages Unit testing in R 16/62

17 Getting help Code: built-in man pages >?library library package:base R Documentation Loading and Listing of Packages Description: Usage: library and require load add-on packages. library(package, help, pos = 2, lib.loc =... 17/62

18 Data structures ˆ ˆ ˆ ˆ ˆ Vector Matrix Array List Data frame 18/62

19 Numbers, assignment To assign a single number: the <- operator: Note: the = operator works the same way in almost all cases, but its usage is not advised. Code: numbers > a <- 4 > b <- 5 > a + b [1] 9 > a - b [1] -1 > a ^ b [1] /62

20 Vector Vectors (similar to Lists in Java) can be crated with the c() function (short name for concatenate ). Vectors can hold any kinds of things, but the items of one vector have to be of the same type. Code: vector examples > v1 <- c(1, 2, 3, 4, 5, 6) > v2 <- c(0.8, 0.1) > v1 + v2 [1] Magic! 20/62

21 Vector You can concatenate items and vectors as you please. Code: vector examples 2 > v1 <- c(1, 2, 3) > v2 <- c(0.8, 0.1) > c(22, v1, -3.9, v2) [1] > c(v1, "Sponge", "Bob") [1] "1" "2" "3" "Sponge" "Bob" Warning: the result vector s elements can be turned into the more general type! 21/62

22 Vector You can select ranges of vectors to get a shorter one. Indexing begins with 1! Negative indices: leave it out! Code: vector subsetting > v1 <- c(1, 2, 3, 4, 5, 6) > v1[2] [1] 2 > v1[2:4] [1] > v1[c(-2,-5)] [1] /62

23 (Nice random image 1) 23/62

24 Matrix A matrix is a vector represented and accessible in two dimensions. It has a fixed type of elements and fixed number of rows and columns. Code: matrix examples > matrix(1:6, byrow=true, nrow=2) [,1] [,2] [,3] [1,] [2,] > matrix(c(1,2,13,9,8,17,3,4,5), ncol = 3) [,1] [,2] [,3] [1,] [2,] [3,] /62

25 Matrix You can give names to columns and/or rows. Code: matrix naming > matrix(c(1:9),nrow=3,byrow=true, + dimnames=list(c("r1","r2","r3"),c("a","b","c"))) a b c r r r /62

26 Matrix Subsetting works here as well... Code: matrix subsetting > m1 <- matrix(c(1:9),nrow=3,byrow=true, + dimnames=list(c("r1","r2","r3"),c("a","b","c"))) > m1[2,] a b c > m1[,-2] a c r1 1 3 r2 4 6 r /62

27 Matrix Matrix operations are quite similar to vector operations. For example, inequality will return another logical matrix of equal size. Code: matrix example > m1 > 5 a b c r1 FALSE FALSE FALSE r2 FALSE FALSE TRUE r3 TRUE TRUE TRUE > m1[m1>5] [1] /62

28 (Nice random image 2) 28/62

29 Array An array is an extension to matrix in its number of dimensions. It is a vector that is represented and accessible in a given number of dimensions. Let s arrange 20 integers from 0 to 19 in three dimensions: 2 x 5 x 2 Code: array example > a1 <- array(c(0:20), dim = c(2, 5, 2)) 29/62

30 Array Code: array example > array(c(0:20), dim = c(2, 5, 2)),, 1 [,1] [,2] [,3] [,4] [,5] [1,] [2,] ,, 2 [,1] [,2] [,3] [,4] [,5] [1,] [2,] /62

31 Array Subsetting works similarly to matrices, but we can specify the selected row/col indices for each dimension. Code: array example > a1[-1, 1:4, ] [,1] [,2] [1,] 1 11 [2,] 3 13 [3,] 5 15 [4,] 7 17 > a1[-1, 1:4, 2] [1] /62

32 (Nice random image 3) 32/62

33 List A list is a generic vector that is allowed to include different types of objects, even other lists. Code: list example > list(1, c(true,false), c("a","b","c")) [[1]] [1] 1 [[2]] [1] TRUE FALSE [[3]] [1] "a" "b" "c" 33/62

34 List We can assign names to each entry by using named arguments. Code: list example > myl <- list(x=1,y=c(true,false),z=c("a","b","c")) > myl $x [1] 1 $y [1] TRUE FALSE $z [1] "a" "b" "c" 34/62

35 List To access the members of a list by name, use dollar-sign: Code: list example > myl$x [1] 1 > myl $ z [1] "a" "b" "c" > myl$alma NULL 35/62

36 List To access the N-th member of a list, use double square brackets: Code: list example > myl[[1]] [1] 1 > myl [[ 3 ]] [1] "a" "b" "c" > myl[[4]] Error in myl[[4]] : subscript out of bounds 36/62

37 List Even names can be used inside double brackets: Code: list example > myl[["x"]] [1] 1 > elemname <- "z" > myl[[elemname]] [1] "a" "b" "c" 37/62

38 Subsetting a list Use single-square-bracket notation to extract multiple members from a list and construct a new list: Code: subsetting a list > myl["x"] $x [1] 1 > mxl[c("x","y")] $x [1] 1 $y [1] TRUE FALSE 38/62

39 Subsetting a list Code: more examples of subsetting a list > myl[1] $x [1] 1 > myl[c(true, FALSE, TRUE)] $x [1] 1 $z [1] "a" "b" "c" 39/62

40 Setting values of a list Code: setting and adding list members > myl$x <- 0.6 # overwrite element > myl$m <- 4 # add a new named element > myl$y <- NULL # delete by name > myl $x [1] 0.6 $z [1] "a" "b" "c" $m [1] 4 40/62

41 Setting values of a list Code: setting and adding list members > myl[[2]] <- NULL # delete by index ("z") > myl[[1]] <- 0.8 # overwrite element > myl[[5]] <- 5 # add a new element # what do we get? 41/62

42 Code: setting and adding list members Setting values of a list > myl $x [1] 0.8 $m [1] 4 [[3]] NULL [[4]] NULL [[5]] [1] 5 42/62

43 Other list functions Code: List functions > is.list(myl[1]) # [ ] -> sublist [1] TRUE > is.list(myl[[1]]) # [[ ]] -> element [1] FALSE > l2 <- as.list(c(a=1,b=2,c=3)) # vector to list > unlist(l2) # list to vector a b c > unlist(list(a=1, b=2, c="hello")) a b c "1" "2" "hello" 43/62

44 (Nice random image 4) 44/62

45 Factor The term factor refers to a statistical data type used to store categorical variables. Use the function factor() to get factors from a vector of objects. Code: using factors > sdata <- c("male", "Female", "Female", "X", "Male") > sfactors <- factor(sdata) > sfactors [1] Male Female Female X Male Levels: Female Male X A factor is stored internally as a numeric vector with values 1, 2, 3, k, where k is the number of levels. 45/62

46 Data frame Data frames are similar to tables in a relational database. Generalisation of a matrix and a list different columns may have different modes, but all elements of a column must have the same mode (all numeric or all factor, or all character). Typically you ll load a csv file into a data frame object. 46/62

47 Data frame A data frame has colnames(), and rownames(); the length() of a data frame is the same as ncol(); nrow() gives the number of rows. Code: data frame example > df <- data.frame(x = 1:3, y = c("a", "b", "c")) > str(df) data.frame : 3 obs. of 2 variables: $ x: int $ y: Factor w/ 3 levels "a","b","c": Note: data.frame() by default turns strings into factors. Use stringasfactors = FALSE. 47/62

48 Data frame Code: data frame example > myy <- c("a", "b", "c", "n") > df <- data.frame(x = 4:7, y = myy, + stringsasfactors = FALSE) > df x y 1 4 a 2 5 b 3 6 c 4 7 n > str(df) data.frame : 4 obs. of 2 variables: $ x: int $ y: chr "a" "b" "c" "n" 48/62

49 (Nice random image 5) 49/62

50 Control structures if, else testing a condition for execute a loop a fixed number of times while execute a loop while a condition is true repeat execute an infinite loop break break the execution of a loop next skip an interation of a loop return exit a function 50/62

51 Condition This is a valid if/else structure. Code: if and else if(x > 3) { y <- 10 } else { y <- 0 } So is this one. Code: alternative if and else y <- if(x > 3) { 10 } else { 0 } 51/62

52 Loop We can iterate on lists or vectors. Code: for loops x <- c("a", "b", "c", "d") for(letter in x) { print(letter) } # another example for(i in 1:3) print(x[i]) 52/62

53 Functions Functions are objects in their own right! Three components of a function: ˆ arguments ˆ body ˆ environment Functions can return only a single object. (But they can return a list with any objects. Call-by-value: modifying a function argument does not change the original value (some exceptions exist). 53/62

54 Functions Code: defining functions issumaboveten <- function(x, y) { if (x + y > 10) { return(true) } else { return(false) } } # Shorter: the last object will be returned issumabove10 <- function(x, y) { if (x + y > 10) TRUE else FALSE } 54/62

55 Scripts And now we can write nice little scripts to do whatever we want! How to get the examples $ git clone git@info.ilab.sztaki.hu/tutorial $ ls tutorial/r example1 example2 example3 example4 55/62

56 Twitter example 56/62

57 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control Structures 3 ExtRa stuff R packages Unit testing in R 57/62

58 Packages in R When you install R then you get a base R system with basic functionality to use R. It does include by default some basic packages (utils, stats, datasets, graphics, methods, tools, parallel, etc.) For more specific purposes you either : ˆ find an existing add-on package on CRAN that helps, or maybe on Bioconductor, ˆ or you can write your own package (just for yourself or you can easily publish it as well) 58/62

59 CRAN CRAN (The Comprehensive R Archive Network) is a network of ftp and web servers around the world that store identical & up-to-date versions of code and documentation for R. (Please use the CRAN mirror nearest to you to minimize network load.) More about packages later... 59/62

60 Unit testing in R There are some options: RUnit is the oldest one svunit has a GUI testthat is actively developed, and smarter (but isn t compatible with either of the pervious 2) 60/62

61 Sources 1 From the Coursera course "R programming" by Roger D. Peng Abstracts.pdf 6 final-10-r-xc introduction-to-r 8 Workshop2/Presentations/functions.pdf 61/62

62 Sources of images 1 supply-chain-optimization-blog/bid/349734/ Combining-Machine-Learning-and-Optimization-in-S /importing-a-google-spreadsheet-into-r/ 3 keep-calm-and-study-data-structures-1/ 4 TPgxWY_QNZI/AAAAAAAAAjc/ORVJjtoDBvg/s1600/ program_language_density_plot.png 5 castor-oil-stem-light-micrograph-of-a-high-res http: //illuminarti.weebly.com/patrick-star.html 7 networks.pdf 62/62

Data types and structures

Data types and structures An introduc+on to Data types and structures Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 3 Review GeFng started with R Crea+ng Objects Data types in R Data structures in R

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

ITS Introduction to R course

ITS Introduction to R course ITS Introduction to R course Nov. 29, 2018 Using this document Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is

More information

Data Structures STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley

Data Structures STAT 133. Gaston Sanchez. Department of Statistics, UC Berkeley Data Structures STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Data Types and Structures To make the

More information

Stat 579: Objects in R Vectors

Stat 579: Objects in R Vectors Stat 579: Objects in R Vectors Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu, 1/23 Logical Vectors I R allows manipulation of logical

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center Introduction to R Nishant Gopalakrishnan, Martin Morgan Fred Hutchinson Cancer Research Center 19-21 January, 2011 Getting Started Atomic Data structures Creating vectors Subsetting vectors Factors Matrices

More information

Functions and data structures. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen

Functions and data structures. Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen Functions and data structures Programming in R for Data Science Anders Stockmarr, Kasper Kristensen, Anders Nielsen Objects of the game In R we have objects which are functions and objects which are data.

More information

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming Intro to Programming Unit 7 Intro to Programming 1 What is Programming? 1. Programming Languages 2. Markup vs. Programming 1. Introduction 2. Print Statement 3. Strings 4. Types and Values 5. Math Externals

More information

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT NAVAL POSTGRADUATE SCHOOL LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT Statistics (OA3102) Lab #2: Sampling, Sampling Distributions, and the Central Limit Theorem Goal: Use R to demonstrate sampling

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Basic R Part 1 BTI Plant Bioinformatics Course

Basic R Part 1 BTI Plant Bioinformatics Course Basic R Part 1 BTI Plant Bioinformatics Course Spring 2013 Sol Genomics Network Boyce Thompson Institute for Plant Research by Jeremy D. Edwards What is R? Statistical programming language Derived from

More information

Part III Appendices 165

Part III Appendices 165 Part III Appendices 165 Appendix A Technical Instructions Learning Outcomes This material will help you learn how to use the software you need to do your work in this course. You won t be tested on it.

More information

SML 201 Week 2 John D. Storey Spring 2016

SML 201 Week 2 John D. Storey Spring 2016 SML 201 Week 2 John D. Storey Spring 2016 Contents Getting Started in R 3 Summary from Week 1.......................... 3 Missing Values.............................. 3 NULL....................................

More information

Statements 2. a operator= b a = a operator b

Statements 2. a operator= b a = a operator b Statements 2 Outline Note: i=i+1 is a valid statement. Don t confuse it with an equation i==i+1 which is always false for normal numbers. The statement i=i+1 is a very common idiom: it just increments

More information

Introduction to R. Educational Materials 2007 S. Falcon, R. Ihaka, and R. Gentleman

Introduction to R. Educational Materials 2007 S. Falcon, R. Ihaka, and R. Gentleman Introduction to R Educational Materials 2007 S. Falcon, R. Ihaka, and R. Gentleman 1 Data Structures ˆ R has a rich set of self-describing data structures. > class(z) [1] "character" > class(x) [1] "data.frame"

More information

Converting categorical data into numbers with Pandas and Scikit-learn -...

Converting categorical data into numbers with Pandas and Scikit-learn -... 1 of 6 11/17/2016 11:02 AM FastML Machine learning made easy RSS Home Contents Popular Links Backgrounds About Converting categorical data into numbers with Pandas and Scikit-learn 2014-04-30 Many machine

More information

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

Basic matrix math in R

Basic matrix math in R 1 Basic matrix math in R This chapter reviews the basic matrix math operations that you will need to understand the course material and how to do these operations in R. 1.1 Creating matrices in R Create

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018 Object-oriented programming 1 and data-structures CS/ENGRD 2110 SUMMER 2018 Lecture 1: Types and Control Flow http://courses.cs.cornell.edu/cs2110/2018su Lecture 1 Outline 2 Languages Overview Imperative

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40

social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 social data science Introduction to R Sebastian Barfort August 07, 2016 University of Copenhagen Department of Economics 1/40 welcome Course Description The objective of this course is to learn how to

More information

An Introduction to R 1.3 Some important practical matters when working with R

An Introduction to R 1.3 Some important practical matters when working with R An Introduction to R 1.3 Some important practical matters when working with R Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop,

More information

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define CS 6A Scheme Summer 207 Discussion 0: July 25, 207 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

Data Science Course Content

Data Science Course Content CHAPTER 1: INTRODUCTION TO DATA SCIENCE Data Science Course Content What is the need for Data Scientists Data Science Foundation Business Intelligence Data Analysis Data Mining Machine Learning Difference

More information

seq(), seq_len(), min(), max(), length(), range(), any(), all() Comparison operators: <, <=, >, >=, ==,!= Logical operators: &&,,!

seq(), seq_len(), min(), max(), length(), range(), any(), all() Comparison operators: <, <=, >, >=, ==,!= Logical operators: &&,,! LECTURE 3: DATA STRUCTURES IN R (contd) STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University SOME USEFUL R FUNCTIONS seq(), seq_len(), min(), max(), length(),

More information

Ch.1 Introduction. Why Machine Learning (ML)?

Ch.1 Introduction. Why Machine Learning (ML)? Syllabus, prerequisites Ch.1 Introduction Notation: Means pencil-and-paper QUIZ Means coding QUIZ Why Machine Learning (ML)? Two problems with conventional if - else decision systems: brittleness: The

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

SQLite vs. MongoDB for Big Data

SQLite vs. MongoDB for Big Data SQLite vs. MongoDB for Big Data In my latest tutorial I walked readers through a Python script designed to download tweets by a set of Twitter users and insert them into an SQLite database. In this post

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

The Warhol Language Reference Manual

The Warhol Language Reference Manual The Warhol Language Reference Manual Martina Atabong maa2247 Charvinia Neblett cdn2118 Samuel Nnodim son2105 Catherine Wes ciw2109 Sarina Xie sx2166 Introduction Warhol is a functional and imperative programming

More information

Introduction to R and R-Studio Getting Data Into R. 1. Enter Data Directly into R...

Introduction to R and R-Studio Getting Data Into R. 1. Enter Data Directly into R... Introduction to R and R-Studio 2017-18 02. Getting Data Into R 1. Enter Data Directly into R...... 2. Import Excel Data (.xlsx ) into R..... 3. Import Stata Data (.dta ) into R...... a) From a folder on

More information

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman Programming with R Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman 1 Data Structures ˆ R has a rich set of self-describing data structures. > class(z) [1] "character" > class(x) [1] "data.frame"

More information

Control Flow Structures

Control Flow Structures Control Flow Structures STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Expressions 2 Expressions R code

More information

Lab 4: Bash Scripting

Lab 4: Bash Scripting Lab 4: Bash Scripting February 20, 2018 Introduction This lab will give you some experience writing bash scripts. You will need to sign in to https://git-classes. mst.edu and git clone the repository for

More information

An Introduction to the R Commander

An Introduction to the R Commander An Introduction to the R Commander BIO/MAT 460, Spring 2011 Christopher J. Mecklin Department of Mathematics & Statistics Biomathematics Research Group Murray State University Murray, KY 42071 christopher.mecklin@murraystate.edu

More information

STAT R Overview. R Intro. R Data Structures. Subsetting. Graphics. January 11, 2018

STAT R Overview. R Intro. R Data Structures. Subsetting. Graphics. January 11, 2018 January 11, 2018 Why use R? R is: a free public domain implementation of S, the standard among (academic) professional statisticians, available for Windows, Mac, and Linux, an object-oriented and functional

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Section 13: data.table

Section 13: data.table Section 13: data.table Ed Rubin Contents 1 Admin 1 1.1 Announcements.......................................... 1 1.2 Last section............................................ 1 1.3 This week.............................................

More information

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman

Programming with R. Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman Programming with R Educational Materials 2006 S. Falcon, R. Ihaka, and R. Gentleman 1 Data Structures ˆ R has a rich set of self-describing data structures. > class(z) [1] "character" > class(x) [1] "data.frame"

More information

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017

R and parallel libraries. Introduction to R for data analytics Bologna, 26/06/2017 R and parallel libraries Introduction to R for data analytics Bologna, 26/06/2017 Outline Overview What is R R Console Input and Evaluation Data types R Objects and Attributes Vectors and Lists Matrices

More information

Introduction to R. Stat Statistical Computing - Summer Dr. Junvie Pailden. July 5, Southern Illinois University Edwardsville

Introduction to R. Stat Statistical Computing - Summer Dr. Junvie Pailden. July 5, Southern Illinois University Edwardsville Introduction to R Stat 575 - Statistical Computing - Summer 2016 Dr. Junvie Pailden Southern Illinois University Edwardsville July 5, 2016 Why R R offers a powerful and appealing interactive environment

More information

Flow Control: Branches and loops

Flow Control: Branches and loops Flow Control: Branches and loops In this context flow control refers to controlling the flow of the execution of your program that is, which instructions will get carried out and in what order. In the

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 2B Transforming Data with Scripts By Graeme Malcolm and Stephen Elston Overview In this lab, you will learn how to use Python or R to manipulate and analyze

More information

Play with Python: An intro to Data Science

Play with Python: An intro to Data Science Play with Python: An intro to Data Science Ignacio Larrú Instituto de Empresa Who am I? Passionate about Technology From Iphone apps to algorithmic programming I love innovative technology Former Entrepreneur:

More information

Intro to R. Some history. Some history

Intro to R. Some history. Some history Intro to R Héctor Corrada Bravo CMSC858B Spring 2012 University of Maryland Computer Science http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2&pagewanted=1 http://www.forbes.com/forbes/2010/0524/opinions-software-norman-nie-spss-ideas-opinions.html

More information

Lecture Programming in C++ PART 1. By Assistant Professor Dr. Ali Kattan

Lecture Programming in C++ PART 1. By Assistant Professor Dr. Ali Kattan Lecture 08-1 Programming in C++ PART 1 By Assistant Professor Dr. Ali Kattan 1 The Conditional Operator The conditional operator is similar to the if..else statement but has a shorter format. This is useful

More information

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims Lecture 10: Asymptotic Complexity and What Makes a Good Algorithm? Suppose you have two possible algorithms or

More information

Module 1: Introduction RStudio

Module 1: Introduction RStudio Module 1: Introduction RStudio Contents Page(s) Installing R and RStudio Software for Social Network Analysis 1-2 Introduction to R Language/ Syntax 3 Welcome to RStudio 4-14 A. The 4 Panes 5 B. Calculator

More information

Computer Vision. Matlab

Computer Vision. Matlab Computer Vision Matlab A good choice for vision program development because Easy to do very rapid prototyping Quick to learn, and good documentation A good library of image processing functions Excellent

More information

Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1)

Basic R QMMA. Emanuele Taufer. 2/19/2018 Basic R (1) Basic R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-3_basic_r.html#(1) 1/21 Preliminary R is case sensitive: a is not the same as A.

More information

Package filematrix. R topics documented: February 27, Type Package

Package filematrix. R topics documented: February 27, Type Package Type Package Package filematrix February 27, 2018 Title File-Backed Matrix Class with Convenient Read and Write Access Version 1.3 Date 2018-02-26 Description Interface for working with large matrices

More information

Lecture 3: Basics of R Programming

Lecture 3: Basics of R Programming Lecture 3: Basics of R Programming This lecture introduces you to how to do more things with R beyond simple commands. Outline: 1. R as a programming language 2. Grouping, loops and conditional execution

More information

A Brief Introduction to R

A Brief Introduction to R A Brief Introduction to R Babak Shahbaba Department of Statistics, University of California, Irvine, USA Chapter 1 Introduction to R 1.1 Installing R To install R, follow these steps: 1. Go to http://www.r-project.org/.

More information

Introduction to Matlab. By: Hossein Hamooni Fall 2014

Introduction to Matlab. By: Hossein Hamooni Fall 2014 Introduction to Matlab By: Hossein Hamooni Fall 2014 Why Matlab? Data analytics task Large data processing Multi-platform, Multi Format data importing Graphing Modeling Lots of built-in functions for rapid

More information

Mobile Computing Professor Pushpendra Singh Indraprastha Institute of Information Technology Delhi Java Basics Lecture 02

Mobile Computing Professor Pushpendra Singh Indraprastha Institute of Information Technology Delhi Java Basics Lecture 02 Mobile Computing Professor Pushpendra Singh Indraprastha Institute of Information Technology Delhi Java Basics Lecture 02 Hello, in this lecture we will learn about some fundamentals concepts of java.

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

Introduction to the R Language

Introduction to the R Language Introduction to the R Language Data Types and Basic Operations Starting Up Windows: Double-click on R Mac OS X: Click on R Unix: Type R Objects R has five basic or atomic classes of objects: character

More information

R in Linguistic Analysis. Week 2 Wassink Autumn 2012

R in Linguistic Analysis. Week 2 Wassink Autumn 2012 R in Linguistic Analysis Week 2 Wassink Autumn 2012 Today R fundamentals The anatomy of an R help file but first... How did you go about learning the R functions in the reading? More help learning functions

More information

1 Pencil and Paper stuff

1 Pencil and Paper stuff Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman

More information

the R environment The R language is an integrated suite of software facilities for:

the R environment The R language is an integrated suite of software facilities for: the R environment The R language is an integrated suite of software facilities for: Data Handling and storage Matrix Math: Manipulating matrices, vectors, and arrays Statistics: A large, integrated set

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture 08 Tutorial 2, Part 2, Facebook API (Refer Slide Time: 00:12)

More information

Reading and wri+ng data

Reading and wri+ng data An introduc+on to Reading and wri+ng data Noémie Becker & Benedikt Holtmann Winter Semester 16/17 Course outline Day 4 Course outline Review Data types and structures Reading data How should data look

More information

R:If, else and loops

R:If, else and loops R:If, else and loops Presenter: Georgiana Onicescu January 19, 2012 Presenter: Georgiana Onicescu R:ifelse,where,looping 1/ 17 Contents Vectors Matrices If else statements For loops Leaving the loop: stop,

More information

Programming for Engineers Arrays

Programming for Engineers Arrays Programming for Engineers Arrays ICEN 200 Spring 2018 Prof. Dola Saha 1 Array Ø Arrays are data structures consisting of related data items of the same type. Ø A group of contiguous memory locations that

More information

Week 4. Big Data Analytics - data.frame manipulation with dplyr

Week 4. Big Data Analytics - data.frame manipulation with dplyr Week 4. Big Data Analytics - data.frame manipulation with dplyr Hyeonsu B. Kang hyk149@eng.ucsd.edu April 2016 1 Dplyr In the last lecture we have seen how to index an individual cell in a data frame,

More information

Scalable Machine Learning in R. with H2O

Scalable Machine Learning in R. with H2O Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with

More information

Programming for Experimental Research. Flow Control

Programming for Experimental Research. Flow Control Programming for Experimental Research Flow Control FLOW CONTROL In a simple program, the commands are executed one after the other in the order they are typed. Many situations require more sophisticated

More information

Automation.

Automation. Automation www.austech.edu.au WHAT IS AUTOMATION? Automation testing is a technique uses an application to implement entire life cycle of the software in less time and provides efficiency and effectiveness

More information

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017 SCHEME 8 COMPUTER SCIENCE 61A March 2, 2017 1 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

CS 31: Intro to Systems C Programming. Kevin Webb Swarthmore College September 13, 2018

CS 31: Intro to Systems C Programming. Kevin Webb Swarthmore College September 13, 2018 CS 31: Intro to Systems C Programming Kevin Webb Swarthmore College September 13, 2018 Reading Quiz Agenda Basics of C programming Comments, variables, print statements, loops, conditionals, etc. NOT the

More information

A brief introduction to coding in Python with Anatella

A brief introduction to coding in Python with Anatella A brief introduction to coding in Python with Anatella Before using the Python engine within Anatella, you must first: 1. Install & download a Python engine that support the Pandas Data Frame library.

More information

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration

STATS 507 Data Analysis in Python. Lecture 2: Functions, Conditionals, Recursion and Iteration STATS 507 Data Analysis in Python Lecture 2: Functions, Conditionals, Recursion and Iteration Functions in Python We ve already seen examples of functions: e.g., type()and print() Function calls take the

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015 SCHEME 7 COMPUTER SCIENCE 61A October 29, 2015 1 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Command Line and Python Introduction Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016 Today Assignment #1! Computer architecture Basic command line skills Python fundamentals

More information

NLP Final Project Fall 2015, Due Friday, December 18

NLP Final Project Fall 2015, Due Friday, December 18 NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,

More information

CS 31: Intro to Systems Arrays, Structs, Strings, and Pointers. Kevin Webb Swarthmore College March 1, 2016

CS 31: Intro to Systems Arrays, Structs, Strings, and Pointers. Kevin Webb Swarthmore College March 1, 2016 CS 31: Intro to Systems Arrays, Structs, Strings, and Pointers Kevin Webb Swarthmore College March 1, 2016 Overview Accessing things via an offset Arrays, Structs, Unions How complex structures are stored

More information

COMP 2718: Shell Scripts: Part 1. By: Dr. Andrew Vardy

COMP 2718: Shell Scripts: Part 1. By: Dr. Andrew Vardy COMP 2718: Shell Scripts: Part 1 By: Dr. Andrew Vardy Outline Shell Scripts: Part 1 Hello World Shebang! Example Project Introducing Variables Variable Names Variable Facts Arguments Exit Status Branching:

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9

Machine Learning for Large-Scale Data Analysis and Decision Making A. Distributed Machine Learning Week #9 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Distributed Machine Learning Week #9 Today Distributed computing for machine learning Background MapReduce/Hadoop & Spark Theory

More information

MATH36032 Problem Solving by Computer. More Data Structure

MATH36032 Problem Solving by Computer. More Data Structure MATH36032 Problem Solving by Computer More Data Structure Data from real life/applications How do the data look like? In what format? Data from real life/applications How do the data look like? In what

More information

R basics workshop Sohee Kang

R basics workshop Sohee Kang R basics workshop Sohee Kang Math and Stats Learning Centre Department of Computer and Mathematical Sciences Objective To teach the basic knowledge necessary to use R independently, thus helping participants

More information

ARTIFICIAL INTELLIGENCE AND PYTHON

ARTIFICIAL INTELLIGENCE AND PYTHON ARTIFICIAL INTELLIGENCE AND PYTHON DAY 1 STANLEY LIANG, LASSONDE SCHOOL OF ENGINEERING, YORK UNIVERSITY WHAT IS PYTHON An interpreted high-level programming language for general-purpose programming. Python

More information

Introduction to Functional Programming

Introduction to Functional Programming A Level Computer Science Introduction to Functional Programming William Marsh School of Electronic Engineering and Computer Science Queen Mary University of London Aims and Claims Flavour of Functional

More information

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.

Ch.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it. Ch.1 Introduction Syllabus, prerequisites Notation: Means pencil-and-paper QUIZ Means coding QUIZ Code respository for our text: https://github.com/amueller/introduction_to_ml_with_python Why Machine Learning

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

Other Loop Options EXAMPLE

Other Loop Options EXAMPLE C++ 14 By EXAMPLE Other Loop Options Now that you have mastered the looping constructs, you should learn some loop-related statements. This chapter teaches the concepts of timing loops, which enable you

More information

EE 301 Signals & Systems I MATLAB Tutorial with Questions

EE 301 Signals & Systems I MATLAB Tutorial with Questions EE 301 Signals & Systems I MATLAB Tutorial with Questions Under the content of the course EE-301, this semester, some MATLAB questions will be assigned in addition to the usual theoretical questions. This

More information

1 Lecture 5: Advanced Data Structures

1 Lecture 5: Advanced Data Structures L5 June 14, 2017 1 Lecture 5: Advanced Data Structures CSCI 1360E: Foundations for Informatics and Analytics 1.1 Overview and Objectives We ve covered list, tuples, sets, and dictionaries. These are the

More information

Programming Languages

Programming Languages Programming Languages Tevfik Koşar Lecture - XVIII March 23 rd, 2006 1 Roadmap Arrays Pointers Lists Files and I/O 2 1 Arrays Two layout strategies for arrays Contiguous elements Row pointers Row pointers

More information

CSC-140 Assignment 5

CSC-140 Assignment 5 CSC-140 Assignment 5 Please do not Google a solution to these problems, cause that won t teach you anything about programming - the only way to get good at it, and understand it, is to do it! 1 Introduction

More information