R Basics / Course Business

Similar documents
Using Microsoft Excel

Basics: How to Calculate Standard Deviation in Excel

An Introduction to R 1.1 Getting started

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

Formulas and Functions

Workbook Also called a spreadsheet, the Workbook is a unique file created by Excel. Title bar

Create your first workbook

Chapter 3: The IF Function and Table Lookup

rrr Cu 'Vt.-1 cal l-c.-<_ {CL/o,_

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

command.name(measurement, grouping, argument1=true, argument2=3, argument3= word, argument4=c( A, B, C ))

Excel Shortcuts Increasing YOUR Productivity

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2010

Text Input and Conditionals

Week 1: Introduction to R, part 1

Intermediate Excel 2016

DOWNLOAD PDF MICROSOFT EXCEL ALL FORMULAS LIST WITH EXAMPLES

Blackboard for Faculty: Grade Center (631) In this document:

Excel Expert Microsoft Excel 2010

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

Formulas, LookUp Tables and PivotTables Prepared for Aero Controlex

Using Basic Formulas 4

Module 1: Introduction RStudio

Adding records Pasting records Deleting records Sorting records Filtering records Inserting and deleting columns Calculated columns Working with the

Lecture- 5. Introduction to Microsoft Excel

Excel Tool: Calculations with Data Sets

STATISTICAL TECHNIQUES. Interpreting Basic Statistical Values

Module Four: Formulas and Functions

Excel Foundation (Step 2)

Standardized Tests: Best Practices for the TI-Nspire CX

Microsoft Excel XP. Intermediate

STAT 113: R/RStudio Intro

What if Analysis, Charting, and Working with Large Worksheets. Chapter 3

ENTERING DATA & FORMULAS...

Excel Lesson 3 USING FORMULAS & FUNCTIONS

SECTION 1: INTRODUCTION. ENGR 112 Introduction to Engineering Computing

Excel 2007: Functions and Forumlas Learning Guide

Using Microsoft Excel

2. INTRODUCTORY EXCEL

Using Microsoft Excel

TI-84+ GC 3: Order of Operations, Additional Parentheses, Roots and Absolute Value

Intro To Excel Spreadsheet for use in Introductory Sciences

Excel Functions INTRODUCTION COUNT AND COUNTIF FUNCTIONS

Opening a Data File in SPSS. Defining Variables in SPSS

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Introduction to R Commander

Excel Functions & Tables

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Using Microsoft Excel

NESTED IF STATEMENTS AND STRING/INTEGER CONVERSION

Excel Basics 1. Running Excel When you first run Microsoft Excel you see the following menus and toolbars across the top of your new worksheet

Name: Dr. Fritz Wilhelm Lab 1, Presentation of lab reports Page # 1 of 7 5/17/2012 Physics 120 Section: ####

Exploring extreme weather with Excel - The basics

SPSS 11.5 for Windows Assignment 2

Chapter 13 Creating a Workbook

COMP-202 Unit 2: Java Basics. CONTENTS: Using Expressions and Variables Types Strings Methods

Rev. B 12/16/2015 Downers Grove Public Library Page 1 of 40

Mobile Computing Professor Pushpendra Singh Indraprastha Institute of Information Technology Delhi Java Basics Lecture 02

Microsoft Excel 2010 Handout

Introduction to MATLAB

Excel Template Instructions for the Glo-Brite Payroll Project (Using Excel 2010 or 2013)

Excel Functions & Tables

Our Strategy for Learning Fortran 90

Fundamentals. Fundamentals. Fundamentals. We build up instructions from three types of materials

1. Intro to the Calc User's Guide This manual is copyright (c) 1989 by D. Pankhurst. All rights reserved. It has been made available to purchasers of

Cell to Cell mouse arrow Type Tab Enter Scroll Bars Page Up Page Down Crtl + Home Crtl + End Value Label Formula Note:

Excel for Gen Chem General Chemistry Laboratory September 15, 2014

Part 1 Arithmetic Operator Precedence

Excel Functions & Tables

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

SAMLab Tip Sheet #1 Translating Mathematical Formulas Into Excel s Language

Access Intermediate

CS150 - Assignment 5 Data For Everyone Due: Wednesday Oct. 16, at the beginning of class

Language Basics. /* The NUMBER GAME - User tries to guess a number between 1 and 10 */ /* Generate a random number between 1 and 10 */

Microsoft Office Excel Use Excel s functions. Tutorial 2 Working With Formulas and Functions

An Introduction to Stata Exercise 1

SPARK-PL: Introduction

Simulations How To for Fathom

APPM 2460 Matlab Basics

SUM - This says to add together cells F28 through F35. Notice that it will show your result is

Lab1: Use of Word and Excel

Python lab session 1

GENERAL MATH FOR PASSING

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

Activity: page 1/10 Introduction to Excel. Getting Started

The MathType Window. The picture below shows MathType with all parts of its toolbar visible: Small bar. Tabs. Ruler. Selection.

EXCEL BASICS: MICROSOFT OFFICE 2010

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Excel 2010: Basics Learning Guide

Part III Fundamentals of Microsoft Excel

Book IX. Developing Applications Rapidly

Spreadsheet EXCEL. Popular Programs. Applications. Microsoft Excel. What is a Spreadsheet composed of?

Create formulas in Excel

PhotoSpread. Quick User s Manual. Stanford University

GOOGLE APPS. If you have difficulty using this program, please contact IT Personnel by phone at

EXCEL BASICS: MICROSOFT OFFICE 2007

IF & VLOOKUP Function

Rev. C 11/09/2010 Downers Grove Public Library Page 1 of 41

Wednesday. Wednesday, September 17, CS 1251 Page 1

Transcription:

R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats background still available. Thanks to everyone who s responded

R Basics

R Basics

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

R Commands Simplest way to interact with R is by typing in commands at the > prompt: R STUDIO R

R as a Calculator Typing in a simple calculation shows us the result: 608 + 28 What s 11527 minus 283? Some more examples: 400 / 65 (division) 2 * 4 (multiplication) 5 ^ 2 (exponentiation)

Functions More complex calculations can be done with functions: sqrt(64) What the function is (square root) Can often read these left to right ( square root of 64 ) What do you think this means? abs(-7) In parenthesis: What we want to perform the function on

Arguments Some functions have settings ( arguments ) that we can adjust: round(3.14) - Rounds off to the nearest integer (zero decimal places) round(3.14, digits=1) - One decimal place

Nested Functions

Nested Functions We can use multiple functions in a row, one inside another - sqrt(abs(-16)) - Square root of the absolute value of -16 Don't get scared when you see multiple parentheses - Can often just read left to right - R first figures out the thing nested in the middle Can you round off the square root of 7?

Using Multiple Numbers at Once When we want to use multiple numbers, we concatenate them c(2,6,16) - A list of the numbers 2, 6, and 16 Sometimes a computation requires multiple numbers - mean(c(2,6,16)) Also a quick way to do the same thing to multiple different numbers: - sqrt(c(16,100,144))

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Course Documents: Sample Data: Week 2 Reading plausible versus implausible sentences Scott chopped the carrots with a knife. Scott chopped the carrots with a spoon. Measure reading time on final word Note: Simulated data; not a real experiment.

Course Documents: Sample Data: Week 2 Reading plausible versus implausible sentences Reading time on critical word 36 subjects Each subject sees 30 items (sentences) half plausible, half implausible Interested in changes over time, so we ll track serial position (trial 1 vs trial 2 vs trial 3 )

Reading in Data Make sure you have the dataset at this point if you want to follow along: Course Documents " Sample Data " Week 2

Reading in Data RStudio Navigate to the folder in lower-right More -> Set as Working Directory Open a comma-separated value file: - experiment <-read.csv('week2.csv') Name of the dataframe we re creating (whatever we want to call this dataset) read.csv is the function name File name

Reading in Data Regular R Read in a comma-separated value file: - experiment <- read.csv('/users/ scottfraundorf/desktop/week2.csv') Name of the dataframe we re creating (whatever we want to call this dataset) read.csv is the function name Folder & file name Drag & drop the file into R to get the full folder & filename

Looking at the Data: Summary A big picture of the dataset: summary(experiment) summary() is a very important function Basic info & descriptive statistics Check to make sure the data are correct

Looking at the Data: Summary A big picture of the dataset: summary(experiment) We can use $ to refer to a specific column/variable in our dataset: summary(experiment$itemname)

Looking at the Data: Raw Data Let s look at the data experiment Ack That s too much How about just a few rows? head(experiment) head(experiment, n=10)

Reading in Data: Other Formats Excel: - library(gdata) - experiment <- read.xls('/users/ scottfraundorf/desktop/week2.xls') SPSS: - library(foreign) - experiment <- read.spss('/users/ scottfraundorf/desktop/ week2.spss', to.data.frame=true)

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

R Scripts Save & reuse commands with a script R File -> New Document R STUDIO

R Scripts Run commands without typing them all again R Studio: Code -> Run Region -> Run All: Run entire script Code -> Run Line(s): Run just what you ve highlighted/selected R: - Highlight the section of script you want to run - Edit -> Execute Keyboard shortcut for this: - Ctrl+Enter (PC), +Enter (Mac)

R Scripts Saves times when re-running analyses Other advantages? Some: - Documentation for yourself - Documentation for others - Reuse with new analyses/experiments - Quicker to run can automatically perform one analysis after another

R Scripts Comments Add # before a line to make it a comment - Not commands to R, just notes to self (or other readers) Can also add a # to make the rest of a line a comment summary(experiment$subject) #awesome

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Descriptive Statistics Remember how we referred to a particular variable in a dataframe? - $ Combine that with functions: - mean(experiment$rt) - median(experiment$rt) - sd(experiment$rt) Or, for a categorical variable: - levels(experiment$itemname) - summary(experiment$subject)

Descriptive Statistics We often want to look at a dependent variable as a function of some independent variable(s) - tapply(experiment$rt, experiment $Condition, mean) - Split up the RTs by Condition, then get the mean Try getting the mean RT for each item How about the median RT for each subject? To combine multiple results into one table, column bind them with cbind(): cbind( tapply(experiment$rt, experiment$condition, mean), tapply(experiment$rt, experiment$condition, sd) )

Descriptive Statistics Can have 2-way tables... - tapply(experiment$rt, list(experiment$subject, experiment$condition), mean) - 1 st variable is rows, 2 nd is columns...or more - tapply(experiment$rt, list(experiment$itemname, experiment$condition, experiment $TestingRoom), mean)

Descriptive Statistics Contingency tables for categorical variables: - xtabs (~ Subject + Condition, data=experiment)

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Subsetting Data Often, we want to examine or use just part of a dataframe Remember how we read our dataframe? - experiment <- read.csv(...) Create a new dataframe that's just a subset of experiment: - experiment.longrtsremoved <- subset(experiment, RT < 2000) New dataframe name Original dataframe Inclusion criterion: RT less than 2000 ms

Subsetting Data: Logical Operators Try getting just the observations with RTs 200 ms or more: - experiment.shortrtsremoved <- subset(experiment, RT >= 200) Why not just delete the bad RTs from the spreadsheet? Easy to make a mistake / miss some of them Faster to have the computer do it We d lose the original data No documentation of how we subsetted the data

Subsetting Data: AND and OR What if we wanted only RTs between 200 and 2000 ms? - Could do two steps: - experiment.temp <- subset(experiment, RT >= 200) - experiment.badrtsremoved <- subset(experiment.temp, RT <= 2000) One step with & for AND: - experiment2 <- subset(experiment, RT >= 200 & RT <= 2000)

Subsetting Data: AND and OR What if we wanted only RTs between 200 and 2000 ms? One step with & for AND: - experiment2 <- subset(experiment, RT >= 200 & RT <= 2000) means OR: - experiment.badrts <- subset(experiment, RT < 200 RT > 2000) - Logical OR ( either or both )

Subsetting Data: == and = Get a match / equals: - experiment.firsttrials <- subset(experiment, SerialPosition == 1) Words/categorical variables need quotes: - experiment.implausiblesentences <- subset(experiment, Condition=='Implausible') = means not equal to : - experiment.badsubjectremoved <- subset(experiment, Subject = 'S23') Note DOUBLE equals sign Drops subject S23

Subsetting Data: %in% Sometimes our inclusion criteria aren't so mathematical Suppose I just want the Ducks, Hornet, and Panther items We can check against any arbitrary list: - experiment.specialitems <- subset(experiment, ItemName %in% c('ducks', 'Hornet', 'Panther')) Or, keep just things that aren't in a list: - experiment.nonnativespeakersremoved <- subset(experiment, Subject %in% c('s10', 'S23') == FALSE)

Logical Operators Review Summary - > Greater than - >= Greater than or equal to - < Less than - <= Less than or equal to - & AND - OR - == Equal to - = Not equal to - %in% Is this included in a list?

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Assignment Remember the pointing arrow used to create dataframes and subsets? - e.g., experiment <- read.csv(...) This is the assignment operator. It saves results or values in a variable - x <- sqrt(64) - CriticalTrialsPerSubject <- 30 - Remember, typing a name by itself shows you the current value: - CriticalTrialsPerSubject Assigning a new value overwrites the old

Assignment We can use this to create new columns in our dataframe: - experiment$experimentnumber <- 1 - Here, the same number (1) is assigned to every trial Or, compute a value for each row: - experiment$logserialposition <- log(experiment$serialposition) - For each trial, takes the log serial position and saves that into LogSerialPosition - Similar to an Excel formula

ifelse() IF YOU WANT DESSERT, EAT YOUR PEAS OR ELSE

ifelse() ifelse(): Use a test to decide which of two values to assign: experiment$half <- ifelse( experiment$serialposition <= 15, 1, 2) Function name Half is 1 If it s NOT, Half is 2 If serial position IS <= 15 Possible to nest ifelse() if we need more than 2 categories: experiment$third <- ifelse(experiment$serialposition <= 10, 1, ifelse(experiment $SerialPosition <= 20, 2, 3))

ifelse() Instead of specific numbers, can use other columns or a formula: experiment$rt.fixed <- ifelse( experiment$testingroom==2, experiment$rt + 100, experiment$rt) How can we check if this worked? head(experiment) Fixed RTs are indeed 100 ms longer for TestingRoom 2

Which do you like better? - experiment$half <-ifelse(experiment $SerialPosition <= 15, 1, 2) - Shorter & faster to write vs: - CriticalTrialsPerSubject <- 30 - experiment$half <- ifelse(experiment $SerialPosition <= (CriticalTrialsPerSubject / 2), 1, 2) - Explains where the 15 comes from helpful if we come back to this script later - We can also refer to CriticalTrialsPerSubject variable later in the script & this ensure it s consistent - Easy to update if we change the number of critical trials

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Referring to Specific Cells So far, we ve seen how to - Create a new dataframe that s a subset of an existing dataframe - Modify a dataframe by creating an entire column What if we want to modify a dataframe by adjusting some existing values? e.g., replace all RTs above 2000 ms with the number 2000 ( fencing ) Creating a new subset won t work because we want to change the original dataframe Need a way to edit specific values

Referring to Specific Cells Use square brackets [ ] to refer to specific entries in a dataframe: - Row, column - experiment[3,7] Omit the row or column number to mean all rows or all columns: - experiment[3,] - experiment[,4] Row 3, all columns All rows in column 4 Can also use column names: experiment[,'rt'] All rows in the RT column Remember c()? We can check multiple rows: - experiment[c(1:4),]

Logical Indexing We can look at rows or columns that meet a specific criterion... - experiment[experiment$rt < 200,] Can use this as another way to subset: - experiment.shortrtsremoved <- experiment[experiment$rt > 200, ] - Actually, subset() just does this But we can also set values this way - experiment[experiment$rt < 200, 'RT'] <- 200 - In the dataframe experiment, find the rows where RT < 200, and set the column RT to 200

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Types R treats continuous & categorical variables differently: These are different data types: - Numeric - Factor: Variable w/ fixed set of categories (e.g., treatment vs. placebo) - Character: Freely entered text (e.g., open response question)

Types R's heuristic when reading in data: - Letters anywhere in the column factor - No letters, purely numbers numeric

Type Conversion: Numeric Factor Sometimes we need to correct this - Room 4 is not twice as much Room 2 Create a new column that's the factor (categorical) version of TestingRoom: - experiment$room.factor <- as.factor(experiment$testingroom) Or, just overwrite the old column: - experiment$testingroom <- as.factor(experiment$testingroom)

Conversion: Character Factor When ifelse() results in words, R creates a character variable rather than a factor - Need to convert it Wrong: - experiment$faveroom <- ifelse(experiment$testingroom==3, 'My favorite room', 'Not favorite') Right: - experiment$faveroom <- as.factor(ifelse(experiment $TestingRoom== 3, 'My favorite room', 'Not favorite'))

Type Conversion: Factor Numeric To change a factor to a number, need to turn it into a character first: - experiment$age.numeric <- as.numeric(as.character(experiment$ Age))

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

NA We might have run into some problems trying to change Age into a numerical variable... NA means not available... - Characters that don't convert to numbers - Missing data in a spreadsheet - Invalid computations

NA If we try to do computations on a set of numbers where any of them is NA, we get NA as a result... - sd(experiment$age.numeric) R wants you to think about how you want to treat these missing values

NA Solutions To ignore the NAs when doing a specific computation, use na.rm=true: - mean(experiment$age.numeric,na.rm=true) To get a copy of the dataframe that excludes all rows with an NA (in any column): - experiment.nonas <- na.omit(experiment) Change NAs to something else with logical indexing: - experiment[is.na(experiment$ Age.Numeric)==TRUE, ]$Age.Numeric <- 23

R Basics R commands & functions Reading in data Saving R scripts Descriptive statistics Subsetting data Assigning new values Referring to specific cells Types & type conversion NA values Getting help

Getting Help Get help on a specific known function: -?sqrt -?write.csv Try to find a function on a particular topic: -??logarithm - External resources: - ling-r-lang-l < https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l > - Mailing list for using R in language research

Wrap-Up Can use R for: - Reading in data - Descriptive statistics - Subsetting data - Creating new variables Additional practice: - Baayen (2008) p. 20 - Use commands on pp. 1-2 to get the data Next week: Mixed effects models Baayen, Davidson, & Bates (2008): Introduced mixed effects models to cog psych. Covers fixed and random effects