Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Similar documents
Lab 1: Getting started with R and RStudio Questions? or

3. Data Tables & Data Management

Chapter 2 The SAS Environment

Word: Print Address Labels Using Mail Merge

Using Excel This is only a brief overview that highlights some of the useful points in a spreadsheet program.

EXCEL BASICS: MICROSOFT OFFICE 2007

Assignment 0. Nothing here to hand in

CSV Roll Documentation

Business Process Procedures

Gradebook Export/Import Instructions

EXCEL BASICS: MICROSOFT OFFICE 2010

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

Code Plug Management: Contact List Import/Export. Version 1.0, Dec 16, 2015

Identifying Updated Metadata and Images from a Content Provider

Excel Level 1

Using Microsoft Excel

Working with Mailbox Manager

Microsoft Excel 2007

Graphics #1. R Graphics Fundamentals & Scatter Plots

How to use Excel Spreadsheets for Graphing

1 Introduction to Using Excel Spreadsheets

ST Lab 1 - The basics of SAS

Basics of Stata, Statistics 220 Last modified December 10, 1999.

Microsoft Word 2010 Intermediate

Creating and Displaying Multi-Layered Cross Sections in Surfer 11

Word - Basics. Course Description. Getting Started. Objectives. Editing a Document. Proofing a Document. Formatting Characters. Formatting Paragraphs

MATLAB Project: Getting Started with MATLAB

Use signatures in Outlook 2010

Stata: A Brief Introduction Biostatistics

Excel Primer CH141 Fall, 2017

Intro to Excel. To start a new workbook, click on the Blank workbook icon in the middle of the screen.

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

CSSCR Excel Intermediate 4/13/06 GH Page 1 of 23 INTERMEDIATE EXCEL

Matlab notes Matlab is a matrix-based, high-performance language for technical computing It integrates computation, visualisation and programming usin

GUARDTOOL IMPORTER ADDENDUM

DocumentDirect for Windows (DDW) Current version 4.4 (white screen)

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 2 Working with data in Excel and exporting to JMP Introduction

1. Setup Everyone: Mount the /geobase/geo5215 drive and add a new Lab4 folder in you Labs directory.

Chapter 5: Compatibility of Data Files

Intro To Excel Spreadsheet for use in Introductory Sciences

ABBYY FineReader 14. User s Guide ABBYY Production LLC. All rights reserved.

Excel to R and back 1

Chemistry 30 Tips for Creating Graphs using Microsoft Excel

Exploring extreme weather with Excel - The basics

Stat 302 Statistical Software and Its Applications SAS: Data I/O

5b. Descriptive Statistics - Part II

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

How to Import Part Numbers to Proman

6. Essential Spreadsheet Operations

History, installation and connection

Microsoft Word Advanced Skills

Code::Blocks Student Manual

using cells to create dynamic formulas

Introduction to R Commander

Using Audacity for Audio-Text Synchronization

Module 1: Introduction RStudio

Tutorial 1 Importing Data

Interfacing with MS Office Conference 2017

1. What specialist uses information obtained from bones to help police solve crimes?

Workshop. Import Workshop

NAME: BEST FIT LINES USING THE NSPIRE

NiceForm User Guide. English Edition. Rev Euro Plus d.o.o. & Niceware International LLC All rights reserved.

Introduction to MATLAB

Introduction to Statistics using R/Rstudio

Chapter 3 Using Styles and Templates

Using Microsoft Excel

Notepad++ The COMPSCI 101 Text Editor for Windows. What is a text editor? Install Python 3. Installing Notepad++

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Getting Started Guide. Chapter 3 Using Styles and Templates

Getting Started Guide. Chapter 3 Using Styles and Templates

How to Mail Merge PDF Documents

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

Introduction to Microsoft Excel 2007

SomaticView Version 1.0

Table of Contents. Part I How do I... Part II Zetafax Client. Foreword. 3 Advanced tasks. 1 Menu options. Annotate a fax? View a text message?

Getting Started Guide. Chapter 3 Using Styles and Templates

Importing Local Contacts from Thunderbird

EQuIS Data Processor (EDP) User Manual

Instructions for Using the Databases

Using Microsoft Excel

MOVING FROM CELL TO CELL

RegressItPC installation and test instructions 1

Computer lab 2 Course: Introduction to R for Biologists

HOW TO USE THE EXPORT FEATURE IN LCL

MATLAB TUTORIAL WORKSHEET

The KWordQuiz Handbook. Peter Hedlund

Describe the Squirt Studio

Creating a data file and entering data

BASIC USER TRAINING PROGRAM Module 5: Test Case Development

Using vletter Handwriting Software with Mail Merge in Word 2007

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Civil Engineering Computation

.txt - Exporting and Importing. Table of Contents

Excel Spreadsheets and Graphs

Excel Shortcuts Increasing YOUR Productivity

Introduction to Spreadsheets

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Chapter 2 Assignment (due Thursday, April 19)

Tutorial (Unix Version)

LOOMIS EXPRESS HOW TO IMPORT THE E-BILL LOOMIS ( ) Technical Support Hotline

Transcription:

Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console window. What you type is red, what the R software package returns is blue. If you enter something wrong or if you want to recall a previous command, use the and keys to scroll through your command history. You can edit recalled commands. To make it easier for you to identify what R commands are, I highlight them with courier font. To further distinguish arbitrary variable names and numbers, I highlight the arbitrary variable names and numbers in bold. You will have to modify those bolded names and numbers in order to adapt the code you learn here to your own data and problems in the future. Working with numbers: 2+3 A=2+3 A a (oops, R is case sensitive) B=7 A+B C=A+B C Some mathematical functions: sqrt(c) C^3 log(c) log(c,10) abs(-5) You can get help for any R function:?log?boxplot Working with vectors: X=c(1,4,3,5,7) Y=c(5,7,9,4,8) mean(x) sd(x) X*10 Z=Y+3 Z boxplot(x,y,z) t.test(x,y) t.test(x,z) Working with matrices (tables): K=as.data.frame(cbind(X,Y,Z)) X=X*10 K (oops, nothing happened?) K$X=K$X*10 t(k) plot(k) 1.2. Working efficiently with R It s not really convenient to work in R like this. You don t want to type your raw data into the R console, and you want to have an editable record of your statistical analysis or graphics scripts. In this section, we learn a basic setup that allows you to write and save R programs as text-files, and load and save data from a working directory of your choice. Make a folder, where all your files for a particular R session are saved: In Windows Explorer, navigate to a place you like, then right click and choose New > Folder, and name it (e.g.: C:\Lab1 ). Create a Workspace Shortcut to R, which sets the new folder (C:\Lab1) as the working directory: If R is still open, close the program and cancel all warning messages and save prompts. Now re-open R, then save an empty workspace to your new directory: In the menu, go to File > Save Workspace > Navigate to your new folder C:\Lab1 > hit Save and choose the filename StartR.RData (Note that you need to include the extension.rdata in the file name!) Close R, then re-open R by double-clicking the StartR file in your C:\Lab1 directory.

Create a script file with your R code: In the menu, go to File > New script In the new R-Editor window, write a few lines of code, for example: X=c(1,4,3,5,7) mean(x) Now save this script by hitting the save button: Choose a file name and add the.r extension, e.g. Script1.r, and by default this should save directly into your working directory C:\Lab1. Close R. OK, while this may seem a little convoluted, this set-up is actually very convenient. Opening R with StartR, will set the working directory to the location of the StartR file. You never have to worry about specifying working directories in your code and you can move files and folders to different computers without any problems. On every Windows-based computer that has R installed, you can just double-click the StartR file, load the script, and run your code. You can also just quickly look at your code on any computer by opening the text file Script1.r in Notepad or Wordpad. In Windows explorer, right-click Script1.r, choose Open with, then select Wordpad under Other Programs. If you put a checkmark at Always use the selected program, you can open.r files by just double-clicking in the future. Note that we will never use the workspace functionality of R. Simply ignore the save workspace? prompts. Load and execute your script file in 10 seconds: Double-click the StartR file to open R Hit the Open File button, choose Script1.r and hit the Open button. Place your cursor anywhere in the first line of your script and hit Ctrl-R for Run (hold the Ctrl key and press the R key) The cursor automatically jumps to the next line and you can hit Ctrl-R again to execute the next command You can also highlight a larger section of the script (or all of it) and execute it with Ctrl-R Of course, you can edit your script in R and re-save it: Make some changes Make sure that the script-window is active (click on the script window if not) Choose File > Save or Save as.. to a new file 1.3. Importing and analyzing data in R There are several ways to import data to R, but I highly recommend using Excel-generated CSV files (Comma Separated Values). There are several advantages to CSV files: (1) they are plain text files, which are good for long-term data archiving, (2) almost any software package (including R and SAS) can import them error-free, and (3) you can double-click CSV files to quickly open them in Excel for editing. Create a CSV file in Excel: Open Excel and enter the dataset on the right: Save it as an Excel spreadsheet in your folder C:\Lab1\data.xls To save as a CSV file, chose Save as From the drop-down box Save as type choose CSV (Comma delimited) (.csv) Put it in the same location C:\Lab1\data.csv Dismiss all warning dialogues and close Excel Import data to R for analysis: If you closed R, re-open it by double-clicking StartR Create a new script file by choosing File > New Script Save it with a new name File > Save as > C:\Lab1\correlation.r X Y Z 1 5 8 4 7 10 3 9 12 5 4 7 7 8 11

Now let s enter some code (and re-save): dat=read.csv( data.csv ) fix(dat) head(dat) str(dat) attach(dat) cor(x,y) lm(y~x) plot(y~x) abline(lm(y~x)) plot(dat) Execute your code line by line with Ctrl-R: fix()allows you to open the spreadsheet to see if the import worked fine. You can also edit a cell by double-clicking it. You need to close the spreadsheet window before you can continue executing code. head()and str() are alternate and perhaps better ways to check your import. head()is fast and does not stop the code execution, while str() gives you more information about the imported variable types, hinting at potential errors. attach()let s R know, that this is the data table that we are working with right now, otherwise it won t find the variables X, Y and Z. cor() returns the Pearson correlation coefficient for two variables. lm() returns the regression equation. The symbol ~ always means as a function of. plot(x~y) give us a scatter plot of Y as a function of X. abline() adds a function to the scatter plot. plot(dat) creates a scatter plot for all variables in the data table dat. 1.4. Working with your own data We will normally use CSV file formats in this course, and I recommend that you do the same for your own datasets. Below are some useful rules that help with file management for this course, and for your own projects. Always keep your original spreadsheet (typically an Excel file) and include documentation in this file (where did the data come from? when was it collected? what are the rows, columns, variables?) Create a simplified CSV for analysis, which just has a single header row with simple variable names followed by data rows (no documentation or comments, no blank rows or columns). Save this CSV file with the same name in the same folder. In the CSV file, variable names can only contain letters, numbers, and underscores (A-Z, a-z, 0-9, _ ). Don t use spaces or symbols. Variable names should be no more than 8 characters long and must start with a letter. For clarity, I find it useful to choose variable names that have ALL UPPER CASE letters and use lower case letters for other R code. R and SAS may have trouble reading CSV files that were generated by other programs. In this case open and re-save the CSV files in Excel Always put all the files you need for a particular analysis into one folder with a good descriptive name. Don t accumulate too many files in a folder, but rather make new folders for different purposes (a folder may for example contain the files: StartR.RData, data.xls, data.csv, anova.r, summary.r, graphics.r).

1.5. Importing other delimited text files The read.csv() function is actually just one specific case of the more general read.table() function. You can specify the delimiter (for example, in Europe the semicolon rather than the comma is a more popular delimiter). Also, the tab-delimited text-file is quite popular. The general syntax is the following. dat1=read.table("filename.txt", header=true, sep=";", skip=0) Header=TRUE means that you have a header row that contains the variable names. If that is not the case, set it to Header=FALSE. If you don t have variable names, you can add them in with a second command after the read.table() command, e.g.: names(dat1)=c("varname1","varname2","varname3") sep=";" determines what character determines column breaks, in this case a semicolon rather than a comma. For tab-delimited text files use sep="\t". skip=0 means that your data starts right after the header row. If there are several rows with comments, you can skip over them by specifying the number of rows to skip, e.g. skip=2 Try to execute the command?read.table for many more options that may at some point be of interest See if you can properly import the file aspen.txt, downloadable from the course website. 1.6. Importing data in other formats Unfortunately, researchers work with a whole range of different data formats that you may need to import into R for analysis. Normally, the best way to handle this is to use the Text Import Whizard of Excel, available from the Open file dialogue. However, sometimes this is not possible because the file may be too large to be opened in Excel, or there may be too many files that you would have to open and re-save as CSVs. For these cases, we can use slightly different import routines directly into R. A common format for data tables are DBF files developed for dbase II software, which was the first widely used database management system for microcomputers. It is still widely used due to its adoption by ESRI in its popular ArcView and ArcGIS software. To import this file format, you need to install an extension for the R base package. From the menu in R, choose Packages, then Install packages. Next, you get a pop-up window, where you can choose a download site. They have all exactly the same contents, so you may pick something nearby, for example Canada (BC). Then, you get a pop-up window, where you can choose a package. Scroll down and double-click foreign, for an extension package that handles DBF files among others. Now you are ready, to import (and export) DBF files into R. The first command loads the extension package that we have just installed into the computer memory, then you can import and export DBF files with the subsequent command. Download the aspen.dbf data from the course website. library(foreign) dat2=read.dbf("aspen.dbf") write.dbf(dat2,"aspen2.dbf") If you need to import and export other exotic formats, go to http://rseek.org. This is essentially a version of Google restricted to the content of the R software project. If you search for dbf, you will see that the extension package foreign is the first hit, but there are many other options too. There is virtually nothing that R can t import via various extension packages developed by a huge open-source community.

1.7. Importing and analyzing data in SAS You can t use SAS like a calculator, but running script files works similar as in R. Instead of the extension.r, SAS scripts have the extension.sas, but they are really just text files. You can always right-click the file and choose Open with to see what s in the file with Notepad or Wordpad even if you don t have SAS installed on a particular computer. The import code for CSVs in SAS is a little bit more complicated and you also have to specify the exact directory path. This is a pain if you move your files to other computers, or edit your directory names. Anyway, here is some code to try out: Creating a script file for SAS: Start SAS from the start menu by either navigating to or searching for SAS 9.3 (English). When SAS opens you see three windows (file navigation on the left, the log window top right, and the script window bottom right). It s a good habit to first select and save that script window before you start working. If you made changes to the script, the script window will indicate this with a * next to the window header, and you can just hit the save button to save any changes. Enter this code to import a file (and customize what I highlighted in bold as needed): proc import out=dat datafile="c:\lab1\data.csv" dbms=csv replace; getnames=yes; datarow=2; guessingrows=100000000; run; Select the script window with the mouse, and hit the run button to execute the script. The log window will point out any errors, so do keep an eye on it while things are running. You can see the imported file, by double-clicking on the left panel Libraries > Work > DAT Close the DAT spreadsheet after confirming that the import went OK (otherwise code that uses this data table will not execute, just as in R) Always use the import code above in SAS and never use the import wizard from the file menu. For most file formats, including CSV files, the import wizard is unfortunately very unreliable in SAS. This stems from the fact that SAS will only look at the first couple of rows to determine the variable type. By setting the Guessingrows parameter to a very large number (more than the number of rows in a data table), that problem is solved. Some sample analysis SAS: Add this to your script. proc reg data=dat; model Y=X; run; proc corr data=dat; var X Y Z; run; proc gplot data=dat; plot X*Y; run; As in R, you can execute individual lines or sections by highlighting them with the mouse, and then hitting the run button. If you don t select a section, all the code is executed in SAS (in R, just a single line is executed). Watch the Log window for errors (in red). Use the bottom bar to navigate between Editor, Log, and Results windows. If you close a window, you can re-open it from the View menu.