Computing With R Handout 1

Similar documents
Computing With R Handout 1

Chapter 3 - Displaying and Summarizing Quantitative Data

Physics 326G Winter Class 2. In this class you will learn how to define and work with arrays or vectors.

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

Applied Calculus. Lab 1: An Introduction to R

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 1. Looking at Data-Distribution

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Chapter 2 Describing, Exploring, and Comparing Data

Practical 2: Using Minitab (not assessed, for practice only!)

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

Chapter 2: The Normal Distributions

CHAPTER 3: Data Description

ENCM 339 Fall 2017: Editing and Running Programs in the Lab

STA 570 Spring Lecture 5 Tuesday, Feb 1

Ch3 E lement Elemen ar tary Descriptive Descriptiv Statistics

An Introduction to the R Commander

No. of blue jelly beans No. of bags

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

appstats6.notebook September 27, 2016

Data Management Project Using Software to Carry Out Data Analysis Tasks

A Quick Introduction to R

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Exercise 1: Introduction to Stata

Page 1. Graphical and Numerical Statistics

Middle Years Data Analysis Display Methods

Statistics Worksheet 1 - Solutions

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

Intro. Scheme Basics. scm> 5 5. scm>

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

An introduction to plotting data

Week 7: The normal distribution and sample means

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

STAT 135 Lab 1 Solutions

MATH11400 Statistics Homepage

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 3 Analyzing Normal Quantitative Data

Statistics can best be defined as a collection and analysis of numerical information.

Measures of Dispersion

(Refer Slide Time: 06:01)

Module 1: Introduction RStudio

Introduction to R: Part I

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

An Introduction to R- Programming

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

Figure 1: The PMG GUI on startup

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Introduction to Minitab 1

The goal of this handout is to allow you to install R on a Windows-based PC and to deal with some of the issues that can (will) come up.

Minitab on the Math OWL Computers (Windows NT)

CSC 121: Computer Science for Statistics

Introduction to Matlab

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

18.02 Multivariable Calculus Fall 2007

Stat 290: Lab 2. Introduction to R/S-Plus

CHAPTER 2: Describing Location in a Distribution

STAT 20060: Statistics for Engineers. Statistical Programming with R

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

Lab 1: Introduction, Plotting, Data manipulation

[1] CURVE FITTING WITH EXCEL

Lab #1: Introduction to Basic SAS Operations

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Introduction to Spreadsheets

MATH 51: MATLAB HOMEWORK 3

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

IN-CLASS EXERCISE: INTRODUCTION TO R

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

2 Sets. 2.1 Notation. last edited January 26, 2016

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

3. Data Analysis and Statistics

CHAPTER 2 DESCRIPTIVE STATISTICS

IT 403 Practice Problems (1-2) Answers

EN 001-4: Introduction to Computational Design. Matrices & vectors. Why do we care about vectors? What is a matrix and a vector?

1 Pencil and Paper stuff

Descriptive Statistics, Standard Deviation and Standard Error

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

CS1114: Matlab Introduction

INFO Wednesday, September 14, 2016

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

SAMLab Tip Sheet #1 Translating Mathematical Formulas Into Excel s Language

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

Lecture 3 - Template and Vectors

FILE ORGANIZATION. GETTING STARTED PAGE 02 Prerequisites What You Will Learn

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Step 1: Syncing Your Library. After installing One Drive for Business, you will be prompted to Sync a library. NEXT: Select the Library URL

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

SAMLab Handout #5 Creating Graphs

Lab1: Introductory and Setup Activities

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Excel Primer CH141 Fall, 2017

Quantitative - One Population

Transcription:

Computing With R Handout 1 The purpose of this handout is to lead you through a simple exercise using the R computing language. It is essentially an assignment, although there will be nothing to hand in. If I hear no questions regarding this assignment, I will assume that you can successfully use R to the extent covered in this handout. Other assignments that will have results due will require the use of R and I will not ``go back'' and cover the material included here. So, please make certain you can follow and complete the little exercise included here. Installing R To install R on your computer, you will need to go to the official home page of The R Project for Statistical Computing, which can be found at http://www.r-project.org/. Depending on your operating system, you will have to follow the directions posted there, by first choosing your preferred CRAN mirror to download the software into your computer. This is fairly straightforward, but do let me know if you run into problems. You should start by installing the base package for now. As we go along and learn more about spatial statistics and need to install different packages, I will show you how to do it. Getting Into R To access the R language, go to room 322 Snedecor Hall or another computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution locations. If this is a Windows machine there will probably be an ``R'' icon. Simply double click. If you are using a Linux or Apple machine there will also likely be an icon. If not, seek assistance in starting R as needed. 1

Once you have started R you should see the following image: The > is the command line prompt where you will enter commands. Entering Data by Hand We will not worry now about all the possible ways to import data into R and will cover different object structures (e.g., matrix, data frame, list) as needed through the semester. The basic assignment of contents (e.g., numbers) to objects (i.e., named variables) in R is to type at the command line. For example, to assign the value of 5 to an object called num1, type num1<-5 at the prompt. Then typing num1 should return the value 5. Do this, and your screen should look like: 2

To create a vector, use the concatenate function c with numbers separated by commas. Type vec1<-c(5,10,12,56,72,33). Type vec2<-c(2,14,100,22,48,86). To concatenate these vectors type vec3<-c(vec1,vec2). To determine the length of a vector type length(vec1). To produce the sum of the elements of vec3 type sum(vec3). To obtain the dot product of vec1 and vec2 type sum(vec1*vec2). To obtain the dot product a more direct way type vec1%*%vec2. To list the objects that you have created type ls(). All of this produces a screen such as: 3

To get rid of objects type rm(num1,vec1,vec2,vec3). Simulating From a Normal Distribution Simulate 100 values from a normal distribution with the following command, as save in an object called j (I use such objects for things I don t really want to save as in junk, so I often end up with objects j, j1, j2, jj, etc. and I don t care if these get overwritten with something else after I m done with them). > j <- rnorm(100, mean=25, sd=4) The above command simulates 100 values from a normal distribution with mean (expected value) 25 and standard deviation 4. We ll talk more about all of these topics (normal distributions, expected values, and standard deviations) later, but I thought you might have heard of them. Check this by length(j) and you should get back the value 100. Now produce one of the basic summaries of numerical data called a Stem and Leaf Plot, and compute values of what is called a Five Number Summary (although the version in R actually is a 6 number summary) with commands: 4

> stem(j) > summary(j) You should now see (your numbers will be different since they have just been randomly generated, but they should be similar): To figure out for yourself what a stem-and-leaf plot is, compare the display to the vector of values (just type j ). It might be easier if we first ordered the values in j with > j[order(j)] Note that this leaves the object j unchanged. If we want to save the ordered version of j (again as the object j) we would type > j<-j[order(j)] Compare the ordered values of the object j with the stem-and-leaf display and make sure you understand what this display is doing. Out of the 100 values in j, how many are less than or equal to the value in the summary given as 1 st Qu (this is the first quartile). Find out with (using your own value of the first quartile): 5

> sum(j <= 21.68) The second quartile is also called the median and is given in summary by M. Figure out what the median is, and similarly the third quartile. To get the sample mean (or sample average) and the sample variance, use > mean(j) > var(j) The standard deviation is the square root of the variance, so compare the value of the sample mean to the 25 you used to simulate the values and the square root of the variance to the value of 4 you used to simulate (try sqrt(var(j)) to do this). Histograms and the Normal Density Curve Another common graphical display of data (technically the empirical distribution ) you have probably heard of is the Histogram. To get a histogram of the values in j use the command > hist(j, prob=t, nclass=5) A graphics window should automatically open and you will now be looking at (after perhaps re-sizing the graphics window): 6

This is a pretty crude picture of the empirical distribution (compare to the stem-and-leaf plot turned on its side). Increase the number of bins or classes in the histogram as > hist(j, prob=t, nclass=15) This may actually be too many (the top is pretty ragged). Try something intermediate, such as 10 classes. There is no correct number of classes to use. There are some rules of thumb, but they don t necessarily apply to every data set. In fact, your data may look nicer with 15 classes than 10, or with 20 than 15, or whatever. The vertical scale of the bars is essentially relative frequency of values falling into the bins or classes given by the bars. The option prob=t in the hist command arranges things so that the total area of the bars in the histogram sum to 1 (why this is a good idea comes later in the class). 7

My values in the object j using 10 classes for the histogram looks like: When you are finished using R, at the command line type >q() and answer Yes only if you wish to have the objects you created today stored in R for your future use. If you type No, your work will simply be discarded. 8