Computing With R Handout 1

Similar documents
Computing With R Handout 1

Physics 326G Winter Class 2. In this class you will learn how to define and work with arrays or vectors.

Intro. Scheme Basics. scm> 5 5. scm>

Applied Calculus. Lab 1: An Introduction to R

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

CHAPTER 3: Data Description

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Data Management Project Using Software to Carry Out Data Analysis Tasks

CSC 121: Computer Science for Statistics

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

appstats6.notebook September 27, 2016

LAB #2: SAMPLING, SAMPLING DISTRIBUTIONS, AND THE CLT

Lesson 18-1 Lesson Lesson 18-1 Lesson Lesson 18-2 Lesson 18-2

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Practical 2: Using Minitab (not assessed, for practice only!)

IT 403 Practice Problems (1-2) Answers

Chapter 1. Looking at Data-Distribution

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

Week 7: The normal distribution and sample means

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Introduction to Matlab

Statistics can best be defined as a collection and analysis of numerical information.

An introduction to plotting data

No. of blue jelly beans No. of bags

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

Statistics... Basic Descriptive Statistics

Statistics... Statistics is the study of how best to. 1. collect data. 2. summarize/describe data. 3. draw conclusions/inferences from data

Chapter 2 Describing, Exploring, and Comparing Data

CS1114: Matlab Introduction

Interactive Math Glossary Terms and Definitions

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 2: The Normal Distributions

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

MAT 003 Brian Killough s Instructor Notes Saint Leo University

Hot X: Algebra Exposed

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Ch3 E lement Elemen ar tary Descriptive Descriptiv Statistics

ENCM 339 Fall 2017: Editing and Running Programs in the Lab

Descriptive Statistics, Standard Deviation and Standard Error

Univariate Statistics Summary

Chapter 3 Analyzing Normal Quantitative Data

Exercise: Graphing and Least Squares Fitting in Quattro Pro

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Topic (3) SUMMARIZING DATA - TABLES AND GRAPHICS

Middle Years Data Analysis Display Methods

Fathom Dynamic Data TM Version 2 Specifications

CHAPTER 2: Describing Location in a Distribution

CS115 - Module 10 - General Trees

Meeting 1 Introduction to Functions. Part 1 Graphing Points on a Plane (REVIEW) Part 2 What is a function?

Revising CS-M41. Oliver Kullmann Computer Science Department Swansea University. Linux Lab Swansea, December 13, 2011.

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

EN 001-4: Introduction to Computational Design. Matrices & vectors. Why do we care about vectors? What is a matrix and a vector?

1 Pencil and Paper stuff

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

2.1: Frequency Distributions and Their Graphs

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

Stat 290: Lab 2. Introduction to R/S-Plus

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

EE 301 Signals & Systems I MATLAB Tutorial with Questions

CS1114: Matlab Introduction

CHAPTER 6. The Normal Probability Distribution

SPSS 11.5 for Windows Assignment 2

SAMLab Tip Sheet #1 Translating Mathematical Formulas Into Excel s Language

R Programming Basics - Useful Builtin Functions for Statistics

MATH11400 Statistics Homepage

So..to be able to make comparisons possible, we need to compare them with their respective distributions.

In this lesson you are going to create a drawing program similar to Windows Paint. 1. Start with a new project and remove the default cat sprite.

COMP 110 Project 1 Programming Project Warm-Up Exercise

Chapter 6: DESCRIPTIVE STATISTICS

ELEC4042 Signal Processing 2 MATLAB Review (prepared by A/Prof Ambikairajah)

Measures of Dispersion

Prentice Hall Mathematics: Course Correlated to: Ohio Academic Content Standards for Mathematics (Grade 7)

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Exercise 1: Introduction to Stata

> glucose = c(81, 85, 93, 93, 99, 76, 75, 84, 78, 84, 81, 82, 89, + 81, 96, 82, 74, 70, 84, 86, 80, 70, 131, 75, 88, 102, 115, + 89, 82, 79, 106)

Exercise 2: Automata Theory

Statistical Graphics

Animations involving numbers

Introduction to Minitab 1

(Refer Slide Time: 06:01)

Animations that make decisions

Page 1. Graphical and Numerical Statistics

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

CITS2401 Computer Analysis & Visualisation

Data and Function Plotting with MATLAB (Linux-10)

Our Changing Forests Level 2 Graphing Exercises (Google Sheets)

CMSC 201 Spring 2017 Project 1 Number Classifier

n = 1 What problems are interesting when n is just 1?

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

Creating a Box-and-Whisker Graph in Excel: Step One: Step Two:

Statistics Worksheet 1 - Solutions

R practice. Eric Gilleland. 20th May 2015

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3

Transcription:

Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution locations. If this is a Windows machine there will probably be an ``R'' icon. Simply double click. If you are using a Linux machine there will also likely be an icon. If not, seek assistance in starting R as needed. Once you have started R you should see the following image: The > is the command line prompt where you will enter commands. Entering Data by Hand We will not worry about all the possible ways to import data into R and will cover different object structures (e.g., matrix, data frame, list) as needed through the semester. But the basic assignment of contents (e.g., numbers) to objects (i.e., named variables) in R is to type at the command line. For example, to assign the value of 5 to an object 1

called num1, type num1<-5 at the prompt. Then typing num1 should return the value 5. Do this, and your screen should look like: To create a vector, use the concatenate function c with numbers separated by commas. Type vec1<-c(5,10,12,56,72,33). Type vec2<-c(2,14,100,22,48,86). ). To concatenate these vectors type vec3<-c(vec1,vec2). To determine the length of a vector type length(vec1). To produce the sum of the elements of vec3 type sum(vec3). To obtain the dot product of vec1 and vec2 type sum(vec1*vec2). To obtain the dot product a more direct way type vec1%*%vec2. To list the objects that you have created type ls(). All of this produces a screen such as: 2

To get rid of objects type rm(num1,vec1,vec2,vec3). Simulating From a Normal Distribution Simulate 100 values from a normal distribution with the following command, as save in an object called j (I use such objects for things I don t really want to save as in junk, so I often end up with objects j, j1, j2, jj, etc. and I don t care if these get overwritten with something else after I m done with them). > j <- rnorm(100, mean=25, sd=4) The above command simulates 100 values from a normal distribution with mean (expected value) 25 and standard deviation 4. We ll talk more about all of these topics (normal distributions, expected values, and standard deviations) later, but I thought you might have heard of them. Check this by length(j) and you should get back the value 100. Now produce one of the basic summaries of numerical data called a Stem and Leaf Plot, and compute values of what is called a Five Number Summary (although the version in R actually is a 6 number summary) with commands: 3

> stem(j) > summary(j) You should now see (your numbers will be different since they have just been randomly generated, but they should be similar): To figure out for yourself what a stem-and-leaf plot is, compare the display to the vector of values (just type j ). It might be easier if we first ordered the values in j with > j[order(j)] Note that this leaves the object j unchanged. If we want to save the ordered version of j (again as the object j) we would type > j<-j[order(j)] Compare the ordered values of the object j with the stem-and-leaf display and make sure you understand what this display is doing. Out of the 100 values in j, how many are less than or equal to the value in the summary given as 1 st Qu (this is the first quartile). Find out with (using your own value of the first quartile): 4

> sum(j <= 21.68) The second quartile is also called the median and is given in summary by M. Figure out what the median is, and similarly the third quartile. To get the sample mean (or sample average) and the sample variance, use > mean(j) > var(j) The standard deviation is the square root of the variance, so compare the value of the sample mean to the 25 you used to simulate the values and the square root of the variance to the value of 4 you used to simulate (try sqrt(var(j)) to do this). Histograms and the Normal Density Curve Another common graphical display of data (technically the empirical distribution ) you have probably heard of is the Histogram. To get a histogram of the values in j use the command > hist(j, prob=t, nclass=5) A graphics window should automatically open and you will now be looking at (after perhaps re-sizing the graphics window): 5

This is a pretty crude picture of the empirical distribution (compare to the stem-and-leaf plot turned on its side). Increase the number of bins or classes in the histogram as > hist(j, prob=t, nclass=15) This may actually be too many (the top is pretty ragged). Try something intermediate, such as 10 classes. There is no correct number of classes to use. There are some rules of thumb, but they don t necessarily apply to every data set. In fact, your data may look nicer with 15 classes than 10, or with 20 than 15, or whatever. The vertical scale of the bars is essentially relative frequency of values falling into the bins or classes given by the bars. The option prob=t in the hist command arranges things so that the total area of the bars in the histogram sum to 1 (why this is a good idea comes later in the class). My values in the object j using 10 classes for the histogram looks like: 6

Now lets plot the theoretical density for a normal distribution with expected value 10 and standard deviation 4 (or variance 16) over the top of the empirical distribution of the 100 values in j as depicted by the histogram. We ll talk about what exactly density functions are at great length later, so don t worry if this is not clear to you yet. Just follow the steps below. To plot the theoretical density function over the histogram we use several steps: 1. Create a vector of values at which to evaluate the density function. Do this with the following syntax (but you may need values other than the 160 and 350): > us<-0.1*(160:350) What the object us contains are equally spaced (by 0.1) values from 16 to 35. I picked 16 because the minimum value of my j was 16.48 (see the results of the summary command above) and the maximum value of my j was 34.73. The part of the command 160:350 generates integers from 160 to 350 as a vector. This is then multiplied by 0.1 with 0.1*(160:350). 2. Evaluate the normal density function with expected value 25 and standard deviation 4 at each value in the vector us and save as a new object (say j2 ): 7

j2<- dnorm(us, mean=25, sd=4) 3. Plot the density as a curve by drawing lines through the points contained in the vectors us and j2 as > lines(us, j2) You should now have something like: Try this exercise again varying 1. The number of values simulated from the normal distribution (the first argument in rnorm(... ) 2. The standard deviation of the normal distribution (change the 4 to 2 or 20 or...) 3. The number of bins or classes used to construct the histogram. 8