Statistical Data Analysis: R Tutorial

Similar documents
Statistical Data Analysis: Python Tutorial

Getting Started with R

R version ( ) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN

The Very Basics of the R Interpreter

A (very) brief introduction to R

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

R version ( ) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN

STA 250: Statistics Lab 1

Implementing Demography by Preston et al. (2001) as an R package

S Basics. Statistics 135. Autumn Copyright c 2005 by Mark E. Irwin

Statistics 251: Statistical Methods

Introduction to R. Daniel Berglund. 9 November 2017

simpler Using R for Introductory Statistics

TUTORIAL for using the web interface

A Tutorial for Excel 2002 for Windows

1 The SAS System 23:01 Friday, November 9, 2012

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

Lab 1 Introduction to R

MT4 ANDROID USER GUIDE

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

Bjørn Helge Mevik Research Computing Services, USIT, UiO

Package weco. May 4, 2018

Introduction to R. Stat Statistical Computing - Summer Dr. Junvie Pailden. July 5, Southern Illinois University Edwardsville

Applied Calculus. Lab 1: An Introduction to R

Chapter 9 Common R Mistakes

Introduction to R for Time Series Analysis Lecture Notes 2

Week 8. Big Data Analytics Visualization with plotly for R

An Introduction to R 1.1 Getting started

Chapter 4: Programming with MATLAB

CS1114: Matlab Introduction

MT4 ANDROID USER GUIDE USER GUIDE

Move It! TEACHER NOTES. Activity Overview. Topic: Geometry. Teacher Preparation and Notes

User Guide. Wiley Online Library OVERVIEW. Keywords. Browser Compatibility. Refine Search. Content on WOL. Advanced Search.

INTRODUCTION TO MATLAB, SIMULINK, AND THE COMMUNICATION TOOLBOX

Part 1: Getting Started

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Assignment: Seminole Movie Connection

Administrator Ready Reference Guide. Examples of Webpages Promoting How to Usage of RefWorks

Stochastic Models. Introduction to R. Walt Pohl. February 28, Department of Business Administration

Introduction to MATLAB

MarketLine Advantage

Package cgh. R topics documented: February 19, 2015

PARTICIPANT CENTER HOW-TO GUIDE

Perfect Analysis A User Guide

In math, the rate of change is called the slope and is often described by the ratio rise

BIO5312: R Session 1 An Introduction to R and Descriptive Statistics

SubFinder for Administrators. Internet Users Guide for SubFinder 5.9

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

How to import literature or references into RefWorks

Week - 01 Lecture - 04 Downloading and installing Python

Grace days can not be used for this assignment

Programming Fundamentals!

Example 1: Give the coordinates of the points on the graph.

Package HMMCont. February 19, 2015

UNLEASH THE POWER OF SCRIPTING F A C T S H E E T AUTHOR DARREN HAWKINS AUGUST 2015

RESEARCH METHODS IN COUNSELING - CNS 5000

Once you define a new command, you can try it out by entering the command in IDLE:

Numerical Methods 5633

A 30 Minute Introduction to Octave ENGR Engineering Mathematics Tony Richardson

MAS115 R programming, Lab Class 5 Prof. P. G. Blackwell

Requirements Gathering using Object- Oriented Models UML Class Diagram. Reference:

Getting started with MATLAB

LUISSearch LUISS Institutional Open Access Research Repository

Bus 41202: Analysis of Financial Time Series Spring 2016, Ruey S. Tsay. Pre-class reading assignment: The R Program and Some Packages

BIOL 417: Biostatistics Laboratory #3 Tuesday, February 8, 2011 (snow day February 1) INTRODUCTION TO MYSTAT

Computational Modelling 102 (Scientific Programming) Tutorials

UNIVERSITY OF TORONTO DEPARTMENT OF ECONOMICS St. GEORGE CAMPUS. ECO358 Financial Economics I (Asset Pricing) Course Outline Summer 2013

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

STATISTICAL TECHNIQUES. Interpreting Basic Statistical Values

The BERGEN Plug-in for EEGLAB

EPIB Four Lecture Overview of R

Statistics 13, Lab 1. Getting Started. The Mac. Launching RStudio and loading data

The R Language. How to find scientific truths buried between the spots

Programming Exercise 1: Linear Regression

Chapter 1 Introduction to MATLAB

ELEC ENG 4CL4 CONTROL SYSTEM DESIGN

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website:

Exploring and Understanding Data Using R.

Practical Exercise: Review Manager 5.1

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Assignment 0. Nothing here to hand in

The nor1mix Package. August 3, 2006

Advanced Econometric Methods EMET3011/8014

FIS Client Point Getting Started Guide

SyncFirst Standard. Quick Start Guide User Guide Step-By-Step Guide

Training Guide. Fees and Invoicing. April 2011

AN INTRODUCTION TO MATLAB

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

Information Retrieval

Why must we use computers in stats? Who wants to find the mean of these numbers (100) by hand?

An Introductory User Guide for Students

Mobile Banking. Bank wherever the Kina app takes you!

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Scheduling WebEx Meetings

Bernt Arne Ødegaard. 15 November 2018

Learning PHP, MySQL, JavaScript, And CSS: A Step-by-Step Guide To Creating Dynamic Websites PDF

Tutorial: SeqAPass Boxplot Generator

[PDF] SEO Toolbook: Directory Of Free Search Engine Optimization Tools

Secrets of Profitable Freelance Writing

Lec 8: Adaptive Information Retrieval 2

Transcription:

1 August 29, 2012 Statistical Data Analysis: R Tutorial Dr A. J. Bevan, Contents 1 Getting started 1 2 Basic calculations 2 3 More advanced calulations 3 4 Plotting data 4 5 The quantmod package 5 1 Getting started R is an open source statistical programming language which can be downloaded from the following website: http : //www.r project.org/ This tool may be useful for this course, and some sectors value familiarity with R. You may wish to use these tutorials to introduce yourself to this package to bolster your CV. In these notes type in bold is entered into a command line terminal, and itallicised text appears on the command line as a result. In order to start R from a computer that has this programme already installed type R This should result in the following output being displayed on the command line terminal a.j.bevan at qmul.ac.uk

2 R version 2.15.0 (2012-03-30) Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86 64-apple-darwin9.8.0/x86 64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type license() or licence() for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type contributors() for more information and citation() on how to cite R or R packages in publications. Type demo() for some demos, help() for on-line help, or help.start() for an HTML browser interface to help. Type q() to quit R. In order to quit an R session one simply enters q() at the prompt. More information on the use of R can be found online at the R project website and as the result of performing a google search for R. In addition to this there are a number of books available on the use of this package. 2 Basic calculations You can perform simple arithmetic in R in a logical way using operators akin to any other programming language. For example 3*5 [1] 15 3/2 [1] 1.5 7-2+1 [1] 6 In addition to this one can define arrays of numbers and perform simple calculations on these sets of data. The following example creates an ordered list of data in an array with variable name x, and computes the arithmetic mean, variance and standard deviation for these data. x < c(1,2,3,4,5,6) print(x) [1] 1 2 3 4 5 6 mean(x) [1] 3.5 var(x) [1] 3.5 sd(x) [1] 1.870829

3 There is a command line help function that enables you to find out more about a particular R command. For example if you wish to query the help for the sd function simply enter help(sd) at the R prompt. This will result in the following output. Standard Deviation Description: This function computes the standard deviation of the values in x. If na.rm is TRUE then missing values are removed before computation proceeds. Usage: sd(x, na.rm = FALSE)... Exercises 1. Define the variables θ = 3.2 and φ = 1.7 and print out these variables. 2. Compute θ + φ. 3. Compute θ φ. 4. Compute θφ. 5. Compute θ/φ. 6. Define an array x containing the prime numbers between 1 and 20. 7. Print out the array x. 8. Compute the arithmetic mean, variance and standard deviation of x. 3 More advanced calulations It is possible to perform more advanced calculations using additional functionality in R. For example if you wanted to compute the standard deviation without using the in-buit function sd(x), for some data x you can do so via the following steps x=c(0.5,0.9,1.2,1.5,1.8,2.0,3.4,4.1,5.0,5.1,7.5,8.5) meanx = mean(x) stdevx = sqrt( sum( (x - meanx)ˆ 2 ) / (sum(!is.na( x ) ) -1 )) The first line defines the data sample, the second comutes the arithmetic mean of the data using the mean function, and the third line computes the value of the sum over all (x i x) 2 terms and normalises that by N 1. The term!is.na( x ) is a boolean check that the value of each data point is defined as a number, as opposed to NaN (Not a Number), so that the denominator is simply the sum over the number of valid data, less one (to include Bessel s correction for an unbiased variance). Several features of R have been introduced in this example Operations on sets of data stored in an array without the need to iterate through the elements. This can be seen with the term x-meanx, where the scalar value meanx is subtracted from each element in x.

4 The ability to sum over an array of data is computed using the sum function. One can check if a number is well defined using is.na(x). If you ve not encountered NaN before in computing, the following provides a simple illustration: y = 0/0 print(y) [1] NaN is.na(y) [1] TRUE A number is NaN if it is ill defined such as the ratio of some number divided by zero. This brief introduction should provide you with a good starting point for using R to analyse data. There is a lot more that you can do with this programme, and some additional information that you may find useful for project work are introduced in the rest of this brief note. If you want to learn more about how to use R you are advised to look online or check the library for books on this topic. Exercises 1. Using the data set Ω = {1, 1.2, 1.5, 1.7, 2.3} compute the arithmetic mean without resorting to the built in functions within R. 2. Using the data set Ω = {1, 1.2, 1.5, 1.7, 2.3} compute the standard deviation without resorting to the built in functions within R. 3. Using the data set Ω = {1, 1.2, 1.5, 1.7, 2.3} compute the skew without resorting to the built in functions within R. 4 Plotting data The plot command can be used to plot data - however one should always strive to produce a meaningful figure with appropriate axis labels so that it is obvious as to what has been plotted. Given some data stored in an array one can simply plot the data as follows x < c(1,2,3,4,5,6) plot(x) This will produce a simple graph with the index number along the horizontal axis and the data corresponding to the value of each index plotted on the vertical axis. Having produced such a plot the next step is to be able to format the axis labels and title of the plot so that is it suitable for external consumption. At the time of plotting one can specify a number of options including main and sub-titles as well as axis labels. For example x < c(1,2,3,4,5,6) plot(x, main= Main Title, sub= A sub title, xlab= x axis label, ylab= y axis label ) will produce a plot like the one shown in Figure 1. It is possible to print out pdf files in R by opening a file (by default if no filename is specified then a file called Rplots.pdf will be created. Having opened the pdf file one can plot data, and finally close the file. The closure of file is done using the dev.off() command. The following example creates a pdf file of the plot shown in Figure 1. i.e. suitable for someone other than the creator of the plot to view.

5 Main Title y axis label 1 2 3 4 5 6 1 2 3 4 5 6 x axis label A sub title Figure 1: An example plot created using the R commands given in the text. pdf() plot(x, main= Main Title, sub= A sub title, xlab= x axis label, ylab= y axis label ) dev.off() Exercises 1. Create a plot of the following data: Ω = {1, 1.2, 1.5, 1.7, 2.3} as a function of the index of each data point. 2. Add axis labels to the plot you ve created. 3. Print the plot to a file - try giving this file a name (pass a string argument to the pdf command). 5 The quantmod package The quantmod package is a Quantitative Financial Modelling & Trading Framework for R. This package is described in full on the package web page http : //www.quantmod.com/ First in order to use quantmod the package has to be installed. This can be done using install.packages( quantmod ) Once installed it is possible to download share prices for further manipulation. For example if you wanted to view the share price of LLoyds, the trading symbol on the FTSE100 is LLOY.L and one can download

6 share prices into a local variable called LLOY.L and compute the return on this stock as the local variable LLOYDS using the following code snippet. require(quantmod) setdefaults(getsymbols, src= yahoo ) getsymbols( LLOY.L ) plot(lloy.l) LLOYDS < dailyreturn(lloy.l) plot(lloyds) The function plot can be used to plot the data on screen as was the case for the share price and the daily return above. Figure 2 shows the distributions of share price and daily return plotted for the requested period (back to January 2007 when the yahoo data became available). Figure 2: The share price (left) and daily return (right) for Lloyds TSB. The data were extracted using the quantmod package. The quantmod package has a lot more functionality than this simple illustration and anyone interested in learning more should refer to the documentation and examples found on the web. More information (including extensive source of documentation) can be found on the R webpage.