DSCI 325: Handout 24 Introduction to Writing Functions in R Spring 2017

Similar documents
DSCI 325: Handout 4 If-Then Statements in SAS Spring 2017

Statistics 251: Statistical Methods

DSCI 325: Handout 3 Creating and Redefining Variable in SAS Spring 2017

Correlation. January 12, 2019

Writing Functions! Part I!

Statistical Programming Camp: An Introduction to R

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

ENGR Fall Exam 1

Getting Started in R

CSI33 Data Structures

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Midterm Examination , Fall 2011

Statistical Programming with R

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

EPIB Four Lecture Overview of R

Getting Started in R

Introduction to R, Github and Gitlab

Applied Calculus. Lab 1: An Introduction to R

simpler Using R for Introductory Statistics

What You ll Learn Today

A Brief Introduction to Scheme (I)

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

Statistics 133 Midterm Exam

Condition-Controlled Loop. Condition-Controlled Loop. If Statement. Various Forms. Conditional-Controlled Loop. Loop Caution.

S Basics. Statistics 135. Autumn Copyright c 2005 by Mark E. Irwin

CS164: Midterm I. Fall 2003

Handling Missing Values

Experiment 3 Microsoft Excel in Scientific Applications I

Assignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices

MAT 090 Brian Killough s Instructor Notes Strayer University

Programming R. Manuel J. A. Eugster. Chapter 4: Writing your own functions. Last modification on May 15, 2012

Stat 4510/7510 Homework 4

ROSE-HULMAN INSTITUTE OF TECHNOLOGY

Use the scantron sheet to enter the answer to questions (pages 1-6)

Advanced Econometric Methods EMET3011/8014

DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017

SPSS TRAINING SPSS VIEWS

Islamic University of Gaza Faculty of Engineering Computer Engineering Department

CS150 - Assignment 5 Data For Everyone Due: Wednesday Oct. 16, at the beginning of class

Name CptS 111, Spring 2018 Programming Assignment #5 Date Name of program Brief description and/or purpose of the program; include sources

Lecture 3: Basics of R Programming

1 Dynamic Programming

Lecture 10. Memory in Python

Lecture 3: Basics of R Programming

DSCI 325: Handout 18 Introduction to Graphics in R

Interactive Math Glossary Terms and Definitions

Monte Carlo Integration

Graphics Calculator Skill Drill Series

CIS 110 Introduction To Computer Programming. February 29, 2012 Midterm

CIS 110 Introduction to Computer Programming. 17 December 2012 Final Exam

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

S CHAPTER return.data S CHAPTER.Data S CHAPTER

RDGL Reference Manual

Interval Estimation. The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

Q1: Functions / 33 Q2: Arrays / 47 Q3: Multiple choice / 20 TOTAL SCORE / 100 Q4: EXTRA CREDIT / 10

050 0 N 03 BECABCDDDBDBCDBDBCDADDBACACBCCBAACEDEDBACBECCDDCEA

Module 10: Imperative Programming, Modularization, and The Future

Looping Subtasks. We will examine some basic algorithms that use the while and if constructs. These subtasks include

COS 116 The Computational Universe Laboratory 8: Digital Logic II

Math 355: Linear Algebra: Midterm 1 Colin Carroll June 25, 2011

Package cgh. R topics documented: February 19, 2015

UNIVERSITY OF TORONTO Faculty of Arts and Science. Midterm Sample Solutions CSC324H1 Duration: 50 minutes Instructor(s): David Liu.

The linear mixed model: modeling hierarchical and longitudinal data

Ch. 3 Review. Name: Class: Date: Short Answer

Uploading Scantron Scores to CourseWeb

Math 214 Introductory Statistics Summer Class Notes Sections 3.2, : 1-21 odd 3.3: 7-13, Measures of Central Tendency

Data 8 Final Review #1

Administrativia. CS107 Introduction to Computer Science. Readings. Algorithms. Expressing algorithms

Introduction to MATLAB Programming

Chislehurst and Sidcup Grammar School Mathematics Department Year 9 Programme of Study

School of Agriculture and Natural Resources

1 Pencil and Paper stuff

CSCI Compiler Design

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Repetition Structures

COS 226 Midterm Exam, Spring 2009

Concepts Review. 2. A program is the implementation of an algorithm in a particular computer language, like C and C++.

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution

Grade 6 Math Vocabulary

CS 455 Midterm Exam 1 Spring 2011 [Bono] Feb. 17, 2011

Part I { Getting Started & Manipulating Data with R

Computing With R Handout 1

CSE341 Spring 2017, Midterm Examination April 28, 2017

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Getting started with MATLAB

S206E Lecture 13, 5/22/2016, Grasshopper Math and Logic Rules

E7 University of California, Berkeley Fall Midterm Exam 10/07/2016 version:

9-1 GCSE Maths. GCSE Mathematics has a Foundation tier (Grades 1 5) and a Higher tier (Grades 4 9).

What is a Function? EF102 - Spring, A&S Lecture 4 Matlab Functions

C++ Programming Language Lecture 2 Problem Analysis and Solution Representation

Common R commands used in Data Analysis and Statistical Inference

R Programming: Worksheet 6

Object Oriented Programming. Java-Lecture 6 - Arrays

RSG online program documentation

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Grade 9 Surface Area and Volume

Transcription:

DSCI 325: Handout 24 Introduction to Writing Functions in R Spring 2017 We have already used several existing R functions in previous handouts. For example, consider the Grades dataset. Once the data frame has been attached in R, we can call the mean() function as follows to compute the average of the Final Exam scores. > mean(final) [1] 69.1791 We can also call the median() function. > median(final) [1] 72 Before we discuss writing new functions in R, let s examine an existing function more closely. For more insight into the coding behind the median() function, consider the following R commands and output. > median function (x, na.rm = FALSE) UseMethod("median") <bytecode: 0x000000000b265d20> <environment: namespace:stats> > methods(median) [1] median.default > median.default function (x, na.rm = FALSE) { if (is.factor(x) is.data.frame(x)) stop("need numeric data") if (length(names(x))) names(x) <- NULL if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[false][na]) n <- length(x) if (n == 0L) return(x[false][na]) half <- (n + 1L)%/%2L if (n%%2l == 1L) sort(x, partial = half)[half] else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L]) When determining how to use this function, note that this function contains two arguments: x and na.rm. For more information on these arguments, enter the following at the prompt. > help(median) 1

Description Compute the sample median. Usage median(x, na.rm = FALSE) Arguments x an object for which a method has been defined, or a numeric vector containing the values whose median is to be computed. na.rm a logical value indicating whether NA values should be stripped before the computation proceeds. Notice that by default, the na.rm argument is set to FALSE. This means that the following arguments both return the same result: > median(final) [1] 72 > median(final,na.rm=false) [1] 72 The previous example gives us some insight into how functions are programmed with arguments and called in R. You can learn a lot about R programming by examining the code behind functions that were written by others, and before you know it, you ll be writing your own. The remainder of this handout provides an introduction to the basics of writing your own functions in R. WRITING NEW FUNCTIONS IN R One of the advantages of using a scriptable language like R is the ability to write your own functions (or to modify existing functions). An R programmer can define their own functions using the function() and return() functions. Description These functions provide the base mechanisms for defining new functions in the R language. Usage function( arglist ) expr return(value) 2

Arguments arglist Empty or one or more name or name=expression terms. expr An expression. value An expression. Example 1: Creating a function to add two numbers Note that the sum() function already exists in R and can be used to add two numbers; for illustrative purposes, however, we will write our own simple function to do this. add_two <- function(a,b){ a+b To call the function, enter the following at the prompt. > add_two(3,9) [1] 12 Example 2: Creating a function to compute multiple quantities Next, suppose you want to write a function in R to both find the difference between two values and the ratio of those two values. You may start with the following. f1 <- function(a,b){ a-b a/b > f1(.5,.25) [1] 2 Note that the function returns the result of only the last of the computations. To report all of the results, you can use the return statement as shown below. f1 <- function(a,b){ return(c(a-b,a/b)) > f1(.5,.25) [1] 0.25 2.00 Finally, note that it may be more useful to assign names to the values calculated and to return a more flexible data type (such as a list object) to provide more information about the calculations that have been performed. The following programming statements return the results in a list. 3

f1 <- function(a,b){ Result1 = a-b Result2 = a/b list(difference=result1,ratio=result2) > f1(.5,.25) $Difference [1] 0.25 $Ratio [1] 2 Example 3: Computing the mean of a vector of numbers Once again, note that the mean() function already exists in R and can be used to find the average of a vector of numbers; for illustrative purposes, however, we will write our own simple function to do this. Suppose we want to name our function average. We can check to make sure that this is not already a keyword in R (we wouldn t want to overwrite an existing function!). >?average No documentation for average in specified packages and libraries: you could try??average Next, we can write the function as follows. average <- function(x){ s = sum(x) avg = s/length(x) return (avg) Note that this function can now be applied to calculate the average of the final exam scores from the Grades data frame. > attach(grades) > average(final) [1] 69.1791 Example 4: Computing the standard error of the mean and dealing with missing values Recall that the standard error of a sample mean is computed as follows: s n, where s is the sample standard deviation and n is the sample size. The R base package does not contain a function to compute this standard error, so let s write our own function named sem(). 4

sem <- function(x){ n = length(x) se = sd(x)/sqrt(n) return (se) > sem(final) [1] 1.756686 Next, import the data file Grades_missing. Note that in one case, a student did not take the final exam. > Grades_missing[1:3,c("FirstName", "LastName", "Final")] FirstName LastName Final 1 Aaron Albrecht 63 2 Aaron Alen 51 3 Abbey Antoff NA What impact will this have on using our sem() function? > sem(final) [1] NA Clearly, our function does not work with missing data. How do we fix it? First, let s remove the old function from our workspace. > rm(sem) Now, we can update our function. sem <- function(x){ n = length(na.omit(x)) se = sd(x,na.rm=true)/sqrt(n) return (se) > sem(final) [1] 1.76148 Example 5: Modifying the summary() function Recall that the summary() function can be used to calculate several summary statistics for the final exam scores. > detach(grades_missing) > attach(grades) > summary(final) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 60.25 72.00 69.18 83.00 100.00 5

What if you would also like to see the standard deviation, sample size, and standard error of the mean? You can write your own function to add to the output provided by the summary() function. Here, we will name this revised function num.summary. num.summary <- function(x){ original = summary(x) s = sd(x,na.rm=true) n=length(na.omit(x)) se=s/sqrt(n) list(c(original,"std Dev" = s, "Sample Size" = n, "SE of mean" = se)) > num.summary(final) [[1]] Min. 1st Qu. Median Mean 3rd Qu. Max. Std Dev Sample Size SE of mean.000000 60.250000 72.000000 69.180000 83.000000 100.000000 20.335106 134.000000 1.756686 Finally, note that you could change the layout of the output, if desired. num.summary <- function(x){ Min = summary(x)[1] Q1 = summary(x)[2] Q2 = summary(x)[3] Q3 = summary(x)[5] Max = summary(x)[6] Mean = summary(x)[4] s = sd(x,na.rm=true) n=length(na.omit(x)) se=s/sqrt(n) cat(paste("min: ", Min)) cat(paste("q1: ", Q1)) cat(paste("median: ", Q2)) cat(paste("q3: ", Q3)) cat(paste("max: ", Max)) cat(paste("mean: ", Mean)) cat(paste("standard Deviation: ", s)) cat(paste("sample Size: ", n)) cat(paste("standard Error: ", se)) > num.summary(final) Min: 0 Q1: 60.25 Median: 72 Q3: 83 Max: 100 Mean: 69.18 Standard Deviation: 20.3351064067899 Sample Size: 134 Standard Error: 1.7566856355663 6

Example 6: Modifying the plot() function Suppose you want the plot() function to use a smaller gray-filled circle as the default plotting symbol instead of an open circle. You can modify the existing plot()function as follows. Note that you can pass an unspecified number of parameters to a function using the notation; however, when using the notation, be sure to carefully consider the order of the arguments in the list. my.plot <- function(..., pch.new=21, bg.new='gray', cex.new=.75 ) { plot(..., pch=pch.new, bg=bg.new, cex=cex.new ) > my.plot(exam1,final) > plot(exam1,final) 7

Tasks: 1. Recall that the formula for a sphere is 4 3πr 3. Write a function named sphere.volume that returns the volume of a sphere when given the radius r as a parameter. Then, call the function to return the volume of a sphere with a radius of 5. Call the function again to return the volume of a sphere with a radius of 10. 2. Write a function named my.plot which calls the plot() function but uses a blue open circle as the default plotting symbol, instead. Use your function to obtain a scatterplot of Exam 1 vs. Exam 2 scores from the Grades data set. 3. Recall that the syntax for the by() function. Usage by(data, INDICES, FUN,..., simplify = TRUE) Arguments data an R object, normally a data frame, possibly a matrix. INDICES a factor or a list of factors, each of length nrow(data). FUN a function to be applied to (usually data-frame) subsets of data.... further arguments to FUN. Suppose you always mess up the order of the arguments because you think it makes more sense to list the arguments in this order, instead: (1) the function to be applied, (2) the data frame (or subset of a data frame) to which you want to apply the function, and (3) the BY factor. Write a function named my.by() which allows you to enter the arguments in the order you desire. Then, import the NYC_trees data set. Use your my.by() function to compute the mean Foliage Density for each Condition. 8