Risk Management Using R, SoSe 2013

Size: px
Start display at page:

Download "Risk Management Using R, SoSe 2013"

Transcription

1 1. Problem (vectors and factors) a) Create a vector containing the numbers 1 to 10. In this vector, replace all numbers greater than 4 with 5. b) Create a sequence of length 5 starting at 0 with an increment step of 2.5. c) Create a sequence from 1 to 5 and repeat each element 4 times. d) Create a vector called x with the letters "a", "b", "c", "d" and "e". The vector should be of length 30, and the frequencies randomly chosen by the sample function. Remove all "e" letters from x. The functions which, == or!= may be useful. e) Make the x vector a factor, i.e., use as.factor. Remove the "d" level completely. This means that levels(x) should give you "a" "b" "c" and not "a" "b" "c" "d". You may want to consider the help page of the factor function. Rename the levels "a", "b" and "c" to "A", "B" and "C". Fusion the levels "A" and "B" into a single category called "one" and rename level "C" into "two". Solution ## a) a <- 1:10 a[a>4] <- 5 a # seq(1,10) ## [1] ## b) seq(0, length=5, by=2.5) ## [1] ## c) rep(1:5, each=4) ## [1] ## d) set.seed(123) x <- sample(letters[1:5], size=30, replace=true) x <- x[x!="e"] # x[-which(x=='e')] x ## [1] "b" "d" "c" "a" "c" "c" "c" "c" "d" "c" "a" "b" "a" "b" "d" "d" "d" ## [18] "d" "c" "c" "b" "a" ## e) xf <- as.factor(x) xf <- xf[xf!="d"] levels(xf) # level 'd' is still there Nikolay Robinzonov 1 14th June 2013

2 ## [1] "a" "b" "c" "d" xf <- xf[,drop=true] levels(xf) # level 'd' is gone ## [1] "a" "b" "c" xf <- factor(xf, labels=c("a","b", "C")) xf ## [1] B C A C C C C C A B A B C C B A ## Levels: A B C ## one option with the 'car' package library("car") recode(xf, "c('a','b')='one'; else='two'") ## [1] one two one two two two two two one one one one two two one one ## Levels: one two ## or alternatively xx <- list(one=c("a","b"), two=c("c")) levels(xf) <- xx xf ## [1] one two one two two two two two one one one one two two one one ## Levels: one two 2. Problem (matrices) a) Sample 9 times from the normal distribution rnorm with mean 10 and variance 4. Call this vector x. Make your results reproducible, e.g., consider the set.seed function. b) Create a matrix X from x with 3 columns. c) Switch the first two columns in X and calculate the column- and the row-average values (mean), as well as the standard deviation (sd). Make use of the apply function. d) Using the functions lower.tri, upper.tri and diag assign to your X matrix the following values: X = e) Compute X +X, X +10I 3. Can you also compute (X X) 1 and (X +X ) 1, i.e., use the solve function? Do you encounter any problems and what would be the statistician s solution? Solution ## a) set.seed(123) x <- rnorm(9, mean=10, sd=2) Nikolay Robinzonov 2 14th June 2013

3 ## b) X <- matrix(x, ncol=3) ## c) X <- X[,c(2,1,3)] colmeans(x) # column switch ## [1] apply(x, 2, mean) ## [1] rowmeans(x) ## [1] apply(x, 1, mean) ## [1] apply(x, 1, sd) ## [1] apply(x, 2, sd) ## [1] ## d) X[lower.tri(X)] <- 1 X[upper.tri(X)] <- 3 diag(x) <- 2 X ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] ## e) X + t(x) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] X + 10 * diag(3) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] solve(x %*% X) Nikolay Robinzonov 3 14th June 2013

4 ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] solve(x + t(x)) # X + t(x) is not invertible (singular) ## Error: Lapack routine dgesv: system is exactly singular: U[2,2] = 0 XX <- X + t(x) + diag(0.01, nrow(x)) solve(xx) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] Problem (dates & times) (a) Make a sequence x of dates starting at January 1, 2013 and ending 90 days later. Extract all Fridays from x. (b) Find out the date lying exactly 10,000 days in the past? Use Sys.time() for the current date. (c) (d) Solution i) Load the Rweek.mat using the readmat() function from the R.matlab package. ii) The file contains weekly (Thursday to Thursday) percentage net returns of the S&P 500, FTSE, and DAX indices over the period from January 1984 to August Assign the data to a data.frame called rweek, name the columns appropriately and add an additional column indicating the dates. Get the returns in 2004 only. i) Load the daxmin.dat data containing DAX minute observations in the time interval March 20-27, Use the paste and the strptime functions to obtain a vector with the date/time information. Create a data.frame called daxini containing the date/time vector and the minute levels of DAX. Plot the times series. ii) Using the function aggregate obtain the highest and the lowest daily observations. Do the same with the cast function from the reshape package. Afterwards, obtain the highest DAX levels per hour for each day. iii) Compute the squared percentage log-returns and plot the median values per minute. Try to depict the previous figure as close as possible to Figure 1 ## a) x <- seq(from=as.date(" "), length=90, by="day") x[weekdays(x) == "Friday"] ## [1] " " " " " " " " " " ## [6] " " " " " " " " " " ## [11] " " " " " " ## b) as.date(sys.time()) Nikolay Robinzonov 4 14th June 2013

5 Median of the squared log returns NYSE opens 09:30 11:00 12:30 14:00 15:30 17:00 Figure 1: Median value per minute of the squared log-returns of the German DAX for the period March 20-27, ## [1] " " as.date(sys.time() * 60 * 60 * 24) ## [1] " " ## c) library(r.matlab, quietly=true) datei <- readmat("rweek.mat") rweek <- data.frame(datei$rweek) names(rweek) <- c("sp500", "ftse", "dax") ## add a "time" column time <- seq(as.date(" "), by="week", length=nrow(rweek)) rweek$time <- time ## get the 2004 ind <- format(rweek$time, "%Y") == "2004" ## rweek[ind,] ## d) daxini <- read.table(file="daxmin.dat", header = FALSE) x <- paste(daxini[,1], daxini[,2]) z <- strptime(x, format = "%Y%m%d %H:%M") dax <- data.frame(datetime = z, val = daxini[,3]) plot(dax$datetime, dax$val, type="l") # overnight jumps Nikolay Robinzonov 5 14th June 2013

6 dax$val Mar 21 Mar 23 Mar 25 Mar 27 dax$datetime plot(1:nrow(dax), dax$val, type="l") # fix gaps dax$val :nrow(dax) ## min and max values per day dax$day <- as.date(dax$datetime) dax$day <- as.factor(dax$day) ## tapply(dax$val, dax$day, max) ## aggregate(list(max=dax$val), list(date=dax$day), max) aggregate(list(rng=dax$val), list(date=dax$day), range) ## Date Rng.1 Rng.2 ## ## ## ## ## ## library(reshape) cast(dax, day ~., value = "val", range) Nikolay Robinzonov 6 14th June 2013

7 ## day X1 X2 ## ## ## ## ## ## ## per hour per day dax$hour <- factor(format(dax$datetime, "%H")) cast(dax, hour ~ day, value = "val", max) ## hour ## ## ## ## ## ## ## ## ## ## aggregate(list(max=dax$val), list(hour=dax$hour, Date=dax$day), max) ## tapply(dax$val, list(dax$hour, dax$day), max) ## log-returns dax$ret <- c(na, diff(log(dax$val))) * 100 dax$sqret <- dax$ret^2 # squared log-returns dax$min <- format(dax$datetime, "%H:%M") mmin <- cast(dax, min ~., value = "sqret", median) # median level per minute plot(1:nrow(mmin), mmin[,2], type="l", axes=false, xlab="", ylab="median of the squared log-returns") axis(2) ii <- grep("*30$ *00$", mmin[,1]) axis(1, at=(1:nrow(mmin))[ii], labels=mmin[ii,1]) nyx <- which(mmin[,1]=="15:30") # x-coord. nyy <- quantile(na.omit(mmin[,2]), 0.999) # y-coord. abline(v=nyx, col=2, lty=2) text(nyx, nyy, "NYSE opens") Median of the squared log returns NYSE opens 09:30 11:00 12:30 14:00 15:30 17:00 Nikolay Robinzonov 7 14th June 2013

8 4. Problem (functions) (a) Create an empty list mylist, and a numeric vector x using set.seed(1234) mylist <- list() x <- sample(1:500, size=20) Using a for-loop assign five new elements to mylist which contain selected values from x according to the following rule: the first element contains all x i [1, 100], the second element contains all x i [101, 200] and so on until the fifth element containing all x i [401, 500]. Sort the values of each element in ascending order. (b) Write a function for computing the density of the normal distribution and compare it to the standard dnorm function. The densitiy of the normal distribution is defined as f (x; µ, σ 2 ) = 1 (x µ)2 e 2σ 2. (1) 2πσ 2 (c) The function embed is very useful for time series analysis since it returns the lagged values of multivariate time-series. Try for example embed(cbind(1:10,101:110), 3) or embed(1:10, 4) to understand what it makes. Write another function, say embed2, which returns the lagged values of a multivariate time-series up to a given lag-length p. This new function should assign informative column names indicating the names of the time-series and the respective lags. When applied to the rweek data set head(rweek) ## sp500 ftse dax ## ## ## ## ## ## the output should look similar to embed2(rweek, resp="dax", lag=2) ## dax sp500.l1 sp500.l2 ftse.l1 ftse.l2 dax.l1 dax.l2 ## ## ## ## ## ## (d) Using the rweek data set, the previous function embed2, and the function lm fit the following model y DAXt = β 0 +β 1 y DAX,t 1 +β 2 y DAX,t 2 +β 3 y sp500,t 1 +β 4 y sp500,t 2 +β 5 y ftse,t 1 +β 6 y ftse,t 2 +ε t (2) where ε t N(0, σ 2 ). Nikolay Robinzonov 8 14th June 2013

9 Solution ## a) for(i in 1:5){ z <- x[x > (i-1) * & x<= i * 100] mylist[[i]] <- sort(z) } ## b) mynorm <- function(x, mu=0, sigma=1) 1/sqrt(2*pi*sigma^2) * exp(-(x-mu)^2/(2*sigma^2)) mynorm(0.3) ## [1] dnorm(0.3) ## [1] ## c) embed2 <- function(y, resp, lag = 1){ Names <- colnames(y) P <- lag + 1 res <- list() for(z in Names){ x <- as.matrix(y[,z, drop=false]) xlagged <- embed(x, P) colnames(xlagged) <- c(z, paste(z, 1:lag, sep = ".L")) if(z == resp) yresp <- xlagged[, 1, drop=false] xlagged <- xlagged[,2:p, drop=false] res[[z]] <- xlagged } res <- do.call(cbind, res) res <- cbind(yresp,res) if(is.data.frame(y)) res <- as.data.frame(res) res } rweek <- rweek[,-4] round(head(embed2(rweek, resp="dax", lag=2)),2) ## dax sp500.l1 sp500.l2 ftse.l1 ftse.l2 dax.l1 dax.l2 ## ## ## ## ## ## ## d) dat <- embed2(rweek, resp="dax", lag=2) fit <- lm(dax ~., data=dat) summary(fit) ## ## Call: ## lm(formula = dax ~., data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: Nikolay Robinzonov 9 14th June 2013

10 ## Estimate Std. Error t value Pr(> t ) ## (Intercept) ** ## sp500.l * ## sp500.l ## ftse.l ## ftse.l ## dax.l ## dax.l ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.98 on 1118 degrees of freedom ## Multiple R-squared: ,Adjusted R-squared: ## F-statistic: 1.11 on 6 and 1118 DF, p-value: Nikolay Robinzonov 10 14th June 2013

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting

More information

Introduction to R, Github and Gitlab

Introduction to R, Github and Gitlab Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Stat 5303 (Oehlert): Response Surfaces 1

Stat 5303 (Oehlert): Response Surfaces 1 Stat 5303 (Oehlert): Response Surfaces 1 > data

More information

Simulating power in practice

Simulating power in practice Simulating power in practice Author: Nicholas G Reich This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

1 Lab 1. Graphics and Checking Residuals

1 Lab 1. Graphics and Checking Residuals R is an object oriented language. We will use R for statistical analysis in FIN 504/ORF 504. To download R, go to CRAN (the Comprehensive R Archive Network) at http://cran.r-project.org Versions for Windows

More information

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 4.1: Time Series I Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Time Series Data and Dependence Time-series data are simply a collection of observations gathered

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

Estimating R 0 : Solutions

Estimating R 0 : Solutions Estimating R 0 : Solutions John M. Drake and Pejman Rohani Exercise 1. Show how this result could have been obtained graphically without the rearranged equation. Here we use the influenza data discussed

More information

R Programming Basics - Useful Builtin Functions for Statistics

R Programming Basics - Useful Builtin Functions for Statistics R Programming Basics - Useful Builtin Functions for Statistics Vectorized Arithmetic - most arthimetic operations in R work on vectors. Here are a few commonly used summary statistics. testvect = c(1,3,5,2,9,10,7,8,6)

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Among those 14 potential explanatory variables,non-dummy variables are:

Among those 14 potential explanatory variables,non-dummy variables are: Among those 14 potential explanatory variables,non-dummy variables are: Size: 2nd column in the dataset Land: 14th column in the dataset Bed.Rooms: 5th column in the dataset Fireplace: 7th column in the

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Elements of a programming language 3

Elements of a programming language 3 Elements of a programming language 3 Marcin Kierczak 21 September 2016 Contents of the lecture variables and their types operators vectors numbers as vectors strings as vectors matrices lists data frames

More information

AA BB CC DD EE. Introduction to Graphics in R

AA BB CC DD EE. Introduction to Graphics in R Introduction to Graphics in R Cori Mar 7/10/18 ### Reading in the data dat

More information

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website:

R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: Introduction to R R R is a programming language of a higher-level Constantly increasing amount of packages (new research) Free of charge Website: http://www.r-project.org/ Code Editor: http://rstudio.org/

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance.

Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance. Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance. Soil Security Laboratory 2018 1 Universal kriging prediction variance In

More information

Getting Started in R

Getting Started in R Getting Started in R Phil Beineke, Balasubramanian Narasimhan, Victoria Stodden modified for Rby Giles Hooker January 25, 2004 1 Overview R is a free alternative to Splus: a nice environment for data analysis

More information

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center

Introduction to R. Nishant Gopalakrishnan, Martin Morgan January, Fred Hutchinson Cancer Research Center Introduction to R Nishant Gopalakrishnan, Martin Morgan Fred Hutchinson Cancer Research Center 19-21 January, 2011 Getting Started Atomic Data structures Creating vectors Subsetting vectors Factors Matrices

More information

Getting Started in R

Getting Started in R Getting Started in R Giles Hooker May 28, 2007 1 Overview R is a free alternative to Splus: a nice environment for data analysis and graphical exploration. It uses the objectoriented paradigm to implement

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

Analysis of variance - ANOVA

Analysis of variance - ANOVA Analysis of variance - ANOVA Based on a book by Julian J. Faraway University of Iceland (UI) Estimation 1 / 50 Anova In ANOVAs all predictors are categorical/qualitative. The original thinking was to try

More information

S CHAPTER return.data S CHAPTER.Data S CHAPTER

S CHAPTER return.data S CHAPTER.Data S CHAPTER 1 S CHAPTER return.data S CHAPTER.Data MySwork S CHAPTER.Data 2 S e > return ; return + # 3 setenv S_CLEDITOR emacs 4 > 4 + 5 / 3 ## addition & divison [1] 5.666667 > (4 + 5) / 3 ## using parentheses [1]

More information

Math 263 Excel Assignment 3

Math 263 Excel Assignment 3 ath 263 Excel Assignment 3 Sections 001 and 003 Purpose In this assignment you will use the same data as in Excel Assignment 2. You will perform an exploratory data analysis using R. You shall reproduce

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Statistics Lab #7 ANOVA Part 2 & ANCOVA Statistics Lab #7 ANOVA Part 2 & ANCOVA PSYCH 710 7 Initialize R Initialize R by entering the following commands at the prompt. You must type the commands exactly as shown. options(contrasts=c("contr.sum","contr.poly")

More information

Continuous-time stochastic simulation of epidemics in R

Continuous-time stochastic simulation of epidemics in R Continuous-time stochastic simulation of epidemics in R Ben Bolker May 16, 2005 1 Introduction/basic code Using the Gillespie algorithm, which assumes that all the possible events that can occur (death

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

1 Standard Errors on Different Models

1 Standard Errors on Different Models 1 Standard Errors on Different Models Math 158, Spring 2018 Jo Hardin Regression Splines & Smoothing/Kernel Splines R code First we scrape some weather data from NOAA. The resulting data we will use is

More information

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO

The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO The Data The following dataset is from Hastie, Tibshirani and Friedman (2009), from a studyby Stamey et al. (1989) of prostate

More information

Solution to Bonus Questions

Solution to Bonus Questions Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample

More information

Section 2.2: Covariance, Correlation, and Least Squares

Section 2.2: Covariance, Correlation, and Least Squares Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 A Deeper

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

Bernt Arne Ødegaard. 15 November 2018

Bernt Arne Ødegaard. 15 November 2018 R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,

More information

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

References R's single biggest strenght is it online community. There are tons of free tutorials on R. Introduction to R Syllabus Instructor Grant Cavanaugh Department of Agricultural Economics University of Kentucky E-mail: gcavanugh@uky.edu Course description Introduction to R is a short course intended

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

Linear Modeling with Bayesian Statistics

Linear Modeling with Bayesian Statistics Linear Modeling with Bayesian Statistics Bayesian Approach I I I I I Estimate probability of a parameter State degree of believe in specific parameter values Evaluate probability of hypothesis given the

More information

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS

STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS STATISTICAL LABORATORY, April 3th, 21 BIVARIATE PROBABILITY DISTRIBUTIONS Mario Romanazzi 1 MULTINOMIAL DISTRIBUTION Ex1 Three players play 1 independent rounds of a game, and each player has probability

More information

Biostatistics 615/815 Lecture 22: Matrix with C++

Biostatistics 615/815 Lecture 22: Matrix with C++ Biostatistics 615/815 Lecture 22: Matrix with C++ Hyun Min Kang December 1st, 2011 Hyun Min Kang Biostatistics 615/815 - Lecture 22 December 1st, 2011 1 / 33 Recap - A case for simple linear regression

More information

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive, R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely

More information

An Introductory Guide to R

An Introductory Guide to R An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics

More information

Stat 579: More Preliminaries, Reading from Files

Stat 579: More Preliminaries, Reading from Files Stat 579: More Preliminaries, Reading from Files Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu September 1, 2011, 1/10 Some more

More information

Practice for Learning R and Learning Latex

Practice for Learning R and Learning Latex Practice for Learning R and Learning Latex Jennifer Pan August, 2011 Latex Environments A) Try to create the following equations: 1. 5+6 α = β2 2. P r( 1.96 Z 1.96) = 0.95 ( ) ( ) sy 1 r 2 3. ˆβx = r xy

More information

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona

Introduction to R. Hao Helen Zhang. Fall Department of Mathematics University of Arizona Department of Mathematics University of Arizona hzhang@math.aricona.edu Fall 2019 What is R R is the most powerful and most widely used statistical software Video: A language and environment for statistical

More information

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison 610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison R is very touchy about unbalanced designs, partly because

More information

Standard Errors in OLS Luke Sonnet

Standard Errors in OLS Luke Sonnet Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7

More information

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive

More information

Multivariate Normal Random Numbers

Multivariate Normal Random Numbers Multivariate Normal Random Numbers Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Options... 4 Analysis Summary... 5 Matrix Plot... 6 Save Results... 8 Calculations... 9 Summary This procedure

More information

Cross-Validation Alan Arnholt 3/22/2016

Cross-Validation Alan Arnholt 3/22/2016 Cross-Validation Alan Arnholt 3/22/2016 Note: Working definitions and graphs are taken from Ugarte, Militino, and Arnholt (2016) The Validation Set Approach The basic idea behind the validation set approach

More information

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models STAT 5200 Handout #25 R-Square & Design Matrix in Mixed Models I. R-Square in Mixed Models (with Example from Handout #20): For mixed models, the concept of R 2 is a little complicated (and neither PROC

More information

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec

Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Calibration of Quinine Fluorescence Emission Vignette for the Data Set flu of the R package hyperspec Claudia Beleites CENMAT and DI3, University of Trieste Spectroscopy Imaging,

More information

Package StVAR. February 11, 2017

Package StVAR. February 11, 2017 Type Package Title Student's t Vector Autoregression (StVAR) Version 1.1 Date 2017-02-10 Author Niraj Poudyal Maintainer Niraj Poudyal Package StVAR February 11, 2017 Description Estimation

More information

Package uclaboot. June 18, 2003

Package uclaboot. June 18, 2003 Package uclaboot June 18, 2003 Version 0.1-3 Date 2003/6/18 Depends R (>= 1.7.0), boot, modreg Title Simple Bootstrap Routines for UCLA Statistics Author Maintainer

More information

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one. Probability and Statistics Chapter 2 Notes I Section 2-1 A Steps to Constructing Frequency Distributions 1 Determine number of (may be given to you) a Should be between and classes 2 Find the Range a The

More information

Salary 9 mo : 9 month salary for faculty member for 2004

Salary 9 mo : 9 month salary for faculty member for 2004 22s:52 Applied Linear Regression DeCook Fall 2008 Lab 3 Friday October 3. The data Set In 2004, a study was done to examine if gender, after controlling for other variables, was a significant predictor

More information

Introduction to the R Statistical Computing Environment R Programming: Exercises

Introduction to the R Statistical Computing Environment R Programming: Exercises Introduction to the R Statistical Computing Environment R Programming: Exercises John Fox (McMaster University) ICPSR 2014 1. A straightforward problem: Write an R function for linear least-squares regression.

More information

Package SSLASSO. August 28, 2018

Package SSLASSO. August 28, 2018 Package SSLASSO August 28, 2018 Version 1.2-1 Date 2018-08-28 Title The Spike-and-Slab LASSO Author Veronika Rockova [aut,cre], Gemma Moran [aut] Maintainer Gemma Moran Description

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

that is, Data Science Hello World.

that is, Data Science Hello World. R 4 hackers Hello World that is, Data Science Hello World. We got some data... Sure, first we ALWAYS do some data exploration. data(longley) head(longley) GNP.deflator GNP Unemployed Armed.Forces Population

More information

Simulating Multivariate Normal Data

Simulating Multivariate Normal Data Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples

More information

Handling Missing Values

Handling Missing Values Handling Missing Values STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 Missing Values 2 Introduction

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

The supclust Package

The supclust Package The supclust Package May 18, 2005 Title Supervised Clustering of Genes Version 1.0-5 Date 2005-05-18 Methodology for Supervised Grouping of Predictor Variables Author Marcel Dettling and Martin Maechler

More information

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set Fitting Mixed-Effects Models Using the lme4 Package in R Deepayan Sarkar Fred Hutchinson Cancer Research Center 18 September 2008 Organizing data in R Standard rectangular data sets (columns are variables,

More information

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

Logistic Regression. (Dichotomous predicted variable) Tim Frasier Logistic Regression (Dichotomous predicted variable) Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

More information

Package BLCOP. February 15, 2013

Package BLCOP. February 15, 2013 Package BLCOP February 15, 2013 Type Package Title Black-Litterman and copula-opinion pooling frameworks Version 0.2.6 Date 22/05/2011 Author Maintainer Mango Solutions An

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

R practice. Eric Gilleland. 20th May 2015

R practice. Eric Gilleland. 20th May 2015 R practice Eric Gilleland 20th May 2015 1 Preliminaries 1. The data set RedRiverPortRoyalTN.dat can be obtained from http://www.ral.ucar.edu/staff/ericg. Read these data into R using the read.table function

More information

Comparing Fitted Models with the fit.models Package

Comparing Fitted Models with the fit.models Package Comparing Fitted Models with the fit.models Package Kjell Konis Acting Assistant Professor Computational Finance and Risk Management Dept. Applied Mathematics, University of Washington History of fit.models

More information

An introduction to R WS 2013/2014

An introduction to R WS 2013/2014 An introduction to R WS 2013/2014 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Dr. Martin Hutzenthaler (previously AG Metzler, now University of Frankfurt) course development,

More information

Section 4: FWL and model fit

Section 4: FWL and model fit Section 4: FWL and model fit Ed Rubin Contents 1 Admin 1 1.1 What you will need........................................ 1 1.2 Last week............................................. 2 1.3 This week.............................................

More information

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1) Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated

More information

The nor1mix Package. August 3, 2006

The nor1mix Package. August 3, 2006 The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler

More information

A Knitr Demo. Charles J. Geyer. February 8, 2017

A Knitr Demo. Charles J. Geyer. February 8, 2017 A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.

More information

Statistical Programming with R

Statistical Programming with R Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan biqelan@iugaza.edu.ps Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester

More information

The simpleboot Package

The simpleboot Package The simpleboot Package April 1, 2005 Version 1.1-1 Date 2005-03-31 LazyLoad yes Depends R (>= 2.0.0), boot Title Simple Bootstrap Routines Author Maintainer Simple bootstrap

More information

Sub-setting Data. Tzu L. Phang

Sub-setting Data. Tzu L. Phang Sub-setting Data Tzu L. Phang 2016-10-13 Subsetting in R Let s start with a (dummy) vectors. x

More information

Gov Troubleshooting the Linear Model II: Heteroskedasticity

Gov Troubleshooting the Linear Model II: Heteroskedasticity Gov 2000-10. Troubleshooting the Linear Model II: Heteroskedasticity Matthew Blackwell December 4, 2015 1 / 64 1. Heteroskedasticity 2. Clustering 3. Serial Correlation 4. What s next for you? 2 / 64 Where

More information

The R statistical computing environment

The R statistical computing environment The R statistical computing environment Luke Tierney Department of Statistics & Actuarial Science University of Iowa June 17, 2011 Luke Tierney (U. of Iowa) R June 17, 2011 1 / 27 Introduction R is a language

More information

II.Matrix. Creates matrix, takes a vector argument and turns it into a matrix matrix(data, nrow, ncol, byrow = F)

II.Matrix. Creates matrix, takes a vector argument and turns it into a matrix matrix(data, nrow, ncol, byrow = F) II.Matrix A matrix is a two dimensional array, it consists of elements of the same type and displayed in rectangular form. The first index denotes the row; the second index denotes the column of the specified

More information

Statistical Programming Worksheet 4

Statistical Programming Worksheet 4 Statistical Programming Worksheet 4 1. Cholesky Decomposition. (a) Write a function with argument n to generate a random symmetric n n-positive definite matrix. To do this: generate an n n matrix C whose

More information

R Programming: Worksheet 6

R Programming: Worksheet 6 R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.

More information

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables: Regression Lab The data set cholesterol.txt available on your thumb drive contains the following variables: Field Descriptions ID: Subject ID sex: Sex: 0 = male, = female age: Age in years chol: Serum

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

ddhazard Diagnostics Benjamin Christoffersen

ddhazard Diagnostics Benjamin Christoffersen ddhazard Diagnostics Benjamin Christoffersen 2017-11-25 Introduction This vignette will show examples of how the residuals and hatvalues functions can be used for an object returned by ddhazard. See vignette("ddhazard",

More information

The nor1mix Package. June 12, 2007

The nor1mix Package. June 12, 2007 The nor1mix Package June 12, 2007 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-7 Date 2007-03-15 Author Martin Mächler Maintainer Martin Maechler

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Lab #7 - More on Regression in R Econ 224 September 18th, 2018

Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Lab #7 - More on Regression in R Econ 224 September 18th, 2018 Robust Standard Errors Your reading assignment from Chapter 3 of ISL briefly discussed two ways that the standard regression inference formulas

More information

R Graphics. SCS Short Course March 14, 2008

R Graphics. SCS Short Course March 14, 2008 R Graphics SCS Short Course March 14, 2008 Archeology Archeological expedition Basic graphics easy and flexible Lattice (trellis) graphics powerful but less flexible Rgl nice 3d but challenging Tons of

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information

Package mnormt. April 2, 2015

Package mnormt. April 2, 2015 Package mnormt April 2, 2015 Version 1.5-2 Date 2015-04-02 Title The Multivariate Normal and t Distributions Author Fortran code by Alan Genz, R code by Adelchi Azzalini Maintainer Adelchi Azzalini

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting

More information

LibPAS Graphs Table Trend/PI Trend Period Comparison PI Gap Graph/PI Summary Graph

LibPAS Graphs Table Trend/PI Trend Period Comparison PI Gap Graph/PI Summary Graph LibPAS Graphs Graphic drill-downs are available in the Table, Trend/PI, Trend, Period Comparison, and PI Gap Report Types. Graph/PI and Summary Graph Report Types were designed specifically as graph reports.

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information