Risk Management Using R, SoSe 2013

Size: px

Start display at page:

Download "Risk Management Using R, SoSe 2013"

Jasmine Butler
5 years ago
Views:

1 1. Problem (vectors and factors) a) Create a vector containing the numbers 1 to 10. In this vector, replace all numbers greater than 4 with 5. b) Create a sequence of length 5 starting at 0 with an increment step of 2.5. c) Create a sequence from 1 to 5 and repeat each element 4 times. d) Create a vector called x with the letters "a", "b", "c", "d" and "e". The vector should be of length 30, and the frequencies randomly chosen by the sample function. Remove all "e" letters from x. The functions which, == or!= may be useful. e) Make the x vector a factor, i.e., use as.factor. Remove the "d" level completely. This means that levels(x) should give you "a" "b" "c" and not "a" "b" "c" "d". You may want to consider the help page of the factor function. Rename the levels "a", "b" and "c" to "A", "B" and "C". Fusion the levels "A" and "B" into a single category called "one" and rename level "C" into "two". Solution ## a) a <- 1:10 a[a>4] <- 5 a # seq(1,10) ## [1] ## b) seq(0, length=5, by=2.5) ## [1] ## c) rep(1:5, each=4) ## [1] ## d) set.seed(123) x <- sample(letters[1:5], size=30, replace=true) x <- x[x!="e"] # x[-which(x=='e')] x ## [1] "b" "d" "c" "a" "c" "c" "c" "c" "d" "c" "a" "b" "a" "b" "d" "d" "d" ## [18] "d" "c" "c" "b" "a" ## e) xf <- as.factor(x) xf <- xf[xf!="d"] levels(xf) # level 'd' is still there Nikolay Robinzonov 1 14th June 2013

2 ## [1] "a" "b" "c" "d" xf <- xf[,drop=true] levels(xf) # level 'd' is gone ## [1] "a" "b" "c" xf <- factor(xf, labels=c("a","b", "C")) xf ## [1] B C A C C C C C A B A B C C B A ## Levels: A B C ## one option with the 'car' package library("car") recode(xf, "c('a','b')='one'; else='two'") ## [1] one two one two two two two two one one one one two two one one ## Levels: one two ## or alternatively xx <- list(one=c("a","b"), two=c("c")) levels(xf) <- xx xf ## [1] one two one two two two two two one one one one two two one one ## Levels: one two 2. Problem (matrices) a) Sample 9 times from the normal distribution rnorm with mean 10 and variance 4. Call this vector x. Make your results reproducible, e.g., consider the set.seed function. b) Create a matrix X from x with 3 columns. c) Switch the first two columns in X and calculate the column- and the row-average values (mean), as well as the standard deviation (sd). Make use of the apply function. d) Using the functions lower.tri, upper.tri and diag assign to your X matrix the following values: X = e) Compute X +X, X +10I 3. Can you also compute (X X) 1 and (X +X ) 1, i.e., use the solve function? Do you encounter any problems and what would be the statistician s solution? Solution ## a) set.seed(123) x <- rnorm(9, mean=10, sd=2) Nikolay Robinzonov 2 14th June 2013

3 ## b) X <- matrix(x, ncol=3) ## c) X <- X[,c(2,1,3)] colmeans(x) # column switch ## [1] apply(x, 2, mean) ## [1] rowmeans(x) ## [1] apply(x, 1, mean) ## [1] apply(x, 1, sd) ## [1] apply(x, 2, sd) ## [1] ## d) X[lower.tri(X)] <- 1 X[upper.tri(X)] <- 3 diag(x) <- 2 X ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] ## e) X + t(x) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] X + 10 * diag(3) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] solve(x %*% X) Nikolay Robinzonov 3 14th June 2013

4 ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] solve(x + t(x)) # X + t(x) is not invertible (singular) ## Error: Lapack routine dgesv: system is exactly singular: U[2,2] = 0 XX <- X + t(x) + diag(0.01, nrow(x)) solve(xx) ## [,1] [,2] [,3] ## [1,] ## [2,] ## [3,] Problem (dates & times) (a) Make a sequence x of dates starting at January 1, 2013 and ending 90 days later. Extract all Fridays from x. (b) Find out the date lying exactly 10,000 days in the past? Use Sys.time() for the current date. (c) (d) Solution i) Load the Rweek.mat using the readmat() function from the R.matlab package. ii) The file contains weekly (Thursday to Thursday) percentage net returns of the S&P 500, FTSE, and DAX indices over the period from January 1984 to August Assign the data to a data.frame called rweek, name the columns appropriately and add an additional column indicating the dates. Get the returns in 2004 only. i) Load the daxmin.dat data containing DAX minute observations in the time interval March 20-27, Use the paste and the strptime functions to obtain a vector with the date/time information. Create a data.frame called daxini containing the date/time vector and the minute levels of DAX. Plot the times series. ii) Using the function aggregate obtain the highest and the lowest daily observations. Do the same with the cast function from the reshape package. Afterwards, obtain the highest DAX levels per hour for each day. iii) Compute the squared percentage log-returns and plot the median values per minute. Try to depict the previous figure as close as possible to Figure 1 ## a) x <- seq(from=as.date(" "), length=90, by="day") x[weekdays(x) == "Friday"] ## [1] " " " " " " " " " " ## [6] " " " " " " " " " " ## [11] " " " " " " ## b) as.date(sys.time()) Nikolay Robinzonov 4 14th June 2013

5 Median of the squared log returns NYSE opens 09:30 11:00 12:30 14:00 15:30 17:00 Figure 1: Median value per minute of the squared log-returns of the German DAX for the period March 20-27, ## [1] " " as.date(sys.time() * 60 * 60 * 24) ## [1] " " ## c) library(r.matlab, quietly=true) datei <- readmat("rweek.mat") rweek <- data.frame(datei$rweek) names(rweek) <- c("sp500", "ftse", "dax") ## add a "time" column time <- seq(as.date(" "), by="week", length=nrow(rweek)) rweek$time <- time ## get the 2004 ind <- format(rweek$time, "%Y") == "2004" ## rweek[ind,] ## d) daxini <- read.table(file="daxmin.dat", header = FALSE) x <- paste(daxini[,1], daxini[,2]) z <- strptime(x, format = "%Y%m%d %H:%M") dax <- data.frame(datetime = z, val = daxini[,3]) plot(dax$datetime, dax$val, type="l") # overnight jumps Nikolay Robinzonov 5 14th June 2013

6 dax$val Mar 21 Mar 23 Mar 25 Mar 27 dax$datetime plot(1:nrow(dax), dax$val, type="l") # fix gaps dax$val :nrow(dax) ## min and max values per day dax$day <- as.date(dax$datetime) dax$day <- as.factor(dax$day) ## tapply(dax$val, dax$day, max) ## aggregate(list(max=dax$val), list(date=dax$day), max) aggregate(list(rng=dax$val), list(date=dax$day), range) ## Date Rng.1 Rng.2 ## ## ## ## ## ## library(reshape) cast(dax, day ~., value = "val", range) Nikolay Robinzonov 6 14th June 2013

7 ## day X1 X2 ## ## ## ## ## ## ## per hour per day dax$hour <- factor(format(dax$datetime, "%H")) cast(dax, hour ~ day, value = "val", max) ## hour ## ## ## ## ## ## ## ## ## ## aggregate(list(max=dax$val), list(hour=dax$hour, Date=dax$day), max) ## tapply(dax$val, list(dax$hour, dax$day), max) ## log-returns dax$ret <- c(na, diff(log(dax$val))) * 100 dax$sqret <- dax$ret^2 # squared log-returns dax$min <- format(dax$datetime, "%H:%M") mmin <- cast(dax, min ~., value = "sqret", median) # median level per minute plot(1:nrow(mmin), mmin[,2], type="l", axes=false, xlab="", ylab="median of the squared log-returns") axis(2) ii <- grep("*30$ *00$", mmin[,1]) axis(1, at=(1:nrow(mmin))[ii], labels=mmin[ii,1]) nyx <- which(mmin[,1]=="15:30") # x-coord. nyy <- quantile(na.omit(mmin[,2]), 0.999) # y-coord. abline(v=nyx, col=2, lty=2) text(nyx, nyy, "NYSE opens") Median of the squared log returns NYSE opens 09:30 11:00 12:30 14:00 15:30 17:00 Nikolay Robinzonov 7 14th June 2013

8 4. Problem (functions) (a) Create an empty list mylist, and a numeric vector x using set.seed(1234) mylist <- list() x <- sample(1:500, size=20) Using a for-loop assign five new elements to mylist which contain selected values from x according to the following rule: the first element contains all x i [1, 100], the second element contains all x i [101, 200] and so on until the fifth element containing all x i [401, 500]. Sort the values of each element in ascending order. (b) Write a function for computing the density of the normal distribution and compare it to the standard dnorm function. The densitiy of the normal distribution is defined as f (x; µ, σ 2 ) = 1 (x µ)2 e 2σ 2. (1) 2πσ 2 (c) The function embed is very useful for time series analysis since it returns the lagged values of multivariate time-series. Try for example embed(cbind(1:10,101:110), 3) or embed(1:10, 4) to understand what it makes. Write another function, say embed2, which returns the lagged values of a multivariate time-series up to a given lag-length p. This new function should assign informative column names indicating the names of the time-series and the respective lags. When applied to the rweek data set head(rweek) ## sp500 ftse dax ## ## ## ## ## ## the output should look similar to embed2(rweek, resp="dax", lag=2) ## dax sp500.l1 sp500.l2 ftse.l1 ftse.l2 dax.l1 dax.l2 ## ## ## ## ## ## (d) Using the rweek data set, the previous function embed2, and the function lm fit the following model y DAXt = β 0 +β 1 y DAX,t 1 +β 2 y DAX,t 2 +β 3 y sp500,t 1 +β 4 y sp500,t 2 +β 5 y ftse,t 1 +β 6 y ftse,t 2 +ε t (2) where ε t N(0, σ 2 ). Nikolay Robinzonov 8 14th June 2013

9 Solution ## a) for(i in 1:5){ z <- x[x > (i-1) * & x<= i * 100] mylist[[i]] <- sort(z) } ## b) mynorm <- function(x, mu=0, sigma=1) 1/sqrt(2*pi*sigma^2) * exp(-(x-mu)^2/(2*sigma^2)) mynorm(0.3) ## [1] dnorm(0.3) ## [1] ## c) embed2 <- function(y, resp, lag = 1){ Names <- colnames(y) P <- lag + 1 res <- list() for(z in Names){ x <- as.matrix(y[,z, drop=false]) xlagged <- embed(x, P) colnames(xlagged) <- c(z, paste(z, 1:lag, sep = ".L")) if(z == resp) yresp <- xlagged[, 1, drop=false] xlagged <- xlagged[,2:p, drop=false] res[[z]] <- xlagged } res <- do.call(cbind, res) res <- cbind(yresp,res) if(is.data.frame(y)) res <- as.data.frame(res) res } rweek <- rweek[,-4] round(head(embed2(rweek, resp="dax", lag=2)),2) ## dax sp500.l1 sp500.l2 ftse.l1 ftse.l2 dax.l1 dax.l2 ## ## ## ## ## ## ## d) dat <- embed2(rweek, resp="dax", lag=2) fit <- lm(dax ~., data=dat) summary(fit) ## ## Call: ## lm(formula = dax ~., data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: Nikolay Robinzonov 9 14th June 2013

10 ## Estimate Std. Error t value Pr(> t ) ## (Intercept) ** ## sp500.l * ## sp500.l ## ftse.l ## ftse.l ## dax.l ## dax.l ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.98 on 1118 degrees of freedom ## Multiple R-squared: ,Adjusted R-squared: ## F-statistic: 1.11 on 6 and 1118 DF, p-value: Nikolay Robinzonov 10 14th June 2013

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation CSSS 510: Lab 2 Introduction to Maximum Likelihood Estimation 2018-10-12 0. Agenda 1. Housekeeping: simcf, tile 2. Questions about Homework 1 or lecture 3. Simulating heteroskedastic normal data 4. Fitting