zoo: An S3 Class and Methods for Indexed Totally Ordered Observations

Size: px
Start display at page:

Download "zoo: An S3 Class and Methods for Indexed Totally Ordered Observations"

Transcription

1 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations Achim Zeileis Wirtschaftsuniversität Wien Gabor Grothendieck GKX Associates Inc. Abstract A previous version to this introduction to the R package zoo has been published as Zeileis and Grothendieck (2005) in the Journal of Statistical Software. zoo is an R package providing an S3 class with methods for indexed totally ordered observations, such as discrete irregular time series. Its key design goals are independence of a particular index/time/date class and consistency with base R and the "ts" class for regular time series. This paper describes how these are achieved within zoo and provides several illustrations of the available methods for "zoo" objects which include plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. A subclass "zooreg" embeds regular time series into the "zoo" framework and thus bridges the gap between regular and irregular time series classes in R. Keywords: totally ordered observations, irregular time series, regular time series, S3, R. 1. Introduction The R system for statistical computing (R Development Core Team 2005, org/) ships with a class for regularly spaced time series, "ts" in package stats, but has no native class for irregularly spaced time series. With the increased interest in computational finance with R over the last years several implementations of classes for irregular time series emerged which are aimed particularly at finance applications. These include the S3 classes "timeseries" in package fcalendar from the Rmetrics bundle (Wuertz 2004) and "irts" in package tseries (Trapletti 2005) and the S4 class "its" in package its (Heywood 2004). With these packages available, why would anybody want yet another package providing infrastructure for irregular time series? The above mentioned implementations have in common that they are restricted to a particular class for the time scale: the former implementation comes with its own time class "timedate" built on top of the "POSIXt" classes available in base R whereas the latter two use "POSIXct" directly. And this was the starting point for the zoo project: the first author of the present paper needed more general support for ordered observations, independent of a particular index class, for the package strucchange (Zeileis, Leisch, Hornik, and Kleiber 2002). Hence, the package was called zoo which stands for Z s ordered observations. Since the first release, a major part of the additions to zoo were provided by the second author of this paper, so that the name of the package does not really reflect the authorship anymore. Nevertheless, independence of a particular index class remained the most important design goal. While the package evolved to its current status, a second key design goal became more and more clear: to provide methods to standard generic functions for the "zoo" class that are similar to those for the "ts" class (and base R in general) such that the usage of zoo is very intuitive because few additional commands have to be learned. This paper describes how these design goals are implemented in zoo. The resulting package provides the "zoo" class which offers an extensive (and still growing) set of standard and new methods for working with indexed observations and talks to the classes "ts", "its", "irts" and "timeseries". It also bridges the gap between regular and irregular time series by providing coercion with (virtually) no loss of information between "ts" and "zoo". With these tools zoo provides the basic infrastructure

2 2 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations for working with indexed totally ordered observations and the package can be either employed by users directly or can be a basic ingredient on top of which other more specialized applications can be built. The remainder of the paper is organized as follows: Section 2 explains how "zoo" objects are created and illustrates how the corresponding methods for plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling can be used. Section 3 outlines how other packages can build on this basic infrastructure. Section 4 gives a few summarizing remarks and an outlook on future developments. Finally, an appendix provides a reference card that gives an overview of the functionality contained in zoo. 2. The class "zoo" and its methods This section describes how "zoo" series can be created and subsequently manipulated, visualized, combined or coerced to other classes. In Section 2.1, the general class "zoo" for totally ordered series is described. Subsequently, in Section 2.2, the subclass "zooreg" for regular "zoo" series, i.e., series which have an index with a specified frequency, is discussed. The methods illustrated in the remainder of the section are mostly the same for both "zoo" and "zooreg" objects and hence do not have to be discussed separately. The few differences in merging and binding are briefly highlighted in Section Creation of "zoo" objects The simple idea for the creation of "zoo" objects is to have some vector or matrix of observations x which are totally ordered by some index vector. In time series applications, this index is a measure of time but every other numeric, character or even more abstract vector that provides a total ordering of the observations is also suitable. Objects of class "zoo" are created by the function zoo(x, order.by) where x is the vector or matrix of observations 1 and order.by is the index by which the observations should be ordered. It has to be of the same length as NROW(x), i.e., either the same length as x for vectors or the same number of rows for matrices. 2 The "zoo" object created is essentially the vector/matrix as before but has an additional "index" attribute in which the index is stored. 3 Both the observations in the vector/matrix x and the index order.by can, in principle, be of arbitrary classes. However, most of the following methods (plotting, aggregating, mathematical operations) for "zoo" objects are typically only useful for numeric observations x. Special effort in the design was put into independence from a particular class for the index vector. In zoo, it is assumed that combination c(), querying the length(), value matching MATCH(), subsetting [,, and, of course, ordering ORDER() work when applied to the index. In addition, an as.character() method might improve printed output 4 and as.numeric() could be used for computing distances between indexes, e.g., in interpolation. Both methods are not necessary for working with "zoo" objects but could be used if available. All these methods are available, e.g., for standard numeric and character vectors and for vectors of classes "Date", "POSIXct" or "times" from package chron, but not for the class "datetime" in fcalendar. In the last case, the solution is to provide methods for the above mentioned functions so that indexing "zoo" objects with "datetime" 1 In principle, more general objects can be indexed, but currently zoo does not support this. Development plans are that zoo should eventually support indexed factors, data frames and lists. 2 The only case where this restriction is not imposed is for zero-length vectors, i.e., vectors that only have an index but no data. 3 There is some limited support for indexed factors available in which case the "zoo" object also has an attribute "oclass" with the original class of x. This feature is still under development and might change in future versions. 4 If an as.character() method is already defined, but gives not the desired output for printing, then an index2char() method can be defined. This is a generic convenience function used for creating character representations of the index vector and it defaults to using as.character().

3 Achim Zeileis, Gabor Grothendieck 3 vectors works (see Section 3.3 for an example). To achieve this independence of the index class, new generic functions for ordering (ORDER()) and value matching (MATCH()) are introduced as the corresponding base functions order() and match() are non-generic. The default methods simply call the corresponding base functions, i.e., no new method needs to be introduced for a particular index class if the non-generic functions order() and match() work for this class. To illustrate the usage of zoo(), we first load the package and set the random seed to make the examples in this paper exactly reproducible. R> library("zoo") R> set.seed(1071) Then, we create two vectors z1 and z2 with "POSIXct" indexes, one with random observations R> z1.index <- ISOdatetime(2004, rep(1:2, 5), sample(28, 10), 0, + 0, 0) R> z1.data <- rnorm(10) R> z1 <- zoo(z1.data, z1.index) and one with a sine wave R> z2.index <- as.posixct(paste(2004, rep(1:2, 5), sample(1:28, + 10), sep = "-")) R> z2.data <- sin(2 * 1:10/pi) R> z2 <- zoo(z2.data, z2.index) Furthermore, we create a matrix Z with random observations and a "Date" index R> Z.index <- as.date(sample(12450:12500, 10)) R> Z.data <- matrix(rnorm(30), ncol = 3) R> colnames(z.data) <- c("aa", "Bb", "Cc") R> Z <- zoo(z.data, Z.index) In the examples above, the generation of indexes looks a bit awkward due to the fact the indexes need to be randomly generated (and there are no special functions for random indexes because these are rarely needed in practice). In real world applications, the indexes are typically part of the raw data set read into R so the code would be even simpler. See Section 3 for such examples. 5 Methods to several standard generic functions are available for "zoo" objects, such as print, summary, str, head, tail and [ (subsetting), a few of which are illustrated in the following. There are three printing code styles for "zoo" objects: vectors are by default printed in "horizontal" style R> z R> z1[3:7] Note, that in the code above a new as.date method, provided in zoo, is used to convert days since to class "Date". See the respective help page for more details.

4 4 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations and matrices in "vertical" style R> Z Aa Bb Cc R> Z[1:3, 2:3] Bb Cc Additionally, there is a "plain" style which simply first prints the data and then the index. Above, we have illustrated that "zoo" series can be indexed like vectors or matrices respectively, i.e., with integers correponding to their observation number (and column number). But for indexed observations, one would obviously also like to be able to index with the index class. This is also available in [ which only uses vector/matrix-type subsetting if its first argument is of class "numeric", "integer" or "logical". R> z1[isodatetime(2004, 1, c(14, 25), 0, 0, 0)] If the index class happens to be "numeric", the index has to be either insulated in I() like z[i(i)] or the window() method can be used (see Section 2.6). Summaries and most other methods for "zoo" objects are carried out column wise, reflecting the rectangular structure. In addition, a summary of the index is provided. R> summary(z1) Index z1 Min. : :00:00 Min. : st Qu.: :00:00 1st Qu.: Median : :00:00 Median : Mean : :36:00 Mean : rd Qu.: :00:00 3rd Qu.: Max. : :00:00 Max. : R> summary(z)

5 Achim Zeileis, Gabor Grothendieck 5 Index Aa Bb Cc Min. : Min. : Min. : Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median : Median : Median : Mean : Mean : Mean : Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. : Max. : Max. : Creation of "zooreg" objects Strictly regular series are such series observations where the distance between the indexes of every two adjacent observations is the same. Such series can also be described by their frequency, i.e., the reciprocal value of the distance between two observations. As "zoo" can be used to store series with arbitrary type of index, it can, of course, also be used to store series with regular indexes. So why should this case be given special attention, in particular as there is already the "ts" class devoted entirely to regular series? There are two reasons: First, to be able to convert back and forth between "ts" and "zoo", the frequency of a certain series needs to be stored on the "zoo" side. Second, "ts" is limited to strictly regular series and the regularity is lost if some internal observations are omitted. Series that can be created by omitting some internal observations from strictly regular series will in the following be refered to as being (weakly) regular. Therefore, a class that bridges the gap between irregular and strictly regular series is needed and "zooreg" fills this gap. Objects of class "zooreg" inherit from class "zoo" but have an additional attribute "frequency" in which the frequency of the series is stored. Therefore, they can be employed to represent both strictly and weakly regular series. To create a "zooreg" object, either the command zoo() can be used or the command zooreg(). zoo(x, order.by, frequency) zooreg(data, start, end, frequency, deltat, ts.eps, order.by) If zoo() is called as in the previous section but with an additional frequency argument, it is checked whether frequency complies with the index order.by: if it does an object of class "zooreg" inheriting from "zoo" is returned. The command zooreg() takes mostly the same arguments as ts(). 6 In both cases, the index class is more restricted than in the plain "zoo" case. The index must be of a class which can be coerced to "numeric" (for checking its regularity) and when converted to numeric the index must be expressable as multiples of 1/frequency. Furthermore, adding/substracting a numeric to/from an observation of the index class, should return the correct value of the index class again, i.e., group generic functions Ops should be defined. 7 The following calls yield equivalent series R> zr1 <- zooreg(sin(1:9), start = 2000, frequency = 4) R> zr2 <- zoo(sin(1:9), seq(2000, 2002, by = 1/4), 4) R> zr1 2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3) (4) 2002(1) R> zr2 6 Only if order.by is specified in the zooreg() call, then zoo(x, order.by, frequency) is called. 7 An application of non-numeric indexes for regular series are the classes "yearmon" and "yearqtr" which are designed for monthly and quarterly series respectively and are discussed in Section 3.4.

6 6 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations 2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3) (4) 2002(1) to which methods to standard generic functions for regular series can be applied, such as frequency, deltat, cycle. As stated above, the advantage of "zooreg" series is that they remain regular even if an internal observation is dropped: R> zr1 <- zr1[-c(3, 5)] R> zr1 2000(1) 2000(2) 2000(4) 2001(2) 2001(3) 2001(4) 2002(1) R> class(zr1) [1] "zooreg" "zoo" R> frequency(zr1) [1] 4 This facilitates NA handling significantly compared to "ts" and makes "zooreg" a much more attractive data type, e.g., for time series regression. zooreg() can also deal with non-numeric indexes provided that adding "numeric" observations to the index class preserves the class and does not coerce to "numeric". R> zooreg(1:5, start = as.date(" ")) To check whether a certain series is (strictly) regular, the new generic function is.regular(x, strict = FALSE) can be used: R> is.regular(zr1) [1] TRUE R> is.regular(zr1, strict = TRUE) [1] FALSE This function (and also the frequency, deltat and cycle) also work for "zoo" objects if the regularity can still be inferred from the data: R> zr1 <- as.zoo(zr1) R> zr

7 Achim Zeileis, Gabor Grothendieck 7 R> class(zr1) [1] "zoo" R> is.regular(zr1) [1] TRUE R> frequency(zr1) [1] 4 Of course, inferring the underlying regularity is not always reliable and it is safer to store a regular series as a "zooreg" object if it is intended to be a regular series. If a weakly regular series is coerced to "ts" the missing observations are filled with NAs (see also Section 2.8). For strictly regular series with numeric index, the class can be switched between "zoo" and "ts" without loss of information. R> as.ts(zr1) Qtr1 Qtr2 Qtr3 Qtr NA NA R> identical(zr2, as.zoo(as.ts(zr2))) [1] TRUE This enables direct use of functions such as acf, arima, stl etc. on "zooreg" objects as these methods coerce to "ts" first. The result only has to be coerced back to "zoo", if appropriate Plotting The plot method for "zoo" objects, in particular for multivariate "zoo" series, is based on the corresponding method for (multivariate) regular time series. It relies on plot and lines methods being available for the index class which can plot the index against the observations. By default the plot method creates a panel for each series R> plot(z) but can also display all series in a single panel R> plot(z, plot.type = "single", col = 2:4) In both cases additional graphical parameters like color col, plotting character pch and line type lty can be expanded to the number of series. But the plot method for "zoo" objects offers some more flexibility in specification of graphical parameters as in R> plot(z, type = "b", lty = 1:3, pch = list(aa = 1:5, Bb = 2, Cc = 4), + col = list(bb = 2, 4))

8 8 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations The argument lty behaves as before and sets every series in another line type. The pch argument is a named list that assigns to each series a different vector of plotting characters each of which is expanded to the number of observations. Such a list does not necessarily have to include the names of all series, but can also specify a subset. For the remaining series the default parameter is then used which can again be changed: e.g., in the above example the col argument is set to display the series "Bb" in red and all remaining series in blue. The results of the multiple panel plots are depicted in Figure 2 and the single panel plot in Figure Merging and binding As for many rectangular data formats in R, there are both methods for combining the rows and columns of "zoo" objects respectively. For the rbind method the number of columns of the combined objects has to be identical and the indexes may not overlap. R> rbind(z1[5:10], z1[2:3]) The c method simply calls rbind and hence behaves in the same way. The cbind method by default combines the columns by the union of the indexes and fills the created gaps by NAs. R> cbind(z1, z2) z1 z NA NA Z Feb 02 Feb 12 Feb 22 Mar 03 Mar 13 Index Figure 1: Example of a single panel plot

9 Achim Zeileis, Gabor Grothendieck 9 Z Cc Bb Aa Feb 02 Feb 12 Feb 22 Mar 03 Mar 13 Index Z Cc Bb Aa Feb 02 Feb 12 Feb 22 Mar 03 Mar 13 Index Figure 2: Examples of multiple panel plots

10 10 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations NA NA NA NA NA NA NA NA NA NA NA NA In fact, the cbind method is synonymous with the merge method 8 except that the latter provides additional arguments which allow for combining the columns by the intersection of the indexes using the argument all = FALSE R> merge(z1, z2, all = FALSE) z1 z Additionally, the filling pattern can be changed in merge, the naming of the columns can be modified and the return class of the result can be specified. In the case of merging of objects with different index classes, R gives a warning and tries to coerce the indexes. Merging objects with different index classes is generally discouraged if it is used nevertheless, it is the responsibility of the user to ensure that the result is as intended. If at least one of the merged/binded objects was a "zooreg" object, then merge tries to return a "zooreg" object. This is done by assessing whether there is a common maximal frequency and by checking whether the resulting index is still (weakly) regular. If non-"zoo" objects are included in merging, then merge gives plain vectors/factors/matrices the index of the first argument (if it is of the same length). Scalars are always added for the full index without missing values. R> merge(z1, pi, 1:10) z1 pi 1: Note, that in some situations the column naming in the resulting object is somewhat problematic in the cbind method and the merge method might provide better formatting of the column names.

11 Achim Zeileis, Gabor Grothendieck 11 Another function which performs operations along a subset of indexes is aggregate, which is discussed in this section although it does not combine several objects. Using the aggregate method, "zoo" objects are split into subsets along a coarser index grid, summary statistics are computed for each and then the reduced object is returned. In the following example, first a function is set up which returns for a given "Date" value the corresponding first of the month. This function is then used to compute the coarser grid for the aggregate call: in the first example, the grouping is computed explicitely by firstofmonth(index(z)) and the mean of the observations in the month is returned in the second example, only the function that computes the grouping (when applied to index(z)) is supplied and the first observation is used for aggregation. R> firstofmonth <- function(x) as.date(sub("..$", "01", format(x))) R> aggregate(z, firstofmonth(index(z)), mean) Aa Bb Cc R> aggregate(z, firstofmonth, head, 1) Aa Bb Cc Mathematical operations To allow for standard mathematical operations among "zoo" objects, zoo extends group generic functions Ops. These perform the operations only for the intersection of the indexes of the objects. As an example, the summation and logical comparison with < of z1 and z2 yield R> z1 + z R> z1 < z FALSE FALSE FALSE Additionally, methods for transposing t of "zoo" objects which coerces to a matrix before and computing cumulative quantities such as cumsum, cumprod, cummin, cummax which are all applied column wise. R> cumsum(z) Aa Bb Cc

12 12 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations 2.6. Extracting and replacing the data and the index zoo provides several generic functions and methods to work on the data contained in a "zoo" object, the index (or time) attribute associated to it, and on both data and index. The data stored in "zoo" objects can be extracted by coredata which strips off all "zoo"-specific attributes and it can be replaced using coredata<-. Both are new generic functions 9 with methods for "zoo" objects as illustrated in the following example. R> coredata(z1) [1] [7] R> coredata(z1) <- 1:10 R> z The index associated with a "zoo" object can be extracted by index and modified by index<-. As the interpretation of the index as time in time series applications is natural, there are also synonymous methods time and time<-. Hence, the commands index(z2) and time(z2) return equivalent results. R> index(z2) [1] " CET" " CET" " CET" " CET" [5] " CET" " CET" " CET" " CET" [9] " CET" " CET" The index scale of z2 can be changed to that of z1 by R> index(z2) <- index(z1) R> z The start and the end of the index/time vector can be queried by start and end: R> start(z1) [1] " CET" R> end(z1) [1] " CET" 9 The coredata functionality is similar in spirit to the core function in its and value in tseries. However, the focus of those functions is somewhat narrower and we try to provide more general purpose generic functions. See the respective manual page for more details.

13 Achim Zeileis, Gabor Grothendieck 13 To work on both data and index/time, zoo provides window and window<- methods for "zoo" objects. In both cases the window is specified by window(x, index, start, end) where x is the "zoo" object, index is a set of indexes to be selected (by default the full index of x) and start and end can be used to restrict the index set. R> window(z, start = as.date(" ")) Aa Bb Cc R> window(z, index = index(z)[5:8], end = as.date(" ")) Aa Bb Cc The first example selects all observations starting from whereas the second selects from the from the 5th to 8th observation those up to The same syntax can be used for the corresponding replacement function. R> window(z1, end = as.posixct(" ")) <- 9:5 R> z Two methods that are standard in time series applications are lag and diff. These are available with the same arguments as the "ts" methods. 10 R> lag(z1, k = -1) R> merge(z1, lag(z1, k = 1)) z1 lag(z1, k = 1) diff also has an additional argument that also allows for geometric and not only allows arithmetic differences. Furthermore, note the sign of the lag in lag which behaves like the "ts" method, i.e., by default it is positive and shifts the observations forward, to obtain the more standard backward shift the lag has to be negative.

14 14 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations NA R> diff(z1) Coercion to and from "zoo" Coercion to and from "zoo" objects is available for objects of various classes, in particular "ts", "irts" and "its" objects can be coerced to "zoo" and back if the index is of the appropriate class. 11 Coercion between "zooreg" and "zoo" is also available and is essentially dropping the "frequency" attribute or trying to add one, respectively. Furthermore, "zoo" objects can be coerced to vectors, matrices, lists and data frames (the latter dropping the index/time attribute). A simple example is R> as.data.frame(z) Aa Bb Cc NA handling Four methods for dealing with NAs (missing observations) in the observations are applicable to "zoo" objects: na.omit, na.contiguous, na.approx and na.locf. na.omit or its default method to be more precise returns a "zoo" object with incomplete observations removed. na.contiguous extracts the longest consecutive stretch of non-missing values. Furthermore, new generic functions na.approx and na.locf and corresponding default methods are introduced in zoo. The former replaces NAs by linear interpolation (using the function approx) and the name of the latter stands for last observation carried forward. It replaces missing observations by the most recent non-na prior to it. Leading NAs, which cannot be replaced by previous observations, are removed in both functions by default. R> z1[sample(1:10, 3)] <- NA R> z1 11 Coercion from "zoo" to "irts" is contained in the tseries package.

15 Achim Zeileis, Gabor Grothendieck NA NA NA R> na.omit(z1) R> na.contiguous(z1) R> na.approx(z1) R> na.approx(z1, 1:NROW(z1)) R> na.locf(z1) As the above example illustrates, na.approx uses by default the underlying time scale for interpolation. This can be changed, e.g., to an equidistant spacing, by setting the second argument of na.approx Rolling functions A typical task to be performed on ordered observations is to evaluate some function, e.g., computing the mean, in a window of observations that is moved over the full sample period. The resulting statistics are usually synonymously referred to as rolling/running/moving statistics. In zoo, the generic function rapply is provided along with a "zoo" and a "ts" method. The most important arguments are rapply(data, width, FUN) where the function FUN is applied to a rolling window of size width of the observations data. The function rapply currently only evaluates the function for windows of full size width, hence the result has width - 1 fewer observations than the original series. But it can be determined whether the lost observations should be padded with NAs and whether the result should be leftor right-aligned or centered (default) with respect to the original index.

16 16 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations R> rapply(z, 5, sd) Aa Bb Cc R> rapply(z, 5, sd, na.pad = TRUE, align = "left") Aa Bb Cc NA NA NA NA NA NA NA NA NA NA NA NA To improve the performance of rapply(x, k, foo) for some frequently used functions foo, more efficient implementations rollfoo(x, k) are available (and also called by rapply). Currently, these are the generic functions rollmean, rollmedian and rollmax which have methods for "zoo" and "ts" series and a default method for plain vectors. R> rollmean(z2, 5, na.pad = TRUE) NA NA NA NA 3. Combining zoo with other packages The main purpose of the package zoo is to provide basic infrastructure for working with indexed totally ordered observations that can be either employed by users directly or can be a basic ingredient on top of which other packages can build. The latter is illustrated with a few brief examples involving the packages strucchange, tseries and fcalendar in this section. Finally, the classes "yearmon" and "yearqtr" (provided in zoo) are used for illustrating how zoo can be extended by creating a new index class strucchange: Empirical fluctuation processes The package strucchange provides a collection of methods for testing, monitoring and dating structural changes, in particular in linear regression models. Tests for structural change assess whether the parameters of a model remain constant over an ordering with respect to a specified variable, usually time. To adequately store and visualize empirical fluctuation processes which capture instabilities over this ordering, a data type for indexed ordered observations is required. This was the motivation for starting the zoo project.

17 Achim Zeileis, Gabor Grothendieck 17 A simple example for the need of "zoo" objects in strucchange which can not be (easily) implemented by other irregular time series classes available in R is described in the following. We assess the constancy of the electrical resistance over the apparent juice content of kiwi fruits. 12 The data set fruitohms is contained in the DAAG package (Maindonald and Braun 2004). The fitted ocus object contains the OLS-based CUSUM process for the mean of the electrical resistance (variable ohms) indexed by the juice content (variable juice). R> library("strucchange") R> library("daag") R> data("fruitohms") R> ocus <- gefp(ohms ~ 1, order.by = ~juice, data = fruitohms) R> plot(ocus) M fluctuation test Empirical fluctuation process juice Figure 3: Empirical M-fluctuation process for fruitohms data This OLS-based CUSUM process can be visualized using the plot method for "gefp" objects which builds on the "zoo" method and yields in this case the plot in Figure 3 showing the process which crosses its 5% critical value and thus signals a significant decrease in the mean electrical resistance over the juice content. For more information on the package strucchange and the function gefp see Zeileis et al. (2002) and Zeileis (2004) tseries: Historical financial data This section war written when tseries did not yet support "zoo" series directly. For historical reasons and completeness, the example is still included but for practical purposes it is not relevant anymore because, from version on, get.hist.quote returns a "zoo" series by default. A typical application for irregular time series which became increasingly important over the last years in computational statistics and finance is daily (or higher frequency) financial data. The package tseries provides the function get.hist.quote for obtaining historical financial data by 12 A different approach would be to test whether the slope of a regression of electrical resistance on juice content changes with increasing juice content, i.e., to test for instabilities in ohms ~ juice instead of ohms ~ 1. Both lead to similar results.

18 18 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations querying Yahoo! Finance at an online portal quoting data provided by Reuters. The following code queries the quotes of Lucent Technologies starting from until : R> library("tseries") R> LU <- get.hist.quote(instrument = "LU", start = " ", + end = " ", origin = " ", retclass = "ts") In the returned LU object the irregular data is stored by extending it in a regular grid and filling the gaps with NAs. The time is stored in days starting from an origin, in this case specified to be , the origin used by the Date class. This series can be transformed easily into an irregular "zoo" series using a "Date" index. The log-difference returns for Lucent Technologies is depicted in Figure 4. R> LU <- as.zoo(lu) R> index(lu) <- as.date(index(lu)) R> LU <- na.omit(lu) 3.3. fcalendar: Indexes of class "timedate" Although the methods in zoo work out of the box for many index classes, it might be necessary for some index classes to provide c, length, ORDER and MATCH methods such that the methods in zoo work properly. An example for such an index class which requires a bit more attention is "timedate" from the fcalendar package. But after the necessary methods have been defined R> length.timedate <- function(x) prod(x@dim) R> ORDER.timeDate <- function(x,...) order(as.posixct(x),...) R> MATCH.timeDate <- function(x, table, nomatch = NA,...) match(as.posixct(x), + as.posixct(table), nomatch = nomatch,...) the class "timedate" can be used for indexing "zoo" objects. The following example illustrates how z2 can be transformed to use the "timedate" class. R> library("fcalendar") R> z2td <- zoo(coredata(z2), timedate(index(z2), FinCenter = "GMT")) R> z2td The classes "yearmon" and "yearqtr": Roll your own index One of the strengths of the zoo package is its independence of the index class, such that the index can be easily customized. The previous section already explained how an existing class ("timedate") can be used as the index if the necessary methods are created. This section has a similar but slightly different focus: it describes how new index classes can be created addressing a certain type of indexes. These classes are "yearmon" and "yearqtr" (already contained in zoo) which provide indexes for monthly and quarterly data respectively. As the code is virtually identical for both classes except that one has the frequency 12 and the other 4 we will only discuss "yearmon" explicitly.

19 Achim Zeileis, Gabor Grothendieck 19 R> plot(diff(log(lu))) diff(log(lu)) Close Low High Open Index Figure 4: Log-difference returns for Lucent Technologies Of course, monthly data can simply be stored using a numeric index just as the class "ts" does. The problem is that this does not have the meta-information attached that this is really specifying monthly data which is in "yearmon" simply added by a class attribute. Hence, the class creator is simply defined as yearmon <- function(x) structure(floor(12*x )/12, class = "yearmon") which is very similar to the as.yearmon coercion functions provided. As "yearmon" data is now explicitly declared to describe monthly data, this can be exploited for coercion to other time classes: either to coarser time scales such as "yearqtr" or to finer time

20 20 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations scales such as "Date", "POSIXct" or "POSIXlt" which by default associate the first day within a month with a "yearmon" observation. Adding a format and as.character method produces human readable character representations of "yearmon" data and Ops and MATCH methods complete the methods needed for conveniently working with monthly data in zoo. Note, that all of these methods are very simple and rather obvious (as can be seen in the zoo sources), but prove very helpful in the following examples. First, we create a regular series zr3 with "yearmon" index which leads to improved printing compared to the regular series zr1 and zr2 from Section 2.2. R> zr3 <- zooreg(rnorm(9), start = yearmon(2000), frequency = 12) R> zr3 Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun Jul 2000 Aug 2000 Sep This could be aggregated to quarterly data via R> aggregate(zr3, as.yearqtr, mean) 2000 Q Q Q The index can easily be transformed to "Date", the default being the first day of the month but which can also be changed to the last day of the month. R> as.date(index(zr3)) [1] " " " " " " " " " " [6] " " " " " " " " R> as.date(index(zr3), frac = 1) [1] " " " " " " " " " " [6] " " " " " " " " Furthermore, "yearmon" indexes can easily be coerced to "POSIXct" such that the series could be exported as a "its" or "irts" series. R> index(zr3) <- as.posixct(index(zr3)) R> as.irts(zr3) :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT :00:00 GMT Again, this functionality makes switching between different time scales or index representations particularly easy and zoo provides the user with the flexibility to adjust a certain index to his/her problem of interest.

21 Achim Zeileis, Gabor Grothendieck Summary and outlook The package zoo provides an S3 class and methods for indexed totally ordered observations, such as both regular and irregular time series. Its key design goals are independence of a particular index class and compatibility with standard generics similar to the behaviour of the corresponding "ts" methods. This paper describes how these are implemented in zoo and illustrates the usage of the methods for plotting, merging and binding, several mathematical operations, extracting and replacing data and index, coercion and NA handling. An indexed object of class "zoo" can be thought of as data plus index where the data are essentially vectors or matrices and the index can be a vector of (in principle) arbitrary class. For (weakly) regular "zooreg" series, a "frequency" attribute is stored in addition. Therefore, objects of classes "ts", "its", "irts" and "timeseries" can easily be transformed into "zoo" objects the reverse transformation is also possible provided that the index fulfills the restrictions of the respective class. Hence, the "zoo" class can also be used as the basis for other classes of indexed observations and more specific functionality can be built on top of it. Furthermore, it bridges the gap between irregular and regular series, facilitating operations such as NA handling compared to "ts". Whereas a lot of effort was put into achieving independence of a particular index class, the types of data that can be indexed with "zoo" are currently limited to vectors and matrices, typically containing numeric values. Although, there is some limited support available for indexed factors, one important direction for future development of zoo is to add better support for other objects that can also naturally be indexed including specifically factors, data frames and lists. Computational details The results in this paper were obtained using R with the packages zoo 1.0 5, strucchange , fcalendar , tseries and DAAG R itself and all packages used are available from CRAN at References Heywood G (2004). its: Irregular Time Series. Portfolio & Risk Advisory Group and Commerzbank Securities. R package version Maindonald J, Braun WJ (2004). DAAG: Data Analysis and Graphics. R package version 0.46, URL R Development Core Team (2005). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN , URL http: // Trapletti A (2005). tseries: Time Series Analysis and Computational Finance. R package version Wuertz D (2004). Rmetrics: An Environment and Software Collection for Teaching Financial Engineering and Computational Finance. R package fcalendar, version , URL http: // Zeileis A (2004). Implementing a Class of Structural Change Tests: An Econometric Computing Approach. Report 7, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Research Report Series. URL Zeileis A, Grothendieck G (2005). zoo: S3 Infrastructure for Regular and Irregular Time Series. Journal of Statistical Software, 14(6), URL

22 22 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations Zeileis A, Leisch F, Hornik K, Kleiber C (2002). strucchange: An R Package for Testing for Structural Change in Linear Regression Models. Journal of Statistical Software, 7(2), URL

23 Achim Zeileis, Gabor Grothendieck 23 Creation zoo(x, order.by) A. Reference card creation of a "zoo" object from the observations x (a vector or a matrix) and an index order.by by which the observations are ordered. For computations on arbitrary index classes, methods to the following generic functions are assumed to work: combining c(), querying length length(), subsetting [, ordering ORDER() and value matching MATCH(). For pretty printing an as.character and/or index2char method might be helpful. Creation of regular series zoo(x, order.by, freq) zooreg(x, start, end, freq) Standard methods plot lines print summary str head, tail works as above but creates a "zooreg" object which inherits from "zoo" if the frequency freq complies with the index order.by. An as.numeric method has to be available for the index class. creates a "zooreg" series with a numeric index as above and has (almost) the same interface as ts(). plotting adding a "zoo" series to a plot printing summarizing (column-wise) displaying structure of "zoo" objects head and tail of "zoo" objects Coercion as.zoo as.class.zoo is.zoo coercion to "zoo" is available for objects of class "ts", "its", "irts" (plus a default method). coercion from "zoo" to other classes. Currently available for class in "matrix", "vector", "data.frame", "list", "irts", "its" and "ts". querying wether an object is of class "zoo" Merging and binding merge union, intersection, left join, right join along indexes cbind column binding along the intersection of the index c, rbind combining/row binding (indexes may not overlap) aggregate compute summary statistics along a coarser grid of indexes Mathematical operations Ops t cumsum group generic functions performed along the intersection of indexes transposing (coerces to "matrix" before) compute (columnwise) cumulative quantities: sums cumsum(), products cumprod(), maximum cummax(), minimum cummin().

24 24 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations Extracting and replacing data and index index, time extract the index of a series index<-, time<- replace the index of a series coredata, coredata<- extract and replace the data associated with a "zoo" object lag lagged observations diff arithmetic and geometric differences start, end querying start and end of a series window, window<- subsetting of "zoo" objects using their index NA handling na.omit na.contiguous na.locf na.approx Rolling functions rapply rollmean omit NAs compute longest sequence of non-na observations impute NAs by carrying forward the last observation impute NAs by interpolation apply a function to rolling margin of an array more efficient functions for computing the rolling mean, median and maximum are rollmean(), rollmedian() and rollmax(), respectively Methods for regular series is.regular checks whether a series is weakly (or strictly if strict = TRUE) regular frequency, deltat extracts the frequency or its reciprocal value respectively from a series, for "zoo" series the functions try to determine the regularity and frequency in a data-driven way cycle gives the position in the cycle of a regular series

Data Structures: Times, Dates, Ordered Observations... and Beyond. zeileis/

Data Structures: Times, Dates, Ordered Observations... and Beyond.  zeileis/ Data Structures: Times, Dates, Ordered Observations... and Beyond Achim Zeileis Kurt Hornik http://statmath.wu-wien.ac.at/ zeileis/ Overview Data structures Principles Object orientation Times and dates

More information

Reading Data in zoo. Gabor Grothendieck GKX Associates Inc. Achim Zeileis Universität Innsbruck

Reading Data in zoo. Gabor Grothendieck GKX Associates Inc. Achim Zeileis Universität Innsbruck Reading Data in zoo Gabor Grothendieck GKX Associates Inc. Achim Zeileis Universität Innsbruck Abstract This vignette gives examples of how to read data in various formats in the zoo package using the

More information

Package zoo. February 15, 2013

Package zoo. February 15, 2013 Package zoo February 15, 2013 Version 1.7-9 Date 2012-11-04 Title S3 Infrastructure for Regular and Irregular Time Series (Z s ordered observations) Description An S3 class with methods for totally ordered

More information

pandas: Rich Data Analysis Tools for Quant Finance

pandas: Rich Data Analysis Tools for Quant Finance pandas: Rich Data Analysis Tools for Quant Finance Wes McKinney April 24, 2012, QWAFAFEW Boston about me MIT 07 AQR Capital: 2007-2010 Global Macro and Credit Research WES MCKINNEY pandas: 2008 - Present

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Managing chronological objects with timedate and timeseries

Managing chronological objects with timedate and timeseries Managing chronological objects with timedate and timeseries Yohan Chalabi and Diethelm Wuertz ITP ETH, Zurich Rmetrics Association, Zurich Finance Online, Zurich user! 2009 Outline 1 timedate Class timedate

More information

Stat 579: Objects in R Vectors

Stat 579: Objects in R Vectors Stat 579: Objects in R Vectors Ranjan Maitra 2220 Snedecor Hall Department of Statistics Iowa State University. Phone: 515-294-7757 maitra@iastate.edu, 1/23 Logical Vectors I R allows manipulation of logical

More information

Package datetimeutils

Package datetimeutils Type Package Title Utilities for Dates and Times Version 0.2-12 Date 2018-02-28 Package datetimeutils March 10, 2018 Maintainer Utilities for handling dates and times, such as selecting

More information

kernlab An R Package for Kernel Learning

kernlab An R Package for Kernel Learning kernlab An R Package for Kernel Learning Alexandros Karatzoglou Alex Smola Kurt Hornik Achim Zeileis http://www.ci.tuwien.ac.at/~zeileis/ Overview Implementing learning algorithms: Why should we write

More information

Introduction to R for Time Series Analysis Lecture Notes 2

Introduction to R for Time Series Analysis Lecture Notes 2 Introduction to R for Time Series Analysis Lecture Notes 2 1.0 OVERVIEW OF R R is a widely used environment for statistical analysis. The striking difference between R and most other statistical software

More information

Package glogis. R topics documented: March 16, Version Date

Package glogis. R topics documented: March 16, Version Date Version 1.0-1 Date 2018-03-16 Package glogis March 16, 2018 Title Fitting and Testing Generalized Logistic Distributions Description Tools for the generalized logistic distribution (Type I, also known

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Package slam. December 1, 2016

Package slam. December 1, 2016 Version 0.1-40 Title Sparse Lightweight Arrays and Matrices Package slam December 1, 2016 Data structures and algorithms for sparse arrays and matrices, based on inde arrays and simple triplet representations,

More information

Package chron. R topics documented: October 7, Version

Package chron. R topics documented: October 7, Version Version 2.3-51 Package chron October 7, 2017 Title Chronological Objects which can Handle Dates and Times Provides chronological objects which can handle dates and times. Depends R (>= 2.12.0) Imports

More information

Package slam. February 15, 2013

Package slam. February 15, 2013 Package slam February 15, 2013 Version 0.1-28 Title Sparse Lightweight Arrays and Matrices Data structures and algorithms for sparse arrays and matrices, based on inde arrays and simple triplet representations,

More information

Package ggseas. June 12, 2018

Package ggseas. June 12, 2018 Package ggseas June 12, 2018 Title 'stats' for Seasonal Adjustment on the Fly with 'ggplot2' Version 0.5.4 Maintainer Peter Ellis Provides 'ggplot2' 'stats' that estimate

More information

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive,

Some issues with R It is command-driven, and learning to use it to its full extent takes some time and effort. The documentation is comprehensive, R To R is Human R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data, also in finance. Good parts It is freely

More information

EPIB Four Lecture Overview of R

EPIB Four Lecture Overview of R EPIB-613 - Four Lecture Overview of R R is a package with enormous capacity for complex statistical analysis. We will see only a small proportion of what it can do. The R component of EPIB-613 is divided

More information

GraphBLAS Mathematics - Provisional Release 1.0 -

GraphBLAS Mathematics - Provisional Release 1.0 - GraphBLAS Mathematics - Provisional Release 1.0 - Jeremy Kepner Generated on April 26, 2017 Contents 1 Introduction: Graphs as Matrices........................... 1 1.1 Adjacency Matrix: Undirected Graphs,

More information

Bernt Arne Ødegaard. 15 November 2018

Bernt Arne Ødegaard. 15 November 2018 R Bernt Arne Ødegaard 15 November 2018 To R is Human 1 R R is a computing environment specially made for doing statistics/econometrics. It is becoming the standard for advanced dealing with empirical data,

More information

CS113: Lecture 3. Topics: Variables. Data types. Arithmetic and Bitwise Operators. Order of Evaluation

CS113: Lecture 3. Topics: Variables. Data types. Arithmetic and Bitwise Operators. Order of Evaluation CS113: Lecture 3 Topics: Variables Data types Arithmetic and Bitwise Operators Order of Evaluation 1 Variables Names of variables: Composed of letters, digits, and the underscore ( ) character. (NO spaces;

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R

MBV4410/9410 Fall Bioinformatics for Molecular Biology. Introduction to R MBV4410/9410 Fall 2018 Bioinformatics for Molecular Biology Introduction to R Outline Introduce R Basic operations RStudio Bioconductor? Goal of the lecture Introduce you to R Show how to run R, basic

More information

Appendix A Getting Started with R

Appendix A Getting Started with R Appendix A Getting Started with R A.1 Installing R To get started with R, we first have to download and install R on our local system. To get R, go to the Comprehensive R Archive Network (CRAN) website

More information

The fimport Package. October 8, 2007

The fimport Package. October 8, 2007 The fimport Package October 8, 2007 Version 260.72 Date 1997-2007 Title Rmetrics - Economic and Financial Data Import Author Diethelm Wuertz and many others, see the SOURCE file Depends R (>= 2.4.0), fseries

More information

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series

More information

Package tibble. August 22, 2017

Package tibble. August 22, 2017 Encoding UTF-8 Version 1.3.4 Title Simple Data Frames Package tibble August 22, 2017 Provides a 'tbl_df' class (the 'tibble') that provides stricter checking and better formatting than the traditional

More information

Basics of Computational Geometry

Basics of Computational Geometry Basics of Computational Geometry Nadeem Mohsin October 12, 2013 1 Contents This handout covers the basic concepts of computational geometry. Rather than exhaustively covering all the algorithms, it deals

More information

Model-based Recursive Partitioning

Model-based Recursive Partitioning Model-based Recursive Partitioning Achim Zeileis Torsten Hothorn Kurt Hornik http://statmath.wu-wien.ac.at/ zeileis/ Overview Motivation The recursive partitioning algorithm Model fitting Testing for parameter

More information

Introduction to Octave/Matlab. Deployment of Telecommunication Infrastructures

Introduction to Octave/Matlab. Deployment of Telecommunication Infrastructures Introduction to Octave/Matlab Deployment of Telecommunication Infrastructures 1 What is Octave? Software for numerical computations and graphics Particularly designed for matrix computations Solving equations,

More information

Sand Pit Utilization

Sand Pit Utilization Sand Pit Utilization A construction company obtains sand, fine gravel, and coarse gravel from three different sand pits. The pits have different average compositions for the three types of raw materials

More information

Earthquake data in geonet.org.nz

Earthquake data in geonet.org.nz Earthquake data in geonet.org.nz There is are large gaps in the 2012 and 2013 data, so let s not use it. Instead we ll use a previous year. Go to http://http://quakesearch.geonet.org.nz/ At the screen,

More information

Ecffient calculations

Ecffient calculations Ecffient calculations Vectorized computations The efficiency of calculations depends on how you perform them. Vectorized calculations, for example, avoid going trough individual vector or matrix elements

More information

Package PCICt. April 16, 2018

Package PCICt. April 16, 2018 Version 0.5-4.1 Date 2013-06-26 Package PCICt April 16, 2018 Title Implementation of POSIXct Work-Alike for 365 and 360 Day Calendars Author David Bronaugh for the Pacific Climate Impacts

More information

1 Matrices and Vectors and Lists

1 Matrices and Vectors and Lists University of Wollongong School of Mathematics and Applied Statistics STAT231 Probability and Random Variables 2014 Second Lab - Week 4 If you can t finish the log-book questions in lab, proceed at home.

More information

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by Date Monthly SEO Report Prepared by Example Client 16 November 212 Scott Lawson Contents Thanks for using TrackPal s automated SEO and Analytics reporting template. Below is a brief explanation of the

More information

Package CausalImpact

Package CausalImpact Package CausalImpact September 15, 2017 Title Inferring Causal Effects using Bayesian Structural Time-Series Models Date 2017-08-16 Author Kay H. Brodersen , Alain Hauser

More information

Financial Time-Series Tools

Financial Time-Series Tools Financial Time-Series Tools xts and chartseries Jeffrey A. Ryan jeffrey.ryan @ insightalgo.com Joshua M. Ulrich joshua.m.ulrich @ gmail.com Presented by Jeffrey Ryan at: Rmetrics: Computational Finance

More information

Good Relations with R. Good Relations with R

Good Relations with R. Good Relations with R Good Relations with R Kurt Hornik David Meyer Motivation Meyer, Leisch & Hornik (2003), The Support Vector Machine under test, Neurocomputing: Large scale benchmark analysis of performance of SVMs for

More information

Vectors and Matrices. Chapter 2. Linguaggio Programmazione Matlab-Simulink (2017/2018)

Vectors and Matrices. Chapter 2. Linguaggio Programmazione Matlab-Simulink (2017/2018) Vectors and Matrices Chapter 2 Linguaggio Programmazione Matlab-Simulink (2017/2018) Matrices A matrix is used to store a set of values of the same type; every value is stored in an element MATLAB stands

More information

Appendix: Generic PbO programming language extension

Appendix: Generic PbO programming language extension Holger H. Hoos: Programming by Optimization Appendix: Generic PbO programming language extension As explained in the main text, we propose three fundamental mechanisms to be covered by a generic PbO programming

More information

8.1 R Computational Toolbox Tutorial 3

8.1 R Computational Toolbox Tutorial 3 8.1 R Computational Toolbox Tutorial 3 Introduction to Computational Science: Modeling and Simulation for the Sciences, 2 nd Edition Angela B. Shiflet and George W. Shiflet Wofford College 2014 by Princeton

More information

Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2015 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

More information

Introduction to Programming, Aug-Dec 2006

Introduction to Programming, Aug-Dec 2006 Introduction to Programming, Aug-Dec 2006 Lecture 3, Friday 11 Aug 2006 Lists... We can implicitly decompose a list into its head and tail by providing a pattern with two variables to denote the two components

More information

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

More information

A.1 Numbers, Sets and Arithmetic

A.1 Numbers, Sets and Arithmetic 522 APPENDIX A. MATHEMATICS FOUNDATIONS A.1 Numbers, Sets and Arithmetic Numbers started as a conceptual way to quantify count objects. Later, numbers were used to measure quantities that were extensive,

More information

YOGYAKARTA STATE UNIVERSITY MATHEMATICS AND NATURAL SCIENCES FACULTY MATHEMATICS EDUCATION STUDY PROGRAM

YOGYAKARTA STATE UNIVERSITY MATHEMATICS AND NATURAL SCIENCES FACULTY MATHEMATICS EDUCATION STUDY PROGRAM YOGYAKARTA STATE UNIVERSITY MATHEMATICS AND NATURAL SCIENCES FACULTY MATHEMATICS EDUCATION STUDY PROGRAM TOPIC 1 INTRODUCING SOME MATHEMATICS SOFTWARE (Matlab, Maple and Mathematica) This topic provides

More information

Imputation for missing data through artificial intelligence 1

Imputation for missing data through artificial intelligence 1 Ninth IFC Conference on Are post-crisis statistical initiatives completed? Basel, 30-31 August 2018 Imputation for missing data through artificial intelligence 1 Byeungchun Kwon, Bank for International

More information

Matrix algebra. Basics

Matrix algebra. Basics Matrix.1 Matrix algebra Matrix algebra is very prevalently used in Statistics because it provides representations of models and computations in a much simpler manner than without its use. The purpose of

More information

Package datetime. May 3, 2017

Package datetime. May 3, 2017 Type Package Title Nominal Dates, Times, and Durations Version 0.1.2 Author Tim Bergsma Package datetime May 3, 2017 Maintainer Tim Bergsma Description Provides methods for working

More information

limma: A brief introduction to R

limma: A brief introduction to R limma: A brief introduction to R Natalie P. Thorne September 5, 2006 R basics i R is a command line driven environment. This means you have to type in commands (line-by-line) for it to compute or calculate

More information

Package narray. January 28, 2018

Package narray. January 28, 2018 Package narray January 28, 2018 Title Subset- And Name-Aware Array Utility Functions Version 0.4.0 Author Michael Schubert Maintainer Michael Schubert Stacking

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

Package fimport. November 20, 2017

Package fimport. November 20, 2017 Package fimport November 20, 2017 Title Rmetrics - Importing Economic and Financial Data Date 2017-11-12 Version 3042.85 Author Diethelm Wuertz [aut], Tobias Setz [cre], Yohan Chalabi [ctb] Maintainer

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Package Brobdingnag. R topics documented: March 19, 2018

Package Brobdingnag. R topics documented: March 19, 2018 Type Package Title Very Large Numbers in R Version 1.2-5 Date 2018-03-19 Author Depends R (>= 2.13.0), methods Package Brobdingnag March 19, 2018 Maintainer Handles very large

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Data Frames Dr. David Koop List, Array, or Series? [[1,2,3],[4,5,6]] 2 List, Array, or Series? [[1,2,3],[4,5,6]] 3 List, Array, or Series? Which should I use to store

More information

Fundamentals of the J Programming Language

Fundamentals of the J Programming Language 2 Fundamentals of the J Programming Language In this chapter, we present the basic concepts of J. We introduce some of J s built-in functions and show how they can be applied to data objects. The pricinpals

More information

Package arphit. March 28, 2019

Package arphit. March 28, 2019 Type Package Title RBA-style R Plots Version 0.3.1 Author Angus Moore Package arphit March 28, 2019 Maintainer Angus Moore Easily create RBA-style graphs

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Implementing S4 objects in your package: Exercises

Implementing S4 objects in your package: Exercises Implementing S4 objects in your package: Exercises Hervé Pagès 17-18 February, 2011 Contents 1 Introduction 1 2 Part I: Implementing the GWASdata class 3 2.1 Class definition............................

More information

Overview of the intervals package

Overview of the intervals package Overview of the intervals package Richard Bourgon 06 June 2009 Contents 1 Introduction 1 2 Interpretation of objects 2 2.1 As a subset of Z or R........................ 3 2.2 As a set of meaningful, possibly

More information

Chapter 2: Number Systems

Chapter 2: Number Systems Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

More information

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 Quick Summary A workbook an Excel document that stores data contains one or more pages called a worksheet. A worksheet or spreadsheet is stored in a workbook, and

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Data Frames Dr. David Koop 2D Indexing [W. McKinney, Python for Data Analysis] 2 Boolean Indexing names == 'Bob' gives back booleans that represent the elementwise

More information

Outline Introduction Sequences Ranges Views Interval Datasets. IRanges. Bioconductor Infrastructure for Sequence Analysis.

Outline Introduction Sequences Ranges Views Interval Datasets. IRanges. Bioconductor Infrastructure for Sequence Analysis. IRanges Bioconductor Infrastructure for Sequence Analysis November 24, 2009 1 Introduction 2 Sequences Background RLEs 3 Ranges Basic Manipulation Simple Transformations Ranges as Sets Overlap 4 Views

More information

Introduction to the R Language

Introduction to the R Language Introduction to the R Language Data Types and Basic Operations Starting Up Windows: Double-click on R Mac OS X: Click on R Unix: Type R Objects R has five basic or atomic classes of objects: character

More information

CSE 341 Section Handout #6 Cheat Sheet

CSE 341 Section Handout #6 Cheat Sheet Cheat Sheet Types numbers: integers (3, 802), reals (3.4), rationals (3/4), complex (2+3.4i) symbols: x, y, hello, r2d2 booleans: #t, #f strings: "hello", "how are you?" lists: (list 3 4 5) (list 98.5

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Package adaptivetau. October 25, 2016

Package adaptivetau. October 25, 2016 Type Package Title Tau-Leaping Stochastic Simulation Version 2.2-1 Date 2016-10-24 Author Philip Johnson Maintainer Philip Johnson Depends R (>= 2.10.1) Enhances compiler Package adaptivetau

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018 Contents 1 Suggested ahead activities 1 2 Introduction to R 2 2.1 Learning Objectives......................................... 2 3 Starting

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Statistical Computing (36-350)

Statistical Computing (36-350) Statistical Computing (36-350) Lecture 1: Introduction to the course; Data Cosma Shalizi and Vincent Vu 29 August 2011 Why good statisticians learn how to program Independence: otherwise, you rely on someone

More information

3 Excel Tips for Marketing Efficiency

3 Excel Tips for Marketing Efficiency 3 Excel Tips for Marketing Efficiency 3 Excel Database Tips for Marketing Efficiency In these challenging times, companies continue to reduce staff to save money. Those who remain must do more. How to

More information

Package lgarch. September 15, 2015

Package lgarch. September 15, 2015 Type Package Package lgarch September 15, 2015 Title Simulation and Estimation of Log-GARCH Models Version 0.6-2 Depends R (>= 2.15.0), zoo Date 2015-09-14 Author Genaro Sucarrat Maintainer Genaro Sucarrat

More information

Package Formula. July 11, 2017

Package Formula. July 11, 2017 Version 1.2-2 Date 2017-07-11 Title Extended Model Formulas Package Formula July 11, 2017 Description Infrastructure for extended formulas with multiple parts on the right-hand side and/or multiple responses

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

Package statar. July 6, 2017

Package statar. July 6, 2017 Package statar July 6, 2017 Title Tools Inspired by 'Stata' to Manipulate Tabular Data Version 0.6.5 A set of tools inspired by 'Stata' to eplore data.frames ('summarize', 'tabulate', 'tile', 'pctile',

More information

PROBLEM SOLVING AND OFFICE AUTOMATION. A Program consists of a series of instruction that a computer processes to perform the required operation.

PROBLEM SOLVING AND OFFICE AUTOMATION. A Program consists of a series of instruction that a computer processes to perform the required operation. UNIT III PROBLEM SOLVING AND OFFICE AUTOMATION Planning the Computer Program Purpose Algorithm Flow Charts Pseudo code -Application Software Packages- Introduction to Office Packages (not detailed commands

More information

Sections in this manual

Sections in this manual 1 Sections in this manual Argus Analytics 2 The service 2 Benefits 2 Launching Argus Analytics 3 Search Interface breakdown 4 Add-in Navigation 5 Search: Free text & Facet 5 Search: Facet filter 6 Filters

More information

Using R to provide statistical functionality for QSAR modeling in CDK

Using R to provide statistical functionality for QSAR modeling in CDK Vol. 2/1, March 2005 7 Using R to provide statistical functionality for QSAR modeling in CDK A description of the integration of CDK and the R statistical environment Rajarshi Guha Introduction A previous

More information

1. To condense data in a single value. 2. To facilitate comparisons between data.

1. To condense data in a single value. 2. To facilitate comparisons between data. The main objectives 1. To condense data in a single value. 2. To facilitate comparisons between data. Measures :- Locational (positional ) average Partition values Median Quartiles Deciles Percentiles

More information

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3 GRETL FOR TODDLERS!! JAVIER FERNÁNDEZ-MACHO CONTENTS 1. Access to the econometric software 3 2. Loading and saving data: the File menu 3 2.1. A new data set: 3 2.2. An existent data set: 3 2.3. Importing

More information

Package proxy. October 29, 2017

Package proxy. October 29, 2017 Package proxy October 29, 2017 Type Package Title Distance and Similarity Measures Version 0.4-19 Description Provides an extensible framework for the efficient calculation of auto- and crossproximities,

More information

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors. 1 LECTURE 3 OUTLINES Variable names in MATLAB Examples Matrices, Vectors and Scalar Scalar Vectors Entering a vector Colon operator ( : ) Mathematical operations on vectors examples 2 VARIABLE NAMES IN

More information

Package TimeWarp. R topics documented: July 22, 2016

Package TimeWarp. R topics documented: July 22, 2016 Type Package Title Date Calculations and Manipulation Version 1.0.15 Date 2016-07-19 Author Tony Plate, Jeffrey Horner, Lars Hansen Maintainer Tony Plate

More information

Spatial and multi-scale data assimilation in EO-LDAS. Technical Note for EO-LDAS project/nceo. P. Lewis, UCL NERC NCEO

Spatial and multi-scale data assimilation in EO-LDAS. Technical Note for EO-LDAS project/nceo. P. Lewis, UCL NERC NCEO Spatial and multi-scale data assimilation in EO-LDAS Technical Note for EO-LDAS project/nceo P. Lewis, UCL NERC NCEO Abstract Email: p.lewis@ucl.ac.uk 2 May 2012 In this technical note, spatial data assimilation

More information

Field Types and Import/Export Formats

Field Types and Import/Export Formats Chapter 3 Field Types and Import/Export Formats Knowing Your Data Besides just knowing the raw statistics and capacities of your software tools ( speeds and feeds, as the machinists like to say), it s

More information

4 Displaying Multiway Tables

4 Displaying Multiway Tables 4 Displaying Multiway Tables An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

Programming R. Manuel J. A. Eugster. Chapter 4: Writing your own functions. Last modification on May 15, 2012

Programming R. Manuel J. A. Eugster. Chapter 4: Writing your own functions. Last modification on May 15, 2012 Manuel J. A. Eugster Programming R Chapter 4: Writing your own functions Last modification on May 15, 2012 Draft of the R programming book I always wanted to read http://mjaeugster.github.com/progr Licensed

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center Introduction to R Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center What is R? R is a statistical computing environment with graphics capabilites It is fully scriptable

More information

Getting Started with montaj GridKnit

Getting Started with montaj GridKnit Getting Started with montaj GridKnit Today s explorers use regional and local geophysical compilations of magnetic, radiometric and apparent resistivity data to perform comprehensive geologic interpretations

More information

Lecture Notes on Ints

Lecture Notes on Ints Lecture Notes on Ints 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 26, 2010 1 Introduction Two fundamental types in almost any programming language are booleans and integers.

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat Grid Workflow Efficient Enactment for Data Intensive Applications L3.4 Data Management Techniques Authors : Eddy Caron Frederic Desprez Benjamin Isnard Johan Montagnat Summary : This document presents

More information