foreach/iterators User s Guide

Size: px
Start display at page:

Download "foreach/iterators User s Guide"

Transcription

1 foreach/iterators User s Guide April 17, 2015 Palo Alto Seattle Dallas Singapore London

2 Copyright 2015 Revolution Analytics. All rights reserved. Revolution R, Revolution R Enterprise, RPE, RevoScaleR, DeployR, NetWorkSpaces, NWS, ParallelR, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks are the property of their respective owners.

3 Contents 1 Parallelizing Loops Using foreach Parallel Backends Using the doparallel parallel backend Getting information about the parallel backend Nesting Calls to foreach Using Iterators Some Special Iterators Writing Iterators A Function and Class Reference 15 A.1 iterators package iapply icount idiv iread.table ireadlines irnorm isplit iter iterators-package nextelem A.2 foreach package foreach foreach-ext foreach-package getdoparworkers setdopar registerdoseq A.3 doparallel package

4 4 CONTENTS doparallel-package registerdoparallel A.4 domc package domc-package registerdomc A.5 multicore package children fork mclapply multicore parallel process sendmaster signals

5 Chapter 1 Parallelizing Loops One common approach to parallelization is to see if the iterations within a loop can be performed independently, and if so, then try to run the iterations concurrently rather than sequentially. The foreach and iterators packages can help you do this loop parallelization quickly and easily. 1.1 Using foreach The foreach package is a set of tools that allow you to run virtually anything that can be expressed as a for-loop as a set of parallel tasks. One application of this is to allow multiple simulations to run in parallel. As a simple example, consider the case of simulating coin flips, which can be done by sampling with replacement from the vector c("h", "T"). To run this simulation 10 times sequentially, use foreach with the %do% operator: > library(foreach) > foreach(i=1:10) %do% sample(c("h", "T"), 10000, replace=true) Comparing the foreach output with that of a similar for loop shows one obvious difference: foreach returns a list containing the value returned by each computation. A for loop, by contrast, returns only the value of its last computation, and relies on user-defined side effects to do its work. We can parallelize the operation immediately by replacing %do% with %dopar%: > foreach(i=1:10) %dopar% sample(c("h", "T"), 10000, replace=true) However, if we run this example, we see the following warning: Warning message: executing %dopar% sequentially: no parallel backend registered To actually run in parallel, we need to have a parallel backend for foreach. Parallel backends are discussed in the next section. 5

6 6 CHAPTER 1. PARALLELIZING LOOPS 1.2 Parallel Backends In order for loops coded with foreach to run in parallel, you must register a parallel backend to manage the execution of the loop. Any type of mechanism for running code in parallel could potentially have a parallel backend written for it. Currently, Revolution R Enterprise includes the doparallel backend; this uses the parallel package of R or later to run jobs in parallel, using either of the component parallelization methods incorporated into the parallel package: SNOW-like functionality using socket connections or multicore-like functionality using forking (on Linux only). The doparallel package is a parallel backend for foreach that is intended for parallel processing on a single computer with multiple cores or processors. Additional parallel backends are available from CRAN: dompi for use with the Rmpi package doredis for use with the rredis package domc provides access to the multicore functionality of the parallel package dosnow for use with the now superseded SNOW package. To use a parallel backend, you must first register it. Once a parallel backend is registered, calls to %dopar% run in parallel using the mechanisms provided by the parallel backend. However, the details of registering the parallel backends differ, so we consider them separately Using the doparallel parallel backend The parallel package of R and later combines elements of snow and multicore; doparallel similarly combines elements of both dosnow and domc. You can register doparallel with a cluster, as with dosnow, or with a number of cores, as with domc. For example, here we create a cluster and register it: > library(doparallel) > cl <- makecluster(4) > registerdoparallel(cl) Once you ve registered the parallel backend, you re ready to run foreach code in parallel. For example, to see how long it takes to run 10,000 bootstrap iterations in parallel on all available cores, you can run the following code: > x <- iris[which(iris[,5]!= "setosa"), c(1,5)] > trials < > ptime <- system.time({

7 1.3. NESTING CALLS TO FOREACH 7 + r <- foreach(icount(trials),.combine = cbind) %dopar% { + ind <- sample(100, 100, replace = TRUE) + result1 <- glm(x[ind, 2] ~ x[ind, 1], family=binomial(logit)) + coefficients(result1) + } + })[3] > ptime Getting information about the parallel backend To find out how many workers foreach is going to use, you can use the getdoparworkers function: > getdoparworkers() This is a useful sanity check that you re actually running in parallel. If you haven t registered a parallel backend, or if your machine only has one core, getdoparworkers will return 1. In either case, don t expect a speed improvement. The getdoparworkers function is also useful when you want the number of tasks to be equal to the number of workers. You may want to pass this value to an iterator constructor, for example. You can also get the name and version of the currently registered backend: > getdoparname() > getdoparversion() 1.3 Nesting Calls to foreach An important feature of foreach is nesting operator %:%. Like the %do% and %dopar% operators, it is a binary operator, but it operates on two foreach objects. It also returns a foreach object, which is essentially a special merger of its operands. Let s say that we want to perform a Monte Carlo simulation using a function called sim. The sim function takes two arguments, and we want to call it with all combinations of the values that are stored in the vectors avec and bvec. The following doubly-nested for loop does that. For testing purposes, the sim function is defined to return 10a + b (although an operation this trivial is not worth executing in parallel): sim <- function(a, b) 10 * a + b avec <- 1:2 bvec <- 1:4

8 8 CHAPTER 1. PARALLELIZING LOOPS x <- matrix(0, length(avec), length(bvec)) for (j in 1:length(bvec)) { for (i in 1:length(avec)) { x[i,j] <- sim(avec[i], bvec[j]) } } x In this case, it makes sense to store the results in a matrix, so we create one of the proper size called x, and assign the return value of sim to the appropriate element of x each time through the inner loop. When using foreach, we don t create a matrix and assign values into it. Instead, the inner loop returns the columns of the result matrix as vectors, which are combined in the outer loop into a matrix. Here s how to do that using the %:% operator: x <- foreach(b=bvec,.combine='cbind') %:% foreach(a=avec,.combine='c') %do% { sim(a, b) } x s is structured very much like the nested for loop. The outer foreach is iterating over the values in bvec, passing them to the inner foreach, which iterates over the values in avec for each value of bvec. Thus, the sim function is called in the same way in both cases. The code is slightly cleaner in this version, and has the advantage of being easily parallelized. When parallelizing nested for loops, there is always a question of which loop to parallelize. The standard advice is to parallelize the outer loop. This results in larger individual tasks, and larger tasks can often be performed more efficiently than smaller tasks. However, if the outer loop doesn t have many iterations and the tasks are already large, parallelizing the outer loop results in a small number of huge tasks, which may not allow you to use all of your processors, and can also result in load balancing problems. You could parallelize an inner loop instead, but that could be inefficient because you re repeatedly waiting for all the results to be returned every time through the outer loop. And if the tasks and number of iterations vary in size, then it s really hard to know which loop to parallelize. But in our Monte Carlo example, all of the tasks are completely independent of each other, and so they can all be executed in parallel. You really want to think of the loops as specifying a single stream of tasks. You just need to be careful to process all of the results correctly, depending on which iteration of the inner loop they came from. That is exactly what the %:% operator does: it turns multiple foreach loops into a single loop. That is why there is only one %do% operator in the example above. And when we parallelize

9 1.4. USING ITERATORS 9 that nested foreach loop by changing the %do% into a %dopar%, we are creating a single stream of tasks that can all be executed in parallel: x <- foreach(b=bvec,.combine='cbind') %:% foreach(a=avec,.combine='c') %dopar% { sim(a, b) } x Of course, we ll actually only run as many tasks in parallel as we have processors, but the parallel backend takes care of all that. The point is that the %:% operator makes it easy to specify the stream of tasks to be executed, and the.combine argument to foreach allows us to specify how the results should be processed. The backend handles executing the tasks in parallel. For more on nested foreach calls, see the vignette Nesting foreach Loops in the foreach package. 1.4 Using Iterators An iterator is a special type of object that generalizes the notion of a looping variable. When passed as an argument to a function that knows what to do with it, the iterator supplies a sequence of values. The iterator also maintains information about its state, in particular its current index. The iterators package includes a number of functions for creating iterators, the simplest of which is iter, which takes virtually any R object and turns it into an iterator object. The simplest function that operates on iterators is the nextelem function, which when given an iterator, returns the next value of the iterator. For example, here we create an iterator object from the sequence 1 to 10, and then use nextelem to iterate through the values: > i1 <- iter(1:10) > nextelem(i1) [1] 1 > nextelem(i1) [2] 2 You can create iterators from matrices and data frames, using the by argument to specify whether to iterate by row or column: > istate <- iter(state.x77, by='row') > nextelem(istate) Population Income Illiteracy Life Exp Murder HS Grad Frost Area Alabama

10 10 CHAPTER 1. PARALLELIZING LOOPS > nextelem(istate) Population Income Illiteracy Life Exp Murder HS Grad Frost Area Alaska Iterators can also be created from functions, in which case the iterator can be an endless source of values: > ifun <- iter(function() sample(0:9, 4, replace=true)) > nextelem(ifun) [1] > nextelem(ifun) [1] For practical applications, iterators can be paired with foreach to obtain parallel results quite easily: > x <- matrix(rnorm( ), ncol=1000) > itx <- iter(x, by='row') > foreach(i=itx,.combine=c) %dopar% mean(i) Some Special Iterators The notion of an iterator is new to R, but should be familiar to users of languages such as Python. The iterators package includes a number of special functions that generate iterators for some common scenarios. For example, the irnorm function creates an iterator for which each value is drawn from a specified random normal distribution: > library(iterators) > itrn <- irnorm(1, count=10) > nextelem(itrn) [1] > nextelem(itrn) [1] Similarly, the irunif, irbinom, and irpois functions create iterators which drawn their values from uniform, binomial, and Poisson distributions, respectively. We can then use these functions just as we used irnorm: > itru <- irunif(1, count=10) > nextelem(itru) [1] > nextelem(itru) [1]

11 1.4. USING ITERATORS 11 The icount function returns an iterator that counts starting from one: > it <- icount(3) > nextelem(it) [1] 1 > nextelem(it) [1] 2 > nextelem(it) [1] Writing Iterators There will be times when you need an iterator that isn t provided by the iterators package. That is when you need to write your own custom iterator. Basically, an iterator is an S3 object whose base class is "iter", and has iter and nextelem methods. The purpose of the iter method is to return an iterator for the specified object. For iterators, that usually just means returning itself, which seems odd at first. But the iter method can be defined for other objects that don t define a nextelem method. We call those objects iterables, meaning that you can iterate over them. The iterators package defines iter methods for vectors, lists, matrices, and data frames, making those objects iterables. By defining an iter method for iterators, they can be used in the same context as an iterable, which can be convenient. For example, the foreach function takes iterables as arguments. It calls the iter method on those arguments in order to create iterators for them. By defining the iter method for all iterators, we can pass iterators to foreach that we created using any method we choose. Thus, we can pass vectors, lists, or iterators to foreach, and they are all processed by foreach in exactly the same way. The iterators package comes with an iter method defined for the "iter" class that simply returns itself. That is usually all that is needed for an iterator. However, if you want to create an iterator for some existing class, you can do that by writing an iter method that returns an appropriate iterator. That will allow you to pass an instance of your class to foreach, which will automatically convert it into an iterator. The alternative is to write your own function that takes arbitrary arguments, and returns an iterator. You can choose whichever method is most natural. The most important method required for iterators is nextelem. This simply returns the next value, or throws an error. Calling the stop function with the string "StopIteration" indicates that there are no more values available in the iterator. In most cases, you don t actually need to write the iter and nextelem methods; you can inherit them. By inheriting from the class abstractiter, you can use the following methods as the basis of your own iterators: > iterators:::iter.iter

12 12 CHAPTER 1. PARALLELIZING LOOPS function (obj,...) { obj } <environment: namespace:iterators> > iterators:::nextelem.abstractiter function (obj,...) { obj$nextelem() } <environment: namespace:iterators> The following function creates a simple iterator that uses these two methods: iforever <- function(x) { nextel <- function() x obj <- list(nextelem=nextel) class(obj) <- c('iforever', 'abstractiter', 'iter') obj } Note that we called the internal function nextel rather than nextelem to avoid masking the standard nextelem generic function. That causes problems when you want your iterator to call the nextelem method of another iterator, which can be quite useful. We create an instance of this iterator by calling the iforever function, and then use it by calling the nextelem method on the resulting object: it <- iforever(42) nextelem(it) nextelem(it) Notice that it doesn t make sense to implement this iterator by defining a new iter method, since there is no natural iterable on which to dispatch. The only argument that we need is the object for the iterator to return, which can be of any type. Instead, we implement this iterator by defining a normal function that returns the iterator. This iterator is quite simple to implement, and possibly even useful. Be careful, however, how you you use this iterator. If you pass it to foreach, it will result in an infinite loop unless you pair it with a finite iterator. And never pass this iterator to as.list without the n argument. The iterator returned by iforever is a list that has a single element named nextelem, whose value is a function that returns the value of x. Because we are subclassing abstractiter, we inherit a nextelem method that will call this function, and because we are subclassing iter, we inherit an iter method that will return itself.

13 1.4. USING ITERATORS 13 Of course, the reason this iterator is so simple is because it doesn t contain any state. Most iterators need to contain some state, or it will be difficult to make it return different values and eventually stop. Managing the state is usually the real trick to writing iterators. As an example of writing a stateful iterator, Let s modify the previous iterator to put a limit on the number of values that it returns. I ll call the new function irep, and give it another argument called times: irep <- function(x, times) { nextel <- function() { if (times > 0) { times <<- times - 1 } else { stop('stopiteration') } x } obj <- list(nextelem=nextel) class(obj) <- c('irep', 'abstractiter', 'iter') obj } Now let s try it out: it <- irep(7, 6) unlist(as.list(it)) The real difference between iforever and irep is in the function that gets called by the nextelem method. This function not only accesses the values of the variables x and times, but it also modifies the value of times. This is accomplished by means of the <<- operator, and the magic of lexical scoping. Technically, this kind of function is called a closure, and is a somewhat advanced feature of R. The important thing to remember is that nextel is able to get the value of variables that were passed as arguments to irep, and it can modify those values using the <<- operator. These are not global variables: they are defined in the enclosing environment of the nextel function. You can create as many iterators as you want using the irep function, and they will all work as expected without conflicts. Note that this iterator only uses the arguments to irep to store its state. If any other state variables are needed, they can be defined anywhere inside the irep function. More examples of writing iterators can be found in the vignette Writing Custom Iterators in the iterators package.

14 14 CHAPTER 1. PARALLELIZING LOOPS

15 Appendix A Function and Class Reference A.1 iterators package iapply Array/Apply Iterator Returns an iterator over an array, which iterates over the array in much the same manner as the apply function. Usage iapply(x, MARGIN) Arguments X MARGIN the array to iterate over. a vector of subscripts. 1 indicates the first dimension (rows), 2 indicates the second dimension (columns), etc. Value The apply iterator. 15

16 16 icount See Also apply Examples a <- array(1:8, c(2, 2, 2)) # iterate over all the matrices it <- iapply(a, 3) as.list(it) # iterate over all the columns of all the matrices it <- iapply(a, c(2, 3)) as.list(it) # iterate over all the rows of all the matrices it <- iapply(a, c(1, 3)) as.list(it) icount Counting Iterators Returns an iterator that counts starting from one. Usage icount(count) icountn(vn) Arguments count vn number of times that the iterator will fire. If not specified, it will count forever. vector of counts.

17 idiv 17 Value The counting iterator. Examples # create an iterator that counts from 1 to 3. it <- icount(3) nextelem(it) nextelem(it) nextelem(it) try(nextelem(it)) # expect a StopIteration exception idiv Dividing Iterator Usage Returns an iterator that returns pieces of numeric value. idiv(n,..., chunks, chunksize) Arguments n Value... unused. chunks chunksize The dividing iterator. number of times that the iterator will fire. If not specified, it will count forever. the number of pieces that n should be divided into. This is useful when you know the number of pieces that you want. If specified, then chunksize should not be. the maximum size of the pieces that n should be divided into. This is useful when you know the size of the pieces that you want. If specified, then chunks should not be.

18 18 iread.table Examples # divide the value 10 into 3 pieces it <- idiv(10, chunks=3) nextelem(it) nextelem(it) nextelem(it) try(nextelem(it)) # expect a StopIteration exception # divide the value 10 into pieces no larger than 3 it <- idiv(10, chunksize=3) nextelem(it) nextelem(it) nextelem(it) nextelem(it) try(nextelem(it)) # expect a StopIteration exception iread.table Iterator over Rows of a Data Frame Stored in a File Usage Returns an iterator over the rows of a data frame stored in a file in table format. It is a wrapper around the standard read.table function. iread.table(file,..., verbose=false) Arguments Value file the name of the file to read the data from.... all additional arguments are passed on to the read.table function. See the documentation for read.table for more information. verbose The file reading iterator. logical value indicating whether or not to print the calls to read.table.

19 ireadlines 19 Note In this version of iread.table, both the read.table arguments header and row.names must be specified. This is because the default values of these arguments depend on the contents of the beginning of the file. In order to make the subsequent calls to read.table work consistently, the user must specify those arguments explicitly. A future version of iread.table may remove this requirement. See Also read.table ireadlines Iterator over Lines of Text from a Connection Returns an iterator over the lines of text from a connection. It is a wrapper around the standard readlines function. Usage ireadlines(con, n=1,...) Arguments con n Value a connection object or a character string. integer. The maximum number of lines to read. Negative values indicate that one should read up to the end of the connection. The default value is passed on to the readlines function. The line reading iterator. See Also readlines

20 20 irnorm Examples # create an iterator over the lines of COPYING it <- ireadlines(file.path(r.home(), 'COPYING')) nextelem(it) nextelem(it) nextelem(it) irnorm Random Number Iterators These function returns an iterators that return random numbers of various distributions. Each one is a wrapper around a standard R function. Usage irnorm(..., count) irunif(..., count) irbinom(..., count) irnbinom(..., count) irpois(..., count) Arguments count number of times that the iterator will fire. If not specified, it will fire values forever.... arguments to pass to the underlying rnorm function. Value An iterator that is a wrapper around the corresponding random number generator function.

21 isplit 21 Examples # create an iterator that returns three random numbers it <- irnorm(1, count=3) nextelem(it) nextelem(it) nextelem(it) try(nextelem(it)) # expect a StopIteration exception isplit Split Iterator Returns an iterator that divides the data in the vector x into the groups defined by f. Usage isplit(x, f, drop=false,...) Arguments x vector or data frame of values to be split into groups. f a factor or list of factors used to categorize x. drop logical indicating if levels that do not occur should be dropped.... current ignored. Value The split iterator. See Also split

22 22 iter Examples x <- rnorm(200) f <- factor(sample(1:10, length(x), replace=true)) it <- isplit(x, f) expected <- split(x, f) for (i in expected) { actual <- nextelem(it) stopifnot(actual$value == i) } iter Iterator Factory Functions iter is a generic function used to create iterator objects. Usage iter(obj,...) ## Default S3 method: iter(obj, checkfunc=function(...) TRUE, recycle=false,...) ## S3 method for class 'iter' iter(obj,...) ## S3 method for class 'matrix' iter(obj, by=c('column', 'cell', 'row'), chunksize=1l, checkfunc=function(...) TRUE, recycle=false,...) ## S3 method for class 'data.frame' iter(obj, by=c('column', 'row'), checkfunc=function(...) TRUE, recycle=false,...) ## S3 method for class 'function' iter(obj, checkfunc=function(...) TRUE, recycle=false,...)

23 iter 23 Arguments obj by Value chunksize checkfunc recycle an object from which to generate an iterator. how to iterate. the number of elements of by to return with each call to nextelem. a function which, when passed an iterator value, return TRUE or FALSE. If FALSE, the value is skipped in the iteration. a boolean describing whether the iterator should reset after running through all it s values.... additional arguments affecting the iterator. The iterator. Examples # a vector iterator i1 <- iter(1:3) nextelem(i1) nextelem(i1) nextelem(i1) # a vector iterator with a checkfunc i1 <- iter(1:3, checkfunc=function(i) i %% 2 == 0) nextelem(i1) # a data frame iterator by column i2 <- iter(data.frame(x=1:3, y=10, z=c('a', 'b', 'c'))) nextelem(i2) nextelem(i2) nextelem(i2) # a data frame iterator by row i3 <- iter(data.frame(x=1:3, y=10), by='row') nextelem(i3) nextelem(i3) nextelem(i3)

24 24 nextelem # a function iterator i4 <- iter(function() rnorm(1)) nextelem(i4) nextelem(i4) nextelem(i4) iterators-package The Iterators Package The iterators package provides tools for iterating over various R data structures. Iterators are available for vectors, lists, matrices, data frames, and files. By following very simple conventions, new iterators can be written to support any type of data source, such as database queries or dynamically generated data. Details Further information is available in the following help topics: iter nextelem icount idiv ireadlines Generic function used to create iterator objects. Generic function used to get the next element of a iterator. A function used to create a counting iterator. A function used to create a number dividing iterator. A function used to create a file reading iterator. For a complete list of functions with individual help pages, use library(help="iterators"). nextelem Get Next Element of Iterator nextelem is a generic function used to produce values. If a checkfunc was specified to the constructor, the potential iterated values will be passed to the checkfunc until the checkfunc returns TRUE. When the iterator has no more values, it calls stop with the message StopIteration.

25 A.2. FOREACH PACKAGE 25 Usage nextelem(obj,...) ## S3 method for class 'containeriter' nextelem(obj,...) ## S3 method for class 'funiter' nextelem(obj,...) Arguments obj Value an iterator object.... additional arguments that are ignored. The value. Examples it <- iter(c('a', 'b', 'c')) print(nextelem(it)) print(nextelem(it)) print(nextelem(it)) A.2 foreach package foreach foreach %do% and %dopar% are binary operators that operate on a foreach object and an R expression. The expression, ex, is evaluated multiple times in an environment that is created by the foreach object, and that environment is modified for each evaluation as specified by the foreach object. %do% evaluates the expression sequentially, while %dopar% evalutes it in parallel. The results of evaluating ex are returned as a list by default, but this can be modified by means of the.combine argument.

26 26 foreach Usage foreach(...,.combine,.init,.final=null,.inorder=true,.multicombine=false,.maxcombine=if (.multicombine) 100 else 2,.errorhandling=c('stop', 'remove', 'pass'),.packages=null,.export=null,.noexport=null,.verbose=false) when(cond) e1 %:% e2 obj %do% ex obj %dopar% ex times(n) Arguments... one or more arguments that control how ex is evaluated. Named arguments specify the name and values of variables to be defined in the evaluation environment. An unnamed argument can be used to specify the number of times that ex should be evaluated. At least one argument must be specified in order to define the number of times ex should be executed..combine.init.final function that is used to process the tasks results as they generated. This can be specified as either a function or a non-empty character string naming the function. Specifying c is useful for concatenating the results into a vector, for example. The values cbind and rbind can combine vectors into a matrix. The values + and * can be used to process numeric data. By default, the results are returned in a list. initial value to pass as the first argument of the.combine function. This should not be specified unless.combine is also specified. function of one argument that is called to return final result..inorder logical flag indicating whether the.combine function requires the task results to be combined in the same order that they were submitted. If the order is not important, then it setting.inorder to FALSE can give improved performance. The default value is TRUE..multicombine logical flag indicating whether the.combine function can accept more than two arguments. If an arbitrary.combine function is specified, by default, that function will always be called with two arguments. If it can take more than two arguments, then setting.multicombine to TRUE could improve

27 foreach 27 the performance. The default value is FALSE unless the.combine function is cbind, rbind, or c, which are known to take more than two arguments..maxcombine maximum number of arguments to pass to the combine function. This is only relevant if.multicombine is TRUE..errorhandling specifies how a task evalution error should be handled. If the value is stop, then execution will be stopped via the stop function if an error occurs. If the value is remove, the result for that task will not be returned, or passed to the.combine function. If it is pass, then the error object generated by task evaluation will be included with the rest of the results. It is assumed that the combine function (if specified) will be able to deal with the error object. The default value is stop..packages.export.noexport.verbose obj e1 e2 ex cond n Details character vector of packages that the tasks depend on. If ex requires a R package to be loaded, this option can be used to load that package on each of the workers. Ignored when used with %do%. character vector of variables to export. This can be useful when accessing a variable that isn t defined in the current environment. The default value in NULL. character vector of variables to exclude from exporting. This can be useful to prevent variables from being exported that aren t actually needed, perhaps because the symbol is used in a model formula. The default value in NULL. logical flag enabling verbose messages. This can be very useful for trouble shooting. foreach object used to control the evaluation of ex. foreach object to merge. foreach object to merge. the R expression to evaluate. condition to evaluate. number of times to evaluate the R expression. The foreach and %do%/%dopar% operators provide a looping construct that can be viewed as a hybrid of the standard for loop and lapply function. It looks similar to the for loop, and it evaluates an expression, rather than a function (as in lapply), but it s purpose is to return a value (a list, by default), rather than to cause side-effects. This faciliates parallelization, but looks more natural to people that prefer for loops to lapply.

28 28 foreach The %:% operator is the nesting operator, used for creating nested foreach loops. vignette("nested") at the R prompt for more details. Type Parallel computation depends upon a parallel backend that must be registered before performing the computation. The parallel backends available will be system-specific, but include doparallel, which uses R s built-in parallel package, domc, which uses the multicore package, and dosnow. Each parallel backend has a specific registration function, such as registerdoparallel or registerdosnow. The times function is a simple convenience function that calls foreach. It is useful for evaluating an R expression multiple times when there are no varying arguments. This can be convenient for resampling, for example. See Also iter Examples # equivalent to rnorm(3) times(3) %do% rnorm(1) # equivalent to lapply(1:3, sqrt) foreach(i=1:3) %do% sqrt(i) # equivalent to colmeans(m) m <- matrix(rnorm(9), 3, 3) foreach(i=1:ncol(m),.combine=c) %do% mean(m[,i]) # normalize the rows of a matrix in parallel, with parenthesis used to # force proper operator precedence # Need to register a parallel backend before this example will run # in parallel foreach(i=1:nrow(m),.combine=rbind) %dopar% (m[i,] / mean(m[i,])) # simple (and inefficient) parallel matrix multiply library(iterators) a <- matrix(1:16, 4, 4) b <- t(a)

29 foreach-ext 29 foreach(b=iter(b, by='col'),.combine=cbind) %dopar% (a %*% b) # split a data frame by row, and put them back together again without # changing anything d <- data.frame(x=1:10, y=rnorm(10)) s <- foreach(d=iter(d, by='row'),.combine=rbind) %dopar% d identical(s, d) # a quick sort function qsort <- function(x) { n <- length(x) if (n == 0) { x } else { p <- sample(n, 1) smaller <- foreach(y=x[-p],.combine=c) %:% when(y <= x[p]) %do% y larger <- foreach(y=x[-p],.combine=c) %:% when(y > x[p]) %do% y c(qsort(smaller), x[p], qsort(larger)) } } qsort(runif(12)) foreach-ext Foreach Extension Functions Usage These functions are used to write parallel backends for the foreach package. They should not be used from normal scripts or packages that use the foreach package. makeaccum(it) accumulate(obj, result, tag,...) getexports(ex, e, env, good=character(0), bad=character(0)) getresult(obj,...) geterrorvalue(obj,...) geterrorindex(obj,...)

30 30 foreach-package Arguments it ex e Note env good bad obj result tag... unused. foreach iterator. call object to analyze. local environment of the call object. exported environment in which call object will be evaluated. names of symbols that are being exported. names of symbols that are not being exported. foreach iterator object. task result to accumulate. tag of task result to accumulate. These functions are likely to change in future versions of the foreach package. When they become more stable, they will be documented. foreach-package The Foreach Package The foreach package provides a new looping construct for executing R code repeatedly. The main reason for using the foreach package is that it supports parallel execution. The foreach package can be used with a variety of different parallel computing systems, include NetWorkSpaces and snow. In addition, foreach can be used with iterators, which allows the data to specified in a very flexible way. Details Further information is available in the following help topics: foreach %do% %dopar% Specify the variables to iterate over Execute the R expression sequentially Execute the R expression using the currently registered backend

31 getdoparworkers 31 To see a tutorial introduction to the foreach package, use vignette("foreach"). To see a demo of foreach computing the sinc function, use demo(sincseq). Some examples (in addition to those in the help pages) are included in the examples directory of the foreach package. To list the files in the examples directory, use list.files(system.file(" package="foreach")). To run the bootstrap example, use source(system.file("examples", "bootseq.r", package="foreach")). For a complete list of functions with individual help pages, use library(help="foreach"). getdoparworkers Functions Providing Information on the dopar Backend Usage The getdoparworkers function returns the number of execution workers there are in the currently registered dopar backend. It can be useful when determining how to split up the work to be executed in parallel. A 1 is returned by default. The getdoparregistered function returns TRUE if a dopar backend has been registered, otherwise FALSE. The getdoparname function returns the name of the currently registered dopar backend. A NULL is returned if no backend is registered. The getdoparversion function returns the version of the currently registered dopar backend. A NULL is returned if no backend is registered. getdoparworkers() getdoparregistered() getdoparname() getdoparversion() Examples cat(sprintf('%s backend is registered\n', if(getdoparregistered()) 'A' else 'No')) cat(sprintf('running with %d worker(s)\n', getdoparworkers())) (name <- getdoparname())

32 32 setdopar (ver <- getdoparversion()) if (getdoparregistered()) cat(sprintf('currently using %s [%s]\n', name, ver)) setdopar setdopar The setdopar function is used to register a parallel backend with the foreach package. This isn t normally executed by the user. Instead, packages that provide a parallel backend provide a function named registerdopar that calls setdopar using the appropriate arguments. Usage setdopar(fun, data=null, info=function(data, item) NULL) Arguments fun data info A function that implements the functionality of %dopar%. Data to passed to the registered function. Function that retrieves information about the backend. See Also %dopar%

33 registerdoseq 33 registerdoseq registerdoseq Usage The registerdoseq function is used to explicitly register a sequential parallel backend with the foreach package. This will prevent a warning message from being issued if the %dopar% function is called and no parallel backend has been registered. registerdoseq() See Also registerdosnow Examples # specify that %dopar% should run sequentially registerdoseq() A.3 doparallel package doparallel-package The doparallel Package The doparallel package provides a parallel backend for the foreach/%dopar% function using the parallel package of R and later. Details Further information is available in the following help topics:

34 34 registerdoparallel registerdoparallel register doparallel to be used by foreach/%dopar% To see a tutorial introduction to the doparallel package, use vignette("gettingstartedparallel"). To see a tutorial introduction to the foreach package, use vignette("foreach"). To see a demo of doparallel computing the sinc function, use demo(sincparallel). Some examples (in addition to those in the help pages) are included in the examples directory of the doparallel package. To list the files in the examples directory, use list.files(system.file package="doparallel")). To run the bootstrap example, use source(system.file("examples", "bootparallel.r", package="doparallel")). This is a simple benchmark, executing both sequentally and in parallel. There are many more examples that come with the foreach package, which will work with the doparallel package if it is registered as the parallel backend. For a complete list of functions with individual help pages, use library(help="doparallel"). registerdoparallel registerdoparallel Usage The registerdoparallel function is used to register the parallel backend with the foreach package. registerdoparallel(cl, cores=null,...) Arguments cl cores A cluster object as returned by makecluster, or the number of nodes to be created in the cluster. If not specified, on Windows a three worker cluster is created and used. The number of cores to use for parallel execution. If not specified, the number of cores is set to the value of options("cores"), if specified, or to one-half the number of cores detected by the parallel package.... Package options. Currently, only the nocompile option is supported. If nocompile is set to TRUE, compiler support is disabled.

35 domc-package 35 Details The parallel package from R and later provides functions for parallel execution of R code on machines with multiple cores or processors or multiple computers. It is essentially a blend of the snow and multicore packages. By default, the doparallel package uses snow-like functionality on Windows systems and multicore-like functionality on Unixlike systems. The snow-like functionality should work fine on Unix-like systems, but the multicore-like functionality is limited to a single sequential worker on Windows systems. On workstations with multiple cores running Unix-like operating systems, the system fork call is used to spawn copies of the current process. A.4 domc package domc-package The domc Package The domc package provides a parallel backend for the foreach/%dopar% function using Simon Urbanek s multicore package. Details Further information is available in the following help topics: registerdomc register domc to be used by foreach/%dopar% To see a tutorial introduction to the domc package, use vignette("gettingstartedmc"). To see a tutorial introduction to the foreach package, use vignette("foreach"). To see a demo of domc computing the sinc function, use demo(sincmc). Some examples (in addition to those in the help pages) are included in the examples directory of the domc package. To list the files in the examples directory, use list.files(system.file("e package="domc")). To run the bootstrap example, use source(system.file("examples", "bootmc.r", package="domc")). This is a simple benchmark, executing both sequentally and in parallel. There are many more examples that come with the foreach package, which will work with the domc package if it is registered as the parallel backend.

36 36 registerdomc For a complete list of functions with individual help pages, use library(help="domc"). registerdomc registerdomc The registerdomc function is used to register the multicore parallel backend with the foreach package. Usage registerdomc(cores=null,...) Arguments cores The number of cores to use for parallel execution. If not specified, the number of cores is set to the value of options("cores"), if specified, or to approximately half the number of cores detected by the parallel or multicore package.... Package options. Currently, only the nocompile option is supported. If nocompile is set to TRUE, compiler support is disabled. Details The multicore package by Simon Urbanek provides functions for parallel execution of R code on machines with multiple cores or processors, using the system fork call to spawn copies of the current process. The multicore package, and therefore registerdomc, should not be used in a GUI environment, because multiple processes then share the same GUI. A.5 multicore package

37 children 37 children Functions for management of parallel children processes Usage children returns all currently active children readchild reads data from a given child process selectchildren checks children for available data readchildren checks children for available data and reads from the first child that has available data sendchildstdin sends string (or data) to child s standard input kill sends a signal to a child process children() readchild(child) readchildren(timeout = 0) selectchildren(children = NULL, timeout = 0) sendchildstdin(child, what) kill(process, signal = SIGINT) Arguments child timeout children what process signal child process (object of the class childprocess) or a process ID (pid) timeout (in seconds, fractions supported) to wait before giving up. Negative numbers mean wait indefinitely (strongly discouraged as it blocks R and may be removed in the future). list of child processes or a single child process object or a vector of process IDs or NULL. If NULL behaves as if all currently known children were supplied. character or raw vector. In the former case elements are collapsed using the newline chracter. (But no trailing newline is added at the end!) process (object of the class process) or a process ID (pid) signal to send (one of SIG... constants see signals or a valid integer signal number)

38 38 fork Value children returns a list of child processes (or an empty list) readchild and readchildren return a raw vector with a "pid" attribute if data were available, integer vector of length one with the process ID if a child terminated or NULL if the child no longer exists (no children at all for readchildren). selectchildren returns TRUE is the timeout was reached, FALSE if an error occurred (e.g. if the master process was interrupted) or an integer vector of process IDs with children that have data available. sendchildstdin sends given content to the standard input (stdin) of the child process. Note that if the master session was interactive, it will also be echoed on the standard output of the master process (unless disabled). The function is vector-compatible, so you can specify more than one child as a list or a vector of process IDs. kill returns TRUE. Warning This is a very low-level API for expert use only. If you are interested in user-level parallel execution use mclapply, parallel and friends instead. Author(s) Simon Urbanek See Also fork, sendmaster, parallel, mclapply fork Fork a copy of the current R process fork creates a new child process as a copy of the current R process exit closes the current child process, informing the master process as necessary

39 fork 39 Usage fork() exit(exit.code = 0L, send = NULL) Arguments exit.code send process exit code. Currently it is not used by multicore, but other applciations might. By convention 0 signifies clean exit, 1 an error. if not NULL send this data before exiting (equivalent to using sendmaster) Details Value The fork function provides an interface to the fork system call. In addition it sets up a pipe between the master and child process that can be used to send data from the child process to the master (see sendmaster) and child s stdin is re-mapped to another pipe held by the master process (see link{sendchildstdin}). If you are not familiar with the fork system call, do not use this function since it leads to very complex inter-process interactions among the R processes involved. In a nutshell fork spawns a copy (child) of the current process, that can work in parallel to the master (parent) process. At the point of forking both processes share exactly the same state including the workspace, global options, loaded packages etc. Forking is relatively cheap in modern operating systems and no real copy of the used memory is created, instead both processes share the same memory and only modified parts are copied. This makes fork an ideal tool for parallel processing since there is no need to setup the parallel working environment, data and code is shared automatically from the start. It is strongly discouraged to use fork in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes). Child processes should never use on-screen graphics devices. fork returns an object of the class childprocess (to the master) and masterprocess (to the child). exit never returns Warning This is a very low-level API for expert use only. If you are interested in user-level parallel execution use mclapply, parallel and friends instead.

40 40 mclapply Note Windows opearting system lacks the fork system call so it cannot be used with multicore. Author(s) Simon Urbanek See Also parallel, sendmaster Examples p <- fork() if (inherits(p, "masterprocess")) { cat("i'm a child! ", Sys.getpid(), "\n") exit(,"i was a child") } cat("i'm the master\n") unserialize(readchildren(1.5)) mclapply Parallel version of lapply mclapply is a parallelized version of lapply, it returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X. Usage mclapply(x, FUN,..., mc.preschedule = TRUE, mc.set.seed = TRUE, mc.silent = FALS

41 mclapply 41 Arguments X FUN a vector (atomic or list) or an expressions vector. Other objects (including classed objects) will be coerced by as.list. the function to be applied to each element of X... optional arguments to FUN mc.preschedule if set to TRUE then the computation is first divided to (at most) as many jobs are there are cores and then the jobs are started, each job possibly covering more than one value. If set to FALSE then one job is spawned for each value of X sequentially (if used with mc.set.seed=false then random number sequences will be identical for all values). The former is better for short computations or large number of values in X, the latter is better for jobs that have high variance of completion time and not too many values of X. Details mc.set.seed mc.silent mc.cores if set to TRUE then each parallel process first sets its seed to something different from other processes. Otherwise all processes start with the same (namely current) seed. if set to TRUE then all output on stdout will be suppressed for all parallel processes spawned (stderr is not affected). The number of cores to use, i.e. how many processes will be spawned (at most) mclapply is a parallelized version of lapply. By default (mc.preschedule=true) the input vector/list X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2,... (core + 1)-th value to core 1 etc.) and then one process is spawned to each core and the results are collected. Due to the parallel nature of the execution random numbers are not sequential (in the random number sequence) as they would be in lapply. They are sequential for each spawned process, but not all jobs as a whole. In addition, each process is running the job inside try(..., silent=true) so if error occur they will be stored as try-error objects in the list. Note: the number of file descriptors is usually limited by the operating system, so you may have trouble using more than 100 cores or so (see ulimit -n or similar in your OS documentation) unless you raise the limit of permissible open file descriptors (fork will fail with unable to create a pipe ).

Package iterators. December 12, 2017

Package iterators. December 12, 2017 Type Package Title Provides Iterator Construct for R Version 1.0.9 Package iterators December 12, 2017 Support for iterators, which allow a programmer to traverse through all the elements of a vector,

More information

Writing Custom Iterators

Writing Custom Iterators Steve Weston doc@revolutionanalytics.com December 9, 2017 1 Introduction An iterator is a special type of object that supplies data on demand, one element 1 at a time. This is a nice abstraction that can

More information

Getting Started with doparallel and foreach

Getting Started with doparallel and foreach Steve Weston and Rich Calaway doc@revolutionanalytics.com September 19, 2017 1 Introduction The doparallel package is a parallel backend for the foreach package. It provides a mechanism needed to execute

More information

Nesting Foreach Loops

Nesting Foreach Loops Steve Weston doc@revolutionanalytics.com December 9, 2017 1 Introduction The foreach package provides a looping construct for executing R code repeatedly. It is similar to the standard for loop, which

More information

Using The foreach Package

Using The foreach Package Steve Weston doc@revolutionanalytics.com December 9, 2017 1 Introduction One of R s most useful features is its interactive interpreter. This makes it very easy to learn and experiment with R. It allows

More information

Introduction to dompi

Introduction to dompi Steve Weston stephen.b.weston@gmail.com May 1, 2017 1 Introduction The dompi package is what I call a parallel backend for the foreach package. Since the foreach package is not a parallel programming system,

More information

Introduction to the doredis Package

Introduction to the doredis Package Introduction to the doredis Package Bryan W. Lewis blewis@illposed.net February 24, 2011 1 Introduction The doredis package provides a parallel back end for foreach using Redis and the corresponding rredis

More information

Elastic computing with R and Redis

Elastic computing with R and Redis Elastic computing with R and Redis Bryan W. Lewis blewis@illposed.net May 16, 2016 1 Introduction The doredis package defines a foreach parallel back end using Redis and the rredis package. It lets users

More information

Parallel Computing with R and How to Use it on High Performance Computing Cluster

Parallel Computing with R and How to Use it on High Performance Computing Cluster UNIVERSITY OF TEXAS AT SAN ANTONIO Parallel Computing with R and How to Use it on High Performance Computing Cluster Liang Jing Nov. 2010 1 1 ABSTRACT Methodological advances have led to much more computationally

More information

R on BioHPC. Rstudio, Parallel R and BioconductoR. Updated for

R on BioHPC. Rstudio, Parallel R and BioconductoR. Updated for R on BioHPC Rstudio, Parallel R and BioconductoR 1 Updated for 2015-07-15 2 Today we ll be looking at Why R? The dominant statistics environment in academia Large number of packages to do a lot of different

More information

Parallel Computing with R. Le Yan LSU

Parallel Computing with R. Le Yan LSU Parallel Computing with Le Yan HPC @ LSU 11/1/2017 HPC training series Fall 2017 Parallel Computing: Why? Getting results faster unning in parallel may speed up the time to reach solution Dealing with

More information

2 Calculation of the within-class covariance matrix

2 Calculation of the within-class covariance matrix 1 Topic Parallel programming in R. Using the «parallel» and «doparallel» packages. Personal computers become more and more efficient. They are mostly equipped with multi-core processors. At the same time,

More information

parallel Parallel R ANF R Vincent Miele CNRS 07/10/2015

parallel Parallel R ANF R Vincent Miele CNRS 07/10/2015 Parallel R ANF R Vincent Miele CNRS 07/10/2015 Thinking Plan Thinking Context Principles Traditional paradigms and languages Parallel R - the foundations embarrassingly computations in R the snow heritage

More information

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define CS 6A Scheme Summer 207 Discussion 0: July 25, 207 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

Parallel Computing with R. Le Yan LSU

Parallel Computing with R. Le Yan LSU Parallel Computing with R Le Yan HPC @ LSU 3/22/2017 HPC training series Spring 2017 Outline Parallel computing primers Parallel computing with R Implicit parallelism Explicit parallelism R with GPU 3/22/2017

More information

RDAV and Nautilus

RDAV and Nautilus http://rdav.nics.tennessee.edu/ RDAV and Nautilus Parallel Processing with R Amy F. Szczepa!ski Remote Data Analysis and Visualization Center University of Tennessee, Knoxville aszczepa@utk.edu Any opinions,

More information

Package doredis. R topics documented: December 15, Type Package. Title Foreach parallel adapter for the rredis package. Version 1.0.

Package doredis. R topics documented: December 15, Type Package. Title Foreach parallel adapter for the rredis package. Version 1.0. Package doredis December 15, 2010 Type Package Title Foreach parallel adapter for the rredis package Version 1.0.1 Date 2010-04-22 Author Maintainer A Redis parallel backend for the %dopar% function Depends

More information

Interacting with Remote Systems + MPI

Interacting with Remote Systems + MPI Interacting with Remote Systems + MPI Advanced Statistical Programming Camp Jonathan Olmsted (Q-APS) Day 2: May 28th, 2014 PM Session ASPC Interacting with Remote Systems + MPI Day 2 PM 1 / 17 Getting

More information

Cross-platform daemonization tools.

Cross-platform daemonization tools. Cross-platform daemonization tools. Release 0.1.0 Muterra, Inc Sep 14, 2017 Contents 1 What is Daemoniker? 1 1.1 Installing................................................. 1 1.2 Example usage..............................................

More information

Package doredis. R topics documented: February 19, Type Package

Package doredis. R topics documented: February 19, Type Package Type Package Package doredis February 19, 2015 Title Foreach parallel adapter for the rredis package Version 1.1.1 Date 2014-2-25 Author Maintainer A Redis parallel backend for the %dopar% function BugReports

More information

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.

Chapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines. Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

SCHEME 10 COMPUTER SCIENCE 61A. July 26, Warm Up: Conditional Expressions. 1. What does Scheme print? scm> (if (or #t (/ 1 0)) 1 (/ 1 0))

SCHEME 10 COMPUTER SCIENCE 61A. July 26, Warm Up: Conditional Expressions. 1. What does Scheme print? scm> (if (or #t (/ 1 0)) 1 (/ 1 0)) SCHEME 0 COMPUTER SCIENCE 6A July 26, 206 0. Warm Up: Conditional Expressions. What does Scheme print? scm> (if (or #t (/ 0 (/ 0 scm> (if (> 4 3 (+ 2 3 4 (+ 3 4 (* 3 2 scm> ((if (< 4 3 + - 4 00 scm> (if

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Using the dorng package

Using the dorng package Using the dorng package dorng package Version 1.6 Renaud Gaujoux March 5, 2014 Contents Introduction............. 1 1 The %dorng% operator...... 3 1.1 How it works......... 3 1.2 Seeding computations.....

More information

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017 ECE 550D Fundamentals of Computer Systems and Engineering Fall 2017 The Operating System (OS) Prof. John Board Duke University Slides are derived from work by Profs. Tyler Bletsch and Andrew Hilton (Duke)

More information

n! = 1 * 2 * 3 * 4 * * (n-1) * n

n! = 1 * 2 * 3 * 4 * * (n-1) * n The Beauty and Joy of Computing 1 Lab Exercise 9: Problem self-similarity and recursion Objectives By completing this lab exercise, you should learn to Recognize simple self-similar problems which are

More information

Some Notes on R Event Handling

Some Notes on R Event Handling Some Notes on R Event Handling Luke Tierney Statistics and Actuatial Science University of Iowa December 9, 2003 1 Some Things that Do Not Work Here is a non-exhaustive list of a few issues I know about.

More information

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression

\n is used in a string to indicate the newline character. An expression produces data. The simplest expression Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of

More information

My malloc: mylloc and mhysa. Johan Montelius HT2016

My malloc: mylloc and mhysa. Johan Montelius HT2016 1 Introduction My malloc: mylloc and mhysa Johan Montelius HT2016 So this is an experiment where we will implement our own malloc. We will not implement the world s fastest allocator, but it will work

More information

Process Management! Goals of this Lecture!

Process Management! Goals of this Lecture! Process Management! 1 Goals of this Lecture! Help you learn about:" Creating new processes" Programmatically redirecting stdin, stdout, and stderr" (Appendix) communication between processes via pipes"

More information

Package TypeInfo. September 2, 2018

Package TypeInfo. September 2, 2018 Version 1.46.0 Date 9/27/2005 Title Optional Type Specification Prototype Package TypeInfo September 2, 2018 Author Duncan Temple Lang Robert Gentleman () Maintainer A prototype for

More information

The Dynamic Typing Interlude

The Dynamic Typing Interlude CHAPTER 6 The Dynamic Typing Interlude In the prior chapter, we began exploring Python s core object types in depth with a look at Python numbers. We ll resume our object type tour in the next chapter,

More information

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015 SCHEME 7 COMPUTER SCIENCE 61A October 29, 2015 1 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

CS 4410, Fall 2017 Project 1: My First Shell Assigned: August 27, 2017 Due: Monday, September 11:59PM

CS 4410, Fall 2017 Project 1: My First Shell Assigned: August 27, 2017 Due: Monday, September 11:59PM CS 4410, Fall 2017 Project 1: My First Shell Assigned: August 27, 2017 Due: Monday, September 11th @ 11:59PM Introduction The purpose of this assignment is to become more familiar with the concepts of

More information

Creating a Shell or Command Interperter Program CSCI411 Lab

Creating a Shell or Command Interperter Program CSCI411 Lab Creating a Shell or Command Interperter Program CSCI411 Lab Adapted from Linux Kernel Projects by Gary Nutt and Operating Systems by Tannenbaum Exercise Goal: You will learn how to write a LINUX shell

More information

Lab 03 - x86-64: atoi

Lab 03 - x86-64: atoi CSCI0330 Intro Computer Systems Doeppner Lab 03 - x86-64: atoi Due: October 1, 2017 at 4pm 1 Introduction 1 2 Assignment 1 2.1 Algorithm 2 3 Assembling and Testing 3 3.1 A Text Editor, Makefile, and gdb

More information

PLD Semester Exam Study Guide Dec. 2018

PLD Semester Exam Study Guide Dec. 2018 Covers material from Chapters 1-8. Semester Exam will be built from these questions and answers, though they will be re-ordered and re-numbered and possibly worded slightly differently than on this study

More information

9.2 Linux Essentials Exam Objectives

9.2 Linux Essentials Exam Objectives 9.2 Linux Essentials Exam Objectives This chapter will cover the topics for the following Linux Essentials exam objectives: Topic 3: The Power of the Command Line (weight: 10) 3.3: Turning Commands into

More information

Lab 4: Super Sudoku Solver CSCI 2101 Fall 2017

Lab 4: Super Sudoku Solver CSCI 2101 Fall 2017 Due: Wednesday, October 18, 11:59 pm Collaboration Policy: Level 1 Group Policy: Pair-Optional Lab 4: Super Sudoku Solver CSCI 2101 Fall 2017 In this week s lab, you will write a program that can solve

More information

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017 SCHEME 8 COMPUTER SCIENCE 61A March 2, 2017 1 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme programs,

More information

CS 118 Project Phase 2 P2P Networking

CS 118 Project Phase 2 P2P Networking CS 118 Project Phase 2 P2P Networking Due Monday, March 15 th at 11:59pm Boelter Hall 4428, Box D3/C4 and via Electronic Submission Overview In this phase you will extend your work from Phase 1 to create

More information

P2P Programming Assignment

P2P Programming Assignment P2P Programming Assignment Overview This project is to implement a Peer-to-Peer (P2P) networking project similar to a simplified Napster. You will provide a centralized server to handle cataloging the

More information

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7 SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7 Hi everyone once again welcome to this lecture we are actually the course is Linux programming and scripting we have been talking about the Perl, Perl

More information

Loop structures and booleans

Loop structures and booleans Loop structures and booleans Michael Mandel Lecture 7 Methods in Computational Linguistics I The City University of New York, Graduate Center https://github.com/ling78100/lectureexamples/blob/master/lecture07final.ipynb

More information

(Refer Slide Time: 01:12)

(Refer Slide Time: 01:12) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #22 PERL Part II We continue with our discussion on the Perl

More information

CS450 - Structure of Higher Level Languages

CS450 - Structure of Higher Level Languages Spring 2018 Streams February 24, 2018 Introduction Streams are abstract sequences. They are potentially infinite we will see that their most interesting and powerful uses come in handling infinite sequences.

More information

Functional Programming. Pure Functional Programming

Functional Programming. Pure Functional Programming Functional Programming Pure Functional Programming Computation is largely performed by applying functions to values. The value of an expression depends only on the values of its sub-expressions (if any).

More information

Windows architecture. user. mode. Env. subsystems. Executive. Device drivers Kernel. kernel. mode HAL. Hardware. Process B. Process C.

Windows architecture. user. mode. Env. subsystems. Executive. Device drivers Kernel. kernel. mode HAL. Hardware. Process B. Process C. Structure Unix architecture users Functions of the System tools (shell, editors, compilers, ) standard library System call Standard library (printf, fork, ) OS kernel: processes, memory management, file

More information

Essentials for Scientific Computing: Bash Shell Scripting Day 3

Essentials for Scientific Computing: Bash Shell Scripting Day 3 Essentials for Scientific Computing: Bash Shell Scripting Day 3 Ershaad Ahamed TUE-CMS, JNCASR May 2012 1 Introduction In the previous sessions, you have been using basic commands in the shell. The bash

More information

Contents: 1 Basic socket interfaces 3. 2 Servers 7. 3 Launching and Controlling Processes 9. 4 Daemonizing Command Line Programs 11

Contents: 1 Basic socket interfaces 3. 2 Servers 7. 3 Launching and Controlling Processes 9. 4 Daemonizing Command Line Programs 11 nclib Documentation Release 0.7.0 rhelmot Apr 19, 2018 Contents: 1 Basic socket interfaces 3 2 Servers 7 3 Launching and Controlling Processes 9 4 Daemonizing Command Line Programs 11 5 Indices and tables

More information

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1)

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1) Topics Data Structures and Information Systems Part 1: Data Structures Michele Zito Lecture 3: Arrays (1) Data structure definition: arrays. Java arrays creation access Primitive types and reference types

More information

CS 360 Programming Languages Interpreters

CS 360 Programming Languages Interpreters CS 360 Programming Languages Interpreters Implementing PLs Most of the course is learning fundamental concepts for using and understanding PLs. Syntax vs. semantics vs. idioms. Powerful constructs like

More information

Computer Science 330 Operating Systems Siena College Spring Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012

Computer Science 330 Operating Systems Siena College Spring Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012 Computer Science 330 Operating Systems Siena College Spring 2012 Lab 5: Unix Systems Programming Due: 4:00 PM, Wednesday, February 29, 2012 Quote: UNIX system calls, reading about those can be about as

More information

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently.

Definition: A data structure is a way of organizing data in a computer so that it can be used efficiently. The Science of Computing I Lesson 4: Introduction to Data Structures Living with Cyber Pillar: Data Structures The need for data structures The algorithms we design to solve problems rarely do so without

More information

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs Lexical addressing The difference between a interpreter and a compiler is really two points on a spectrum of possible

More information

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview

Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring Topic Notes: C and Unix Overview Computer Science 2500 Computer Organization Rensselaer Polytechnic Institute Spring 2009 Topic Notes: C and Unix Overview This course is about computer organization, but since most of our programming is

More information

Package itertools2. R topics documented: August 29, 2016

Package itertools2. R topics documented: August 29, 2016 Package itertools2 August 29, 2016 Title itertools2: Functions creating iterators for efficient looping Version 0.1.1 Date 2014-08-08 Author John A. Ramey , Kayla Schaefer

More information

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives CS 6A Scheme Fall 207 Discussion 7: October 25, 207 Solutions Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write

More information

Process Management 1

Process Management 1 Process Management 1 Goals of this Lecture Help you learn about: Creating new processes Programmatically redirecting stdin, stdout, and stderr (Appendix) communication between processes via pipes Why?

More information

Indicate the answer choice that best completes the statement or answers the question. Enter the appropriate word(s) to complete the statement.

Indicate the answer choice that best completes the statement or answers the question. Enter the appropriate word(s) to complete the statement. 1. C#, C++, C, and Java use the symbol as the logical OR operator. a. $ b. % c. ^ d. 2. errors are relatively easy to locate and correct because the compiler or interpreter you use highlights every error.

More information

JME Language Reference Manual

JME Language Reference Manual JME Language Reference Manual 1 Introduction JME (pronounced jay+me) is a lightweight language that allows programmers to easily perform statistic computations on tabular data as part of data analysis.

More information

CS 326: Operating Systems. Process Execution. Lecture 5

CS 326: Operating Systems. Process Execution. Lecture 5 CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation

More information

Introduction to Asynchronous Programming Fall 2014

Introduction to Asynchronous Programming Fall 2014 CS168 Computer Networks Fonseca Introduction to Asynchronous Programming Fall 2014 Contents 1 Introduction 1 2 The Models 1 3 The Motivation 3 4 Event-Driven Programming 4 5 select() to the rescue 5 1

More information

Package pbapply. R topics documented: January 10, Type Package Title Adding Progress Bar to '*apply' Functions Version Date

Package pbapply. R topics documented: January 10, Type Package Title Adding Progress Bar to '*apply' Functions Version Date Type Package Title Adding Progress Bar to '*apply' Functions Version 1.3-4 Date 2018-01-09 Package pbapply January 10, 2018 Author Peter Solymos [aut, cre], Zygmunt Zawadzki [aut] Maintainer Peter Solymos

More information

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines Introduction to UNIX Logging in Basic system architecture Getting help Intro to shell (tcsh) Basic UNIX File Maintenance Intro to emacs I/O Redirection Shell scripts Logging in most systems have graphical

More information

x = 3 * y + 1; // x becomes 3 * y + 1 a = b = 0; // multiple assignment: a and b both get the value 0

x = 3 * y + 1; // x becomes 3 * y + 1 a = b = 0; // multiple assignment: a and b both get the value 0 6 Statements 43 6 Statements The statements of C# do not differ very much from those of other programming languages. In addition to assignments and method calls there are various sorts of selections and

More information

Repetition Through Recursion

Repetition Through Recursion Fundamentals of Computer Science I (CS151.02 2007S) Repetition Through Recursion Summary: In many algorithms, you want to do things again and again and again. For example, you might want to do something

More information

Processes in linux. What s s a process? process? A dynamically executing instance of a program. David Morgan. David Morgan

Processes in linux. What s s a process? process? A dynamically executing instance of a program. David Morgan. David Morgan Processes in linux David Morgan What s s a process? process? A dynamically executing instance of a program 1 Constituents of a process its code data various attributes OS needs to manage it OS keeps track

More information

Introduction to Computer Programming for Non-Majors

Introduction to Computer Programming for Non-Majors Introduction to Computer Programming for Non-Majors CSC 2301, Fall 2015 Chapter 8 Part 1 The Department of Computer Science Chapter 8 Loop Structures and Booleans 2 Objectives To understand the concepts

More information

Introduction to Modern Fortran

Introduction to Modern Fortran Introduction to Modern Fortran p. 1/?? Introduction to Modern Fortran Advanced I/O and Files Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 November 2007 Introduction to Modern Fortran p. 2/??

More information

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives CS 61A Scheme Spring 2018 Discussion 7: March 21, 2018 1 Introduction In the next part of the course, we will be working with the Scheme programming language. In addition to learning how to write Scheme

More information

HW 1: Shell. Contents CS 162. Due: September 18, Getting started 2. 2 Add support for cd and pwd 2. 3 Program execution 2. 4 Path resolution 3

HW 1: Shell. Contents CS 162. Due: September 18, Getting started 2. 2 Add support for cd and pwd 2. 3 Program execution 2. 4 Path resolution 3 CS 162 Due: September 18, 2017 Contents 1 Getting started 2 2 Add support for cd and pwd 2 3 Program execution 2 4 Path resolution 3 5 Input/Output Redirection 3 6 Signal Handling and Terminal Control

More information

Part III Appendices 165

Part III Appendices 165 Part III Appendices 165 Appendix A Technical Instructions Learning Outcomes This material will help you learn how to use the software you need to do your work in this course. You won t be tested on it.

More information

CS354/CS350 Operating Systems Winter 2004

CS354/CS350 Operating Systems Winter 2004 CS354/CS350 Operating Systems Winter 2004 Assignment Three (March 14 VERSION) Design and Preliminary Testing Document Due: Tuesday March 23, 11:59 am Full Assignment Due: Tuesday March 30, 11:59 am 1 Nachos

More information

The Typed Racket Guide

The Typed Racket Guide The Typed Racket Guide Version 5.3.6 Sam Tobin-Hochstadt and Vincent St-Amour August 9, 2013 Typed Racket is a family of languages, each of which enforce

More information

Unix Processes. What is a Process?

Unix Processes. What is a Process? Unix Processes Process -- program in execution shell spawns a process for each command and terminates it when the command completes Many processes all multiplexed to a single processor (or a small number

More information

ACT-R RPC Interface Documentation. Working Draft Dan Bothell

ACT-R RPC Interface Documentation. Working Draft Dan Bothell AC-R RPC Interface Documentation Working Draft Dan Bothell Introduction his document contains information about a new feature available with the AC-R 7.6 + software. here is now a built-in RPC (remote

More information

CS61A Summer 2010 George Wang, Jonathan Kotker, Seshadri Mahalingam, Eric Tzeng, Steven Tang

CS61A Summer 2010 George Wang, Jonathan Kotker, Seshadri Mahalingam, Eric Tzeng, Steven Tang CS61A Notes Week 6B: Streams Streaming Along A stream is an element and a promise to evaluate the rest of the stream. You ve already seen multiple examples of this and its syntax in lecture and in the

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

Package biganalytics

Package biganalytics Version 1.1.14 Date 2016-02-17 Package biganalytics February 18, 2016 Title Utilities for 'big.matrix' Objects from Package 'bigmemory' Author John W. Emerson and Michael J. Kane

More information

Simulator. Chapter 4 Tutorial: The SDL

Simulator. Chapter 4 Tutorial: The SDL 4 Tutorial: The SDL Simulator The SDL Simulator is the tool that you use for testing the behavior of your SDL systems. In this tutorial, you will practice hands-on on the DemonGame system. To be properly

More information

Review of Fundamentals

Review of Fundamentals Review of Fundamentals 1 The shell vi General shell review 2 http://teaching.idallen.com/cst8207/14f/notes/120_shell_basics.html The shell is a program that is executed for us automatically when we log

More information

Package iotools. R topics documented: January 25, Version Title I/O Tools for Streaming

Package iotools. R topics documented: January 25, Version Title I/O Tools for Streaming Version 0.2-5 Title I/O Tools for Streaming Package iotools January 25, 2018 Author Simon Urbanek , Taylor Arnold Maintainer Simon Urbanek

More information

Implementing Coroutines with call/cc. Producer/Consumer using Coroutines

Implementing Coroutines with call/cc. Producer/Consumer using Coroutines Implementing Coroutines with call/cc Producer/Consumer using Coroutines Coroutines are a very handy generalization of subroutines. A coroutine may suspend its execution and later resume from the point

More information

CS61 Scribe Notes Lecture 18 11/6/14 Fork, Advanced Virtual Memory

CS61 Scribe Notes Lecture 18 11/6/14 Fork, Advanced Virtual Memory CS61 Scribe Notes Lecture 18 11/6/14 Fork, Advanced Virtual Memory Roger, Ali, and Tochi Topics: exploits fork shell programming rest of course announcements/ending (for later info) final (not as time

More information

The SPL Programming Language Reference Manual

The SPL Programming Language Reference Manual The SPL Programming Language Reference Manual Leonidas Fegaras University of Texas at Arlington Arlington, TX 76019 fegaras@cse.uta.edu February 27, 2018 1 Introduction The SPL language is a Small Programming

More information

Chapter 17. Iteration The while Statement

Chapter 17. Iteration The while Statement 203 Chapter 17 Iteration Iteration repeats the execution of a sequence of code. Iteration is useful for solving many programming problems. Interation and conditional execution form the basis for algorithm

More information

DCLI User's Guide. Data Center Command-Line Interface

DCLI User's Guide. Data Center Command-Line Interface Data Center Command-Line Interface 2.10.2 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments about this documentation, submit

More information

Hello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3

Hello, World! in C. Johann Myrkraverk Oskarsson October 23, The Quintessential Example Program 1. I Printing Text 2. II The Main Function 3 Hello, World! in C Johann Myrkraverk Oskarsson October 23, 2018 Contents 1 The Quintessential Example Program 1 I Printing Text 2 II The Main Function 3 III The Header Files 4 IV Compiling and Running

More information

Lecture 6: Arithmetic and Threshold Circuits

Lecture 6: Arithmetic and Threshold Circuits IAS/PCMI Summer Session 2000 Clay Mathematics Undergraduate Program Advanced Course on Computational Complexity Lecture 6: Arithmetic and Threshold Circuits David Mix Barrington and Alexis Maciel July

More information

Threads and Continuations COS 320, David Walker

Threads and Continuations COS 320, David Walker Threads and Continuations COS 320, David Walker Concurrency Concurrency primitives are an important part of modern programming languages. Concurrency primitives allow programmers to avoid having to specify

More information

Programming with MPI

Programming with MPI Programming with MPI p. 1/?? Programming with MPI Miscellaneous Guidelines Nick Maclaren Computing Service nmm1@cam.ac.uk, ext. 34761 March 2010 Programming with MPI p. 2/?? Summary This is a miscellaneous

More information

3. Process Management in xv6

3. Process Management in xv6 Lecture Notes for CS347: Operating Systems Mythili Vutukuru, Department of Computer Science and Engineering, IIT Bombay 3. Process Management in xv6 We begin understanding xv6 process management by looking

More information

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview

Computer Science 322 Operating Systems Mount Holyoke College Spring Topic Notes: C and Unix Overview Computer Science 322 Operating Systems Mount Holyoke College Spring 2010 Topic Notes: C and Unix Overview This course is about operating systems, but since most of our upcoming programming is in C on a

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

For this chapter, switch languages in DrRacket to Advanced Student Language.

For this chapter, switch languages in DrRacket to Advanced Student Language. Chapter 30 Mutation For this chapter, switch languages in DrRacket to Advanced Student Language. 30.1 Remembering changes Suppose you wanted to keep track of a grocery shopping list. You could easily define

More information

Lecture 5. Essential skills for bioinformatics: Unix/Linux

Lecture 5. Essential skills for bioinformatics: Unix/Linux Lecture 5 Essential skills for bioinformatics: Unix/Linux UNIX DATA TOOLS Text processing with awk We have illustrated two ways awk can come in handy: Filtering data using rules that can combine regular

More information

Pace University. Fundamental Concepts of CS121 1

Pace University. Fundamental Concepts of CS121 1 Pace University Fundamental Concepts of CS121 1 Dr. Lixin Tao http://csis.pace.edu/~lixin Computer Science Department Pace University October 12, 2005 This document complements my tutorial Introduction

More information

Fishnet Assignment 1: Distance Vector Routing Due: May 13, 2002.

Fishnet Assignment 1: Distance Vector Routing Due: May 13, 2002. Fishnet Assignment 1: Distance Vector Routing Due: May 13, 2002. In this assignment, you will work in teams of one to four (try to form you own group; if you can t find one, you can either work alone,

More information