Distributed Batch Controller. Department of Computer Science, University of Maryland, College Park, MD USA. Waterloo, ON N2L 3G1 Canada

Size: px
Start display at page:

Download "Distributed Batch Controller. Department of Computer Science, University of Maryland, College Park, MD USA. Waterloo, ON N2L 3G1 Canada"

Transcription

1 Processing TOVS Polar Pathnder Data Using the Distributed Batch Controller James Du a, Kenneth Salem b, Axel Schweiger c, and Miron Livny d a Department of Computer Science, University of Maryland, College Park, MD USA bdepartment of Computer Science, University ofwaterloo, Waterloo, ON N2L 3G1 Canada c Applied Physics Laboratory/Polar Science Center, University ofwashington, 1013 NE 40th Street, Seattle, WA USA dcomputer Sciences Department,University of Wisconsin - Madison, 1210 W. Dayton St., Madison, WI USA ABSTRACT The Distributed Batch Controller (DBC) supports scientic batch data processing. Batch jobs are distributed by the DBC over a collection of computing resources. Since these resources may be widely scattered the DBC is well suited for collaborative research eorts whose resources may not be centrally located. The DBC provides its users with centralized monitoring and control of distributed batch jobs. Version 1 of the DBC is currently being used by the TOVS Polar Pathnder project to generate Arctic atmospheric temperature and humidity proles. Prole generating jobs are distributed and executed by the DBC on workstation clusters located at several sites across the United States. This paper describes the data processing requirements of the TOVS Polar Pathnder project, and how the DBC is being used to meet them. It also describes Version 2 of the DBC. DBC V2 is implemented in Java, and utilizes a number of advanced Java features such as threads and remote method invocation. It incorporates a number of functional enhancements. These include a exible mechanism supporting interoperation of the DBC with a wider variety of execution resources and an improved user interface. Keywords: batch processing, TOVS Polar Pathnder, DBC, Java 1. INTRODUCTION The Distributed Batch Controller (DBC) is a tool for scientic data processing. 1 It executes batch data processing jobs in pools of workstations. While it can be used to control jobs in a single pool, it is intended for users who wish to combine more than one pool into a single computational engine. For example, a collaborative research team could use the DBC to combine the computational resources of individual team members. The workstations used to process DBC jobs remain autonomous and are not assumed to be dedicated to DBCrelated work. Furthermore, they may be widely distributed. For example, the DBC system described later in the this paper utilizes workstation pools located in Washington, Wisconsin, and Maryland. The DBC addresses computational issues that arise in this environment. For example, pools may join or leave the DBC system at any time, the processing power available at the pools may uctuate with time, job input data must be shipped to remote pools and output data must be collected, and failures of various types must be detected and corrected. In addition, Other author information: K.Salem (correspondence): kmsalem@uwaterloo.ca; Telephone: x3485

2 user input repository output repository meta-data repository DBC master DBC agent DBC agent buffer Pool 1 buffer Pool N Figure 1. Architecture of the DBC the system provides a centralized administrative interface to allow its users to monitor and control distributed batch execution. The DBC is currently being used by the TOVS Polar Pathnder project to generate Arctic atmospheric temperature and humidity proles. This paper describes the project's data processing requirements and shows how they are supported by the DBC. It also describes DBC V2, a new Java-based implementation of the DBC architecture. DBC V2 incorporates a number of enhancements intended to make it more exible and powerful than its predecessor. 2. DBC OVERVIEW A single DBC job consists of a sequence of one or more job steps. A job step represents the execution of a program or a transfer of data from one location to another. Job steps may have formal parameters, which are typically used to represent input and output le names and run-time parameters for the job step program. A batch is a set of jobs. All of the jobs in a DBC batch are identical, except for their parameters. In the scientic domains to which the DBC is targeted, batches are often large and long-running. Figure 1 illustrates the run-time architecture of a DBC system. A DBC system is used to process a single batch of jobs. Batches may be dynamic, in the sense that jobs can be added to or removed from a batch while it is being processed. Although a single DBC system always processes a single batch, several DBC systems could run concurrently and share the available resources. Computational power is provided to the DBC by one or more resource pools. Pools are autonomous and may join, leave, and rejoin the system at any time. A DBC agent controls jobs at each pool, and a single DBC master coordinates the agents. The master maintains information about the entire batch, assigns jobs to pools, and is informed when jobs are completed. If a pool leaves the system, the master may reassign its tasks to other pools. The master also supports user interfaces that provide a central point of control for a human administrator. Each DBC pool includes one or more \execution resources" as well as some disk buer space. Resources normally represent workstations, or clusters of workstations, that are capable of executing job steps. The DBC agent at each pool is responsible for arranging for the execution of the jobs assigned to it by the master. Executing a job involves moving its input data from the input repository to the pool's disk buer, arranging for the execution of the job step

3 programs on the pool's execution resources, and moving the resulting output data from the pool buer to the output repository. One type of execution resource supported by the DBC is a workstation cluster managed by the Condor resource management system. 2,3 Condor harnesses the computational power of unused workstations in the cluster. When a DBC agent executes a job using a Condor resource, the Condor system locates an idle workstation in its cluster (if one is available) and assigns the job to it. By interacting with Condor in this way, the DBC is able to turn the power of unused workstations in the cluster into a valuable computing resource. Version 1 of the DBC is implemented in C and runs on a variety of UNIX platforms. Communication among DBC processes is implemented using the Parallel Virtual Machine (PVM) system. 4 DBC V2 is implemented in Java. We will describe DBC V2 in more detail in Section PROCESSING POLAR PATHFINDER DATA DBC is being used as a data processing engine for the TOVS Polar Pathnder (TOVS Path-P) project at the Polar Science Center (PSC) in the Applied Physics Laboratory at the University of Washington. This project, part of the NOAA/NASA Pathnder program, is producing daily Arctic atmospheric temperature and humidity proles from the High Resolution Infrared Sounder (HIRS) and Microwave Sounding Unit (MSU) for an 18 year period from A modied version of the Improved Initialization Inversion (3I) retrieval algorithm 5,6 is used to determine geophysical products (temperature and humidity proles) from satellite radiances. The algorithm performs the inversion of the radiative transfer equation starting from a \rst guess" prole which is selected from a library by minimization of brightness temperature dierences. The nal solution is found by correcting the rst guess. To avoid forward radiative transfer calculations this step utilizes precomputed Jacobian matrices that contain the partial derivatives of satellite brightness temperatures (TB) in dierent channels with respect to the temperature and humidity values at dierent levels in the atmosphere. The algorithm consists of ve steps. These steps are: 1. calibration of the input data (NOAA Level 1B), 2. interpolation of HIRS and MSU data to a common georeference system, 3. determination of initial guess, cloud detection and clearing, 4. inversion of temperature and humidity proles, 5. interpolation of algorithm results to standard levels. Despite the design of the algorithm for speeds to allow large scale or global processing, the computational demands of creating a 18-year data set are still substantial. Proles are being generated using data from approximately 100,000 orbital passes. Each execution of the prole retrieval program uses two Level 1b input les, one containing HIRS data and the other containing MSU data. The total size of these les is about 350 kilobytes compressed. The program generates about 1 megabyte (compressed) of output data. The time to execute the prole retrieval program ranges from about ve minutes of CPU time per orbit on a dedicated Sun Sparc-20/60, to about twelve minutes per orbit on a Sparc-10. Assuming no I/O-related delays, sequential processing of this data set would occupy such workstations for approximately one to two years. Since the PSC team expects to be reprocessing the data two or three times to incorporate modications in response to validation and user feedback, this application alone may represent three to ve workstation-years of computing. Each execution of the prole retrieval program is implemented as a single DBC job with one execution step and several data transfer steps. The data transfer steps bring the HIRS and MSU input les to the pool and send the output to the repository at the PSC. The execution step runs a simple csh script that uncompresses the input les, runs the prole retrieval program, and compresses the output. Alternatively, it would have been possible to treat each of the ve steps of the algorithm, plus data compression and uncompression, as separate job steps. We elected to treat them as a single step to avoid the run-time overhead of scheduling many short steps.

4 Total Jobs Completed CESDIS APL HPUX Wisc SunOS Wisc Solaris APL SunOS 0 Tu 6pm Wed 6am Wed 6pm Th 6am Th 6pm Fri 6am Fri 6pm Sat 6am Time Figure 2. Progress of the a Typical DBC Batch Run Retrieval jobs are normally run in batches of 500 to 5000 jobs, and the DBC is typically shut down between batches. (This mode of operation is preferred by the PSC, and is not required by the DBC.) Initially, all of the jobs in a batch are submitted to the DBC using a program that species which input les are to be processed by each job. This program is written in C and uses the DBC's API to inform the system of the new jobs. The DBC console is then used to monitor and control the remainder of the batch execution. Pools can be added and removed using console commands and progress statistics (number of executed, failed jobs etc.) can be displayed. Jobs can be retried or canceled. Detailed processing logs are kept in separate log and error les. 4. EXPERIENCE To generate proles, we have used a DBC system consisting of three widely-distributed workstation pools. The largest pool is at the University of Wisconsin Computer Science Department in Madison, Wisconsin, consisting of about 40 Sun/Solaris workstations, and 30 Sun/SunOS4.1.3 workstations. Its disk buer has a 180 megabyte capacity. The second pool is at the Center of Excellence in Space Data and Information Systems (CESDIS), located at the NASA Goddard Space Flight Center in Maryland. It consists of 14 Sun/SunOS4.1.3 workstations and a 160 megabyte disk buer. The third pool consists of the PSC's own workstations, including two Sun/Solaris workstations, four HPUX workstations, and a 100MB disk buer. All three pools are shared by other users, so the number of workstations available for DBC jobs is normally much less than the total. A RAID storage device at the PSC in Seattle is used as both the input repository and the output repository for the system. The DBC master is also located at the PSC site. This DBC system has been used to process tens of thousands retrieval jobs. The following are some observations concerning the performance and usability of the DBC system under this workload Performance The DBC system was able to make eective use of all three of its workstation pools. Figure 2 shows jobs completed as a function of time for one typical batch of 5000 jobs. The gure shows the jobs completed by each pool, as well as Jobs can also be submitted \manually" through the DBC console program. However, the use of a program is preferable when large numbers of jobs are to be submitted.

5 Pool Throughput (jobs/hr) %Downtime Ideal Throughput Wisconsin/Solaris Wisconsin/SunOS CESDIS/SunOS PSC/Solaris PSC/HPUX PSC/SunOS Total All Pools Figure 3. DBC Throughput With and Without Pool Downtime the total. Because the pools at Wisconsin and PSC each included two dierent types of machines, we have included two lines for each of those pools in the Figure. Each line represents jobs completed by machines of a particular type in a particular pool. This batch took 86 hours to complete, for an overall throughput of 58 jobs per hour. The remote CESDIS and Wisconsin pools contributed substantially, processing 77% of all of the jobs. An analysis of job response times from this batch showed that workstation availability was the throughput bottleneck at all three pools in the system. In other words, the DBC system had no problem supplying sucient jobs to utilize all of the idle workstations that Condor was able to give it from these pools. Data transfer, even to and from the remote pools, was not a performance-limiting factor. This was because the DBC was able to hide the data transfer latency by overlapping it with the processing time of other jobs. This is possible as long as the DBC has ample disk buer space at each pool Failures and Recovery Failures are a fact of life in a long-running distributed system like the DBC. Much of the implementation eort for DBC V1 went into failure detection and recovery. Nonetheless, failures still had an impact on the system's performance and ease of use. To examine the impact of pool failures, we analyzed data from a set of seven batches of jobs (about 15,000 jobs total). The rst column of Figure 3 shows the average throughput for each pool, measured over all of the batches, and the second column shows the percentage of time that each pool was down while the DBC was running. The third column shows the throughput we would expect if pools never went down, assuming their average processing rate remained unchanged. If all pools had run without failure, the overall throughput of the DBC would have been 85.9 jobs per hour, more than twice the measured value of 38.6 jobs per hour. Not all of the pool downtime was caused by failures. For example, during the batch execution shown in Figure 2, the Wisconsin pool was unavailable during the rst thirty hours of the batch run because of a planned shutdown for installation and testing of a new version of the Condor software. Nonetheless, failures of DBC pools were not uncommon. We observed pool failures for all of the following reasons: operating system crashes (on DBC agent machines), local area network and network le server failures, Condor failures, broken network connections between the DBC agent and the master, the DBC agents' own automatic shutdown mechanism, which is triggered by a burst of persistent job failures. DBC agents shut themselves down if they detect a high rate of failures among local DBC jobs. This mechanism exists to prevent a problem pool from consuming large numbers of batch jobs. Jobs can fail for a variety of reasons. Some failures are due to application problems such as bad input data or bugs in the job step programs. Some are due to a timeout mechanism built into the DBC that is used to detect \hung" jobs. This can fail to distinguish between a job that is truly hung and one that is simply slow, perhaps because the underlying Condor system is heavily loaded. Some failures are due to Internet problems that prevent the transfer of the job's input or output

6 data to the pool. All of these failures tend to be bursty, and may result in a pool shutdown if a suciently large burst of failures occurs. When a pool fails for any reason other than the automatic shutdown mechanism, the master attempts several times to restart it before eventually giving up. If restart is successful, jobs that were in progress at the time of the failure must be re-run (this is handled automatically) but downtime is normally minimal. Pools that succumb to the automatic shutdown mechanism are not restarted automatically, since jobs are likely to fail there again if the underlying problem is not corrected. When automatic restart fails or is not employed, it is up to the DBC user to restart the pool manually by issuing the appropriate command from the DBC console. Although our goal is fully automatic long-term operation, it is our experience that the DBC must still be monitored several times a day to determine whether manual intervention is required Metadata If a system like the DBC is to be useful, it must provide its users not only with output from the batch jobs, but also with meta-data describing how that output was produced. These meta-data are important for several reasons: Performance Monitoring and Analysis: The DBC user should be able to track how many jobs were processed by each pool and how long they took. As we have already noted, periodic run-time monitoring of these data is necessary to detect and correct some kinds of pool problems. O-line meta-data analysis is also useful for for identifying performance problems and for conguring the run-time system for future batches. Interpretation of Results: Dierent jobs in a DBC batch may run on dierent types of machines, depending on the makeup of the pools in the run-time system. The type of machine on which a job runs may have an impact on the output data it generates. For example, oating point numeric calculations may dier slightly on dierent machines. To properly interpret the output of a batch job, a record of the environment in which each job was processed is essential. Application Debugging: DBC application job step programs, like other programs, will not be free of errors. Such errors may manifest themselves more frequently in the heterogeneous execution environment provided by the DBC, since some environments may expose program bugs that others do not. When a job fails, some record of its state, e.g., a \core dump", should be available for subsequent post-mortem analysis. There must be some mechanism for a DBC user to collect the necessary state and environment information for failed jobs, which may be widely distributed. Currently, the DBC makes several types of meta-data available to its users. Performance statistics are shipped from the pools and are available immediately through the DBC master. This allows the user to determine the status of any job or pool in the system, and the rate of job completion at the various pools. In addition, a variety of information about job execution environments and execution times is shipped to the master and logged. Tools are provided for subsequent analysis of the logs. All of the measurements presented in this paper were produced from analyses of DBC-generated logs. In DBC V1, debugging data, such as core les, are not shipped to the master because they are bulky and because they are not always needed. Debugging data is saved in each pool's local disk buer. The amount of data to save can be controlled by run-time parameters. Options include saving everything (including bulky core les), saving just small les (typically stdout and stderr output), or saving nothing at all other than the output data (normal operation). Since the DBC assumes that it is allowed to use only a limited amount of disk space at each pool, the collection of debugging data can cause a graduate decline in a pool's performance. This is because the debugging data occupies buer space that could otherwise have been used for job input and output data. This reduces the amount of data prefetching that can be performed, and may limit the maximum number of batch jobs that can execute concurrently in a pool.

7 5. DBC VERSION 2 Version 2 of the DBC is a complete re-implementation of the DBC architecture. DBC V2 supports a more exible execution model than V1, which was restricted to one resource per pool and one execution step per job. It also provides an improved user interface and a standardized internal interface to external system components, such as Condor. This simplies the task of extending the DBC to interact with new systems. Finally, DBC V2 is a lighter weight implementation, in the sense that fewer DBC processes are required at each of the pools. DBC V2 is implemented entirely in Java. Commonly cited advantages of Java include its software engineering features, which lead hopefully to robust and maintainable code, and the promise of write-once-run-anywhere. Equally important, from our perspective, are several other features of the Java environment. First, Java provides machineindependent language support for threaded applications. Both the DBC master and the agents make heavy use of threads. In particular, Java threads eliminate the need for separate job monitor processes, which were required at each pool in DBC V1. Second, Java supports a distributed object model, in which one Java virtual machine can invoke methods of objects located in a remote virtual machine. In DBC V2, all communication between the master and the agents is accomplished using the Remote Method Invocation (RMI) mechanism. This has the eect of shielding the DBC from tasks such as parameter marshalling and unmarshalling. These tasks were performed explicitly by the DBC in Version 1, which relied on the PVM message passing facilities. Third, the Java environment provides standard interfaces to external components, such as operating systems. This hides, in principle if not always in practice, some of the heterogeneous nature of these components from the DBC The Job Execution Model DBC V2 supports a exible job execution model. Under the model, a single logical job step may be executed dierently depending on where it runs. This provides a simple way to take dierent types of computing resources and integrate them into a single DBC system. The job model requires that several things happen before the DBC runs jobs. First, job steps must be dened. Second, the resources available at each pool must be specied. Finally, the DBC must be told how to implement the various job steps using those resources. Figure 4 shows part of a DBC script that denes job steps, resources, and implementations for the prole retrieval application described in Section 3.. Normally, such a script is run once when the DBC system is being initialized. Once this is done, an arbitrary number of jobs can be created and submitted to the system for execution. As was described in Section 2., each job consists of a sequence of job steps. The DBC provides two built-in generic job step types: program execution steps and data transfer steps. A DBC user customizes these generic types to dene the job step types for a particular application. Customizing involves creating a unique name for the new step type and specifying default values for step parameters. For example, the four addjobsteptype commands in Figure 4 dene four step types for the TOVS Path-P application: 1. gethirs - a data transfer step to get the HIRS input le 2. getmsu - a data transfer step to get the MSU input le 3. send result - a data transfer step to move the resulting prole to the output repository. 4. execinversion - a program execution step to execute the prole retrieval script using the HIRS and MSU input les Job step types may be parameterized. In Figure 4, each step has a timeout parameter used by the system to detect step execution problems. Parameter values supplied with the job step type denition apply to all job steps of that type. It is possible to override such values for an particular job step, or for all steps of that type that are executed by a particular resource. DBC resources can represent anything that is capable of executing DBC job steps. Each DBC pool must include one or more resources. When a DBC pool is rst started, it is necessary to specify the execution resources that are available at that pool. The built-in resource types include Unix workstations and Condor-managed workstation clusters. In addition, there is an FTP resource, which represents the ability to transfer les to and from a pool.

8 First, define four types of job steps, two for transferring the HIRS and MSU input files, one for executing the inversion procedure, and one for transferring the resulting profile addjobsteptype gethirs XferStep direction[input] timeout[300] addjobsteptype getmsu XferStep direction[input] timeout[100] addjobsteptype send_result XferStep direction[output] timeout[600] addjobsteptype execinversion ExecStep timeout[3600] Next, create a DBC pool at foo.washington.edu Initially, the pool has no resources and cannot process jobs startpool foo.washington.edu /home/dbc/buffer dbc Now create a resource representing HPUX machines in the Condor cluster at foo.washington.edu addresource foo ConHPUX9 CondorResource runlimit[3] Now create an implementation that will allow execinversion steps to run on the newly-created resource addimpl foo execinversion Condor5Exec ConHPUX9 \ execpath[/home/dbc/iii/bin/pp_iii_run.hpux9.script] \ cputype[hpux9] \ requirements[(arch == "HPPAR") && (OpSys == "HPUX9")] \ execmode[vanilla] polling_freq[15] Create another resource to represent the Solaris machines in the Condor cluster addresource foo ConSolaris CondorResource runlimit[4] Allow execinversion steps to run on the Condor/Solaris machines addimpl foo execinversion Condor5Exec ConSolaris \ execpath[/home/dbc/iii/bin/pp_iii_run.solaris.script] \ cputype[solaris] \ requirements[((arch == "sun4m") (Arch == "sun4c")) && (OpSys == "Solaris")] \ execmode[vanilla] polling_freq[15] Finally, create an FTP resource and implementations for the data transfer steps. The input and output data resides at bar.washington.edu addresource foo FTP FtpResource runlimit[4] remote_site[bar.washington.edu] addimpl foo gethirs FtpXfer FTP addimpl foo getmsu FtpXfer FTP addimpl foo send_result FtpXfer FTP Figure 4. Part of a DBC Script for the TOVS Path-P Application

9 Figure 4 includes three addresource commands, each of which denes a resource at the pool foo.washington.edu. (The pool, in turn, was dened using the startpool command.) Two of the resources represent Condor-managed workstation clusters and the third is an FTP resource. Like job steps, resources may be parameterized. The runlimit parameter used in Figure 4 species an upper bound on the number of DBC job steps that may be running concurrently on a particular resource. This can be set to limit the load that the DBC can place on a particular machine or group of machines. Because there are dierent types of resources, the exact procedure used to execute a particular job step will depend on the resource that executes it. For example, if the resource is a Condor cluster, the agent executes a job step by submitting the step's program to Condor for execution. To execute that same job step on a resource consisting of a single workstation, the agent would create a new process to run the step's program. The DBC uses \implementations" to encapsulate knowledge of how to execute a particular type of job step on a particular resource The DBC includes built-in, parameterized implementations that map built-in generic job step types to the built-in resource types. For example, the rst addimpl command in Figure 4 denes an implementation that species how an execinversion job step is executed on the ConHPUX9 resource at the foo.washington.edu pool. An implementation parameter species the name of the csh script that performs the inversion. Additional parameters, some of which are passed through to the Condor system, are also specied. By specifying the implementations that are to be used at each pool, the DBC user can control which job steps can be executed on which resources at each pool. In our example, implementations for execinversion steps are dened for both the ConHPUX9 and ConSolaris resources, so execinversion steps will be able to run on either resource. The DBC requires that at least one implementation be specied for each job step type at a pool before that pool will be allowed to process jobs. This is to ensure that it is possible to completely execute jobs at the pool. The job model provides a layer of indirection that allows the DBC to exploit a heterogeneous collection of computing resources to process jobs. At the pool foo.washington.edu, the script used to perform an inversion procedure and the parameters passed to the underlying Condor system will depend on which resource is used to perform the inversion job step. This model separates the logical structure of the job from the details of how job steps are implemented at a particular pool. The model also simplies the task of extending the DBC to support new types of resources. To support a new resource, it is necessary to create a new DBC resource type and new implementations to map job steps to the new resource type. For example, one could create a new resource to represent workstations running the Windows NT operating system, 7 plus a new implementation that mapped program execution job steps to NT resources. Essentially, the new implementation would encapsulates the procedures necessary to run and monitor a program on an NT machine. The remainder of the would DBC remain unchanged User Interface Two user interface programs are available for monitoring and controlling DBC systems. Both programs communicate with the DBC master through a standard DBC application programming interface. The rst program provides a text-based console that can be used to control and monitor a DBC system. Console commands allow jobs to be submitted, pools to be started and stopped, and error logging to be controlled. Pool status and job status can be checked. Console commands are also used to dene job step types, pool resources, and implementations. These commands are normally read from a le when the DBC is rst started. The second program is DBCHttpd, which implements the HyperText Transfer Protocol. This program supports a graphical DBC monitor which can run in a Java-enabled web browser. A browser that connects to the DBCHttpd downloads a Java applet, which then retrieves current DBC status information from the DBCHttpd. This situation is illustrated in Figure 5. The applet displays a batch progress chart showing the number of jobs completed and aborted as a function of time. It can also display more detailed information about specic pools, such as the loads on the pools' resources. A snapshot of the monitor interface is shown in Figure CONCLUSIONS The DBC system demonstrates that a widely-distributed collection of workstation pools can be utilized to process TOVS Polar Pathnder data. Such a system may be particularly valuable to collaborative research teams, which

10 web browser HTTP DBCHttpd Java RMI DBC master web browser HTTP Figure 5. The DBC HPPT Daemon Figure 6. The Graphical Monitor

11 may control a variety of widely-distributed computing resources. By combining the power of several pools, the time required to complete large batch processing tasks can be substantially reduced. Such a system is exible, and can provide computing cycles even when some pools are unavailable. The heterogeneity and distribution of the system's computing resources are potential performance impediments which the DBC has been designed to overcome. More information about the DBC, including a user's guide and the DBC V2 code, and be found at the DBC's World Wide Web site: ACKNOWLEDGEMENTS We are grateful to CESDIS for providing access to its workstations in support of the DBC system described here. The Polar Pathnder team has been patient with the system. Victor Wiewiorowski, at the University of Waterloo, implemented the graphical front-end for DBC V1. Work on the DBC has been supported by NASA through its Applied Information Systems Research Program. REFERENCES 1. C. Chen, K. Salem, and M. Livny, \The DBC: Processing scientic data over the internet," in 16th International Conference on Distributed Computing Systems, pp. 673{679, May A. Bricker, M. Litzkow, and M. Livny, \Condor technical summary," Tech. Rep. TR 1069, Department of Computer Science, University of Wisconsin, Oct M. Litzkow and M. Livny, \Experience with the Condor distributed batch system," in Proc. of the IEEE Workshop on Experimental Distributed Systems, pp. 97{101, Oct A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM: parallel virtual machine - a user's guide and tutorial for networked parallel computing, The MIT Press, Cambridge, MA, A. Chedin, N. A. Scott, C. Wahiche, and P. Moulinier, \The Improved Initialization Inversion method: A high resolution physical method for temperature retrievals from satellites of the TIROS-N series," J. Clim. Appl. Meteorol. 24, pp. 128{134, J. A. Francis, \Improvements of TOVS retrievals over sea ice and applications to estimating Arctic energy uxes," Journal of Geophysical Research 99.NO.D5(10), pp {10408, H. Custer, Inside Windows NT, Microsoft Press, 1993.

The DBC: Processing Scientic Data. Over the Internet 1. Chungmin Chen Kenneth Salem Miron Livny. USA Canada USA

The DBC: Processing Scientic Data. Over the Internet 1. Chungmin Chen Kenneth Salem Miron Livny. USA Canada USA The DBC: Processing Scientic Data Over the Internet 1 Chungmin Chen Kenneth Salem Miron Livny Dept. of Computer Science Dept. of Computer Science Computer Sciences Dept. University of Maryland University

More information

Application. CoCheck Overlay Library. MPE Library Checkpointing Library. OS Library. Operating System

Application. CoCheck Overlay Library. MPE Library Checkpointing Library. OS Library. Operating System Managing Checkpoints for Parallel Programs Jim Pruyne and Miron Livny Department of Computer Sciences University of Wisconsin{Madison fpruyne, mirong@cs.wisc.edu Abstract Checkpointing is a valuable tool

More information

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap, Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland

More information

Technische Universitat Munchen. Institut fur Informatik. D Munchen.

Technische Universitat Munchen. Institut fur Informatik. D Munchen. Developing Applications for Multicomputer Systems on Workstation Clusters Georg Stellner, Arndt Bode, Stefan Lamberts and Thomas Ludwig? Technische Universitat Munchen Institut fur Informatik Lehrstuhl

More information

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow Making Workstations a Friendly Environment for Batch Jobs Miron Livny Mike Litzkow Computer Sciences Department University of Wisconsin - Madison {miron,mike}@cs.wisc.edu 1. Introduction As time-sharing

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

100 Mbps DEC FDDI Gigaswitch

100 Mbps DEC FDDI Gigaswitch PVM Communication Performance in a Switched FDDI Heterogeneous Distributed Computing Environment Michael J. Lewis Raymond E. Cline, Jr. Distributed Computing Department Distributed Computing Department

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Chapter 1: Distributed Information Systems

Chapter 1: Distributed Information Systems Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier

More information

director executor user program user program signal, breakpoint function call communication channel client library directing server

director executor user program user program signal, breakpoint function call communication channel client library directing server (appeared in Computing Systems, Vol. 8, 2, pp.107-134, MIT Press, Spring 1995.) The Dynascope Directing Server: Design and Implementation 1 Rok Sosic School of Computing and Information Technology Grith

More information

OPERATING SYSTEMS. Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

OPERATING SYSTEMS. Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne OVERVIEW An operating system is a program that manages the

More information

Lecture 2 Process Management

Lecture 2 Process Management Lecture 2 Process Management Process Concept An operating system executes a variety of programs: Batch system jobs Time-shared systems user programs or tasks The terms job and process may be interchangeable

More information

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2 Chapter 1: Distributed Information Systems Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) alonso@inf.ethz.ch http://www.iks.inf.ethz.ch/ Contents - Chapter 1 Design

More information

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz

Compiler and Runtime Support for Programming in Adaptive. Parallel Environments 1. Guy Edjlali, Gagan Agrawal, and Joel Saltz Compiler and Runtime Support for Programming in Adaptive Parallel Environments 1 Guy Edjlali, Gagan Agrawal, Alan Sussman, Jim Humphries, and Joel Saltz UMIACS and Dept. of Computer Science University

More information

execution host commd

execution host commd Batch Queuing and Resource Management for Applications in a Network of Workstations Ursula Maier, Georg Stellner, Ivan Zoraja Lehrstuhl fur Rechnertechnik und Rechnerorganisation (LRR-TUM) Institut fur

More information

Normal mode acoustic propagation models. E.A. Vavalis. the computer code to a network of heterogeneous workstations using the Parallel

Normal mode acoustic propagation models. E.A. Vavalis. the computer code to a network of heterogeneous workstations using the Parallel Normal mode acoustic propagation models on heterogeneous networks of workstations E.A. Vavalis University of Crete, Mathematics Department, 714 09 Heraklion, GREECE and IACM, FORTH, 711 10 Heraklion, GREECE.

More information

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk

Storage System. Distributor. Network. Drive. Drive. Storage System. Controller. Controller. Disk. Disk HRaid: a Flexible Storage-system Simulator Toni Cortes Jesus Labarta Universitat Politecnica de Catalunya - Barcelona ftoni, jesusg@ac.upc.es - http://www.ac.upc.es/hpc Abstract Clusters of workstations

More information

CS 111. Operating Systems Peter Reiher

CS 111. Operating Systems Peter Reiher Operating System Principles: File Systems Operating Systems Peter Reiher Page 1 Outline File systems: Why do we need them? Why are they challenging? Basic elements of file system design Designing file

More information

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client.

Web site Image database. Web site Video database. Web server. Meta-server Meta-search Agent. Meta-DB. Video query. Text query. Web client. (Published in WebNet 97: World Conference of the WWW, Internet and Intranet, Toronto, Canada, Octobor, 1997) WebView: A Multimedia Database Resource Integration and Search System over Web Deepak Murthy

More information

Lotus Sametime 3.x for iseries. Performance and Scaling

Lotus Sametime 3.x for iseries. Performance and Scaling Lotus Sametime 3.x for iseries Performance and Scaling Contents Introduction... 1 Sametime Workloads... 2 Instant messaging and awareness.. 3 emeeting (Data only)... 4 emeeting (Data plus A/V)... 8 Sametime

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

Flash: an efficient and portable web server

Flash: an efficient and portable web server Flash: an efficient and portable web server High Level Ideas Server performance has several dimensions Lots of different choices on how to express and effect concurrency in a program Paper argues that

More information

Monitoring the Usage of the ZEUS Analysis Grid

Monitoring the Usage of the ZEUS Analysis Grid Monitoring the Usage of the ZEUS Analysis Grid Stefanos Leontsinis September 9, 2006 Summer Student Programme 2006 DESY Hamburg Supervisor Dr. Hartmut Stadie National Technical

More information

Java Virtual Machine

Java Virtual Machine Evaluation of Java Thread Performance on Two Dierent Multithreaded Kernels Yan Gu B. S. Lee Wentong Cai School of Applied Science Nanyang Technological University Singapore 639798 guyan@cais.ntu.edu.sg,

More information

Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety

Khoral Research, Inc. Khoros is a powerful, integrated system which allows users to perform a variety Data Parallel Programming with the Khoros Data Services Library Steve Kubica, Thomas Robey, Chris Moorman Khoral Research, Inc. 6200 Indian School Rd. NE Suite 200 Albuquerque, NM 87110 USA E-mail: info@khoral.com

More information

OPERATING SYSTEM. Chapter 12: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management

More information

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. File-System Structure File structure Logical storage unit Collection of related information File

More information

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004 Middleware Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004 Outline Web Services Goals Where do they come from? Understanding middleware Middleware as infrastructure Communication

More information

Chapter 11: Implementing File Systems

Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation

More information

What is a file system

What is a file system COSC 6397 Big Data Analytics Distributed File Systems Edgar Gabriel Spring 2017 What is a file system A clearly defined method that the OS uses to store, catalog and retrieve files Manage the bits that

More information

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems

Chapter 5: Processes & Process Concept. Objectives. Process Concept Process Scheduling Operations on Processes. Communication in Client-Server Systems Chapter 5: Processes Chapter 5: Processes & Threads Process Concept Process Scheduling Operations on Processes Interprocess Communication Communication in Client-Server Systems, Silberschatz, Galvin and

More information

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu

MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM. Daniel Grosu, Honorius G^almeanu MOTION ESTIMATION IN MPEG-2 VIDEO ENCODING USING A PARALLEL BLOCK MATCHING ALGORITHM Daniel Grosu, Honorius G^almeanu Multimedia Group - Department of Electronics and Computers Transilvania University

More information

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996 TR-CS-96-05 The rsync algorithm Andrew Tridgell and Paul Mackerras June 1996 Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Chapter 10: File System Implementation

Chapter 10: File System Implementation Chapter 10: File System Implementation Chapter 10: File System Implementation File-System Structure" File-System Implementation " Directory Implementation" Allocation Methods" Free-Space Management " Efficiency

More information

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections

More information

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table

More information

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/

More information

and easily tailor it for use within the multicast system. [9] J. Purtilo, C. Hofmeister. Dynamic Reconguration of Distributed Programs.

and easily tailor it for use within the multicast system. [9] J. Purtilo, C. Hofmeister. Dynamic Reconguration of Distributed Programs. and easily tailor it for use within the multicast system. After expressing an initial application design in terms of MIL specications, the application code and speci- cations may be compiled and executed.

More information

Parallel Implementation of a Unied Approach to. Image Focus and Defocus Analysis on the Parallel Virtual Machine

Parallel Implementation of a Unied Approach to. Image Focus and Defocus Analysis on the Parallel Virtual Machine Parallel Implementation of a Unied Approach to Image Focus and Defocus Analysis on the Parallel Virtual Machine Yen-Fu Liu, Nai-Wei Lo, Murali Subbarao, Bradley S. Carlson yiu@sbee.sunysb.edu, naiwei@sbee.sunysb.edu

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

les are not generally available by NFS or AFS, with a mechanism called \remote system calls". These are discussed in section 4.1. Our general method f

les are not generally available by NFS or AFS, with a mechanism called \remote system calls. These are discussed in section 4.1. Our general method f Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System Michael Litzkow, Todd Tannenbaum, Jim Basney, and Miron Livny Computer Sciences Department University of Wisconsin-Madison

More information

Benchmarking third-party-transfer protocols with the FTS

Benchmarking third-party-transfer protocols with the FTS Benchmarking third-party-transfer protocols with the FTS Rizart Dona CERN Summer Student Programme 2018 Supervised by Dr. Simone Campana & Dr. Oliver Keeble 1.Introduction 1 Worldwide LHC Computing Grid

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table

More information

second_language research_teaching sla vivian_cook language_department idl

second_language research_teaching sla vivian_cook language_department idl Using Implicit Relevance Feedback in a Web Search Assistant Maria Fasli and Udo Kruschwitz Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, United Kingdom fmfasli

More information

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.1

Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.1 Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.1 December 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Scheduling of Parallel Jobs on Dynamic, Heterogenous Networks

Scheduling of Parallel Jobs on Dynamic, Heterogenous Networks Scheduling of Parallel Jobs on Dynamic, Heterogenous Networks Dan L. Clark, Jeremy Casas, Steve W. Otto, Robert M. Prouty, Jonathan Walpole {dclark, casas, otto, prouty, walpole}@cse.ogi.edu http://www.cse.ogi.edu/disc/projects/cpe/

More information

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada

IBM Platform LSF. Best Practices. IBM Platform LSF and IBM GPFS in Large Clusters. Jin Ma Platform LSF Developer IBM Canada IBM Platform LSF Best Practices IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters Jin Ma Platform LSF Developer IBM Canada Table of Contents IBM Platform LSF 9.1.3 and IBM GPFS in Large Clusters...

More information

Product Release Notes Alderstone cmt 2.0

Product Release Notes Alderstone cmt 2.0 Alderstone cmt product release notes Product Release Notes Alderstone cmt 2.0 Alderstone Consulting is a technology company headquartered in the UK and established in 2008. A BMC Technology Alliance Premier

More information

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods

More information

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0. IBM Optim Performance Manager Extended Edition V4.1.0.1 Best Practices Deploying Optim Performance Manager in large scale environments Ute Baumbach (bmb@de.ibm.com) Optim Performance Manager Development

More information

Abstract 1. Introduction

Abstract 1. Introduction Jaguar: A Distributed Computing Environment Based on Java Sheng-De Wang and Wei-Shen Wang Department of Electrical Engineering National Taiwan University Taipei, Taiwan Abstract As the development of network

More information

Job-Oriented Monitoring of Clusters

Job-Oriented Monitoring of Clusters Job-Oriented Monitoring of Clusters Vijayalaxmi Cigala Dhirajkumar Mahale Monil Shah Sukhada Bhingarkar Abstract There has been a lot of development in the field of clusters and grids. Recently, the use

More information

As related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope

As related works, OMG's CORBA (Common Object Request Broker Architecture)[2] has been developed for long years. CORBA was intended to realize interope HORB: Distributed Execution of Java Programs HIRANO Satoshi Electrotechnical Laboratory and RingServer Project 1-1-4 Umezono Tsukuba, 305 Japan hirano@etl.go.jp http://ring.etl.go.jp/openlab/horb/ Abstract.

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information

UTOPIA: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems. Technical Report CSRI-257. April 1992

UTOPIA: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems. Technical Report CSRI-257. April 1992 UTOPIA: A Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems Songnian Zhou, Jingwen Wang, Xiaohu Zheng, and Pierre Delisle Technical Report CSRI-257 April 1992. (To appear in Software

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information

Polarion 18.2 Enterprise Setup

Polarion 18.2 Enterprise Setup SIEMENS Polarion 18.2 Enterprise Setup POL005 18.2 Contents Overview........................................................... 1-1 Terminology..........................................................

More information

Common Configuration Options

Common Configuration Options Common Configuration Options Unless otherwise noted, the common configuration options that this chapter describes are common to all Genesys server applications and applicable to any Framework server component.

More information

Types of Virtualization. Types of virtualization

Types of Virtualization. Types of virtualization Types of Virtualization Emulation VM emulates/simulates complete hardware Unmodified guest OS for a different PC can be run Bochs, VirtualPC for Mac, QEMU Full/native Virtualization VM simulates enough

More information

Hs01006: Language Features, Arithmetic Operators *

Hs01006: Language Features, Arithmetic Operators * OpenStax-CNX module: m37146 1 Hs01006: Language Features, Arithmetic Operators * R.G. (Dick) Baldwin This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 4.0

More information

Chapter 11: File System Implementation

Chapter 11: File System Implementation Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Parallelizing a seismic inversion code using PVM: a poor. June 27, Abstract

Parallelizing a seismic inversion code using PVM: a poor. June 27, Abstract Parallelizing a seismic inversion code using PVM: a poor man's supercomputer June 27, 1994 Abstract This paper presents experience with parallelization using PVM of DSO, a seismic inversion code developed

More information

Chapter 11: File System Implementation

Chapter 11: File System Implementation Chapter 11: File System Implementation Chapter 11: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency

More information

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. File System Implementation FILES. DIRECTORIES (FOLDERS). FILE SYSTEM PROTECTION. B I B L I O G R A P H Y 1. S I L B E R S C H AT Z, G A L V I N, A N

More information

Chapter 2 CommVault Data Management Concepts

Chapter 2 CommVault Data Management Concepts Chapter 2 CommVault Data Management Concepts 10 - CommVault Data Management Concepts The Simpana product suite offers a wide range of features and options to provide great flexibility in configuring and

More information

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Slides based on the book Operating System Concepts, 9th Edition, Abraham Silberschatz, Peter B. Galvin and Greg Gagne,

More information

How to Break Software by James Whittaker

How to Break Software by James Whittaker How to Break Software by James Whittaker CS 470 Practical Guide to Testing Consider the system as a whole and their interactions File System, Operating System API Application Under Test UI Human invokes

More information

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table

More information

Chapter 11: Implementing File-Systems

Chapter 11: Implementing File-Systems Chapter 11: Implementing File-Systems Chapter 11 File-System Implementation 11.1 File-System Structure 11.2 File-System Implementation 11.3 Directory Implementation 11.4 Allocation Methods 11.5 Free-Space

More information

CUMULVS: Collaborative Infrastructure for Developing. Abstract. by allowing them to dynamically attach to, view, and \steer" a running simulation.

CUMULVS: Collaborative Infrastructure for Developing. Abstract. by allowing them to dynamically attach to, view, and \steer a running simulation. CUMULVS: Collaborative Infrastructure for Developing Distributed Simulations James Arthur Kohl Philip M. Papadopoulos G. A. Geist, II y Abstract The CUMULVS software environment provides remote collaboration

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

1. Operating System Concepts

1. Operating System Concepts 1. Operating System Concepts 1.1 What is an operating system? Operating systems are an essential part of any computer system. An operating system (OS) is software, which acts as an intermediary between

More information

Operating Systems Overview. Chapter 2

Operating Systems Overview. Chapter 2 1 Operating Systems Overview 2 Chapter 2 3 An operating System: The interface between hardware and the user From the user s perspective: OS is a program that controls the execution of application programs

More information

Chapter 11: Implementing File Systems

Chapter 11: Implementing File Systems Chapter 11: Implementing File-Systems, Silberschatz, Galvin and Gagne 2009 Chapter 11: Implementing File Systems File-System Structure File-System Implementation ti Directory Implementation Allocation

More information

A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations

A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations Yuichi Tsujita Abstract A Windows PC cluster is focused for its high availabilities and fruitful

More information

Chapter 12: File System Implementation

Chapter 12: File System Implementation Chapter 12: File System Implementation Silberschatz, Galvin and Gagne 2013 Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods

More information

Figure 1-1: Local Storage Status (cache).

Figure 1-1: Local Storage Status (cache). The cache is the local storage of the Nasuni Filer. When running the Nasuni Filer on a virtual platform, you can configure the size of the cache disk and the copy-on-write (COW) disk. On Nasuni hardware

More information

[8] J. J. Dongarra and D. C. Sorensen. SCHEDULE: Programs. In D. B. Gannon L. H. Jamieson {24, August 1988.

[8] J. J. Dongarra and D. C. Sorensen. SCHEDULE: Programs. In D. B. Gannon L. H. Jamieson {24, August 1988. editor, Proceedings of Fifth SIAM Conference on Parallel Processing, Philadelphia, 1991. SIAM. [3] A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, and V. S. Sunderam. A users' guide to PVM parallel

More information

Native Marshalling. Java Marshalling. Mb/s. kbytes

Native Marshalling. Java Marshalling. Mb/s. kbytes Design Issues for Ecient Implementation of MPI in Java Glenn Judd, Mark Clement, Quinn Snell Computer Science Department, Brigham Young University, Provo, USA Vladimir Getov 2 2 School of Computer Science,

More information

Interme diate DNS. Local browse r. Authorit ative ... DNS

Interme diate DNS. Local browse r. Authorit ative ... DNS WPI-CS-TR-00-12 July 2000 The Contribution of DNS Lookup Costs to Web Object Retrieval by Craig E. Wills Hao Shang Computer Science Technical Report Series WORCESTER POLYTECHNIC INSTITUTE Computer Science

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 03 (version February 11, 2008) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.

More information

Reporting Interface. network link sensor machine machine machine. memory sensor. CPU sensor. 2.1 Sensory Subsystem. Sensor Data. Forecasting Subsystem

Reporting Interface. network link sensor machine machine machine. memory sensor. CPU sensor. 2.1 Sensory Subsystem. Sensor Data. Forecasting Subsystem Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service (Extended Abstract) submitted to SC97 UCSD Technical Report TR-CS97-540 Rich Wolski Neil Spring Chris Peterson

More information

Tutorial 4: Condor. John Watt, National e-science Centre

Tutorial 4: Condor. John Watt, National e-science Centre Tutorial 4: Condor John Watt, National e-science Centre Tutorials Timetable Week Day/Time Topic Staff 3 Fri 11am Introduction to Globus J.W. 4 Fri 11am Globus Development J.W. 5 Fri 11am Globus Development

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

StorSimple Storage Appliance Release Notes. Patch Release Software V2.0.1 ( )

StorSimple Storage Appliance Release Notes. Patch Release Software V2.0.1 ( ) StorSimple Storage Appliance Release Notes Patch Release Software V2.0.1 (2.0.2-84) May, 2012 Page 2 Table of Contents Welcome... 3 Issues Resolved in Version 2.0.1 (2.0.2-84)... 3 Release Notes for Version

More information

SCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE. x8534, x8505,

SCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE. x8534, x8505, SCALABLE TRAJECTORY DESIGN WITH COTS SOFTWARE Kenneth Kawahara (1) and Jonathan Lowe (2) (1) Analytical Graphics, Inc., 6404 Ivy Lane, Suite 810, Greenbelt, MD 20770, (240) 764 1500 x8534, kkawahara@agi.com

More information

Dynamic Process Management in an MPI Setting. William Gropp. Ewing Lusk. Abstract

Dynamic Process Management in an MPI Setting. William Gropp. Ewing Lusk.  Abstract Dynamic Process Management in an MPI Setting William Gropp Ewing Lusk Mathematics and Computer Science Division Argonne National Laboratory gropp@mcs.anl.gov lusk@mcs.anl.gov Abstract We propose extensions

More information

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2

Name: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2 CMU 18-746/15-746 Storage Systems 20 April 2011 Spring 2011 Exam 2 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation

More information

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz.

Blocking vs. Non-blocking Communication under. MPI on a Master-Worker Problem. Institut fur Physik. TU Chemnitz. D Chemnitz. Blocking vs. Non-blocking Communication under MPI on a Master-Worker Problem Andre Fachat, Karl Heinz Homann Institut fur Physik TU Chemnitz D-09107 Chemnitz Germany e-mail: fachat@physik.tu-chemnitz.de

More information

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads

Chapter 4: Threads. Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads Chapter 4: Threads Objectives To introduce the notion of a

More information

Virtuozzo Containers

Virtuozzo Containers Parallels Virtuozzo Containers White Paper An Introduction to Operating System Virtualization and Parallels Containers www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3

More information

A Framework for Building Parallel ATPs. James Cook University. Automated Theorem Proving (ATP) systems attempt to prove proposed theorems from given

A Framework for Building Parallel ATPs. James Cook University. Automated Theorem Proving (ATP) systems attempt to prove proposed theorems from given A Framework for Building Parallel ATPs Geo Sutclie and Kalvinder Singh James Cook University 1 Introduction Automated Theorem Proving (ATP) systems attempt to prove proposed theorems from given sets of

More information

W H I T E P A P E R : T E C H N I C AL. Symantec High Availability Solution for Oracle Enterprise Manager Grid Control 11g and Cloud Control 12c

W H I T E P A P E R : T E C H N I C AL. Symantec High Availability Solution for Oracle Enterprise Manager Grid Control 11g and Cloud Control 12c W H I T E P A P E R : T E C H N I C AL Symantec High Availability Solution for Oracle Enterprise Manager Grid Control 11g and Cloud Control 12c Table of Contents Symantec s solution for ensuring high availability

More information

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8 PROCESSES AND THREADS THREADING MODELS CS124 Operating Systems Winter 2016-2017, Lecture 8 2 Processes and Threads As previously described, processes have one sequential thread of execution Increasingly,

More information

CS307: Operating Systems

CS307: Operating Systems CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn

More information