An Organizational Grid of Federated MOSIX Clusters

Size: px
Start display at page:

Download "An Organizational Grid of Federated MOSIX Clusters"

Transcription

1 2005 IEEE International Symposium on Cluster Computing and the Grid An Organizational Grid of Federated MOSIX Clusters Amnon Barak, Amnon Shiloh, and Lior Amar Department of Computer Science The Hebrew University of Jerusalem Jerusalem, Israel Abstract MOSIX is a cluster management system that uses process migration to allow a Linux cluster to perform like a parallel computer. Recently it has been extended with new features that could make a grid of Linux clusters run as a cooperative system of federated clusters. On one hand, it supports automatic workload distribution among connected clusters that belong to different owners, while still preserving the autonomy of each owner to disconnect its cluster from the grid at any time, without sacrificing migrated processes from other clusters. Other new features of MOSIX include grid-wide automatic resource discovery; a precedence scheme for local processes and among guest processes (from other clusters); flood control; a secure run-time environment (sandbox) which prevents guest processes from accessing local resources in a hosting system, and support of cluster partitions. The resulting grid management system is suitable to create an intra-organizational highperformance computational grid, e.g., in an enterprise or in a campus. The paper presents enhanced and new features of MOSIX and their performance. Keywords: cluster computing, grid computing, process migration, organizational grid, sandbox 1 Introduction Grid computing is an emerging technology that uses the Internet to allow sharing of computational and data resources among geographically dispersed users within and across institutional boundaries [6]. For example, scientific grid computing refer to applications with a large resource requirements that can not be satisfied by traditional clusters or supercomputers. Existing grid packages, such as Cactus [1, 11], Condor [17], Globus [7] and Nimrod/G [14] already provide essential grid services, such as batch scheduling, assignment of processes to nodes, checkpointing, process migration, inter-process communication and remote data access. Due to the diversity of the grid resources, the disruptive environment and the unpredictable requirements of processes, some areas that need further improvements are automatic (on-line) management, including adaptive resource discovery and workload distribution, as well as support of a secure run-time environment for non-local processes. MOSIX [3, 4, 13] is a cluster management system that allows a set of x86 nodes to perform like a single parallel computer. Users can run parallel (and sequential) applications by letting MOSIX automatically seek resources and migrate processes among nodes to improve the overall performance, without changing the run-time environment of the migrated processes. This paper presents a grid management system that extends the cluster version of MOSIX with new features that could make a grid run as a cooperative system of federated clusters. Our system model consists of independent clusters, e.g., of different groups, whose owners wish to share their computational resources, while still maintaining control over their private resources. The main features of the resulting grid system are: 1. Automatic resource discovery: users need not know the details of the configuration or the state of any specific resource. 2. Preemptive (transparent) process migration within and across different clusters, and automatic load-balancing. 3. Adaptive management that responds to changes of available and required resources. 4. A secure run-time environment for guest processes. 5. Support of a flexible configuration: clusters can be partitioned or combined. 6. A run-time precedence for local over guest processes and among guest processes. 7. Flood prevention. 8. Support of a dynamic environment: clusters can be connected or disconnected at any time /05/$ IEEE 350

2 We note that the first four features were obtained by enhancing existing cluster features of MOSIX to the grid environment, while the last four are new features that were developed for that environment. The new system is particularly suitable to run compute intensive and other applications with moderate amounts of I/O, over fast trusted networks, which are common in an enterprise or in a campus grid. For example, in our campus there are 8 MOSIX clusters, ranging from 14 to 50 (mostly dual CPU) nodes. Each cluster is owned by a different group (owner) in various departments. These clusters are connected by a lgb/s campus-wide backbone, which also connects several other clusters (with almost 200 nodes) in student labs, that are not used during nights and weekends. All these clusters could form a campus grid with as many as 500 processors, see Fig. 1. Note that the leaves in the figure are cluster partitions. Figure 1: A 4-level campus grid. The paper is organized as follows: Section 2 presents cluster management features that were enhanced to support grid computing. Section 3 presents new grid management features. Section 4 presents the performance of new features. Section 5 describes related works and Section 6 summarizes our conclusions. 2 From a Cluster to a Grid This section presents cluster features of MOSIX that were enhanced for a grid environment. 2.1 Automatic Resource Discovery Resource discovery is performed by an on-line, hierarchical information dissemination scheme, that provides each node with the latest information about availability and the state of grid-wide resources. The scheme is based on a randomized gossip algorithm, in which each node regularly (e.g. every second) monitors the state of its resources, including the CPU speed and current load, free and used memory, current rates of I/O and network throughput. Each node then sends its most recent information, including indirect information that it has about other nodes, to a randomly chosen node in its cluster. Also, selected information, e.g., on the least loaded nodes, is exchanged among different clusters at a rate that is proportional to the relative distance between the corresponding clusters. In this scheme, information about newly available resources, e.g., clusters that have just become available, is quickly disseminated across the grid while information about nodes in disconnected clusters is quickly phased out, and thus could no longer be used. In [15] we presented bounds for the age properties and the rates of propagation of the above information dissemination scheme. 2.2 Preemptive Process Migration MOSIX supports automatic, grid-wide preemptive process migration, that can migrate almost any process [4]. Migrations are supervised by adaptive on-line algorithms that continuously attempt to improve the performance, e.g., by load-balancing, or by nmigrating processes from slower to faster nodes. These algorithms are particularly useful for applications with unpredictable or changing resource requirements. Within a cluster, process migration amounts to a copy of the memory image of the process and setting its execution environment. To reduce network occupancy in cross cluster migrations, the process memory image is compressed, using the LZOP [12] algorithm. Migration decisions are based on (run-time) process profiling and the latest information on availability of grid resources, as provided by the information dissemination scheme. Process profiling is performed by continuously collecting information about its characteristics, e.g., size, rates of system-calls, volume of IPC and I/O, etc. This information is then used by competitive on-line algorithms [9] to determine the best location for the process. These algorithms take into account the respective speed and current load of the nodes, the size of the migrated process vs. the free memory available 351

3 in different nodes, and the characteristics of the processes. In this scheme, when the profile of a process is changed or new resources become available, the system automatically responds by considering reassignment of processes to better locations. 2.3 A Virtual Run-Time Environment The MOSIX Virtual run-time Environment (MVE) is a software layer that allows a mnigrated process to run in a remote node, away from the (home) node in which it was created [3, 4]. This is accomplished by intercepting the system-calls of such processes. If the process was not migrated, then its system-calls are performed locally. After a process is migrated, its few system-calls that are site-independent are performed in the remote node, while the rest are forwarded to the MVE layer in its home-node, which then performs the system-calls on behalf of the process as if it was running in the home-node. The main outcome of the MVE scheme is that each process, including a migrated process, seems to be running in its home-node, and all the processes of a user's session share the run-time environment of the homenode. As a result, the user gets the illusion of running on a single node system. The drawback of this approach is increased communication overhead, which makes our system suitable to run compute bound and other non-intensive I/O applications in an organizational grid. The MVE layer guarantees that a migrated (guest) process cannot modify or even access local resources, other than CPU and memory in a remote (hosting) node. As explained above, this is accomplished by intercepting the system-calls of guest processes. Special care is taken to ensure that the few system-calls that are performed locally, could not access resources in the hosting node. The rest are forwarded to the home-node of the process. The net result is a secure run-time environment (sandbox) for all guest processes. 3 New Grid Features This section presents the new grid features, including a scheme for sharing cluster partitions, a method for allocating a run-time precedence among processes, flood control - to prevent overloading of nodes, and handling of long running processes in a disruptive environment. 3.1 Sharing of Cluster Partitions Each MOSIX cluster can be divided into several partitions, where a partition is a set of nodes that is usually allocated to one user (partition owner) at a time. The allocation of nodes to partitions could change from time to time, to reflect changing demands. Nodes, if any, that are not allocated, form a special partition. Each user is expected to login only to nodes in its allocated (home) partition in its local (group's) cluster, where all of his/her processes are created. Besides the home partition, a user may (temporary) get on loan additional partitions in other (remote) clusters. MOSIX supports process migration among nodes in all the user's partitions as if they were a single partition. Process migration can be done either automatically, using the load-balancing algorithms, or manually, by explicit requests. To enable grid-wide resource sharing, each owner may designate some nodes in its home partition to host guest processes of other owners. The remaining (reserved) nodes are allocated exclusively for the owner's processes. Guest processes may migrate to designated nodes only if these nodes are not used for a prolonged time (a parameter). They are automatically moved out whenever an owner reclaims its nodes, or when processes with higher precedence are moved in (see below). Since each partition can host processes from different owners, a partition level automatic runtime precedence scheme was developed to distinguish between such guest processes. 3.2 The Precedence Scheme The precedence scheme provides a run-time precedence among processes of different owners. Processes with a higher precedence preempt and push out all processes with a lower precedence. Note that a node may still share its resources among two or more owners with the same precedence. The precedence scheme consists of a precedence allocation method and an enforcement algorithm. Each partition owner is responsible to define a precedence table. The table contains information about partitions whose processes are allowed to move in, and if so, also their precedence. By proper setting of the precedence table, it is possible to combine two partitions in different clusters (symmetrically or asymmetrically), to block migration from specific partitions or even to attach a partition to several clusters. For example, the owner of partition P1 can get on loan a partition P2 in another cluster, if the owner of P2 sets the precedence 352

4 of P1 processes to be equal to that of local processes. Note that processes of P2 are not considered as local processes in P1 and could even be blocked all together. The precedence algorithm has two responsibilities: to migrates out all processes with a lower precedence upon arrival of any processes with a higher precedence, and to guarantee that processes with a higher precedence can always move in, even if processes with a lower precedence are already running there. 3.3 Flood Control Flooding could occur when a user creates a large number of processes, to take advantage of grid resources (nodes), or when a large number of processes migrate back to their owner's partition, e.g., when clusters are disconnected. The first case is prevented by placing a (tunable) limit on the number of guest processes that are allowed to migrate to each node. Processes of a user that attempt to overload the grid beyond the allowable limit are not permitted to migrate. To prevent flooding by a large number of processes, including returning processes, each node has a limit on the number of local running processes. When this limit is reached, additional processes are automatically frozen and their memory images are stored in regular files. This method ensures that a large number of processes could be handled without exhausting the CPU and memory. Frozen processes are reactivated in a circular fashion in order to allow some work to be done, without overloading the owner's nodes. When resources become available again, the load-balancing algorithm migrates running processes away, thus allowing reactivation of more frozen processes. 3.4 Disruptive Configurations In our grid, the owner of each cluster can connect (disconnect) it to (from) the grid at any time, and also can block or allow migration of guest processes from each remote cluster to each node. After a request is issued to disconnect a cluster, if guest processes are present then they are moved out, and if local processes were migrated to other clusters then they are brought back. Note that guest processes are not necessarily migrated back to their respective home-nodes, although they can always do so. For this reason, users are not expected to login and/or initiate processes from remote clusters, since if that is allowed and the clusters are disconnected, those processes may have nowhere to return. In order to manage a large number of returning processes, the scheme relies on the freezing mechanism described in the previous section Support of Long Processes The process migration, freezing and gradual reactivation mechanisms provide support to grid applications that need to run for a long time, e.g., for days, weeks or even months. As explained above, before a remote cluster is disconnected, all guest processes are moved out. These processes are frozen and are gradually reactivated when grid resources become available again. For example, long processes that had been migrated to student farms during the night, were returned to their home cluster in the morning, only to migrate back to the student farms in the next night. 4 Performance This section presents the performance of MOSIX and its new features. We ran tests in a grid with two clusters (C14 and C20) and a workstation that were located in the same building, as well as between nodes in different buildings and different campuses of our university. The C14 cluster consisted of 14 identical (dual-xeon 3.06GHz, 4GB RAM) diskless nodes, and C20 consisted of 20 identical (dual-xeon 3.06GHz, 4GB RAM) diskless nodes. The workstation had a single Pentium IV 3GHz CPU, 1GB RAM, and a 40GB low-cost SATA- IDE disk. The nodes in each cluster were connected by an internal lgb/s Ethernet switch. The two clusters and the workstation were connected by a lgb/s switch. 4.1 Overhead of Migrated Processes This test presents the overhead of running migrated processes in different remote clusters. This overhead includes the migration, the communication and that of the MVE. We used four real-life applications, with increasing amounts of I/O. The first application, RC, is an intensive CPU program that generates random sets of clauses over propositional variables and analyses the satisfiability, size and distribution of variables of each set and its subsets [5]. The second application, SW, produces all possible alignments between pairs of proteins sequences using the Smith-Waterman ajgorithm [20]. SW uses a relative small amount of I/O. JELlium solves the fundamental Schroedinger/Newton equations of motion of electrons and nuclei of a molecule by computing the combined electron-nuclear 353

5 dynamics [2]. JELlium uses a moderate amount of I/O. Lastly, BLAT is a bioinformatics software for rapid mrna/dna and cross-species protein alignments [8], which uses a moderate amount of I/O. We used identical Xeon 3.06GHz nodes and ran each program using four different settings: first, as a local (native Linux) process; then as a migrated process to a remote node in a cluster that was located in the same building and was connected by a lgb/s Ethernet; then as a migrated process to a cluster across campus (located about 1 Km away) that was connected once by a lgb/s and again by a 10OMb/s Ethernet (for reference), and lastly as a migrated process to a cluster in a campus across town (located about 10 Km away) that was connected by a 10OMb/s Ethernet. We note that in the last four tests, each process was migrated to the remote node immediately after its creation and it performed all its I/O and site-dependent system-calls in its home-node, across the network. The results of these tests (averaged over 5 runs) are shown in Table 1. In the table, the first four lines show the local run-time (Sec.) of each program, followed by the respective total amounts of I/O (MB), the block size (KB) used and the number of remote systemcalls performed. The next two pairs of rows list the run-times and the corresponding slowdown (vs. the local time) between clusters in the same building and across a campus grid using a lgb/s Ethernet. The last two pairs of rows list the run-times and the corresponding slowdown in the above campus grid and a grid across town using a 10OMb/s Ethernet. Table 1: Local vs. Remote Run-times RC SWI JEL BLAT 11 Local time Total I/O Block size - 32KB 32KB 64KB R-Syscalls 3,050 16,700 7,200 7,800 l lgb/s Ethernet Building Slowdown 0.32% 1.47% 1.16% 1.39% Campus Slowdown 0.50% 1.85% 1.18% 1.67% 10OMb/s Ethernet Campus Slowdown 0.72% 3.61% 9.10% 7.82% Town Slowdown 1.15% 11.80% 12.16% 10.95% From Table 1 it can be seen that with a lgb/s Ethernet the average slowdown of all the tested programs in the same campus was 1.2%, with less than 2% maximal slowdown for all the tested programs. The corresponding average slowdowns with a 10OMb/s Ethernet were 5.31% in the same campus and 9.02% across town. These results were expected due to the increased latencies and lower bandwidth of the 10OMb/s vs. lgb/s Ethernet, which affected mostly programs that performed I/0. In comparison, the slowdown of the CPUbound program (RC) increased only by 0.22% across camnpus and by 0.65% across town. These results confirm the claim that our system is suitable for applications with a moderate amount of I/O over fast networks in an enterprise or in a campus grid. 4.2 Grid-wide Load-balancing In The next two tests we measured the time needed to balance the load between the C14 and C20 clusters. Initially, all the nodes of C20 were idle, this cluster had one reserved node and no limit on the number of guest processes in the remaining nodes. Cluster C14 was disconnected from the grid. We created a set of 66 identical CPU-bound processes with a high precedence in C14 (the processes were dispersed evenly among all the nodes). Each process was allocated an array of size 64MB, which was continuously accessed, so as to keep the whole array in memory. The number of processes was chosen such that when C14 is connected to the grid, each node in the two clusters (except the reserved node in C20) will have two processes (equilibrium). The test started by a cluster-connect command in C14, allowing processes to migrate to C20, and ended when equilibrium was reached. The measured time (averaged over 5 runs) to balance the load between the two clusters is shown in Table 2, Line 1. From the obtained result it follows that the average migration rate was one process every 1.4 Sec., slightly higher than the 1 Sec. residence time imposed on processes by the loadbalancing algorithm, to prevent process swinging. Table 2: Load-balancing Times Initial No. Initial No. Equilibrium of Processes of Processes (migration) in C14 in C20 Time Sec Sec 354

6 4.2.1 The Precedence Scheme This test measures the time needed by higher precedence guest processes to replace lower precedence guest processes, using the load-balancing and the precedence algorithms. As in the previous test, initially C14 was disconnected from the grid and it ran a set of 66 identical higher precedence processes, while C20 had one reserved node and no limit on the number of guest processes in its remaining nodes. First we migrated 38 lower precedence processes (same type and size as above) from the workstation to the nodes of C20. We then issued a cluster-connect command to C14, allowing processes to migrate to C20. We measured the time until all the lower precedence processes were migrated back to the workstation and higher precedence processes were migrated from C14 to C20 and equilibrium was reached. The result of this test (averaged over 5 runs) is shown in Table 2, Line 2. From the result it follows that the average migration rate was one process every 1.6 Sec. As can be expected, the measured time is more than double the migration time in the previous test (Table 2, Line 1), since twice the number of processes were migrated and also because all the lower precedence processes were migrated to one workstation. 4.3 Disruptive Configurations This test provides an estimate for the time to move out guest processes from a hosting cluster that is about to be disconnected from the grid. Note that the measured time is similar to the time needed to bring back (to an owner's cluster) an identical set of local processes that were migrated to other clusters. We created a set of identical CPU-bound processes (same as before) in all the nodes of the C14 cluster. We then forced all the processes to migrate to the C20 cluster, to simulate a scenario in which the nodes of C20 were faster than the nodes of C14. The test started by initiating a cluster-disconnect command in C20, which forced all the guest processes to move out, and ended when all the guest processes were running in C14. The results of this test for different numbers of processes and different sizes, are presented in Table 3. In the table, Column 1 lists the total number of guest processes; Column 2 lists the size of each process; Column 3 shows the average (over 4 runs) of the measured migration times, and Column 4 shows the migration rates (MB/s), obtained by dividing the total amount of mi- Table 3: Time to Evacuate a Cluster No. of [Process Migration Processes Size Time Rate [[40 64 MB 26 Sec 98 MB/Sec MB 101 Sec 101 MB/Sec MB 198 Sec 103 MB/Sec MB 397 Sec 103 MB/Sec MB 50 Sec 1102 MB/Sec II MB 192 Sec 106 MB/Sec MB 388 Sec 105 MB/Sec U _. grated data (Column 1 times Column 2) by the total migration time (Column 3). The obtained results show that MOSIX can move out guest processes in a reasonable time, and also provides an estimate to delays that owners who participate in grid activities can expect when reclaiming a cluster. Note that process migration was performed at an average (weighted over all cases) rate of MB/s, which is over 93% of the 110 MB/s measured rate of TCP/IP between a pair of nodes in the different clusters. Finally, observe that the measured time to migrate out 40 processes from C20 (Table 3, Line 1) was twice faster than the time required to balance the load between C14 and C20 (Table 2, Line 1). As explained above, this is due to the restraining nature of the loadbalancing algorithm to prevent process swinging Handling a Large Number of Processes This test demonstrates how long it takes to migrate back processes when the workstation in which they were created is about to be disconnected from the grid. We created a set of identical processes (same kind as before) in the workstation and migrated all the processes to the C20 and C14 clusters, so that there were an equal number of processes in each node. The test started by a cluster-disconnect command in the workstation, which forced all the remote processes to be suspended and move back to the workstation according to the flood control scheme described in Section 3.3. We measured the time until all the processes were frozen in a local disk of the workstation. The results of this test for different numbers of processes and different sizes, are presented in Table 4. In the table, Column 1 lists the total number of processes; Column 2 lists the size of each process; Column 3 shows the average (over 5 runs) of the total migration times 355

7 Table 4: Time to Disconnect a Workstation No. of Process 1 Migration Processes Size Time l Rate MB 69 Sec 32.6 MB/Sec MB 266 Sec 33.1 MB/Sec MB 520 Sec 33.7 MB/Sec MB 1127 Sec 31.1 MB/Sec MB 143 Sec 30.9 MB/Sec MB 552 Sec 31.9 MB/Sec 68 [512 MB Sec f 30.0 MB/Sec MB 299 Sec 29.9 MB/Sec MB 1197 Sec 29.4 MB/Sec until all the processes were stored in the workstation, and Column 4 shows the migration rates (in MB/s), which reflect the throughput of the (low-cost) disk at the workstation. The results of this test show that MOSIX is capable to migrate back and store a relatively large number of processes in a reasonable time. This means that even a user with one workstation could use grid resources when they are available and store processes locally, until such resources become available. The specific results provide an estimate for the time delays to disconnect a workstation from the grid. Obviously, the presented times are expected to be shorter if the migrations are back to a cluster instead of to a workstation. 5 Related Works Several grid management projects address issues that are presented in this paper. The usefulness of automatic process migration for grid-wide performance improvements of communicating processes was shown in [18]. The presented method assumes knowledge of the application run-times. It relies on a user-level checkpointing library which was implemented on top of MPI. Cactus [11] uses the "worm unit", an independent entity which is aware of grid information services and resource brokers, to migrate tasks in a dynamic grid environment. Migration is performed by an application level checkpointing, which requires some modification of applications. Condor [17] supports "preemptive resume" scheduling by linking compatible, independent applications with a library, which allows checkpointing and process migration. Its flocking mechanism allows aggregation of several pools into a single entity while preserving the owners right. Flocking is usually transparent to the users. This is accomplished by gateway machines that isolate each pool from the flock. The pool owner can connect or disconnect its pool from the grid anytime and share resources with any set of remote pools. Each such remote pool can be given a different priority. A framework for creating and managing a secure runtime environment in a grid is presented in [10]. It is shown how such a remote environment could be interfaced with Globus [7]. We believe that the MVE layer could be incorporated with this framework. Astrolab [19] and Network Weather Service (NWS) [21] are two systems that provide updated information about grid resources. Astrolab uses an hierarchical gossip protocol in which the rate of exchanged information depends on the proximity of nodes in an hierarchy tree. The algorithm that is used by our system is similar to the one used by Astrolab. NWS monitors and predicts the performance of computational grid resources, including network related resources. These predictions are used for adaptive application scheduling and migration decisions. 6 Conclusions and Future Work The paper presented enhanced cluster and new grid features of MOSEX for automatic management of resources in an organizational grid. These features include resource discovery and workload distribution; a precedence scheme for local processes and among guest processes; support of a flexible configuration; preservation of running processes when clusters are disconnected; flood control and a secure run-time environment for guest processes. In addition to the convenience of use, the performance of the system over a campus grid was nearly that of a local cluster. The obtained results confirm the claim that our system is suitable for compute bound and other applications with a moderate amount of I/O over fast networks in an enterprise or in a campus grid. A prototype grid with 8 clusters (almost 500 processors) is now under construction in our campus. Since not every cluster is used at all times, our system will allow better utilization of already existing clusters, e.g., in student labs, by users who need to run demanding applications, but cannot afford to own large clusters. The work described in this paper could be extended in several ways. First, we plan to incorporate an intermediate standard grid service layer, such as Globus [7], for secure communication, process and data migration. 356

8 Another possibility is to combine our system with a grid package that already uses Globus. An interesting extension of our work would be to migrate groups of communicating processes (a job) from one cluster to another, e.g., by generalization of the session migration of Zap [16], so that all inter-process communication, including remote system-calls within a group is confined to the same cluster, thus further reducing the communication overhead between a migrated process and its home-node. Another subject that we plan to investigate is methods for a fair-share distribution of the grid resources among users and cluster owners, e.g., based on a credit system, to encourage owners to connect their clusters to the grid. A much more difficult challenge is to develop a secure run-time environment, in which a hosting system could not interfere with guest processes. Acknowledgments We wish to thank R. Baer, A. Harel, J. Kent and E. Lozinskii for contributing applications for our tests and D. Braniss for his help. This research was supported in part by grants from the Ministry of Defense, from Mrs. B. Liber and from Dr. and Mrs. Silverston, Cambridge, UK. References [1] G. Allen, D. Angulo, I. Foster, G. Lanfermann, C. Liu, T. Radke, E. Seidel, and J. Shalf, "The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment," Int. J. of High Performance Applications and Supercomputing, 15(4), 2001, pp [2] R. Baer, and R. Gould, "A Method for ab Initio Nonlinear Electron Density Evolution," J. Chem. Phys., 114(8), 2001, pp [3] L. Amar, A. Barak, and A. Shiloh, "The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems," Cluster Computing, 7(2), 2004, pp [4] A. Barak, 0. La'adan, and A. Shiloh, "Scalable Cluster Computing with MOSIX for Linux," Proc. 5th Annual Linux Expo, Raleigh, NC, May 1999, pp [5] E. Birnbaum, and E. L. Lozinskii, "Consistent subsets of inconsistent systems: structure and behavior," J. of Experimental and Theoretical Artificial Intelligence, 15(1), 2003, pp [6] I. Foster, and C. Kesselman, eds. The Grid: Blueprint for a New Computing Infrastructure, Morgan- Kaufman Publishers, Inc. San Francisco, CA, [7] Globus, [8] W.J. Kent, "BLAT - The BLAST-Like Alignment Tool," Genome Res., 12(4), 2002, pp [9] A. Keren, and A. Barak, "Opportunity Cost Algorithms for Reduction of I/O and Interprocess Communication Overhead in a Computing Cluster," IEEE Tran. Parallel and Dist. Systems, 14(1), 2003, pp [10] K. Keahey, K. Doering, and I. Foster, "From Sandbox to Playground: Dynamic Virtual Environments in the Grid," Fifth IEEE/ACM Int. Workshop on Grid Computing (GRID'04), Pittsburgh, PA, Nov. 2004, pp [11] G. Lanfermann, G. Allen, T. Radke, E. Seidel, "Nomadic Migration: A New Tool for Dynamic Grid Computing," Proc. 10th IEEE Int. Symp. on High Performance Distributed Computing (HPDC-10'01), San Francisco, CA, Aug. 2001, pp [12] LZOP, [13] MOSIX, [14] Nimrod: Tools for Distributed Parametric Modeling, nimrod/nimrodg.htm, [15] I. Peer, A. Barak, and L. Amar, "A Gossip-Based Distributed Bulletin Board with Guaranteed Age Properties," Accepted to the Int. J. on Parallel Programming, Oct [16] S. Osman, D. Subhraveti, G. Su, and J. Nieh, "The Design and Implementation of Zap: A System for Migrating Computing Environments," Proc. 5th Symp. on Operating Systems Design and Implementation, Boston, MA, Dec. 2002, pp [17] A. Roy, and M. Livny, "Condor and Preemptive Resume Scheduling," Grid Resource Management: State of the Art and Future Trends, Kluwer Academic Publishers, 2003, pp [18] S. Vadhiyar, and J. Dongarra, "A performance Oriented Migration Framework for the Grid," Proc. 3rd IEEEI/ACM Int. Symp. on Cluster Computing and the Grid (CCGrid 2003), May 2003, pp [19] R. van Renesse, K. P. Birman, and W. Vogels, "Astrolabe: A Robust and Scalable Technology For Distributed Systems Monitoring, Management, and Data Mining," ACM Tran. on Computer Systems, 21(3), 2003, pp [20] T. F. Smith, and M. S. Waterman, 'Identification of Common Molecular Subsequences," J. of Mol. Biol., 147(1), 1981, pp [21] R. Wolski, "Experiences with Predicting Resource Performance On-line in Computational Grid Settings," ACM SIGMETRICS Performance Evaluation Review," 30(4), March 2003, pp

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University.

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University. Overview of MOSIX Prof. Amnon Barak Computer Science Department The Hebrew University http:// www.mosix.org Copyright 2006-2017. All rights reserved. 1 Background Clusters and multi-cluster private Clouds

More information

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds Prof. Amnon Barak Department of Computer Science The Hebrew University of Jerusalem http:// www. MOSIX. Org 1 Background

More information

The MOSIX Algorithms for Managing Cluster, Multi-Clusters and Cloud Computing Systems

The MOSIX Algorithms for Managing Cluster, Multi-Clusters and Cloud Computing Systems The MOSIX Algorithms for Managing Cluster, Multi-Clusters and Cloud Computing Systems Prof. Amnon Barak Department of Computer Science The Hebrew University of Jerusalem http:// www. MOSIX. Org 1 Background

More information

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds

The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds The MOSIX Algorithms for Managing Cluster, Multi-Clusters, GPU Clusters and Clouds Prof. Amnon Barak Department of Computer Science The Hebrew University of Jerusalem http:// www. MOSIX. Org 1 Background

More information

The MOSIX Scalable Cluster Computing for Linux. mosix.org

The MOSIX Scalable Cluster Computing for Linux.  mosix.org The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What

More information

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2016 Hermann Härtig LECTURE OBJECTIVES starting points independent Unix processes and block synchronous execution which component (point in

More information

Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties

Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2009; 21:1907 1927 Published online 31 March 2009 in Wiley InterScience (www.interscience.wiley.com)..1418 Randomized

More information

A Data-Aware Resource Broker for Data Grids

A Data-Aware Resource Broker for Data Grids A Data-Aware Resource Broker for Data Grids Huy Le, Paul Coddington, and Andrew L. Wendelborn School of Computer Science, University of Adelaide Adelaide, SA 5005, Australia {paulc,andrew}@cs.adelaide.edu.au

More information

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing

Exercises: April 11. Hermann Härtig, TU Dresden, Distributed OS, Load Balancing Exercises: April 11 1 PARTITIONING IN MPI COMMUNICATION AND NOISE AS HPC BOTTLENECK LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS 2017 Hermann Härtig THIS LECTURE Partitioning: bulk synchronous

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information

Node Circulation in Cluster Computing Using ORCS

Node Circulation in Cluster Computing Using ORCS 2012 International Conference on Information and Network Technology (ICINT 2012) IPCSIT vol. 37 (2012) (2012) IACSIT Press, Singapore Node Circulation in Cluster Computing Using ORCS S.M.Krishna Ganesh

More information

Economically Enhanced MOSIX for Market-based Scheduling in Grid OS

Economically Enhanced MOSIX for Market-based Scheduling in Grid OS Economically Enhanced MOSIX for Market-based Scheduling in Grid OS Lior Amar, Jochen Stößer, Amnon Barak, Dirk Neumann Institute of Computer Science The Hebrew University of Jerusalem Jerusalem, 91904

More information

Low Cost Supercomputing. Rajkumar Buyya, Monash University, Melbourne, Australia. Parallel Processing on Linux Clusters

Low Cost Supercomputing. Rajkumar Buyya, Monash University, Melbourne, Australia. Parallel Processing on Linux Clusters N Low Cost Supercomputing o Parallel Processing on Linux Clusters Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org http://www.dgs.monash.edu.au/~rajkumar Agenda Cluster? Enabling

More information

Enhanced Round Robin Technique with Variant Time Quantum for Task Scheduling In Grid Computing

Enhanced Round Robin Technique with Variant Time Quantum for Task Scheduling In Grid Computing International Journal of Emerging Trends in Science and Technology IC Value: 76.89 (Index Copernicus) Impact Factor: 4.219 DOI: https://dx.doi.org/10.18535/ijetst/v4i9.23 Enhanced Round Robin Technique

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

EMC VPLEX Geo with Quantum StorNext

EMC VPLEX Geo with Quantum StorNext White Paper Application Enabled Collaboration Abstract The EMC VPLEX Geo storage federation solution, together with Quantum StorNext file system, enables a global clustered File System solution where remote

More information

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University

CPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou (  Zhejiang University Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads

More information

GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid

GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid Wahid Chrabakh University of California Santa Barbara Department of Computer Science Santa Barbara, CA chrabakh@cs.ucsb.edu

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems

Processes. CS 475, Spring 2018 Concurrent & Distributed Systems Processes CS 475, Spring 2018 Concurrent & Distributed Systems Review: Abstractions 2 Review: Concurrency & Parallelism 4 different things: T1 T2 T3 T4 Concurrency: (1 processor) Time T1 T2 T3 T4 T1 T1

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT:

HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms. Author: Correspondence: ABSTRACT: HPC Considerations for Scalable Multidiscipline CAE Applications on Conventional Linux Platforms Author: Stan Posey Panasas, Inc. Correspondence: Stan Posey Panasas, Inc. Phone +510 608 4383 Email sposey@panasas.com

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

Dynamic Load balancing for I/O- and Memory- Intensive workload in Clusters using a Feedback Control Mechanism

Dynamic Load balancing for I/O- and Memory- Intensive workload in Clusters using a Feedback Control Mechanism Dynamic Load balancing for I/O- and Memory- Intensive workload in Clusters using a Feedback Control Mechanism Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson Department of Computer Science and Engineering

More information

NODE PROGRESS IN CLUSTER COMPUTING

NODE PROGRESS IN CLUSTER COMPUTING NODE PROGRESS IN CLUSTER COMPUTING S.M.Krishna Ganesh 1 and A.SilesBalaSingh 2 1 Department of Computer Engineering, SJUIT, Tanzania, 2 Department of Computer Engineering, SJUIT, Tanzania, ABSTRACT The

More information

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA Motivation Scientific experiments are generating large amounts of data Education

More information

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification: Application control : Boundary control : Access Controls: These controls restrict use of computer system resources to authorized users, limit the actions authorized users can taker with these resources,

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

OASIS: Self-tuning Storage for Applications

OASIS: Self-tuning Storage for Applications OASIS: Self-tuning Storage for Applications Kostas Magoutis, Prasenjit Sarkar, Gauri Shah 14 th NASA Goddard- 23 rd IEEE Mass Storage Systems Technologies, College Park, MD, May 17, 2006 Outline Motivation

More information

VIRTUALIZATION PERFORMANCE: VMWARE VSPHERE 5 VS. RED HAT ENTERPRISE VIRTUALIZATION 3

VIRTUALIZATION PERFORMANCE: VMWARE VSPHERE 5 VS. RED HAT ENTERPRISE VIRTUALIZATION 3 VIRTUALIZATION PERFORMANCE: VMWARE VSPHERE 5 VS. RED HAT ENTERPRISE VIRTUALIZATION 3 When you invest in a virtualization platform, you can maximize the performance of your applications and the overall

More information

Managing CAE Simulation Workloads in Cluster Environments

Managing CAE Simulation Workloads in Cluster Environments Managing CAE Simulation Workloads in Cluster Environments Michael Humphrey V.P. Enterprise Computing Altair Engineering humphrey@altair.com June 2003 Copyright 2003 Altair Engineering, Inc. All rights

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group WHITE PAPER: BEST PRACTICES Sizing and Scalability Recommendations for Symantec Rev 2.2 Symantec Enterprise Security Solutions Group White Paper: Symantec Best Practices Contents Introduction... 4 The

More information

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware

Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware openbench Labs Executive Briefing: March 13, 2013 Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware Optimizing I/O for Increased Throughput and Reduced

More information

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002

M. Roehrig, Sandia National Laboratories. Philipp Wieder, Research Centre Jülich Nov 2002 Category: INFORMATIONAL Grid Scheduling Dictionary WG (SD-WG) M. Roehrig, Sandia National Laboratories Wolfgang Ziegler, Fraunhofer-Institute for Algorithms and Scientific Computing Philipp Wieder, Research

More information

Keywords: disk throughput, virtual machine, I/O scheduling, performance evaluation

Keywords: disk throughput, virtual machine, I/O scheduling, performance evaluation Simple and practical disk performance evaluation method in virtual machine environments Teruyuki Baba Atsuhiro Tanaka System Platforms Research Laboratories, NEC Corporation 1753, Shimonumabe, Nakahara-Ku,

More information

PARDA: Proportional Allocation of Resources for Distributed Storage Access

PARDA: Proportional Allocation of Resources for Distributed Storage Access PARDA: Proportional Allocation of Resources for Distributed Storage Access Ajay Gulati, Irfan Ahmad, Carl Waldspurger Resource Management Team VMware Inc. USENIX FAST 09 Conference February 26, 2009 The

More information

Virtuozzo Containers

Virtuozzo Containers Parallels Virtuozzo Containers White Paper An Introduction to Operating System Virtualization and Parallels Containers www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections

More information

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing Sanya Tangpongprasit, Takahiro Katagiri, Hiroki Honda, Toshitsugu Yuba Graduate School of Information

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

The DETER Testbed: Overview 25 August 2004

The DETER Testbed: Overview 25 August 2004 The DETER Testbed: Overview 25 August 2004 1. INTRODUCTION The DETER (Cyber Defense Technology Experimental Research testbed is a computer facility to support experiments in a broad range of cyber-security

More information

SMD149 - Operating Systems

SMD149 - Operating Systems SMD149 - Operating Systems Roland Parviainen November 3, 2005 1 / 45 Outline Overview 2 / 45 Process (tasks) are necessary for concurrency Instance of a program in execution Next invocation of the program

More information

Transparent Process Migration for Distributed Applications in a Beowulf Cluster

Transparent Process Migration for Distributed Applications in a Beowulf Cluster Transparent Process Migration for Distributed Applications in a Beowulf Cluster Mark Claypool and David Finkel Department of Computer Science Worcester Polytechnic Institute Worcester, MA 01609 USA {claypool

More information

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract

More information

Live Virtual Machine Migration with Efficient Working Set Prediction

Live Virtual Machine Migration with Efficient Working Set Prediction 2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore Live Virtual Machine Migration with Efficient Working Set Prediction Ei Phyu Zaw

More information

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware 2010 VMware Inc. All rights reserved About the Speaker Hemant Gaidhani Senior Technical

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin A Decoupled Scheduling Approach for the GrADS Program Development Environment DCSL Ahmed Amin Outline Introduction Related Work Scheduling Architecture Scheduling Algorithm Testbench Results Conclusions

More information

Assessing performance in HP LeftHand SANs

Assessing performance in HP LeftHand SANs Assessing performance in HP LeftHand SANs HP LeftHand Starter, Virtualization, and Multi-Site SANs deliver reliable, scalable, and predictable performance White paper Introduction... 2 The advantages of

More information

LECTURE 3:CPU SCHEDULING

LECTURE 3:CPU SCHEDULING LECTURE 3:CPU SCHEDULING 1 Outline Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation 2 Objectives

More information

Computational Mini-Grid Research at Clemson University

Computational Mini-Grid Research at Clemson University Computational Mini-Grid Research at Clemson University Parallel Architecture Research Lab November 19, 2002 Project Description The concept of grid computing is becoming a more and more important one in

More information

Operating Systems. Computer Science & Information Technology (CS) Rank under AIR 100

Operating Systems. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Operating Systems Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

New research on Key Technologies of unstructured data cloud storage

New research on Key Technologies of unstructured data cloud storage 2017 International Conference on Computing, Communications and Automation(I3CA 2017) New research on Key Technologies of unstructured data cloud storage Songqi Peng, Rengkui Liua, *, Futian Wang State

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

AMD Opteron Processors In the Cloud

AMD Opteron Processors In the Cloud AMD Opteron Processors In the Cloud Pat Patla Vice President Product Marketing AMD DID YOU KNOW? By 2020, every byte of data will pass through the cloud *Source IDC 2 AMD Opteron In The Cloud October,

More information

MOSIX Tutorial. L. Amar, A. Barak, T. Maoz, E. Meiri, A. Shiloh Computer Science Hebrew University. www. MOSIX. org

MOSIX Tutorial. L. Amar, A. Barak, T. Maoz, E. Meiri, A. Shiloh Computer Science Hebrew University.  www. MOSIX. org MOSIX Tutorial L. Amar, A. Barak, T. Maoz, E. Meiri, A. Shiloh Computer Science Hebrew University http:// www. MOSIX. org Copyright 2006-2017. All rights reserved. 1 About This tutorial has 2 parts: Part

More information

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays Dell EMC Engineering December 2016 A Dell Best Practices Guide Revisions Date March 2011 Description Initial

More information

Day 9: Introduction to CHTC

Day 9: Introduction to CHTC Day 9: Introduction to CHTC Suggested reading: Condor 7.7 Manual: http://www.cs.wisc.edu/condor/manual/v7.7/ Chapter 1: Overview Chapter 2: Users Manual (at most, 2.1 2.7) 1 Turn In Homework 2 Homework

More information

Cloud Computing. Summary

Cloud Computing. Summary Cloud Computing Lectures 2 and 3 Definition of Cloud Computing, Grid Architectures 2012-2013 Summary Definition of Cloud Computing (more complete). Grid Computing: Conceptual Architecture. Condor. 1 Cloud

More information

Chapter 3 Virtualization Model for Cloud Computing Environment

Chapter 3 Virtualization Model for Cloud Computing Environment Chapter 3 Virtualization Model for Cloud Computing Environment This chapter introduces the concept of virtualization in Cloud Computing Environment along with need of virtualization, components and characteristics

More information

Designing Issues For Distributed Computing System: An Empirical View

Designing Issues For Distributed Computing System: An Empirical View ISSN: 2278 0211 (Online) Designing Issues For Distributed Computing System: An Empirical View Dr. S.K Gandhi, Research Guide Department of Computer Science & Engineering, AISECT University, Bhopal (M.P),

More information

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007

EMC CLARiiON CX3-40. Reference Architecture. Enterprise Solutions for Microsoft Exchange 2007 Enterprise Solutions for Microsoft Exchange 2007 EMC CLARiiON CX3-40 Metropolitan Exchange Recovery (MER) for Exchange Server Enabled by MirrorView/S and Replication Manager Reference Architecture EMC

More information

EMC CLARiiON Backup Storage Solutions

EMC CLARiiON Backup Storage Solutions Engineering White Paper Backup-to-Disk Guide with Computer Associates BrightStor ARCserve Backup Abstract This white paper describes how to configure EMC CLARiiON CX series storage systems with Computer

More information

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S.

Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces. Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Automatic Identification of Application I/O Signatures from Noisy Server-Side Traces Yang Liu Raghul Gunasekaran Xiaosong Ma Sudharshan S. Vazhkudai Instance of Large-Scale HPC Systems ORNL s TITAN (World

More information

An Evaluation of Alternative Designs for a Grid Information Service

An Evaluation of Alternative Designs for a Grid Information Service An Evaluation of Alternative Designs for a Grid Information Service Warren Smith, Abdul Waheed *, David Meyers, Jerry Yan Computer Sciences Corporation * MRJ Technology Solutions Directory Research L.L.C.

More information

CHAPTER 7 CONCLUSION AND FUTURE SCOPE

CHAPTER 7 CONCLUSION AND FUTURE SCOPE 121 CHAPTER 7 CONCLUSION AND FUTURE SCOPE This research has addressed the issues of grid scheduling, load balancing and fault tolerance for large scale computational grids. To investigate the solution

More information

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:

More information

UNICORE Globus: Interoperability of Grid Infrastructures

UNICORE Globus: Interoperability of Grid Infrastructures UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057

More information

Estimate performance and capacity requirements for Access Services

Estimate performance and capacity requirements for Access Services Estimate performance and capacity requirements for Access Services This document is provided as-is. Information and views expressed in this document, including URL and other Internet Web site references,

More information

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays Dell EqualLogic Best Practices Series Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays A Dell Technical Whitepaper Jerry Daugherty Storage Infrastructure

More information

EMC VPLEX with Quantum Stornext

EMC VPLEX with Quantum Stornext White Paper Application Enabled Collaboration Abstract The EMC VPLEX storage federation solution together with Quantum StorNext file system enables a stretched cluster solution where hosts has simultaneous

More information

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance 11 th International LS-DYNA Users Conference Computing Technology LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton

More information

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs

Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr

More information

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract.

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract. Grid-JQA : Grid Java based Quality of service management by Active database L. Mohammad Khanli M. Analoui Ph.D. student C.E. Dept. IUST Tehran, Iran Khanli@iust.ac.ir Assistant professor C.E. Dept. IUST

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

An Architecture For Computational Grids Based On Proxy Servers

An Architecture For Computational Grids Based On Proxy Servers An Architecture For Computational Grids Based On Proxy Servers P. V. C. Costa, S. D. Zorzo, H. C. Guardia {paulocosta,zorzo,helio}@dc.ufscar.br UFSCar Federal University of São Carlos, Brazil Abstract

More information

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme Yue Zhang, Yunxia Pei To cite this version: Yue Zhang, Yunxia Pei. A Resource Discovery Algorithm in Mobile Grid Computing

More information

Solving the N-Body Problem with the ALiCE Grid System

Solving the N-Body Problem with the ALiCE Grid System Solving the N-Body Problem with the ALiCE Grid System Dac Phuong Ho 1, Yong Meng Teo 2 and Johan Prawira Gozali 2 1 Department of Computer Network, Vietnam National University of Hanoi 144 Xuan Thuy Street,

More information

Problems for Resource Brokering in Large and Dynamic Grid Environments

Problems for Resource Brokering in Large and Dynamic Grid Environments Problems for Resource Brokering in Large and Dynamic Grid Environments Cătălin L. Dumitrescu Computer Science Department The University of Chicago cldumitr@cs.uchicago.edu (currently at TU Delft) Kindly

More information

OPERATING SYSTEM. Functions of Operating System:

OPERATING SYSTEM. Functions of Operating System: OPERATING SYSTEM Introduction: An operating system (commonly abbreviated to either OS or O/S) is an interface between hardware and user. OS is responsible for the management and coordination of activities

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1 What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information

Data Center Interconnect Solution Overview

Data Center Interconnect Solution Overview CHAPTER 2 The term DCI (Data Center Interconnect) is relevant in all scenarios where different levels of connectivity are required between two or more data center locations in order to provide flexibility

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid

Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid Maoyuan Xie, Zhifeng Yun, Zhou Lei, Gabrielle Allen Center for Computation & Technology, Louisiana State University,

More information

Special Topics: CSci 8980 Edge History

Special Topics: CSci 8980 Edge History Special Topics: CSci 8980 Edge History Jon B. Weissman (jon@cs.umn.edu) Department of Computer Science University of Minnesota P2P: What is it? No always-on server Nodes are at the network edge; come and

More information

Managing Performance Variance of Applications Using Storage I/O Control

Managing Performance Variance of Applications Using Storage I/O Control Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

Using Transparent Compression to Improve SSD-based I/O Caches

Using Transparent Compression to Improve SSD-based I/O Caches Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr

More information

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles Application Performance in the QLinux Multimedia Operating System Sundaram, A. Chandra, P. Goyal, P. Shenoy, J. Sahni and H. Vin Umass Amherst, U of Texas Austin ACM Multimedia, 2000 Introduction General

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

Introduction to Operating. Chapter Chapter

Introduction to Operating. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information