Multi-site Jobs Management System (MJMS): A tool to manage multi-site MPI applications execution in Grid environment

Size: px
Start display at page:

Download "Multi-site Jobs Management System (MJMS): A tool to manage multi-site MPI applications execution in Grid environment"

Transcription

1 Multi-site Jobs Management System (MJMS): A tool to manage multi-site MPI applications execution in Grid environment This work has been partially supported by Italian Ministry of Education,University and Research (MIUR) within the activities of the WP9 work package Grid Enabled Scientific Libraries, part of the MIUR FIRB RBNE01KNFP Grid.it project. Jaime Frey, Francesco Gregoretti, Giuliano Laccetti, Almerico Murli and Gennaro Oliva Department of Computer Science, University of Wisconsin, Madison, WI jfrey@cs.wisc.edu Institute for High Performance Computing and Networking ICAR-CNR, Naples, Italy {francesco.gregoretti,gennaro.oliva@na.icar.cnr.it University of Naples Federico II, Naples, Italy {giuliano.laccetti,almerico.murli@dma.unina.it Abstract Multi-site parallel applications consist of multiple distributed processes running on one or more potentially heterogeneous resources in different locations. This class of applications can effectively use the Grid for high-performance computing. We propose a multi-site job management system (MJMS) based on the Globus Toolkit, the MPICH-G2 library and Condor-G for the effective, reliable and secure execution of multi-site parallel applications in the Grid environment. This system allows the user to submit execution requests specifying application requirements and preferences in a high-level language (following Condor submit file syntax) freeing her from the tasks of resource discovery and co-allocation: it efficiently selects available computing resources for execution according to the application requirements and interacts with the local management systems through the Condor-G system. I. Introduction and state of the art Multi-site parallel applications consist of multiple distributed processes running on one or more potentially heterogeneous resources in different locations. This class of applications can effectively use the Grid for highperformance computing by grouping computing resources for a single application execution. In such a distributed environment programmers can benefit from the use of the message passing paradigm and in particular of the MPI standard because it is easy to understand and use, architecture-independent and portable. Furthermore MPI is a widely-used standard for high performance computing and therefore can make Grid applications development more accessible to programmers with parallel computing skills. The first MPI standard (MPI-1) was released in 1994 and since then many libraries and tools have been developed based on it. There are several grid-enabled MPI libraries [1: MagPIe, MPICH-G2, MPI Connect, MetaMPICH, Stampi, PACX-MPI, etc. Many of these implementations allow the grouping of multiple machines potentially based on heterogeneous architectures for the execution of MPI programs and the use of vendor-supplied MPI libraries over high performance networks for intramachine messaging. All of the available libraries support coallocation as well as process synchronization, but don t provide any execution management tools. Thus, users must explicitly specify resources to be used for the execution and directly handle possible failures. Among these libraries, MPICH-G2 [2 is one of the most complete and widespread and many of its features are very useful in a Grid environment. MPICH-G2 provides the user with an advanced interconnecting topology description with multiple levels of depth, that gives a detailed view of the underlying executing environment. It uses the Globus Security Infrastructure (GSI) [3 for authorization and authentication, the Grid Resource Allocation and Management (GRAM) protocol [4 for resource allocation the DUROC library [5 for process synchronization. The adoption of job management systems like Condor-G [6 and the DataGrid WMS [7 has facilitated the use of the Grid for sequential and parallel applications execution. These systems accept user execution requests consisting of input files, execution binaries, environment requirements and other parameters without necessarily naming the specific resources to run the job. They transparently handle resource selection, resource allocation, file transfers, application monitoring and failure events on the users behalf. Moreover during the life cycle of the jobs, users can query the systems to retrieve detailed information about the current and past status of their jobs. Therefore the use of a job management system hides the complexity of the Grid by raising the architecture abstraction level. Existing management systems don t allow the execution of multi-site parallel applications because they don t handle requests for multiple resources within a single job

2 execution. Furthermore the available systems don t interact with existing grid-enabled MPI libraries for process synchronization. We propose an execution management tool called MPI Job Management System (MJMS) based on the Globus Toolkit [8, the MPICH-G2 library and Condor-G for the effective, reliable and secure execution of multi-site parallel applications in the Grid environment. II. MJMS: Multi-site Jobs Management System MJMS manages the execution of parallel MPI applications on multiple resources in a Grid environment. It interacts with the Condor-G daemons exploiting the Condor-G system features for job execution management and uses the services provided by DUROC to handle job synchronization across sites. The Condor-G system has many features that are useful for our purposes: it is capable of interacting with the GRAM service provided by the Globus toolkit for job submission and monitoring; it provides a high-level language called ClassAds (Classified Advertisements) [9 to describe job requirements and preferences as well as Grid resources characteristics; it has some fault-tolerance features; it can manage distributed I/O operations; it provides a robust mechanism called Matchmaking to match a job request with a computing resource. However, the Condor-G system has three limitations that prevent its use for our class of applications: it does not support the execution of MPICH-G2 jobs; its Matchmaking mechanism is limited to match each single job request with a single resource precluding co-allocation while we need to request a consistent collection of resources for a single MPI execution; the condor collector daemon, responsible for collecting all the information about the status and the characteristics of the available Grid resources, doesn t interact with the Globus Information System (MDS) [10. The system we propose allows the users to submit execution requests for multi-site parallel applications by specifying requirements and preferences in a high-level language (following the Condor submit file syntax, figure 1). Requirements and preferences should reflect the application needs in terms of computational and communication costs located by the user. Parallel and multi-site parallel application requirements and preferences may be different from those of sequential applications. In fact parallel application performance can depend both on communication and computational costs. For this reason users should be able to specify network characteristics such as bandwidth and latency alongside job requirements and preferences as well as CPU performance. A user can utilize multiple Grid resources for a single MPI job execution by splitting her application into subjobs. MJMS assigns each subjob to a single Grid resource. MJMS accepts the user submit description file and locates a matching pool of available resources among those published by the Globus Information System, according to the application requirements and preferences. (subjob1) universe = globus executable = bcg_dist arguments = s3rmt3m3.mtx 3 bs3rmt3m3.mtx machine_count = 17 should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = s3rmt3m3.mtx,s3rmt3m3_rhs.mtx output = outfile.$(cluster) error = errfile.$(cluster) log = logfile.$(cluster) environment = P4_SETS_ALL_ENVVARS=1; MGF_PRIVATE_SLAN=1; LD_LIBRARY_PATH=/opt/globus-2.4.3/lib/ globusrsl = (jobtype=mpi) (count=17) (label=subjob 0) requirements = (OpSys == LINUX && Arch == i686 ) && (ClusterNodeCount >= 17) rank = 10000/ClusterNetLatency + 10*ClusterCPUsSpecFloat queue (subjob2) universe = globus executable = bcg_dist arguments = s3rmt3m3.mtx 3 bs3rmt3m3.mtx machine_count = 4 should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = s3rmt3m3.mtx,s3rmt3m3_rhs.mtx output = outfile.$(cluster) error = errfile.$(cluster) log = logfile.$(cluster) environment = P4_SETS_ALL_ENVVARS=1; MGF_PRIVATE_SLAN=1; LD_LIBRARY_PATH=/opt/globus-2.4.3/lib/ globusrsl = (jobtype=mpi) (count=4) (label=subjob 17) requirements = (OpSys == LINUX && Arch == i686 ) && (ClusterNodeCount >= 4) queue (subjob3) universe = globus executable = bcg_dist arguments = s3rmt3m3.mtx 3 bs3rmt3m3.mtx machine_count = 14 should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = s3rmt3m3.mtx,s3rmt3m3_rhs.mtx output = outfile.$(cluster) error = errfile.$(cluster) log = logfile.$(cluster) environment = P4_SETS_ALL_ENVVARS=1; MGF_PRIVATE_SLAN=1; LD_LIBRARY_PATH=/opt/globus-2.4.3/lib/ globusrsl = (jobtype=mpi) (count=14) (label=subjob 21) +Subnet1 = subjob1.subnet requirements = (OpSys == LINUX && Arch == i686 ) && (ClusterNodeCount >= 14) rank = 10*ClusterCPUsSpecFloat queue Fig. 1. MJMS submit description file example. For resource location MJMS uses Gangmatching [11 an extended version of the Condor Matchmaking [9 capable of matching multiple resource requests within a single job execution request. Once compatible resources have been located, MJMS submits subjobs to the Condor-G system specifying target resources for each subjob. Process synchronization is achieved by a Coordinator barrier. Each process placed in execution stops at the barrier. Once all processes have been successfully started,

3 MJMS starts the computation by releasing the barrier. Input and output file staging and subjob logging is handled by the Condor-G system. MJMS system architecture, depicted in figure 2, consists of three main components: Coordinator GangMatchMaker Advertiser A. Coordinator Fig. 2. MJMS architecture The Coordinator accepts user submit description files, invokes the GangMatchmaker to locate a pool of matching resources and delegates the management of the subjobs execution to the Condor-G system. A MJMS job may be in one of the following states: ACCEPTED the job has been accepted by the system; MATCHED a pool of matching resources has been located; UNMATCHED no set of resources matches the job request; BARRIERED not all subjobs are yet held at the DUROC-controlled barrier; RUNNING the subjobs have been released from the barrier. When the job is ACCEPTED all its subjobs are enqueued and held in the Condor-G queue. Target Grid resources have not yet been identified. The DUROC services are initialized and a synchronization barrier for all the subjobs is implemented by modifying subjob environment variables. The synchronization method depends on the grid-enabled MPI library used: MPICH-G2 process synchronization is implemented through the DUROC library which uses environment variables. Coordinator creates a single ClassAd containing Requirements and Rank of all subjobs as stated in the corresponding submit description file. The resulting ClassAd is sent to the GangMatchMaker that is responsible for locating a pool of matching Grid resources. If GangMatchMaker doesn t find compatible resources the job state becomes UNMATCHED otherwise it becomes MATCHED and Coordinator sets the target Grid resources. This is done by modifying the subjob attribute GlobusResource with the Gatekeeper contact string of the matching resource. Coordinator then releases the subjobs from the Condor-G queue and the job state becomes BARRIERED. Once all subjobs are running on the selected resources, the Coordinator starts the computation by releasing the synchronization barrier. The job state becomes RUNNING and the execution management of all subjobs is from this point onwards in charge of the Condor-G system. B. GangMatchMaker The GangMatchMaker uses the Gangmatching model to locate resources for multi-site job execution. In the context of a multi-site parallel application co-allocation problem we need a matchmaking scheme to be able to match the subjobs execution requests and the available resources by simultaneously taking into account subjobs requirements, preferences and interdependencies. Moreover each subjob has to be matched with a different resource. The Gangmatching model is a possible solution to this multilateral matchmaking problem. It is an extended version of the Condor bilateral Matchmaking which makes use of the ClassAds library for job and resource description. The ClassAd representing the job execution request is generated by the Coordinator starting from the submit description file. A job ClassAd example for the multi-site parallel application problem generated from the submit description file of figure 1 is illustrated in figure 3. The most notable feature of this example is the Ports attribute. In the Gangmatching model the Matchmaking bilateral algorithm occurs between ports of ClassAds instead of entire ClassAds. The Ports attribute defines the number and characteristics of matching offers required for that ClassAd to be satisfied [11. In the above example requirements on port matching are not influenced by the attributes of the other ports. One could require some port Requirements to be influenced by the attributes of some preceding port. As a consequence of the label scoping the dependency relations between ports are anyway limited by the fact that labels of successive ports may not be referred to by preceding ports. Example ClassAds of resource offers are illustrated in figures 4, 5, 6. The resources ClassAds are very similar to their bilateral matchmaking counterpart except for the presence of

4 [ Type = Job ; [ Label= subjob1 ; Requirements = subjob1.type == Machine && subjob1.arch == i686 && subjob1.opsys == LINUX && subjob1.clusternodecount >= 17; Rank = / subjob1.clusternetmpi0latency + 10 * subjob1.clustercpusspecfloat;, [ Label= subjob2 ; Requirements = subjob2.type == Machine && subjob2.arch == i686 && subjob2.opsys == LINUX && subjob2.clusternodecount >= 4;, [ Label= subjob3 ; Subnet1 = subjob1.subnet; Requirements = subjob3.type == Machine && subjob3.arch == i686 && subjob3.opsys == LINUX && subjob3.clusternodecount >= 14; Rank = 10 * subjob3.clustercpusspecfloat; Fig. 3. ClassAd example describing a multi-site parallel job. [ MyType = Machine ; Name = vega.na.icar.cnr.it ; gatekeeper_url = vega.na.icar.cnr.it:2122 ; Arch = i686 ; OpSys = LINUX ; Subnet = ClusterNodeCount = 17; ClusterCPUType = Intel(R) Pentium(R) 4 CPU 1500MHz ; ClusterNetMPIBandwidth = 88 ; ClusterCPUsSpecFloat = 558 ; ClusterCPUsSpecInt = 535 ; ClusterNetiMPI0Latency = 38 ; [ Label = subjob Fig. 4. ClassAd extract describing a cluster named Vega. [ MyType = Machine ; Name= beocomp.dma.unina.it ; gatekeeper_url= beocomp.dma.unina.it:2122 ; Arch= i686 ; OpSys= LINUX ; Subnet = ClusterNodeCount=14; ClusterCPUType= Pentium II (Deschutes) ; ClusterNetMPIBandwidth= 82 ; ClusterCPUsSpecFloat= 13 ; ClusterCPUsSpecInt= 17 ; ClusterNetMPI0Latency= 65 ; [ Label = subjob Fig. 6. ClassAd extract describing a cluster named Beocomp. resources ClassAds through a specific port. The subjob3 port can refer to an attribute of the resource ClassAd already matched with a preceding port. It conveys the value of this attribute to the resources ClassAds through the P orts[2.attribute syntax. For example the job ClassAd of figure 3 conveys the value of the attribute Subnet of the resource ClassAd already matched with the subjob1 port to the resources ClassAds through the Ports[2.Subnet1 syntax. A user can request that subjob1 and subjob3 run on resources belonging to the same LAN by adding the expression subjob3.subnet == Subnet1 among Requirements. The algorithm implemented by the GangMatchMaker is the simple naive algorithm [11. It marshals a consistent gang of ClassAds for the job ClassAds in the system. In our example, the gang consists of four ads: a job and three clusters. The match for that example consists of the tree illustrated in figure 7. [ MyType = Machine ; Name= altair.dma.unina.it ; gatekeeper_url= altair.dma.unina.it:2122 ; Arch= i686 ; OpSys= LINUX ; Subnet = ClusterNodeCount=11; ClusterCPUType= Pentium Pro ; ClusterNetMPIBandwidth= 74 ; ClusterCPUsSpecFloat= 6 ; ClusterCPUsSpecInt= 8 ; ClusterNetMPI0Latency= 117 ; [ Label = subjob Fig. 5. ClassAd extract describing a cluster named Altair. Beocomp Job Altair Vega ClassAd port Gangmatching Job Beocomp Altair Vega matched ports (bilateral match) gang Fig. 7. Gangmatching operation for our job example. a port to explicitly indicate its imperative to match only one request port. No Requirements are expressed in the resources ClassAds examples. Complex Requirements may be expressed in the offer resource ClassAd to implement sophisticated resource management policies. These Requirements could refer to some attributes of the job ports. The Gangmatching model allows the job request ClassAd to convey information about matched resources to the C. Advertiser The Advertiser is responsible for periodically sending ClassAds representing available resources status and characteristics to the GangMatchMaker and to the Condor-G system. The Advertiser gathers this information by periodically querying a set of Globus MDS servers, extracting information that is relevant to MJMS.

5 Parallel and multi-site parallel application generally have different requirements from those of sequential applications. In particular a user can be interested in information about the high-performance network of a parallel computing resource. For this reason a user should be able to specify in her job requirements and preferences conditional expressions with respect to the network characteristics (i.e. MPI communications bandwidth and latency for some packet size). The Information System provided by the Globus Toolkit doesn t supply this information, then we extended it by a custom schema and a corresponding custom Information Provider. A sample of information supplied by the MJMS Information Provider is shown in figure 8. system-network-vendor: Quadrics cluster-network-model: Elan cluster-network-version: 4 cluster-network-mpi-0-latency: 2.63 cluster-network-mpi-1024-latency: 6.32 cluster-network-mpi latency: cluster-network-mpi-64-bandwidth: cluster-network-mpi-1024-bandwidth: cluster-network-mpi bandwidth: Fig. 8. Network related information provided by the MJMS Information Provider Latency and bandwidth measurements are static information. We used values obtained by the Intel MPI Benchmark suite [12. The MJMS schema defines where to publish and to retrieve information in the MDS tree. D. The role of Condor-G The management of the subjobs execution through the Condor-G system continues as follow: The Condor-G Scheduler creates a new GridManager daemon to submit and manage all subjobs; The GridManager job submission request results in the creation of one Globus JobManager daemon on each Grid resource; The JobManager daemon connects to the GridManager using the Globus GASS service [13 in order to transfer the executable binary and standard input files, and to provide the streaming of standard output and error. The JobManager submits the subjob to the local management systems of the target resource; The JobManager sends back the subjob status updates to the GridManager, which in its turn updates the Scheduler. In order to allow Condor-G to make remote requests on behalf of the user a proxy credential has to be created. The Condor-G system uses the Globus GSI authentication and authorization system. This system make it possible to authenticate a user just once, so she need not reauthenticate herself each time a Condor-G computation management service access target remote resources. Condor-G is able to tolerate four types of failure events: crash of the Globus JobManager, crash of the machine that manages the target resource (on which GateKeeper and JobManager are executing), crash of the GridManager or of the machine on which the GridManager is executing, failures in the network connecting the two machines. The Condor-G is able to transfer any files needed by a job for the computation from the machine where MJMS submitted the subjobs to target resources. In the same way it can transfer output files back to MJMS. The user simply specifies when and which files have to be transmitted in the submit description file. III. Preliminary Experiences MJMS was used to run an iterative solver of sparse linear systems of equations with multiple right-hand sides in the Grid environment. In this context the Block version of the Conjugate Gradient algorithm (BCG) [14 becomes attractive. In fact with respect to the standard Conjugate Gradient parallel algorithm, it reduces the number of synchronization points and increases the size of messages improving latency tolerance. Therefore it is more suited for a distributed Grid environment. The BCG algorithm simultaneously produces a solution vector for each right-hand side vector. It consists of eight concurrent tasks with different computational complexity. To correctly balance the computational load we introduced a two-level parallelism schema in its implementation. The first level reflects the task decomposition while the second is introduced within the most computationally intensive tasks. The resulting multisite job consists of different subjobs each grouping one or more tasks. Each subjob must be assigned to a computing resource. For the implementation we used a grid-enabled MPI library that extend MPICH-G2 called MGF [15 which allows transparent and efficient usage of Grid resources with a particular attention to clusters with private networks. We used MJMS to assign the subjobs to the available parallel computing resources according to their computational costs and inter-task communication costs. This let us to perform a better resource workload balancing that improved the overall speedup performance. IV. Discussion and future work MJMS was designed on top of the Condor-G System which is a personal desktop agent. Even so its features can be useful in a multi-institution multi-user environment. The DataGrid WorkLoad Management System which is based on the Condor-G system, is used in the EGEE/LCG production Grid [16 by thousands of users and hundreds of institutions and is proof of Condor-G s flexibility. The use of MJMS in a multi-user and multi-institutional environment can be achieved, as done for EGEE/LCG Grid, by manging resource selection on the basis of Virtual

6 Organization (VO) membership and using a system to verify user VO association like VOMS [17. An important issue concerning the management of multi-site parallel applications that emerged during MJMS implementation is how to handle possible subjob or process failures. A robust management system should allow application fault recovery. A possible way to handle faults for MPI applications, easily integrable in our system, is by providing process level fault-tolerance at the MPI API level like that provided by FT-MPI [18. Another important issue related to the use of our system in a real production Grid is the interaction with the local batch management systems. The GangMatchMaker should take care of the batch systems queues when matching subjobs with resources. Queues characteristics and status can be published by the Globus Information System and therefore easily included in the ClassAd representing the computing resource. Resource selection becomes hard when there are not enough free resources available and some subjobs need to be queued on the local batch systems before running. MJMS implementation has been based on MPICH-G2 and pre-ws GRAM but its design is independent of the MPI implementation and Grid technology. Therefore MJMS implementation can be extended in order to support other grid-enabled MPI libraries and other Grid systems used in Condor-G (NorduGrid, Unicore, WS GRAM and remote Condor pools). A limitation with the ClassAd Gangmatching used in the MJMS is that the decomposition of a job into subjobs is constrained. The number of subjobs is fixed in the job ClassAd. GangMatchMaker should allow an aggregate matching policy to increase scheduling flexibility. A proposed extension to Gangmatching called Set-Matching [19 would overcome that limitation. With Set-Matching, a job ClassAd can request to be matched with an unspecified number of resource ClassAds. In addition to Requirements on individual resources ClassAds, the job ClassAd can place Requirements on aggregate properties of the set of matching resources. For example, the job ClassAd can request a certain number of total CPUs and the MatchMaker will add resources to the set until that number is reached. V. Conclusions One of the interesting and exciting fields of application of Grid Computing has always been High-Performance Computing (HPC). Multi-site parallel applications can achieve high performance rates by grouping HPC resources for a single application execution. Therefore we argue that this class of applications can effectively use the Grid for HPC. The use of Globus Grid technologies with their middleware design has given a uniform access to computing resources, while the use of MPI standard has facilitated application development. MJMS was developed to support multi-site application execution and testing. Our system has all basic features for job management: it allows the submission of multi-site jobs to the Grid and raises the abstraction level by transparently handling the job throughout its entire life cycle. MJMS has been designed to be extended in order to support some additional features like fault-tolerance and multi-user usage and to be independent of any particular grid-enabled MPI implementation and Grid technology. VI. Acknowledgment The authors would like to thank the Condor Team at the Computer Sciences Department of the University of Wisconsin-Madison. References [1 C. Lee, S. Matsuoka, D. Talia, A. Sussmann, M. Mueller, G. Allen, and J. Saltz, A grid programming primer, [2 N. T. Karonis, B. Toonen, and I. Foster, Mpich-g2: a gridenabled implementation of the message passing interface, J. Parallel Distrib. Comput., vol. 63, no. 5, pp , [3 I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke, A security architecture for computational grids, in Proceedings of the 5th ACM conference on Computer and communications security. ACM Press, 1998, pp [4 K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke, A resource management architecture for metacomputing systems, Lecture Notes in Computer Science, vol. 1459, pp. 62??, [5 K. Czajkowski, I. T. Foster, and C. Kesselman, Resource coallocation in computational grids, in HPDC, [6 J. Frey, T. Tannenbaum, I. Foster, M. Livny, and S. Tuecke, Condor-G: A computation management agent for multiinstitutional grids, in Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing (HPDC), San Francisco, California, August 2001, pp [7 G. Avellino, S. Beco, B. Cantalupo, A. Maraschini, F. Pacini, M. Sottilaro, A. Terracina, D. Colling, F. Giacomini, E. Ronchieri, A. Gianelle, M. Mazzucato, R. Peluso, M. Sgaravatto, A. Guarise, R. M. Piro, A. Werbrouck, D. Kouril, A. Krenek, L. Matyska, M. Mulac, J. Pospísil, M. Ruda, Z. Salvet, J. Sitera, J. Skrabal, M. Vocu, M. Mezzadri, F. Prelz, S. Monforte, and M. Pappalardo, The datagrid workload management system: Challenges and results. J. Grid Comput., vol. 2, no. 4, pp , [8 I. Foster and C. Kesselman, Globus: A metacomputing infrastructure toolkit, The International Journal of Supercomputer Applications and High Performance Computing, vol. 11, no. 2, pp , Summer [9 R. Raman, M. Livny, and M. Solomon, Matchmaking: Distributed resource management for high throughput computing, in Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing (HPDC7), Chicago, IL, July [10 K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, Grid information services for distributed resource sharing, [11 R. Raman, M. Livny, and M. Solomon, Policy driven heterogeneous resource co-allocation with gangmatching, in HPDC 03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC 03). Washington, DC, USA: IEEE Computer Society, 2003, p. 80. [12 Intel mpi benchmark suite, Code and documentation available from downloads/mpibenchmarks.htm. [13 J. Bester, I. Foster, C. Kesselman, J. Tedesco, and S. Tuecke, GASS: A data movement and access service for wide area computing systems, in Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems. Atlanta, GA: ACM Press, 1999, pp

7 [14 D. P. O Leary, Parallel implementation of the block conjugate gradient algorithm. Parallel Computing, vol. 5, no. 1-2, pp , [15 F. Gregoretti, G. Laccetti, A. Murli, G. Oliva, and U. Scafuri, Mgf: A grid-enabled mpi library with a delegation mechanism to improve collective operations. in PVM/MPI, ser. Lecture Notes in Computer Science, B. D. Martino, D. Kranzlmüller, and J. Dongarra, Eds., vol Springer, 2005, pp [16 D. Kranzlmüller, The egee project - a multidisciplinary, production-level grid. in HPCC, ser. Lecture Notes in Computer Science, L. T. Yang, O. F. Rana, B. D. Martino, and J. Dongarra, Eds., vol Springer, 2005, p. 2. [17 R. Alfieri, R. Cecchini, V. Ciaschini, L. dell Agnello, Á. Frohner, A. Gianoli, K. Lörentey, and F. Spataro, Voms, an authorization system for virtual organizations. in European Across Grids Conference, ser. Lecture Notes in Computer Science, F. F. Rivera, M. Bubak, A. G. Tato, and R. Doallo, Eds., vol Springer, 2003, pp [18 E. Gabriel, G. Fagg, A. Bukovsky, T. Angskun, and J. Dongarra, A fault tolerant communication library for grid environments, in Proceedings of the 17th Annual ACM International Conference on Supercomputing (ICS 03), International Workshop on Grid Computing, [19 C. Liu, L. Yang, I. Foster, and D. Angulo, Design and evaluation of a resource selection framework for grid applications, in In Proceedings of the 11th IEEE Symposium on High- Performance Distributed Computing, July 2002.

UNICORE Globus: Interoperability of Grid Infrastructures

UNICORE Globus: Interoperability of Grid Infrastructures UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057

More information

Bridging Secure WebCom and European DataGrid Security for Multiple VOs over Multiple Grids

Bridging Secure WebCom and European DataGrid Security for Multiple VOs over Multiple Grids 1 Bridging Secure WebCom and European DataGrid Security for Multiple VOs over Multiple Grids David W. O Callaghan and Brian A. Coghlan Department of Computer Science, Trinity College Dublin, Ireland. {david.ocallaghan,

More information

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4 An Overview of Grid Computing Workshop Day 1 : August 05 2004 (Thursday) An overview of Globus Toolkit 2.4 By CDAC Experts Contact :vcvrao@cdacindia.com; betatest@cdacindia.com URL : http://www.cs.umn.edu/~vcvrao

More information

Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid

Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid Cluster Abstraction: towards Uniform Resource Description and Access in Multicluster Grid Maoyuan Xie, Zhifeng Yun, Zhou Lei, Gabrielle Allen Center for Computation & Technology, Louisiana State University,

More information

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an

More information

Web-based access to the grid using. the Grid Resource Broker Portal

Web-based access to the grid using. the Grid Resource Broker Portal Web-based access to the grid using the Grid Resource Broker Portal Giovanni Aloisio, Massimo Cafaro ISUFI High Performance Computing Center Department of Innovation Engineering University of Lecce, Italy

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP

OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP OmniRPC: a Grid RPC facility for Cluster and Global Computing in OpenMP (extended abstract) Mitsuhisa Sato 1, Motonari Hirano 2, Yoshio Tanaka 2 and Satoshi Sekiguchi 2 1 Real World Computing Partnership,

More information

Grid Compute Resources and Grid Job Management

Grid Compute Resources and Grid Job Management Grid Compute Resources and Job Management March 24-25, 2007 Grid Job Management 1 Job and compute resource management! This module is about running jobs on remote compute resources March 24-25, 2007 Grid

More information

Multiple Broker Support by Grid Portals* Extended Abstract

Multiple Broker Support by Grid Portals* Extended Abstract 1. Introduction Multiple Broker Support by Grid Portals* Extended Abstract Attila Kertesz 1,3, Zoltan Farkas 1,4, Peter Kacsuk 1,4, Tamas Kiss 2,4 1 MTA SZTAKI Computer and Automation Research Institute

More information

DiPerF: automated DIstributed PERformance testing Framework

DiPerF: automated DIstributed PERformance testing Framework DiPerF: automated DIstributed PERformance testing Framework Ioan Raicu, Catalin Dumitrescu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction

More information

Globus Toolkit Firewall Requirements. Abstract

Globus Toolkit Firewall Requirements. Abstract Globus Toolkit Firewall Requirements v0.3 8/30/2002 Von Welch Software Architect, Globus Project welch@mcs.anl.gov Abstract This document provides requirements and guidance to firewall administrators at

More information

A Distributed Media Service System Based on Globus Data-Management Technologies1

A Distributed Media Service System Based on Globus Data-Management Technologies1 A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026,

More information

Managing CAE Simulation Workloads in Cluster Environments

Managing CAE Simulation Workloads in Cluster Environments Managing CAE Simulation Workloads in Cluster Environments Michael Humphrey V.P. Enterprise Computing Altair Engineering humphrey@altair.com June 2003 Copyright 2003 Altair Engineering, Inc. All rights

More information

First evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova

First evaluation of the Globus GRAM Service. Massimo Sgaravatto INFN Padova First evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova massimo.sgaravatto@pd.infn.it Draft version release 1.0.5 20 June 2000 1 Introduction...... 3 2 Running jobs... 3 2.1 Usage examples.

More information

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem

More information

Grid Architectural Models

Grid Architectural Models Grid Architectural Models Computational Grids - A computational Grid aggregates the processing power from a distributed collection of systems - This type of Grid is primarily composed of low powered computers

More information

IMAGE: An approach to building standards-based enterprise Grids

IMAGE: An approach to building standards-based enterprise Grids IMAGE: An approach to building standards-based enterprise Grids Gabriel Mateescu 1 and Masha Sosonkina 2 1 Research Computing Support Group 2 Scalable Computing Laboratory National Research Council USDOE

More information

Managing MPICH-G2 Jobs with WebCom-G

Managing MPICH-G2 Jobs with WebCom-G Managing MPICH-G2 Jobs with WebCom-G Padraig J. O Dowd, Adarsh Patil and John P. Morrison Computer Science Dept., University College Cork, Ireland {p.odowd, adarsh, j.morrison}@cs.ucc.ie Abstract This

More information

Resource Allocation in computational Grids

Resource Allocation in computational Grids Grid Computing Competence Center Resource Allocation in computational Grids Riccardo Murri Grid Computing Competence Center, Organisch-Chemisches Institut, University of Zurich Nov. 23, 21 Scheduling on

More information

A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations

A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations A Compact Computing Environment For A Windows PC Cluster Towards Seamless Molecular Dynamics Simulations Yuichi Tsujita Abstract A Windows PC cluster is focused for its high availabilities and fruitful

More information

Design and Implementation of Condor-UNICORE Bridge

Design and Implementation of Condor-UNICORE Bridge Design and Implementation of Condor-UNICORE Bridge Hidemoto Nakada National Institute of Advanced Industrial Science and Technology (AIST) 1-1-1 Umezono, Tsukuba, 305-8568, Japan hide-nakada@aist.go.jp

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why the Grid? Science is becoming increasingly digital and needs to deal with increasing amounts of

More information

CSF4:A WSRF Compliant Meta-Scheduler

CSF4:A WSRF Compliant Meta-Scheduler CSF4:A WSRF Compliant Meta-Scheduler Wei Xiaohui 1, Ding Zhaohui 1, Yuan Shutao 2, Hou Chang 1, LI Huizhen 1 (1: The College of Computer Science & Technology, Jilin University, China 2:Platform Computing,

More information

MSF: A Workflow Service Infrastructure for Computational Grid Environments

MSF: A Workflow Service Infrastructure for Computational Grid Environments MSF: A Workflow Service Infrastructure for Computational Grid Environments Seogchan Hwang 1 and Jaeyoung Choi 2 1 Supercomputing Center, Korea Institute of Science and Technology Information, 52 Eoeun-dong,

More information

A Parallel Programming Environment on Grid

A Parallel Programming Environment on Grid A Parallel Programming Environment on Grid Weiqin Tong, Jingbo Ding, and Lizhi Cai School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China wqtong@mail.shu.edu.cn Abstract.

More information

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID

Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Design of Distributed Data Mining Applications on the KNOWLEDGE GRID Mario Cannataro ICAR-CNR cannataro@acm.org Domenico Talia DEIS University of Calabria talia@deis.unical.it Paolo Trunfio DEIS University

More information

Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer

Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer David Abramson and Jon Giddy Department of Digital Systems, CRC for Distributed Systems Technology Monash University, Gehrmann

More information

Grid Compute Resources and Job Management

Grid Compute Resources and Job Management Grid Compute Resources and Job Management How do we access the grid? Command line with tools that you'll use Specialised applications Ex: Write a program to process images that sends data to run on the

More information

An Evaluation of Alternative Designs for a Grid Information Service

An Evaluation of Alternative Designs for a Grid Information Service An Evaluation of Alternative Designs for a Grid Information Service Warren Smith, Abdul Waheed *, David Meyers, Jerry Yan Computer Sciences Corporation * MRJ Technology Solutions Directory Research L.L.C.

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

GEMS: A Fault Tolerant Grid Job Management System

GEMS: A Fault Tolerant Grid Job Management System GEMS: A Fault Tolerant Grid Job Management System Sriram Satish Tadepalli Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements

More information

Formalization of Objectives of Grid Systems Resources Protection against Unauthorized Access

Formalization of Objectives of Grid Systems Resources Protection against Unauthorized Access Nonlinear Phenomena in Complex Systems, vol. 17, no. 3 (2014), pp. 272-277 Formalization of Objectives of Grid Systems Resources Protection against Unauthorized Access M. O. Kalinin and A. S. Konoplev

More information

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance Domenico Talia, Paolo Trunfio, and Oreste Verta DEIS, University of Calabria Via P. Bucci 41c, 87036

More information

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Computing Middleware. Definitions & functions Middleware components Globus glite Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid

More information

The GridWay. approach for job Submission and Management on Grids. Outline. Motivation. The GridWay Framework. Resource Selection

The GridWay. approach for job Submission and Management on Grids. Outline. Motivation. The GridWay Framework. Resource Selection The GridWay approach for job Submission and Management on Grids Eduardo Huedo Rubén S. Montero Ignacio M. Llorente Laboratorio de Computación Avanzada Centro de Astrobiología (INTA - CSIC) Associated to

More information

Abstract. Introduction. Application of Grid Techniques in the CFD Field

Abstract. Introduction. Application of Grid Techniques in the CFD Field P V Application of Grid Techniques in the CFD Field 1! 2 " # $%& ' '( " )*+,-.$ / 0 )12 3 " #2 4 $ 56 879 :3; WX '$WZY {P }V '(

More information

PBS PRO: GRID COMPUTING AND SCHEDULING ATTRIBUTES

PBS PRO: GRID COMPUTING AND SCHEDULING ATTRIBUTES Chapter 1 PBS PRO: GRID COMPUTING AND SCHEDULING ATTRIBUTES Bill Nitzberg, Jennifer M. Schopf, and James Patton Jones Altair Grid Technologies Mathematics and Computer Science Division, Argonne National

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID

A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID A Generalized MapReduce Approach for Efficient mining of Large data Sets in the GRID Matthias Röhm, Matthias Grabert and Franz Schweiggert Institute of Applied Information Processing Ulm University Ulm,

More information

Today s Readings, Part I

Today s Readings, Part I esource Information and Description Last Time: Applications» High Energy Physics Massive Data Analysis and Simulation» Encyclopedia of Life: Massive Computation» LEAD: Linked Environments for Atmospheric

More information

AssistConf: a Grid configuration tool for the ASSIST parallel programming environment

AssistConf: a Grid configuration tool for the ASSIST parallel programming environment AssistConf: a Grid configuration tool for the ASSIST parallel programming environment R. Baraglia 1, M. Danelutto 2, D. Laforenza 1, S. Orlando 3, P. Palmerini 1,3, P. Pesciullesi 2, R. Perego 1, M. Vanneschi

More information

Advanced Job Submission on the Grid

Advanced Job Submission on the Grid Advanced Job Submission on the Grid Antun Balaz Scientific Computing Laboratory Institute of Physics Belgrade http://www.scl.rs/ 30 Nov 11 Dec 2009 www.eu-egee.org Scope User Interface Submit job Workload

More information

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments*

Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Performance Analysis of Applying Replica Selection Technology for Data Grid Environments* Chao-Tung Yang 1,, Chun-Hsiang Chen 1, Kuan-Ching Li 2, and Ching-Hsien Hsu 3 1 High-Performance Computing Laboratory,

More information

A P2P Approach for Membership Management and Resource Discovery in Grids1

A P2P Approach for Membership Management and Resource Discovery in Grids1 A P2P Approach for Membership Management and Resource Discovery in Grids1 Carlo Mastroianni 1, Domenico Talia 2 and Oreste Verta 2 1 ICAR-CNR, Via P. Bucci 41 c, 87036 Rende, Italy mastroianni@icar.cnr.it

More information

Computational Mini-Grid Research at Clemson University

Computational Mini-Grid Research at Clemson University Computational Mini-Grid Research at Clemson University Parallel Architecture Research Lab November 19, 2002 Project Description The concept of grid computing is becoming a more and more important one in

More information

Knowledge Discovery Services and Tools on Grids

Knowledge Discovery Services and Tools on Grids Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid

More information

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Advanced School in High Performance and GRID Computing November Introduction to Grid computing. 1967-14 Advanced School in High Performance and GRID Computing 3-14 November 2008 Introduction to Grid computing. TAFFONI Giuliano Osservatorio Astronomico di Trieste/INAF Via G.B. Tiepolo 11 34131 Trieste

More information

MPICH-G2 performance evaluation on PC clusters

MPICH-G2 performance evaluation on PC clusters MPICH-G2 performance evaluation on PC clusters Roberto Alfieri Fabio Spataro February 1, 2001 1 Introduction The Message Passing Interface (MPI) [1] is a standard specification for message passing libraries.

More information

glite Grid Services Overview

glite Grid Services Overview The EPIKH Project (Exchange Programme to advance e-infrastructure Know-How) glite Grid Services Overview Antonio Calanducci INFN Catania Joint GISELA/EPIKH School for Grid Site Administrators Valparaiso,

More information

Replica Selection in the Globus Data Grid

Replica Selection in the Globus Data Grid Replica Selection in the Globus Data Grid Sudharshan Vazhkudai 1, Steven Tuecke 2, and Ian Foster 2 1 Department of Computer and Information Science The University of Mississippi chucha@john.cs.olemiss.edu

More information

Top-down definition of Network Centric Operating System features

Top-down definition of Network Centric Operating System features Position paper submitted to the Workshop on Network Centric Operating Systems Bruxelles 16-17 march 2005 Top-down definition of Network Centric Operating System features Thesis Marco Danelutto Dept. Computer

More information

Customized way of Resource Discovery in a Campus Grid

Customized way of Resource Discovery in a Campus Grid 51 Customized way of Resource Discovery in a Campus Grid Damandeep Kaur Society for Promotion of IT in Chandigarh (SPIC), Chandigarh Email: daman_811@yahoo.com Lokesh Shandil Email: lokesh_tiet@yahoo.co.in

More information

Application of Grid techniques in the CFD field

Application of Grid techniques in the CFD field Application of Grid techniques in the CFD field Xiaobo Yang and Mark Hayes Cambridge escience Centre, University of Cambridge Wilberforce Road, Cambridge CB3 0WA, UK Email: {xy216,mah1002}@cam.ac.uk Abstract

More information

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Grid services Dusan Vudragovic dusan@phy.bg.ac.yu Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Sep. 19, 2008 www.eu-egee.org Set of basic Grid services Job submission/management

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

Usage of LDAP in Globus

Usage of LDAP in Globus Usage of LDAP in Globus Gregor von Laszewski and Ian Foster Mathematics and Computer Science Division Argonne National Laboratory, Argonne, IL 60439 gregor@mcs.anl.gov Abstract: This short note describes

More information

Future Developments in the EU DataGrid

Future Developments in the EU DataGrid Future Developments in the EU DataGrid The European DataGrid Project Team http://www.eu-datagrid.org DataGrid is a project funded by the European Union Grid Tutorial 4/3/2004 n 1 Overview Where is the

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme Yue Zhang, Yunxia Pei To cite this version: Yue Zhang, Yunxia Pei. A Resource Discovery Algorithm in Mobile Grid Computing

More information

Tutorial 4: Condor. John Watt, National e-science Centre

Tutorial 4: Condor. John Watt, National e-science Centre Tutorial 4: Condor John Watt, National e-science Centre Tutorials Timetable Week Day/Time Topic Staff 3 Fri 11am Introduction to Globus J.W. 4 Fri 11am Globus Development J.W. 5 Fri 11am Globus Development

More information

GridSphere s Grid Portlets

GridSphere s Grid Portlets COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY 12(1), 89-97 (2006) GridSphere s Grid Portlets Michael Russell 1, Jason Novotny 2, Oliver Wehrens 3 1 Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut,

More information

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.

More information

CMS HLT production using Grid tools

CMS HLT production using Grid tools CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea Sciaba` (INFN Pisa) Massimo Sgaravatto (INFN Padova)

More information

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin

A Decoupled Scheduling Approach for the GrADS Program Development Environment. DCSL Ahmed Amin A Decoupled Scheduling Approach for the GrADS Program Development Environment DCSL Ahmed Amin Outline Introduction Related Work Scheduling Architecture Scheduling Algorithm Testbench Results Conclusions

More information

How to Run Scientific Applications over Web Services

How to Run Scientific Applications over Web Services How to Run Scientific Applications over Web Services email: Diego Puppin Nicola Tonellotto Domenico Laforenza Institute for Information Science and Technologies ISTI - CNR, via Moruzzi, 60 Pisa, Italy

More information

Layered Architecture

Layered Architecture The Globus Toolkit : Introdution Dr Simon See Sun APSTC 09 June 2003 Jie Song, Grid Computing Specialist, Sun APSTC 2 Globus Toolkit TM An open source software toolkit addressing key technical problems

More information

Minutes of WP1 meeting Milan (20 th and 21 st March 2001).

Minutes of WP1 meeting Milan (20 th and 21 st March 2001). Minutes of WP1 meeting Milan (20 th and 21 st March 2001). Meeting held INFN Milan, chaired by Francesco Prelz. Welcome (F. Prelz) Report from GGF1 (M. Sgaravatto) (http://www.pd.infn.it/~sgaravat/grid/report_amsterdam.ppt)

More information

Globus Toolkit 4 Execution Management. Alexandra Jimborean International School of Informatics Hagenberg, 2009

Globus Toolkit 4 Execution Management. Alexandra Jimborean International School of Informatics Hagenberg, 2009 Globus Toolkit 4 Execution Management Alexandra Jimborean International School of Informatics Hagenberg, 2009 2 Agenda of the day Introduction to Globus Toolkit and GRAM Zoom In WS GRAM Usage Guide Architecture

More information

visperf: Monitoring Tool for Grid Computing

visperf: Monitoring Tool for Grid Computing visperf: Monitoring Tool for Grid Computing DongWoo Lee 1, Jack J. Dongarra 2, and R.S. Ramakrishna 1 1 Department of Information and Communication Kwangju Institute of Science and Technology, Republic

More information

A Comparison of Conventional Distributed Computing Environments and Computational Grids

A Comparison of Conventional Distributed Computing Environments and Computational Grids A Comparison of Conventional Distributed Computing Environments and Computational Grids Zsolt Németh 1, Vaidy Sunderam 2 1 MTA SZTAKI, Computer and Automation Research Institute, Hungarian Academy of Sciences,

More information

THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE

THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE THE VEGA PERSONAL GRID: A LIGHTWEIGHT GRID ARCHITECTURE Wei Li, Zhiwei Xu, Bingchen Li, Yili Gong Institute of Computing Technology of Chinese Academy of Sciences Beijing China, 100080 {zxu, liwei, libingchen,

More information

High Performance Computing Course Notes Grid Computing I

High Performance Computing Course Notes Grid Computing I High Performance Computing Course Notes 2008-2009 2009 Grid Computing I Resource Demands Even as computer power, data storage, and communication continue to improve exponentially, resource capacities are

More information

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing Sanya Tangpongprasit, Takahiro Katagiri, Hiroki Honda, Toshitsugu Yuba Graduate School of Information

More information

An Engineering Computation Oriented Visual Grid Framework

An Engineering Computation Oriented Visual Grid Framework An Engineering Computation Oriented Visual Grid Framework Guiyi Wei 1,2,3, Yao Zheng 1,2, Jifa Zhang 1,2, and Guanghua Song 1,2 1 College of Computer Science, Zhejiang University, Hangzhou, 310027, P.

More information

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus UNIT IV PROGRAMMING MODEL Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus Globus: One of the most influential Grid middleware projects is the Globus

More information

EGEE and Interoperation

EGEE and Interoperation EGEE and Interoperation Laurence Field CERN-IT-GD ISGC 2008 www.eu-egee.org EGEE and glite are registered trademarks Overview The grid problem definition GLite and EGEE The interoperability problem The

More information

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: 1137-3601 revista@aepia.org Asociación Española para la Inteligencia Artificial España Kus, Waclaw; Burczynski, Tadeusz

More information

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004 A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into

More information

DiPerF: automated DIstributed PERformance testing Framework

DiPerF: automated DIstributed PERformance testing Framework DiPerF: automated DIstributed PERformance testing Framework Catalin Dumitrescu, Ioan Raicu, Matei Ripeanu, Ian Foster Distributed Systems Laboratory Computer Science Department University of Chicago Introduction

More information

PoS(EGICF12-EMITC2)081

PoS(EGICF12-EMITC2)081 University of Oslo, P.b.1048 Blindern, N-0316 Oslo, Norway E-mail: aleksandr.konstantinov@fys.uio.no Martin Skou Andersen Niels Bohr Institute, Blegdamsvej 17, 2100 København Ø, Denmark E-mail: skou@nbi.ku.dk

More information

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract.

GRB. Grid-JQA : Grid Java based Quality of service management by Active database. L. Mohammad Khanli M. Analoui. Abstract. Grid-JQA : Grid Java based Quality of service management by Active database L. Mohammad Khanli M. Analoui Ph.D. student C.E. Dept. IUST Tehran, Iran Khanli@iust.ac.ir Assistant professor C.E. Dept. IUST

More information

PROOF-Condor integration for ATLAS

PROOF-Condor integration for ATLAS PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline

More information

Implementing a Hardware-Based Barrier in Open MPI

Implementing a Hardware-Based Barrier in Open MPI Implementing a Hardware-Based Barrier in Open MPI - A Case Study - Torsten Hoefler 1, Jeffrey M. Squyres 2, Torsten Mehlan 1 Frank Mietke 1 and Wolfgang Rehm 1 1 Technical University of Chemnitz 2 Open

More information

Workload Management on a Data Grid: a review of current technology

Workload Management on a Data Grid: a review of current technology Workload Management on a Data Grid: a review of current technology INFN/TC-xx-yy 17th April 2001 Cosimo Anglano, Stefano Barale, Stefano Beco, Salvatore Cavalieri, Flavia Donno, Luciano Gaido, Antonia

More information

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018

What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018 What s new in HTCondor? What s coming? HTCondor Week 2018 Madison, WI -- May 22, 2018 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison

More information

ADAPTIVE GRID COMPUTING FOR MPI APPLICATIONS

ADAPTIVE GRID COMPUTING FOR MPI APPLICATIONS ADAPTIVE GRID COMPUTING FOR MPI APPLICATIONS Juemin Zhang and Waleed Meleis Department of Electrical and Computer Engineering Northeastern University Boston, MA 02115 {jzhang, meleis}@ece.neu.edu ABSTRACT

More information

Grid Resources Search Engine based on Ontology

Grid Resources Search Engine based on Ontology based on Ontology 12 E-mail: emiao_beyond@163.com Yang Li 3 E-mail: miipl606@163.com Weiguang Xu E-mail: miipl606@163.com Jiabao Wang E-mail: miipl606@163.com Lei Song E-mail: songlei@nudt.edu.cn Jiang

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

ETICS meta-data software editing - from check out to commit operations

ETICS meta-data software editing - from check out to commit operations Journal of Physics: Conference Series ETICS meta-data software editing - from check out to commit operations To cite this article: M-E Bégin et al 2008 J. Phys.: Conf. Ser. 119 042004 View the article

More information

Predicting the Performance of a GRID Environment: An Initial Effort to Increase Scheduling Efficiency

Predicting the Performance of a GRID Environment: An Initial Effort to Increase Scheduling Efficiency Predicting the Performance of a GRID Environment: An Initial Effort to Increase Scheduling Efficiency Nuno Guerreiro and Orlando Belo Department of Informatics, School of Engineering, University of Minho

More information

DataGrid TECHNICAL PLAN AND EVALUATION CRITERIA FOR THE RESOURCE CO- ALLOCATION FRAMEWORK AND MECHANISMS FOR PARALLEL JOB PARTITIONING

DataGrid TECHNICAL PLAN AND EVALUATION CRITERIA FOR THE RESOURCE CO- ALLOCATION FRAMEWORK AND MECHANISMS FOR PARALLEL JOB PARTITIONING DataGrid D EFINITION OF THE ARCHITECTURE, WP1: Workload Management Document identifier: Work package: Partner(s): Lead Partner: Document status WP1: Workload Management INFN INFN APPROVED Deliverable identifier:

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

High Throughput Urgent Computing

High Throughput Urgent Computing Condor Week 2008 High Throughput Urgent Computing Jason Cope jason.cope@colorado.edu Project Collaborators Argonne National Laboratory / University of Chicago Pete Beckman Suman Nadella Nick Trebon University

More information

GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid

GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid GridSAT Portal: A Grid Portal for Solving Satisfiability Problems On a Computational Grid Wahid Chrabakh University of California Santa Barbara Department of Computer Science Santa Barbara, CA chrabakh@cs.ucsb.edu

More information

An update on the scalability limits of the Condor batch system

An update on the scalability limits of the Condor batch system An update on the scalability limits of the Condor batch system D Bradley 1, T St Clair 1, M Farrellee 1, Z Guo 1, M Livny 1, I Sfiligoi 2, T Tannenbaum 1 1 University of Wisconsin, Madison, WI, USA 2 University

More information

Gergely Sipos MTA SZTAKI

Gergely Sipos MTA SZTAKI Application development on EGEE with P-GRADE Portal Gergely Sipos MTA SZTAKI sipos@sztaki.hu EGEE Training and Induction EGEE Application Porting Support www.lpds.sztaki.hu/gasuc www.portal.p-grade.hu

More information

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández

MPI SUPPORT ON THE GRID. Kiril Dichev, Sven Stork, Rainer Keller. Enol Fernández Computing and Informatics, Vol. 27, 2008, 213 222 MPI SUPPORT ON THE GRID Kiril Dichev, Sven Stork, Rainer Keller High Performance Computing Center University of Stuttgart Nobelstrasse 19 70569 Stuttgart,

More information

A Survey on Grid Scheduling Systems

A Survey on Grid Scheduling Systems Technical Report Report #: SJTU_CS_TR_200309001 A Survey on Grid Scheduling Systems Yanmin Zhu and Lionel M Ni Cite this paper: Yanmin Zhu, Lionel M. Ni, A Survey on Grid Scheduling Systems, Technical

More information

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture

Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Performance of DB2 Enterprise-Extended Edition on NT with Virtual Interface Architecture Sivakumar Harinath 1, Robert L. Grossman 1, K. Bernhard Schiefer 2, Xun Xue 2, and Sadique Syed 2 1 Laboratory of

More information