ARC middleware. The NorduGrid Collaboration

Size: px
Start display at page:

Download "ARC middleware. The NorduGrid Collaboration"

Transcription

1 ARC middleware The NorduGrid Collaboration Abstract The paper describes the existing components of ARC, discusses some of the new components, functionalities and enhancements currently under development, and outlines the architectural and technical choices that have been made to ensure scalability, stability and high performace. 1 Introduction Computational Grids are not a novel idea. The Condor r project [1] led the way in 1988 by setting the goal of creating a computing environment with heterogeneous distributed resources. Further efforts to create a layer for secure, coordinated access to geographically distributed computing and storage resources resulted in the appearance of the Globus Toolkit r [2] and its Grid Security Infrastructure (GSI) [3] in the late 1990 s. The words Grid and middleware were coined to describe the new phenomenon of the single-login access to widely distributed computing resources. Addressing the needs of the Nordic 1 research community, the NorduGrid collaboration [4] in 2002 started to develop a software solution for wide-area computing and data handling [5], making use of the by then de facto standard Globus Toolkit r 2. This new open source middleware is now called the Advanced Resource Connector (ARC). Scientific and academic computing in the Nordic countries has a specific feature in a way that it is carried out by a large number of small and medium size facilities of different kind and ownership. An important requirement from the very start of ARC development has been to keep the middleware portable, compact and manageable both on the server and the client side. The client package currently occupies only 14 Megabytes, it is available for most current Linux 2 distributions and can be installed at any available location by a nonprivileged user. For a computing service, only three main processes are needed: 1 Nordic states are: Denmark, Finland, Iceland, Norway and Sweden. 2 Linux is a trademark of Linus Torvalds. 2005

2 those of the file transfer service (GridFTP, see Section 3.1), the Grid Manager (see Section 3.3) and the Local Information Service (see Section 3.6.1). Recently, ARC development and deployment has been funded by the Nordic Natural Science Research Council (NOS-N) and the aim is to provide a foundation for a large-scale, multi-disciplinary Nordic Data Grid Facility [6]. ARC is being used as the middleware of choice for this facility and for many other Grid infrastructures, such as the Estonian Grid [7], Swegrid [8], DCGC [9], Swiss ATLAS Grid [10] and others. The middleware supports productionlevel systems being in a non-stop operation since 2002 [11], and has proven its record in demanding High Energy Physics computing tasks [12,13], being the first ever middleware to provide services to this kind of a massive world-wide data processing effort. This paper gives a comprehensive technical overview of the ARC middleware, its architecture and components the existing core ones as well as the recently introduced and beta-quality services. The rest of the paper is organised as follows. In Section 2, the basic architecture of ARC is presented. Section 3 discusses various services and Section 4 describes the client components and user interfaces. Sections 5 and 6 are more development oriented, discussing the developer interfaces and libraries, recent additions to ARC, and software dependencies and packaging, respectively. A summary is provided in Section 7. 2 Architecture The architecture and design of the ARC middleware has been driven by the requirements and specifics of the user community that initiated the project the Nordic ATLAS [14] groups. The basic set of high-level requirements is not different from those of any other community, be it users, resource owners or software developers: the system must be performant, reliable, scalable, secure and simple. However, the environment that led to creation of ARC has some specifics that can be outlined as follows: Excellent academic network (NORDUNet [15]), providing a solid basis for deployment of wide area computing architectures based on Grid technologies; A multitude of computing resources of various capacity, origin and ownership, effectively excluding a possibility for a centralized, uniform facility; Prevalence of trivially parallel, serial batch tasks, requiring Linux flavours of operating system; Necessity to operate with large amounts (Terabytes) of data stored in files 2

3 and accessed by the tasks via POSIX calls. Customers requirements impose strict rules for the middleware development. Independently of the nature of the customer an end-user or a resource owner the guiding principles are the following: (1) No single point of failure, no bottlenecks. (2) The Grid is self-organizing with no need for centralized management. (3) The Grid is robust and fault-tolerant, capable of providing stable roundthe-clock services for years. (4) Grid tools and utilities are non-intrusive, have small footprint, do not require special underlying system configuration and are easily portable. (5) No extra manpower is needed to maintain and utilize the Grid layer. (6) Tools and utilities respect local resource owner policies, in particular, security-related ones. ARC developers take a conscious approach of creating a solution that addresses the users needs while meeting the resource owners strict demands. This requires a delicate balance between the users wishes to get a transparent, single-login access to as much resources as possible world-wide, and the system administrators needs to keep unwanted customers and expensive or insecure services at bay. Simplicity is the key word: applied correctly, it ensures robustness, stability, performance and ease of control. Development of the ARC middleware follows the philosophy start with something simple that works for users and add functionality gradually. This allows ARC to provide customers with a working setup on very early stages, and to define the priorities of adding functionality and enhancements based on the feed-back from the users. Middleware development was preceded by an extended period of try-outs of by then available solutions and collecting feed-back from the users. This process identified a set of more specific user requirements, described in the following Section. 2.1 Requirements Perhaps the most important requirement that affects ARC architecture stems from the fact that most user tasks deal with massive data processing. A typical job expects that a certain set of input data is available via a POSIX open call. Moreover, such a job produces another set of data as a result of its activity. These input and output data might well reach several Gigabytes in size. The most challenging task of the Grid is thus not just to find free cycles for a job, but to ensure that the necessary input is staged in for it, and all the output is properly staged out. To date, ARC is the only Grid solution which adequately 3

4 addresses this demand. Another important requirement is that of a simple client that can be easily installed at any location by any user. Such a client must provide all the functionality necessary to work in the Grid environment and yet remain small in size and simple in installation. The functionality includes, quite obviously, Grid job submission, monitoring and manipulation, possibility to declare and submit batches of jobs with minimal overhead, and means to work with the data located on the Grid. The client must come primarily as a lightweight command line interface (CLI), with an optional extension to a GUI or a portal. The requirement of CLI portability and non-root installation procedure practically implies its distribution as a binary archived package (tar, gzip). A user must be able to use clients installed in different workstations interchangeably, without the necessity to instantiate them as permanently active services. Naturally, configuration process must be reduced to an absolute minimum, with the default configuration covering most of the use cases. Last, but not least, the clients must remain functional in a firewalled environment. There is another perspective on the Grid: that of the resource owners. The idea of providing rather expensive and often sensitive services to a set of unknown users worldwide is not particularly appealing to the system administrators, facility owners and funding bodies, whose primary task is to serve local user communities. A set of strict agreements defining a limited user base and specific services results effectively in a limited and a highly regulated set of dedicated computing resources, only remotely resembling the World Wide Grid idea. NorduGrid s approach is to provide resource owners with a non-invasive and yet secure set of tools, requiring minimal installation and maintenace efforts while preserving the existing structures and respecting local usage policies. Each resource owner that would be willing to share some resources on the Grid, has a specific set of requirements to be met by the middleware. The most common requests are: The solution must be firewall-friendly (e.g., minimize number of necessary connections). Services must have easy crash recovery procedures. Allowed Grid-specific resource usage (load, disk space) should be adjustable and manageable. Accounting possibilities must be foreseen. Services must be able to run using a non-privileged account. Single configuration procedure for site-specific services is highly desirable. 4

5 2.2 Design The ARC architecture is carefully planned and designed to satisfy the above mentioned needs of end-users and resource providers. The major effort is put to make the resulting Grid system stable by design. This is achieved by identifying three mandatory components (see Figure 1): (1) The Computing Service, implemented as a GridFTP-Grid Manager pair of services (see Sections 3.1 and 3.3). The Grid Manager (GM) is the heart of the Grid, instantiated at each computing resource s (usually a cluster of computers) front-end as a new service. It serves as a gateway to the computing resource through a GridFTP channel. GM provides an interface to the local resource management system, facilitates job manipulation, data management, allows for accounting and other essential funtions. (2) The Information System (see Section 3.6) serves as a nervous system of the Grid. It is realized as a hierarchichal distributed database: the information is produced and stored at each service (computing, storage), while the distributed system of Index Services maintains the list of known services. (3) The Brokering Client (see Section 4.2.1) the brain of the Grid, which is deployed as a client part in as many instances as users need. It is enabled with powerful brokering capabilities, being able to distribute the workload across the Grid. It also provides client functionality for all the Grid services, and yet is required to be lightweight and installable by any user in an arbitrary location in a matter of few minutes. Computing Services 1..* Mandatory elements Optional elements Logging Services 1..* Data Indexing Services 1..* Brokering Clients 1..* Resource Index Services 1..* Monitoring Clients 1..* User Databases 1..* Storage Services 1..* Fig. 1. Components of the ARC architecture. 5

6 In this scheme, the Grid is defined as a set of resources registering to the common information system. Grid users are those who are authorized to use at least one of the Grid resources by the means of Grid tools. Services and users must be properly certified by trusted agencies (Section 3.7). Users interact with the Grid via their personal clients, which interpret the tasks, query the Information System whenever necessary, discover suitable resources, and forward task requests to the appropriate ones, along with the user s proxy and other necessary data. If the task consists of job execution, it is submitted to a computing resource, where the Grid Manager interprets the job description, prepares the requested environment, stages in the necessary data, and submits the job to the local resource management system (LRMS). Grid jobs within the ARC system possess a dedicated area on the computing resource, called the session directory, which effectively implements limited sandboxing for each job. Location of each session directory is a valid URL and serves as the unique Job Identifier. In most configurations this guarantees that the entire contents of the session directory is available to the authorised persons during the lifetime of the Grid job. Job states are reported in the Information System. Users can monitor the job status and manipulate the jobs with the help of the client tools, that fetch the data from the Information System and forward the necessary instructions to the Grid Manager. The Grid Manager takes care of staging out and registering the data (if requested), submitting logging and accounting information, and eventually cleaning up the space used by the job. Client tools can also be used to retrieve job outputs at any time. The reliable performance of such a system is achieved by using several innovative approaches: Usage of the specialized GridFTP server (see Section 3.1) for job submission provides for job and resource management flexibility, full GSI benefits, firewall-friendliness, and a lightweight client. The Grid Manager carries most of the workload, thus enabling a perfect front-end Grid model, where no Grid-related software (other than the LRMS) has to be installed on cluster worker nodes. The Grid Manager is highly configurable. The data management functionality of the Grid Manager includes reliable file transfer, input data caching and registration of output data to data indexing services via built-in interfaces. The system has no central element or service: the information tree is multirooted, and matchmaking is performed by individual clients. The information system appropriately describes the natural Grid entities and resources in a dedicated LDAP schema. The information about Grid jobs, users and resource characteristics is kept in local LDAP databases at every resource and is not cached in higher level databases: the imperative is to always have fresh data, or not to have it at all. There is no intermediate service between the client and the resource: the 6

7 Brokering Client performs the matchmaking and actual Grid job submission, as well as providing clients for other Grid services. Users can share the clients, or have several user interfaces per user. All the services are stateful, allowing easy failure recovery and synchronisation, when necessary. Figure 1 shows several optional components, which are not required for an initial Grid setup, but are essential for providing proper Grid services. Most important are the Storage Services and the Data Indexing Services. As long as the users jobs do not manipulate large amonts of data, these are not necessary. However, when Terabytes of data are being processed, they have to be reliably stored and properly indexed, allowing further usage. Data are stored at Storage Elements (SE), which in ARC come in two kinds: the regular disk-based SE is simply a GridFTP interface to a file system (see Section 3.4), while the Smart Storage Element (SSE) is a Web service providing extended functionality, including file movement and automatic file indexing (see Section 3.5). File indexing itself needs a specially structured database with a proper GSI-enabled interface. ARC currently has an interface to several data indexing systems: Globus Replica Catalog (RC) [16], Globus Replica Location Service (RLS) [17] and glite Fireman [18]. RC and RLS were developed by the Globus project with the goal to facilitate cataloging and consequent locating of data sources. The original Replica Catalog was not GSI-enabled; however, ARC developers have added possibility to perform securely authenticated connections. Fireman is being development as a component of the glite [19] Data Management System. Another non-critical component that nevertheless often comes among the topranked in user requirements, is the Grid Monitor (Section 4.4). In the ARC solution, it is only a client without capabilities to manipulate resources or jobs, providing information only to the end-users. It relies entirely on the Information System, being basically a specialized user-friendly LDAP browser. Monitoring in ARC does not require special sensors, agents or dedicated databases: it simply re-uses the Information System infrastructure. While monitoring offers real-time snapshots of the system, historical information has to be persistified for further analysis, accounting, auditing and similar purposes. To provide for this, a job s usage record is submitted by the Grid Manager to a Logging Service (Logger) (see Section 6.2). Each group of users that desires to maintain such records can run a dedicated Logger. For all the structure described above to serve a particular user, this user should be known (authenticated) to the Grid and be authorised to use one or more services according to the policies adopted by the service providers. The ARC user management scheme is identical to most other Grid solutions: typically, Grid users are grouped in Virtual Organisations (VO), which provide their 7

8 member lists via various User Databases or through a Virtual Organization Membership Service (VOMS [20]). ARC services that require authorisation, such as the Computing Service or a Storage Service, subscribe to a subset of these databases (depending on local policies) and regularly retrieve the lists of authorised user credentials. The overall architecture does not impose any special requirements upon computing resources, apart from the preferable presence of a shared filesystem (e.g. NFS) between the head node and the compute (worker) nodes of a cluster. If there is no shared area, LRMS has to provide means for staging data between the head node and the compute nodes. Cluster-wise, ARC implements a pure front-end model: nothing has to be installed on the worker nodes. Being developed mainly for the purposes of serial batch jobs, ARC presently does not support a range of other computational tasks. Most notably, there is no solution for an interactive analysis, such as graphical visualisation or other kind of tasks requiring real-time user input. Although it attempts to look to an end-user not much different from a conventional batch system, ARC has a substantial difference: the results of a batch job do not appear at the user s local area, until the user fetches them manually. The ARC job submission client does not provide an automatic job resubmission in the case of trivial failures (such as network outages), neither does it implement job re-scheduling from one computing service to another. Higher level tools or Grid services can be developed to assist in such tasks (e.g., the Job Manager, see Section 6). ARC is not suitable for jobs that are expected to execute instantly: this would require distributed advance resource reservation, which is not implemented in the Grid world. The related shortcoming is that ARC provides no solution for cross-cluster parallel jobs. All these limitations do not stem from the architectural solution: rather, they would require creating new dedicated services, quite possibly in the framework of the current design. 3 Services ARC provides a reliable, lightweight, production quality and scalable implementation of fundamental Grid services such as Grid job submission and management, Grid resource management, resource characterization, resource aggregation, and data management. A Grid enabled computing resource runs a non-intrusive Grid layer composed of three main services: specialized GridFTP (Sections 3.1), Grid Manager (Section 3.3), and the Local Information Service (Section 3.6.1). The first two constitute the Computing Service as such. 8

9 A Grid enabled storage resource makes use of either of the two types of Grid storage layers provided by ARC: a conventional solution, based upon the GridFTP server (Section 3.4), or the Web service based (Section 3.2) Smart Storage Element (Section 3.5). Storage resources also run Local Information Services. Computing and storage resources are connected to the Grid via the registration service (Section 3.6.3) which links a resource to an Information Index Service (Section 3.6.2), this way implementing resource aggregation. 3.1 ARC GridFTP server GridFTP interface was chosen. ARC provides its own GridFTP server (GFS), with a GridFTP interface implemented using the Globus GSIFTP Control Connection application programming interface (API) of the Globus Toolkit r libraries. The reason for introducing this server was an absence of GridFTP servers with flexible authorization. Only available implementation was modified WU-FTPD FTP server distributed as part of Globus Toolkit r Ẇith whole configurutation based on UNIX identity of users, it was incapable of providing adequate authorization in distributed environement. GFS consists of two parts. The front-end part implements the GridFTP protocol, authentication, authorization and configuration of the service. The backend part implements the file system visible through the GridFTP interface. This solution also adds possibility to serve various kind of information through GridFTP interface, and to reduce number of transport protocols used in system. And hence, to decrease complexity of software. GFS does not implement all the latest features of the GridFTP protocols, but it is compatible with the tools provided by the Globus Toolkit r and other Grid projects based on it. The front-end part takes care of identifying the connecting client and applying access rules (see Section 3.7). In the configuration, a virtual tree of directories is defined, with directories assigned to various back-ends. The activation of the virtual directories is performed in the configuration, according to the Grid identity of a client. Therefore, each client may be served by various combinations of services. Information about Grid identity of user is propagated through whole chain down to back-ends. This make it possible to write back-end capable of making internal authorization decisions based entirely on Grid identity of connection client and avoid necessity to manage dedicated local user accounts. 9

10 The back-end part is implemented as a set of plugins. Each plugin is a dynamic library which implements a particular C++ class with methods corresponding to FTP operations needed to create a view of a virtual file system-like structure visible through the GridFTP interface. Currently, the ARC distribution comes with three plugins. Two of them, called fileplugin and gaclplugin, implement different ways for accessing local file systems (see Section 3.4 for description). Their virtual file system almost corresponds to the underlying physical one. The third plugin is called jobplugin: it generates a virtual view of session directories managed by the Grid Manager service and is described in Section HTTPSd framework ARC comes with a framework for building lightweight HTTP, HTTP over SSL (HTTPS) and GSI (HTTPg) services and clients. 3 It consists of C++ classes and a server implementation. These classes implement a subset of the HTTP protocol, including GET, PUT and POST methods, content ranges, and other most often used features. The implementation is not feature rich, instead, it is light-weight and targeted to transfer the content with little overhead. It has some support for gsoap [22] integrated in the implementation of the POST method, thus allowing creation and communication with Web Services. This framework does not support message-level security. The server implements configuration, authentication and authorization in a way common for all the network services of ARC. It listens to two TCP/IP ports for HTTPS and HTTPg connections. The configuration assigns various plugin services to paths. The services themselves are dynamic libraries which provide configuration and creation functions and supply an implementation of a C++ class responsible for handling HTTP requests. The client part is a thin layer which allows one to pump data through GET and PUT methods, and provides functionality adequate to the server part. The DataMove library, described in Section 5.2, uses its functionality to provide a multi-stream access to HTTP(S/g) content. ARC is currently being delivered with two HTTP(S/g) plugins. Those are two Web Services: the Smart Storage Element (Section 3.5) and the Logger (Section 6.2). Some others are being developed. Among them is a Web Service for job submission and control, intended to replace the jobplugin of the GFS. It also provides an interface for ordinary Web browsers to monitor the execution of Grid jobs. 3 For a summary of these mechanisms, see [21]. 10

11 3.3 Grid Manager The purpose of the ARC Grid Manager (GM) is to take necessary actions on a computing cluster front-end, needed to execute a job. In ARC, a job is defined as a set of input files, a main executable, the requested resources (including pre-installed software packages) and a set of produced files. The process of gathering the input files, executing the main executable, and transferring/storing the output files is carried out by GM (Figure 2). Submission Stage-in Stage-out Front-end GridFTP server Downloader Uploader File access Job control Grid Manager LRMS Computing node LRMS Cache Link or copy Job session directory NFS Job session directory Fig. 2. A typical setup of ARC Grid Manager GM processes job descriptions written in Extended Resource Specification Language (XRSL, see Section 4.1). Each job gets a directory on the execution cluster, the session directory. This directory is created by GM as a part of the job preparation process. The job is supposed to use this directory for all input and output data, and possibly other auxilliary files. All the file paths in the job description are specified relative to the session directory. There is no notion of the user s home directory, nor user ID, nor other attributes common in shell access environments. Jobs executed on ARC enabled clusters should not rely on anything specific to user account, not even on the existence of such an account. In most common installations, the space for the session directory resides in an area shared by means of the Network File System with all the computing nodes. In this case, all the data that a job keeps in the session directory is continuously available to the user through the jobplugin of the GFS, as described below. It is also possible to have a session directory prepared on the front-end computer and then moved to a computing node. This limits the amount of remotely available information during the job execution to that generated by GM itself. Input files are gathered in the session directory by the downloader module of GM. In the most common configuration this module is run by the GM directly 11

12 on the front-end. To limit the load of the front-end machine, it is possible to control the number of the simultaneous instances of the downloader modules and the number of downloadable files. GM can be instructed to run downloader on a computing node as a part of the job execution. Such a configuration is desirable on computing clusters with slow or small disk space on the front-end. GM supports various kinds of sources for input as well as for output files. These include pseudo-urls expressed in terms of data registered in Indexing Services. All types of supported URLs can be found in Section 5.2. A client can push files to the session directory as a part of the job preparation. This way, the user can run remotely a snapshot of an environment prepared and tested on a local workstation without the trouble of arranging space on storage services. GM handles a cache of input files. Every URL requested for download is first checked against the contents of the cache. If a file associated with a given URL is already cached, the remote storage is contacted to check the file s validity and the access permissions based on the user s credentials. If a file stored in cache is found to be valid, it is either copied to session directory, or a soft-link is used, depending on the configuration of GM. The caching feature can be turned off either by the resource administrator or by the user submitting the job. GM can automatically register the files stored in its cache to a Data Indexing Service. This information is then used by a brokering client (see Section 4.2) to direct jobs into a cluster with a high probability of fast access to input data. This functionality has proved to be very useful when running many jobs which process the same input data. It is possible to instruct GM to modify the files URLs and hence access them through more suitable protocols and servers. This feature makes it possible to have so called local Storage Elements in a way transparent to users. Together with the possibility to have this information published in the Information System (see Section 3.6), this may significantly reduce the job preparation costs. After the contents of the session directory has been prepared, GM communicates with the Local Resource Management System (LRMS) to run the main executable of the job. GM uses a set of back-end shell scripts in order to perform such LRMS operations as job submission, detection of the exit state of the main executable, etc. Currently supported LRMS types include various PBS [23] flavors, Condor [1] and SGE [24]. There is also support for job execution on the front-end, meant for testing purposes only. The output files generated by the job have to be listed in the job description 12

13 together with their final destinations, or indicated to be kept on the computing resource for eventual retrieval. After the job s main executable has stopped running, they are delivered to the specified destinations by the uploader module. The uploader is similar to the downloader described above. The session directories are preserved only temporarily after the session is finished. Hence, the user can not directly re-use any produced data in sequential jobs and has to take care of retrieving output files kept on the computing resource as soon as possible. Support for software runtime environments (RTE, see also Section 4.1) is also a part of GM. RTEs are meant to provide a unified access to pre-installed software packages. In the GM, support for RTE is made through shell scripts executed a few times during job duration: after the preparation phase of the job, before running the main executable, and after it has finished. The scripts may set up environments needed by pre-installed software packages, and even modify the job description itself. To standardize creation of new RTEs and to make it easier for users to find out how to use available RTEs, a registration portal was established [25] by the NorduGrid partner, the Nordic Data Grid Facility. A job passes through several states in GM, and each state can be assigned an external plugin. This way, the functionality of GM can be significantly extended with external components to include e.g. logging, fine-grained authorization, accounting, filtering, etc. GM itself does not have a network port. Instead, it reads the information about job requests from the local file system. This information is delivered through the GridFTP channel by using a customized plugin for the ARC GridFTP server, named jobplugin. Information about the jobs served by GM and content of their session directories is represented by the same plugin through the virtual file system. Session directories are mapped to directories, and actions to various FTP commands and files. Access to every piece of information is protected, and a flexible access control over the job data is possible. By default, every piece of information about the job is accessible only by a client which presents the same credentials as those used to submit a job. By adding GACL [26] rules to job description, a user can allow other users to see results of the job and even to manage it. Thus users can form groups to share responsibility for submitted jobs. GM has several other, less significant, features. Those include notification of job status changes, support for site local credentials through customizable plugins, reporting jobs to centralized logging service (see Section 6.2), etc. 13

14 3.4 Classic Storage Element Most Storage Elements on sites equipped with ARC are GridFTP servers. A significant number of them use ARC GridFTP server with one of the following plugins. The fileplugin presents the underlying file system in a way similar to ordinary FTP servers with file access based on a mapping between the client certificate subject and a local UNIX 4 identity. It is also possible to specify static access restrictions for pre-defined directories directly in the configuration of the GridFTP server (GFS). In addition, there is a way to completely disregard UNIX identities of the clients, thus relying only on capability of GFS to create virtual file system trees, or to mix both ways. This plugin is meant to be used as a replacement for an ordinary FTP server in user mapping mode, or for user communities which do not need complex dynamic access control inside dedicated storage pools. The gaclplugin uses GACL [26], an XML-based access control language, in order to control access to files and directories on a storage element. Each directory and file has an associated GACL document stored in a dedicated file. In order not to break the compatibility with the FTP protocol, modification of GACL rules is performed by downloading and uploading those files. GACL files themselves are not visible through FTP file listing commands. Each GACL document consists of entries with list of identities and access rules associated with them. There is no predefined set of supported types of identities. Currently, ARC can recognize a X.509 certificate subject (Distinguished Name), VOMS [20] names, groups, roles and capabilities, and membership of a predefined Virtual Organization (lists of members must be created by a third-party utility). GACL permissions control possibility to read and write objects, list the contents of directories or get information about stored file. The admin permission allows one to obtain and modify the contents of GACL documents. A GridFTP server with the gaclplugin is most convenient for providing flexible dynamic access control based purely on Grid identities. 4 UNIX is a trademark of the Open Group. 14

15 3.5 Smart Storage Element A Smart Storage Element (SSE) is a data storage service that performs a basic set of the most important data management functions without user intervention. It is not meant to be used in an isolated way. Instead, its main purpose is to form a data storage infrastructure together with similar SSEs and Data Indexing Services, such as the Globus r Replica Catalog or Replica Location Service. The main features of SSE and the data storage infrastructure which can be implemented using SSE are as follows (see also Figure 3): SSE can be used as a stand-alone service, but preferably should be a part of a full infrastructure and must be complemented with Data Indexing Services (DIS). DIS can be either a common purpose or an application specific one. Modular internal architecture of SSE allows adding (at the source code level) interfaces to previously unsupported DIS types. SSE has no internal structure for storing data units (files). Files are identified by their identity used in a DIS (Logical File Name, GUID, Logical Path, etc.) All operations on data units (creation, replication, deletion of replica, etc.) are done upon a single request from a client through SSE. Hence most operations are atomic from the client s point of view. Nevertheless, objects created during those operations do not always have final state and may evolve with time. For example, a created file can be filled with data, then validated, and then registered. Entry points to a DIS infrastructure should be used by clients only for operations not involving data storage units. If an operation involves any modification of SSE content which has to be registered in DIS, any communication to DIS will be done by the SSE itself. Access to data units in SSE is based on a Grid Identity of the client and is controlled by GACL rules, for flexibility. Data transfer between SSEs is done upon request from a client, by SSEs themselves (third-party transfers using pull model). An interface to SSE is based on SOAP over HTTPS/HTTPg (Web Services) with data transfer through pure HTTPS/HTTPg. It is implemented as a plugin for the HTTPSd framework (see Section 3.2). SOAP methods implemented in SSE allow it to: Create a new data unit and prepare it to be either filled by the client, or its contents to be pulled from an external source. The SSE can pull data both from inside of the infrastructure (replication) and from external sources. SSE 15

16 Indexing Services Infrastructure control Client download/upload Data IS Data IS registration Data IS SSE SSE data transfer SSE SSE Fig. 3. An example of an architecture of a Data Storage Infrastructure. uses the DataMove library (see Section 5.2) for data transfer and supports all the corresponding protocols. Get information about stored data units regular expression search for names is possible. Inspect and modify the access rules of data units. Update the meta data of stored units. This method is mostly used to provide checksums needed for content validation. Remove data units. Recently, a SRM v1.1 [27] interface has been implemented for SSE. Work to provide support for the current standard SRM v2.1.1 is in progress. 3.6 Information system The ARC middleware implements a scalable, production quality, dynamic, LDAP-based distributed information system via a set of coupled resource lists (Index Services) and local LDAP databases. Implementation of the ARC information system is based on OpenLDAP ([28], [29], [30]) and its modifications provided by the Globus r MDS framework [31,32]. NorduGrid plans to replace the Globus r LDAP modifications with native OpenLDAP functionalities. The system consists of three main components: (1) Local Information Services (LIS), (2) Index Services (IS), (3) and Registration Processes (RP) The components and their connections shown in the overview Figure 4 are described in the following subsections. 16

17 3.6.1 Local Information Service Local Information Services (LIS) are responsible for resource (computing or storage) description and characterization. We note that the description of Grid jobs running on a resource is also a part of the resource characterization: job status monitoring within ARC is done via querying the LIS components of the information system. The local information is generated on the resource, and optionally is cached locally within the LDAP server. Upon client requests, the information is presented to the clients via LDAP interface. A LIS is basically nothing more but a specially populated and customized OpenLDAP database. Figure 5 together with the description below gives a detailed account of the internal structure of the LIS. The dynamic resource state information is generated on the resource. Small and efficient programs, called information providers, are used to collect local state information from the batch system, from the local Grid layer (e.g., the Grid Manager or the GridFTP server), or from the local operating system (e.g., information available in the /proc area). Currently, ARC infoproviders are interfaced with the same systems as GM, namely the UNIX fork, the PBS family [23] (OpenPBS, PBS-Pro, Torque), Condor [1] and Sun Grid Engine (SGE) [24] batch systems. The output of the information providers (generated in LDIF format) is used to populate the LIS. A special OpenLDAP back-end, the Globus r GRIS [32], is used to store the LDIF output of the information providers. The custom GRIS back-end implements two functionalities: it is Fig. 4. Overview of the ARC information system components. Clients issue type-1 queries (dotted arrows) against Index Services which maintain a dynamic list of contact URLs of LISs and further Index Services registering to them (registration processes are marked by dashed arrows). Then clients perform type-2 queries (solid arrows) to discover resource properties stored in the LISs. 17

18 capable of caching the providers output, and upon client query it triggers the information providers unless the data is already available in the cache. The caching feature of the OpenLDAP back-end provides protection against overloading the local resource by continuously triggering the information providers. The information stored in the LISs follows the ARC information model. This model gives a natural representation of Grid resources and Grid entities. It is implemented via an LDAP schema by specifying an LDAP Directory Information Tree (DIT). The schema naturally describes Grid enabled clusters, queues and storage elements. The schema, unlike other contemporary Grid schemas, also describes Grid jobs and authorized Grid users. A detailed description of this information model is given in the online manual [33]. Fig. 5. Internal structure of the Local Information Service Index Services Index Services are used to maintain dynamic lists of available resources, containing the LDAP contact information of the registrants. A registrant can either be a LIS or another Index Service. The content of the Index Service, that is, the information about the registrants, is pushed by the registrants via the registration process and is periodically purged by the Index Service, this way maintaining a dynamic registrant list. Index Services are implemented as special purpose LDAP databases: the registration entries are stored as pseudo- LDAP entries in the Globus GIIS LDAP back-end [32]. Although the registration information is presented in a valid LDIF format, the registration entries do not constitute a legitimate LDAP tree, therefore they are referred to as the pseudo-ldap registration entries of the Index Service. A specially crafted LDAP query is required to obtain the registration entries. Besides the registrant tables, no any other information is stored in the ARC indices. The current implementation based on such misuse of the GIIS-backend is planned to be replaced by something more suitable: the simplistic functionality of maintaining a dynamic list of registrant tables could be trivially done e.g. in 18

19 native LDAP. Furthermore, ARC Index Service does not implement any internal query machinery, Index Services are basically plain lists of contact URLs of other Index Services and LISs Registration Processes, Topology Individual LISs and Index Services need to be connected and organized into some sort of a topological structure in order to create a coherent information system. ARC utilizes registration processes and Index Services to build a distributed information system of the individual LISs and Index Services. During the registration process, the registrant (lower level) sends its registration packet to an Index Service. The packet contains information about the host (the registrant) initiating the process and the type of the information service running on the registrant (whether it is a LIS or an IS). The registration message is basically an LDAP contact URL of the information service running on the registrant. Registrations are always initiated by the registrants (bottom-up model) and are performed periodically, thus the registration mechanism follows a periodic push model. Technically, the registrations are implemented as periodic ldapadd operations. The LISs and the Index Services of ARC are organized into a multi-level tree hierarchy via the registration process. The LISs that describe storage or computing resources represent the lowest level of the tree-like topology. Resources register to first level Index Services, which in turn register to Second level Services, and so forth. The registration chain ends at the Top Level Indices which represent the root of the hierarchy tree. The structure is built from bottom to top: the lower level always registers to the higher one. The tree-like hierarchical structure is motivated by the natural geographical organization, where resources belonging to the same region register under a region index, region indices are registering to the appropriate country index, while country indices are grouped together and register to the top level Grid Index Services. Besides the geographical structuring there are some Index Services which group resources by specific application area or organization. These application/organization Index Services either link themselves under a country Index or register directly to a Top Level Index. In order to avoid any single point of failure, ARC suggests a multi-rooted tree with several top-level Indices. Figure 6 shows a simplified schematic view of the hierarchical multirooted tree topology of ARC-connected resources and Index Services Operational overview Grid clients such as monitors, Web portals or user interfaces (Section 4) perform two types of queries against the Information System: 19

20 (1) During the resource discovery process clients query Index Services in order to collect list of LDAP contact URLs of Local Information Services describing Grid resources. (2) During a direct resource query the clients directly contact the LISs by making use of the obtained LDAP contact URLs. Once the client has collected the list of resource LDAP contact URLs (by performing type-1 queries against the Index Services), the second phase of the information collection begins: type-2 queries are issued against LISs to gather resource descriptions. Both types of queries are carried out and served via the LDAP protocol. Unlike other similar purpose systems (Globus Toolkit r 2 GIIS or Globus Toolkit r 4 aggregator [34], or R-GMA [35]), ARC has no service which caches or aggregates resource specific information on a higher level. ARC Index Services are not used to store resource information, they only maintain dynamic list of LDAP URLs. Figure 4 presents an overview of the ARC information system components showing relations between the Index Services, LISs registration processes and the two type of queries. Registrations Deprecated registrations α Indexing servers β Indexing servers γ Indexing servers ω Local information (cluster, storage) Fig. 6. Resources (LIS) and Index Services are linked via the registration process creating a hierarchical multi-rooted tree topology. The hierarchy has several levels: the ω lowest level (populated by LIS), the γ organisation/project level, the β country/international project level and the α top (root) Grid level. The tree is multi-rooted, such that there are several α-level Index Services and in bigger countries there are several β- or country-level Index Services. 20

21 3.7 Security framework of services The security model of the ARC middleware is built upon the Grid Security Infrastructure (GSI) [3]. Currently, for communication ARC services do not use message level security, but rather transport level security (see [21]). The ARC services that make use of authentication and authorization are those built on top of the GridFTP server and the HTTPSd framework. The authentication part makes use of Certificate Authorities (CA) signing policies, whereby sites can limit the signing namespace of a CA. Moreover, ARC provides a utility (grid-update-crls) to retrieve certificate revocation lists on a regular basis. In a basic and largely obsolete authorization model, ARC services support a simple X.509 certificate subject (Distinguished Name) mapping to UNIX users through the grid-mapfile. Automatic grid-mapfile creation can be done using the nordugridmap utility which can fetch Virtual Organization membership information from LDAP databases, and simple X.509 subject lists through FTP and HTTP(s). In addition, it is possible to authorize local users, specify filters and black-lists with this utility. For advanced authorization, most of the ARC services are capable of acting purely on Grid identity of the connecting client without relying on local identities. Those services support a unified set of rules which allows to identify users, group them and choose adequate service level. In addition to personal Grid identity of a user, other identification mechanisms are supported. Among them are the Virtual Organization Membership Service (VOMS) and general VO membership (through external information gathering tools). It is also possible to use external identification plugins, thus providing high level of flexibility. In addition to access level authorization, many services make use of the client Grid identities internally to provide fine-grained access control. Currently, all such services use GACL language for that purpose. Fine-grained access control is available in the GridFTP server s file access and job control plugins and in the SSE service. The job handling service the Grid Manager also has a pluggable authorization mechanism that can provide fine-grained authorization and accounting during job execution (see Section 3.3) through third-party plugins. 21

22 4 User interfaces and clients Users interact with the ARC services on the Grid resources using various client tools. With the help of the basic command line interface (CLI, Section 4.2), users can submit jobs to the Grid, monitor their progress and perform basic job and file manipulations. Jobs submitted through the command line interface must be described in terms of the Extended Resource Specification Language (Section 4.1). The basic command line interface for job handling is similar to a set of commands that can be found in a local resource management system. In addition to this, there are commands that can be used for basic data management, like creating and registering files on storage elements and administering access rights to the data. In addition to the CLI, the ARC client package includes some commands from the Globus Toolkit r, in particular, those needed for certificate handling and data indexing (see Section 4.3). Several other clients interacting with the ARC services are developed as well. An example of such is the ARC Grid Monitor (see Section 4.4) that presents the information from the Information System using a Web browser. 4.1 Resource Specification Language The XRSL language used by ARC to describe job options is an extension of the core Globus RSL 1.0 [36]. The extensions were prompted by the specifics of the architectural solution, and resulted in the introduction of new attributes. In addition, differentiation between the two levels of job options specifications was introduced. The user-side XRSL is a set of attributes specified by a user in a job description file. This file is interpreted and pre-processed by the brokering client (see Section 4.2), and after the necessary modifications is passed to the Grid Manager (GM). The GM-side XRSL is a set of attributes prepared by the client, and ready to be interpreted by the GM. The purpose of XRSL in ARC is therefore dual: it is used not only by users to describe job requirements, but also by the client and GM as a communication language. The major challenge for many applications is pre- and post-staging of considerable amount of files, often of a large size. To reflect this, two new attributes were introduced in XRSL: inputfiles and outputfiles. Each of them is a list of local-remote file name or URL pairs. Local to the submission node input 22

Architecture Proposal

Architecture Proposal Nordic Testbed for Wide Area Computing and Data Handling NORDUGRID-TECH-1 19/02/2002 Architecture Proposal M.Ellert, A.Konstantinov, B.Kónya, O.Smirnova, A.Wäänänen Introduction The document describes

More information

Grid Architectural Models

Grid Architectural Models Grid Architectural Models Computational Grids - A computational Grid aggregates the processing power from a distributed collection of systems - This type of Grid is primarily composed of low powered computers

More information

The NorduGrid Architecture and Middleware for Scientific Applications

The NorduGrid Architecture and Middleware for Scientific Applications The NorduGrid Architecture and Middleware for Scientific Applications O. Smirnova 1, P. Eerola 1,T.Ekelöf 2, M. Ellert 2, J.R. Hansen 3, A. Konstantinov 4,B.Kónya 1, J.L. Nielsen 3, F. Ould-Saada 5, and

More information

glite Grid Services Overview

glite Grid Services Overview The EPIKH Project (Exchange Programme to advance e-infrastructure Know-How) glite Grid Services Overview Antonio Calanducci INFN Catania Joint GISELA/EPIKH School for Grid Site Administrators Valparaiso,

More information

Information and monitoring

Information and monitoring Information and monitoring Information is essential Application database Certificate Certificate Authorised users directory Certificate Certificate Grid tools Researcher Certificate Policies Information

More information

The Grid Monitor. Usage and installation manual. Oxana Smirnova

The Grid Monitor. Usage and installation manual. Oxana Smirnova NORDUGRID NORDUGRID-MANUAL-5 2/5/2017 The Grid Monitor Usage and installation manual Oxana Smirnova Abstract The LDAP-based ARC Grid Monitor is a Web client tool for the ARC Information System, allowing

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms Grid Computing 1 Resource sharing Elements of Grid Computing - Computers, data, storage, sensors, networks, - Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem

More information

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy Why the Grid? Science is becoming increasingly digital and needs to deal with increasing amounts of

More information

EGEE and Interoperation

EGEE and Interoperation EGEE and Interoperation Laurence Field CERN-IT-GD ISGC 2008 www.eu-egee.org EGEE and glite are registered trademarks Overview The grid problem definition GLite and EGEE The interoperability problem The

More information

ATLAS NorduGrid related activities

ATLAS NorduGrid related activities Outline: NorduGrid Introduction ATLAS software preparation and distribution Interface between NorduGrid and Condor NGlogger graphical interface On behalf of: Ugur Erkarslan, Samir Ferrag, Morten Hanshaugen

More information

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries. for a distributed Tier1 in the Nordic countries. Philippe Gros Lund University, Div. of Experimental High Energy Physics, Box 118, 22100 Lund, Sweden philippe.gros@hep.lu.se Anders Rhod Gregersen NDGF

More information

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer computing platform Internal report Marko Niinimaki, Mohamed BenBelgacem, Nabil Abdennadher HEPIA, January 2010 1. Background and motivation

More information

Scientific data management

Scientific data management Scientific data management Storage and data management components Application database Certificate Certificate Authorised users directory Certificate Certificate Researcher Certificate Policies Information

More information

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments * Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments * Joesph JaJa joseph@ Mike Smorul toaster@ Fritz McCall fmccall@ Yang Wang wpwy@ Institute

More information

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus

UNIT IV PROGRAMMING MODEL. Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus UNIT IV PROGRAMMING MODEL Open source grid middleware packages - Globus Toolkit (GT4) Architecture, Configuration - Usage of Globus Globus: One of the most influential Grid middleware projects is the Globus

More information

A Distributed Media Service System Based on Globus Data-Management Technologies1

A Distributed Media Service System Based on Globus Data-Management Technologies1 A Distributed Media Service System Based on Globus Data-Management Technologies1 Xiang Yu, Shoubao Yang, and Yu Hong Dept. of Computer Science, University of Science and Technology of China, Hefei 230026,

More information

Architectural Styles I

Architectural Styles I Architectural Styles I Software Architecture VO/KU (707023/707024) Roman Kern KTI, TU Graz 2015-01-07 Roman Kern (KTI, TU Graz) Architectural Styles I 2015-01-07 1 / 86 Outline 1 Non-Functional Concepts

More information

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4 An Overview of Grid Computing Workshop Day 1 : August 05 2004 (Thursday) An overview of Globus Toolkit 2.4 By CDAC Experts Contact :vcvrao@cdacindia.com; betatest@cdacindia.com URL : http://www.cs.umn.edu/~vcvrao

More information

Database Assessment for PDMS

Database Assessment for PDMS Database Assessment for PDMS Abhishek Gaurav, Nayden Markatchev, Philip Rizk and Rob Simmonds Grid Research Centre, University of Calgary. http://grid.ucalgary.ca 1 Introduction This document describes

More information

Data Management for the World s Largest Machine

Data Management for the World s Largest Machine Data Management for the World s Largest Machine Sigve Haug 1, Farid Ould-Saada 2, Katarina Pajchel 2, and Alexander L. Read 2 1 Laboratory for High Energy Physics, University of Bern, Sidlerstrasse 5,

More information

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Grid services Dusan Vudragovic dusan@phy.bg.ac.yu Scientific Computing Laboratory Institute of Physics Belgrade, Serbia Sep. 19, 2008 www.eu-egee.org Set of basic Grid services Job submission/management

More information

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an

More information

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world Pajchel, K.; Eerola, Paula; Konya, Balazs; Smirnova,

More information

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices Data Management 1 Grid data management Different sources of data Sensors Analytic equipment Measurement tools and devices Need to discover patterns in data to create information Need mechanisms to deal

More information

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat Grid Workflow Efficient Enactment for Data Intensive Applications L3.4 Data Management Techniques Authors : Eddy Caron Frederic Desprez Benjamin Isnard Johan Montagnat Summary : This document presents

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

NUSGRID a computational grid at NUS

NUSGRID a computational grid at NUS NUSGRID a computational grid at NUS Grace Foo (SVU/Academic Computing, Computer Centre) SVU is leading an initiative to set up a campus wide computational grid prototype at NUS. The initiative arose out

More information

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007 Grid Programming: Concepts and Challenges Michael Rokitka SUNY@Buffalo CSE510B 10/2007 Issues Due to Heterogeneous Hardware level Environment Different architectures, chipsets, execution speeds Software

More information

ThinAir Server Platform White Paper June 2000

ThinAir Server Platform White Paper June 2000 ThinAir Server Platform White Paper June 2000 ThinAirApps, Inc. 1999, 2000. All Rights Reserved Copyright Copyright 1999, 2000 ThinAirApps, Inc. all rights reserved. Neither this publication nor any part

More information

A New Grid Manager for NorduGrid

A New Grid Manager for NorduGrid Master s Thesis A New Grid Manager for NorduGrid A Transitional Path Thomas Christensen Rasmus Aslak Kjær June 2nd, 2005 Aalborg University Aalborg University Department of Computer Science, Frederik Bajers

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 4 and 5 Grid: 2012-2013 Introduction. Up until now Definition of Cloud Computing. Grid Computing: Schedulers: Condor SGE 1 Summary Core Grid: Toolkit Condor-G Grid: Conceptual Architecture

More information

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed. Technical Overview Technical Overview Standards based Architecture Scalable Secure Entirely Web Based Browser Independent Document Format independent LDAP integration Distributed Architecture Multiple

More information

Distributed Computing Environment (DCE)

Distributed Computing Environment (DCE) Distributed Computing Environment (DCE) Distributed Computing means computing that involves the cooperation of two or more machines communicating over a network as depicted in Fig-1. The machines participating

More information

Installing and Administering a Satellite Environment

Installing and Administering a Satellite Environment IBM DB2 Universal Database Installing and Administering a Satellite Environment Version 8 GC09-4823-00 IBM DB2 Universal Database Installing and Administering a Satellite Environment Version 8 GC09-4823-00

More information

An Introduction to GPFS

An Introduction to GPFS IBM High Performance Computing July 2006 An Introduction to GPFS gpfsintro072506.doc Page 2 Contents Overview 2 What is GPFS? 3 The file system 3 Application interfaces 4 Performance and scalability 4

More information

Understanding StoRM: from introduction to internals

Understanding StoRM: from introduction to internals Understanding StoRM: from introduction to internals 13 November 2007 Outline Storage Resource Manager The StoRM service StoRM components and internals Deployment configuration Authorization and ACLs Conclusions.

More information

LCG-2 and glite Architecture and components

LCG-2 and glite Architecture and components LCG-2 and glite Architecture and components Author E.Slabospitskaya www.eu-egee.org Outline Enabling Grids for E-sciencE What are LCG-2 and glite? glite Architecture Release 1.0 review What is glite?.

More information

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Chapter 4:- Introduction to Grid and its Evolution Prepared By:- Assistant Professor SVBIT. Overview Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies

More information

A Federated Grid Environment with Replication Services

A Federated Grid Environment with Replication Services A Federated Grid Environment with Replication Services Vivek Khurana, Max Berger & Michael Sobolewski SORCER Research Group, Texas Tech University Grids can be classified as computational grids, access

More information

A Simple Mass Storage System for the SRB Data Grid

A Simple Mass Storage System for the SRB Data Grid A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI Outline Motivations for implementing a Mass Storage

More information

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS

A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS A RESOURCE MANAGEMENT FRAMEWORK FOR INTERACTIVE GRIDS Raj Kumar, Vanish Talwar, Sujoy Basu Hewlett-Packard Labs 1501 Page Mill Road, MS 1181 Palo Alto, CA 94304 USA { raj.kumar,vanish.talwar,sujoy.basu}@hp.com

More information

Overview SENTINET 3.1

Overview SENTINET 3.1 Overview SENTINET 3.1 Overview 1 Contents Introduction... 2 Customer Benefits... 3 Development and Test... 3 Production and Operations... 4 Architecture... 5 Technology Stack... 7 Features Summary... 7

More information

Layered Architecture

Layered Architecture The Globus Toolkit : Introdution Dr Simon See Sun APSTC 09 June 2003 Jie Song, Grid Computing Specialist, Sun APSTC 2 Globus Toolkit TM An open source software toolkit addressing key technical problems

More information

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved. Configuring the Oracle Network Environment Objectives After completing this lesson, you should be able to: Use Enterprise Manager to: Create additional listeners Create Oracle Net Service aliases Configure

More information

Control-M and Payment Card Industry Data Security Standard (PCI DSS)

Control-M and Payment Card Industry Data Security Standard (PCI DSS) Control-M and Payment Card Industry Data Security Standard (PCI DSS) White paper PAGE 1 OF 16 Copyright BMC Software, Inc. 2016 Contents Introduction...3 The Need...3 PCI DSS Related to Control-M...4 Control-M

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac. g-eclipse A Framework for Accessing Grid Infrastructures Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.cy) EGEE Training the Trainers May 6 th, 2009 Outline Grid Reality The Problem g-eclipse

More information

ARC integration for CMS

ARC integration for CMS ARC integration for CMS ARC integration for CMS Erik Edelmann 2, Laurence Field 3, Jaime Frey 4, Michael Grønager 2, Kalle Happonen 1, Daniel Johansson 2, Josva Kleist 2, Jukka Klem 1, Jesper Koivumäki

More information

dcache Introduction Course

dcache Introduction Course GRIDKA SCHOOL 2013 KARLSRUHER INSTITUT FÜR TECHNOLOGIE KARLSRUHE August 29, 2013 dcache Introduction Course Overview Chapters I, II and Ⅴ christoph.anton.mitterer@lmu.de I. Introduction To dcache Slide

More information

Grid Computing Systems: A Survey and Taxonomy

Grid Computing Systems: A Survey and Taxonomy Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical

More information

DIRAC pilot framework and the DIRAC Workload Management System

DIRAC pilot framework and the DIRAC Workload Management System Journal of Physics: Conference Series DIRAC pilot framework and the DIRAC Workload Management System To cite this article: Adrian Casajus et al 2010 J. Phys.: Conf. Ser. 219 062049 View the article online

More information

ESET Remote Administrator 6. Version 6.0 Product Details

ESET Remote Administrator 6. Version 6.0 Product Details ESET Remote Administrator 6 Version 6.0 Product Details ESET Remote Administrator 6.0 is a successor to ESET Remote Administrator V5.x, however represents a major step forward, completely new generation

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager The University of Oxford campus grid, expansion and integrating new partners Dr. David Wallom Technical Manager Outline Overview of OxGrid Self designed components Users Resources, adding new local or

More information

PoS(EGICF12-EMITC2)081

PoS(EGICF12-EMITC2)081 University of Oslo, P.b.1048 Blindern, N-0316 Oslo, Norway E-mail: aleksandr.konstantinov@fys.uio.no Martin Skou Andersen Niels Bohr Institute, Blegdamsvej 17, 2100 København Ø, Denmark E-mail: skou@nbi.ku.dk

More information

Service Mesh and Microservices Networking

Service Mesh and Microservices Networking Service Mesh and Microservices Networking WHITEPAPER Service mesh and microservice networking As organizations adopt cloud infrastructure, there is a concurrent change in application architectures towards

More information

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution Tervela helps companies move large volumes of sensitive data safely and securely over network distances great and small. We have been

More information

WebSphere Application Server, Version 5. What s New?

WebSphere Application Server, Version 5. What s New? WebSphere Application Server, Version 5 What s New? 1 WebSphere Application Server, V5 represents a continuation of the evolution to a single, integrated, cost effective, Web services-enabled, J2EE server

More information

ZL UA Exchange 2013 Archiving Configuration Guide

ZL UA Exchange 2013 Archiving Configuration Guide ZL UA Exchange 2013 Archiving Configuration Guide Version 8.0 January 2014 ZL Technologies, Inc. Copyright 2014 ZL Technologies, Inc.All rights reserved ZL Technologies, Inc. ( ZLTI, formerly known as

More information

Client tools know everything

Client tools know everything Scheduling, clients Client tools know everything Application database Certificate Certificate Authorised users directory Certificate Policies Grid job management service Data Certificate Certificate Researcher

More information

Introduction to Programming and Computing for Scientists

Introduction to Programming and Computing for Scientists Oxana Smirnova (Lund University) Programming for Scientists Lecture 4 1 / 44 Introduction to Programming and Computing for Scientists Oxana Smirnova Lund University Lecture 4: Distributed computing Most

More information

EUROPEAN MIDDLEWARE INITIATIVE

EUROPEAN MIDDLEWARE INITIATIVE EUROPEAN MIDDLEWARE INITIATIVE VOMS CORE AND WMS SECURITY ASSESSMENT EMI DOCUMENT Document identifier: EMI-DOC-SA2- VOMS_WMS_Security_Assessment_v1.0.doc Activity: Lead Partner: Document status: Document

More information

An Architecture For Computational Grids Based On Proxy Servers

An Architecture For Computational Grids Based On Proxy Servers An Architecture For Computational Grids Based On Proxy Servers P. V. C. Costa, S. D. Zorzo, H. C. Guardia {paulocosta,zorzo,helio}@dc.ufscar.br UFSCar Federal University of São Carlos, Brazil Abstract

More information

GT-OGSA Grid Service Infrastructure

GT-OGSA Grid Service Infrastructure Introduction to GT3 Background The Grid Problem The Globus Approach OGSA & OGSI Globus Toolkit GT3 Architecture and Functionality: The Latest Refinement of the Globus Toolkit Core Base s User-Defined s

More information

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Chapter 3. Design of Grid Scheduler. 3.1 Introduction Chapter 3 Design of Grid Scheduler The scheduler component of the grid is responsible to prepare the job ques for grid resources. The research in design of grid schedulers has given various topologies

More information

Extended Search Administration

Extended Search Administration IBM Lotus Extended Search Extended Search Administration Version 4 Release 0.1 SC27-1404-02 IBM Lotus Extended Search Extended Search Administration Version 4 Release 0.1 SC27-1404-02 Note! Before using

More information

ForeScout Extended Module for Carbon Black

ForeScout Extended Module for Carbon Black ForeScout Extended Module for Carbon Black Version 1.0 Table of Contents About the Carbon Black Integration... 4 Advanced Threat Detection with the IOC Scanner Plugin... 4 Use Cases... 5 Carbon Black Agent

More information

IBM Tivoli Directory Server

IBM Tivoli Directory Server Build a powerful, security-rich data foundation for enterprise identity management IBM Tivoli Directory Server Highlights Support hundreds of millions of entries by leveraging advanced reliability and

More information

X100 ARCHITECTURE REFERENCES:

X100 ARCHITECTURE REFERENCES: UNION SYSTEMS GLOBAL This guide is designed to provide you with an highlevel overview of some of the key points of the Oracle Fusion Middleware Forms Services architecture, a component of the Oracle Fusion

More information

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT

MONTE CARLO SIMULATION FOR RADIOTHERAPY IN A DISTRIBUTED COMPUTING ENVIRONMENT The Monte Carlo Method: Versatility Unbounded in a Dynamic Computing World Chattanooga, Tennessee, April 17-21, 2005, on CD-ROM, American Nuclear Society, LaGrange Park, IL (2005) MONTE CARLO SIMULATION

More information

Knowledge Discovery Services and Tools on Grids

Knowledge Discovery Services and Tools on Grids Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid

More information

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems Vortex Whitepaper Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems www.adlinktech.com 2017 Table of Contents 1. Introduction........ P 3 2. Iot and

More information

Textual Description of webbioc

Textual Description of webbioc Textual Description of webbioc Colin A. Smith October 13, 2014 Introduction webbioc is a web interface for some of the Bioconductor microarray analysis packages. It is designed to be installed at local

More information

Grid Computing with Voyager

Grid Computing with Voyager Grid Computing with Voyager By Saikumar Dubugunta Recursion Software, Inc. September 28, 2005 TABLE OF CONTENTS Introduction... 1 Using Voyager for Grid Computing... 2 Voyager Core Components... 3 Code

More information

OpenIAM Identity and Access Manager Technical Architecture Overview

OpenIAM Identity and Access Manager Technical Architecture Overview OpenIAM Identity and Access Manager Technical Architecture Overview Overview... 3 Architecture... 3 Common Use Case Description... 3 Identity and Access Middleware... 5 Enterprise Service Bus (ESB)...

More information

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1 A distributed tier-1 L Fischer 1, M Grønager 1, J Kleist 2 and O Smirnova 3 1 NDGF - Nordic DataGrid Facilty, Kastruplundgade 22(1), DK-2770 Kastrup 2 NDGF and Aalborg University, Department of Computer

More information

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Grid Computing Middleware. Definitions & functions Middleware components Globus glite Seminar Review 1 Topics Grid Computing Middleware Grid Resource Management Grid Computing Security Applications of SOA and Web Services Semantic Grid Grid & E-Science Grid Economics Cloud Computing 2 Grid

More information

igrid: a Relational Information Service A novel resource & service discovery approach

igrid: a Relational Information Service A novel resource & service discovery approach igrid: a Relational Information Service A novel resource & service discovery approach Italo Epicoco, Ph.D. University of Lecce, Italy Italo.epicoco@unile.it Outline of the talk Requirements & features

More information

Sentinet for BizTalk Server SENTINET

Sentinet for BizTalk Server SENTINET Sentinet for BizTalk Server SENTINET Sentinet for BizTalk Server 1 Contents Introduction... 2 Sentinet Benefits... 3 SOA and API Repository... 4 Security... 4 Mediation and Virtualization... 5 Authentication

More information

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data June 2006 Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality,

More information

Enterprise Directories and Security Management: Merging Technologies for More Control

Enterprise Directories and Security Management: Merging Technologies for More Control Enterprise Directories and Security Management: Merging Technologies for More Control Contents Introduction...3 Directory Services...3 A brief history of LDAP...3 LDAP today...4 Free and commercially available

More information

MONITORING OF GRID RESOURCES

MONITORING OF GRID RESOURCES MONITORING OF GRID RESOURCES Nikhil Khandelwal School of Computer Engineering Nanyang Technological University Nanyang Avenue, Singapore 639798 e-mail:a8156178@ntu.edu.sg Lee Bu Sung School of Computer

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

Grid Middleware and Globus Toolkit Architecture

Grid Middleware and Globus Toolkit Architecture Grid Middleware and Globus Toolkit Architecture Lisa Childers Argonne National Laboratory University of Chicago 2 Overview Grid Middleware The problem: supporting Virtual Organizations equirements Capabilities

More information

MythoLogic: problems and their solutions in the evolution of a project

MythoLogic: problems and their solutions in the evolution of a project 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. MythoLogic: problems and their solutions in the evolution of a project István Székelya, Róbert Kincsesb a Department

More information

SAML-Based SSO Solution

SAML-Based SSO Solution About SAML SSO Solution, page 1 Single Sign on Single Service Provider Agreement, page 2 SAML-Based SSO Features, page 2 Basic Elements of a SAML SSO Solution, page 3 Cisco Unified Communications Applications

More information

SAS Environment Manager A SAS Viya Administrator s Swiss Army Knife

SAS Environment Manager A SAS Viya Administrator s Swiss Army Knife Paper SAS2260-2018 SAS Environment Manager A SAS Viya Administrator s Swiss Army Knife Michelle Ryals, Trevor Nightingale, SAS Institute Inc. ABSTRACT The latest version of SAS Viya brings with it a wealth

More information

TFS WorkstationControl White Paper

TFS WorkstationControl White Paper White Paper Intelligent Public Key Credential Distribution and Workstation Access Control TFS Technology www.tfstech.com Table of Contents Overview 3 Introduction 3 Important Concepts 4 Logon Modes 4 Password

More information

Scientific Computing with UNICORE

Scientific Computing with UNICORE Scientific Computing with UNICORE Dirk Breuer, Dietmar Erwin Presented by Cristina Tugurlan Outline Introduction Grid Computing Concepts Unicore Arhitecture Unicore Capabilities Unicore Globus Interoperability

More information

Grid Scheduling Architectures with Globus

Grid Scheduling Architectures with Globus Grid Scheduling Architectures with Workshop on Scheduling WS 07 Cetraro, Italy July 28, 2007 Ignacio Martin Llorente Distributed Systems Architecture Group Universidad Complutense de Madrid 1/38 Contents

More information

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM IBM Spectrum Protect Version 8.1.2 Introduction to Data Protection Solutions IBM Note: Before you use this information

More information

Index Introduction Setting up an account Searching and accessing Download Advanced features

Index Introduction Setting up an account Searching and accessing Download Advanced features ESGF Earth System Grid Federation Tutorial Index Introduction Setting up an account Searching and accessing Download Advanced features Index Introduction IT Challenges of Climate Change Research ESGF Introduction

More information

Entrust. Discovery 2.4. Administration Guide. Document issue: 3.0. Date of issue: June 2014

Entrust. Discovery 2.4. Administration Guide. Document issue: 3.0. Date of issue: June 2014 Entrust Discovery 2.4 Administration Guide Document issue: 3.0 Date of issue: June 2014 Copyright 2010-2014 Entrust. All rights reserved. Entrust is a trademark or a registered trademark of Entrust, Inc.

More information

Connection Broker Managing User Connections to Workstations and Blades, OpenStack Clouds, VDI, and More

Connection Broker Managing User Connections to Workstations and Blades, OpenStack Clouds, VDI, and More Connection Broker Managing User Connections to Workstations and Blades, OpenStack Clouds, VDI, and More Quick Start Using Leostream with Citrix XenDesktop 7 and HDX Version 8.1 January 14, 2016 Contacting

More information

A Survey Paper on Grid Information Systems

A Survey Paper on Grid Information Systems B 534 DISTRIBUTED SYSTEMS A Survey Paper on Grid Information Systems Anand Hegde 800 North Smith Road Bloomington Indiana 47408 aghegde@indiana.edu ABSTRACT Grid computing combines computers from various

More information

Monitoring ARC services with GangliARC

Monitoring ARC services with GangliARC Journal of Physics: Conference Series Monitoring ARC services with GangliARC To cite this article: D Cameron and D Karpenko 2012 J. Phys.: Conf. Ser. 396 032018 View the article online for updates and

More information

OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems

OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems April 2017 215-12035_C0 doccomments@netapp.com Table of Contents 3 Contents Before you create ONTAP Cloud systems... 5 Logging in

More information

Xerox Device Data Collector 1.1 Security and Evaluation Guide

Xerox Device Data Collector 1.1 Security and Evaluation Guide Xerox Device Data Collector 1.1 Security and Evaluation Guide 2009 Xerox Corporation. All rights reserved. Xerox, WorkCentre, Phaser and the sphere of connectivity design are trademarks of Xerox Corporation

More information

Building a Real-time Notification System

Building a Real-time Notification System Building a Real-time Notification System September 2015, Geneva Author: Jorge Vicente Cantero Supervisor: Jiri Kuncar CERN openlab Summer Student Report 2015 Project Specification Configurable Notification

More information