ARC middleware. The NorduGrid Collaboration

Size: px

Start display at page:

Download "ARC middleware. The NorduGrid Collaboration"

Ami Benson
6 years ago
Views:

1 ARC middleware The NorduGrid Collaboration Abstract The paper describes the existing components of ARC, discusses some of the new components, functionalities and enhancements currently under development, and outlines the architectural and technical choices that have been made to ensure scalability, stability and high performace. 1 Introduction Computational Grids are not a novel idea. The Condor r project [1] led the way in 1988 by setting the goal of creating a computing environment with heterogeneous distributed resources. Further efforts to create a layer for secure, coordinated access to geographically distributed computing and storage resources resulted in the appearance of the Globus Toolkit r [2] and its Grid Security Infrastructure (GSI) [3] in the late 1990 s. The words Grid and middleware were coined to describe the new phenomenon of the single-login access to widely distributed computing resources. Addressing the needs of the Nordic 1 research community, the NorduGrid collaboration [4] in 2002 started to develop a software solution for wide-area computing and data handling [5], making use of the by then de facto standard Globus Toolkit r 2. This new open source middleware is now called the Advanced Resource Connector (ARC). Scientific and academic computing in the Nordic countries has a specific feature in a way that it is carried out by a large number of small and medium size facilities of different kind and ownership. An important requirement from the very start of ARC development has been to keep the middleware portable, compact and manageable both on the server and the client side. The client package currently occupies only 14 Megabytes, it is available for most current Linux 2 distributions and can be installed at any available location by a nonprivileged user. For a computing service, only three main processes are needed: 1 Nordic states are: Denmark, Finland, Iceland, Norway and Sweden. 2 Linux is a trademark of Linus Torvalds. 2005

2 those of the file transfer service (GridFTP, see Section 3.1), the Grid Manager (see Section 3.3) and the Local Information Service (see Section 3.6.1). Recently, ARC development and deployment has been funded by the Nordic Natural Science Research Council (NOS-N) and the aim is to provide a foundation for a large-scale, multi-disciplinary Nordic Data Grid Facility [6]. ARC is being used as the middleware of choice for this facility and for many other Grid infrastructures, such as the Estonian Grid [7], Swegrid [8], DCGC [9], Swiss ATLAS Grid [10] and others. The middleware supports productionlevel systems being in a non-stop operation since 2002 [11], and has proven its record in demanding High Energy Physics computing tasks [12,13], being the first ever middleware to provide services to this kind of a massive world-wide data processing effort. This paper gives a comprehensive technical overview of the ARC middleware, its architecture and components the existing core ones as well as the recently introduced and beta-quality services. The rest of the paper is organised as follows. In Section 2, the basic architecture of ARC is presented. Section 3 discusses various services and Section 4 describes the client components and user interfaces. Sections 5 and 6 are more development oriented, discussing the developer interfaces and libraries, recent additions to ARC, and software dependencies and packaging, respectively. A summary is provided in Section 7. 2 Architecture The architecture and design of the ARC middleware has been driven by the requirements and specifics of the user community that initiated the project the Nordic ATLAS [14] groups. The basic set of high-level requirements is not different from those of any other community, be it users, resource owners or software developers: the system must be performant, reliable, scalable, secure and simple. However, the environment that led to creation of ARC has some specifics that can be outlined as follows: Excellent academic network (NORDUNet [15]), providing a solid basis for deployment of wide area computing architectures based on Grid technologies; A multitude of computing resources of various capacity, origin and ownership, effectively excluding a possibility for a centralized, uniform facility; Prevalence of trivially parallel, serial batch tasks, requiring Linux flavours of operating system; Necessity to operate with large amounts (Terabytes) of data stored in files 2

3 and accessed by the tasks via POSIX calls. Customers requirements impose strict rules for the middleware development. Independently of the nature of the customer an end-user or a resource owner the guiding principles are the following: (1) No single point of failure, no bottlenecks. (2) The Grid is self-organizing with no need for centralized management. (3) The Grid is robust and fault-tolerant, capable of providing stable roundthe-clock services for years. (4) Grid tools and utilities are non-intrusive, have small footprint, do not require special underlying system configuration and are easily portable. (5) No extra manpower is needed to maintain and utilize the Grid layer. (6) Tools and utilities respect local resource owner policies, in particular, security-related ones. ARC developers take a conscious approach of creating a solution that addresses the users needs while meeting the resource owners strict demands. This requires a delicate balance between the users wishes to get a transparent, single-login access to as much resources as possible world-wide, and the system administrators needs to keep unwanted customers and expensive or insecure services at bay. Simplicity is the key word: applied correctly, it ensures robustness, stability, performance and ease of control. Development of the ARC middleware follows the philosophy start with something simple that works for users and add functionality gradually. This allows ARC to provide customers with a working setup on very early stages, and to define the priorities of adding functionality and enhancements based on the feed-back from the users. Middleware development was preceded by an extended period of try-outs of by then available solutions and collecting feed-back from the users. This process identified a set of more specific user requirements, described in the following Section. 2.1 Requirements Perhaps the most important requirement that affects ARC architecture stems from the fact that most user tasks deal with massive data processing. A typical job expects that a certain set of input data is available via a POSIX open call. Moreover, such a job produces another set of data as a result of its activity. These input and output data might well reach several Gigabytes in size. The most challenging task of the Grid is thus not just to find free cycles for a job, but to ensure that the necessary input is staged in for it, and all the output is properly staged out. To date, ARC is the only Grid solution which adequately 3

4 addresses this demand. Another important requirement is that of a simple client that can be easily installed at any location by any user. Such a client must provide all the functionality necessary to work in the Grid environment and yet remain small in size and simple in installation. The functionality includes, quite obviously, Grid job submission, monitoring and manipulation, possibility to declare and submit batches of jobs with minimal overhead, and means to work with the data located on the Grid. The client must come primarily as a lightweight command line interface (CLI), with an optional extension to a GUI or a portal. The requirement of CLI portability and non-root installation procedure practically implies its distribution as a binary archived package (tar, gzip). A user must be able to use clients installed in different workstations interchangeably, without the necessity to instantiate them as permanently active services. Naturally, configuration process must be reduced to an absolute minimum, with the default configuration covering most of the use cases. Last, but not least, the clients must remain functional in a firewalled environment. There is another perspective on the Grid: that of the resource owners. The idea of providing rather expensive and often sensitive services to a set of unknown users worldwide is not particularly appealing to the system administrators, facility owners and funding bodies, whose primary task is to serve local user communities. A set of strict agreements defining a limited user base and specific services results effectively in a limited and a highly regulated set of dedicated computing resources, only remotely resembling the World Wide Grid idea. NorduGrid s approach is to provide resource owners with a non-invasive and yet secure set of tools, requiring minimal installation and maintenace efforts while preserving the existing structures and respecting local usage policies. Each resource owner that would be willing to share some resources on the Grid, has a specific set of requirements to be met by the middleware. The most common requests are: The solution must be firewall-friendly (e.g., minimize number of necessary connections). Services must have easy crash recovery procedures. Allowed Grid-specific resource usage (load, disk space) should be adjustable and manageable. Accounting possibilities must be foreseen. Services must be able to run using a non-privileged account. Single configuration procedure for site-specific services is highly desirable. 4

5 2.2 Design The ARC architecture is carefully planned and designed to satisfy the above mentioned needs of end-users and resource providers. The major effort is put to make the resulting Grid system stable by design. This is achieved by identifying three mandatory components (see Figure 1): (1) The Computing Service, implemented as a GridFTP-Grid Manager pair of services (see Sections 3.1 and 3.3). The Grid Manager (GM) is the heart of the Grid, instantiated at each computing resource s (usually a cluster of computers) front-end as a new service. It serves as a gateway to the computing resource through a GridFTP channel. GM provides an interface to the local resource management system, facilitates job manipulation, data management, allows for accounting and other essential funtions. (2) The Information System (see Section 3.6) serves as a nervous system of the Grid. It is realized as a hierarchichal distributed database: the information is produced and stored at each service (computing, storage), while the distributed system of Index Services maintains the list of known services. (3) The Brokering Client (see Section 4.2.1) the brain of the Grid, which is deployed as a client part in as many instances as users need. It is enabled with powerful brokering capabilities, being able to distribute the workload across the Grid. It also provides client functionality for all the Grid services, and yet is required to be lightweight and installable by any user in an arbitrary location in a matter of few minutes. Computing Services 1..* Mandatory elements Optional elements Logging Services 1..* Data Indexing Services 1..* Brokering Clients 1..* Resource Index Services 1..* Monitoring Clients 1..* User Databases 1..* Storage Services 1..* Fig. 1. Components of the ARC architecture. 5

6 In this scheme, the Grid is defined as a set of resources registering to the common information system. Grid users are those who are authorized to use at least one of the Grid resources by the means of Grid tools. Services and users must be properly certified by trusted agencies (Section 3.7). Users interact with the Grid via their personal clients, which interpret the tasks, query the Information System whenever necessary, discover suitable resources, and forward task requests to the appropriate ones, along with the user s proxy and other necessary data. If the task consists of job execution, it is submitted to a computing resource, where the Grid Manager interprets the job description, prepares the requested environment, stages in the necessary data, and submits the job to the local resource management system (LRMS). Grid jobs within the ARC system possess a dedicated area on the computing resource, called the session directory, which effectively implements limited sandboxing for each job. Location of each session directory is a valid URL and serves as the unique Job Identifier. In most configurations this guarantees that the entire contents of the session directory is available to the authorised persons during the lifetime of the Grid job. Job states are reported in the Information System. Users can monitor the job status and manipulate the jobs with the help of the client tools, that fetch the data from the Information System and forward the necessary instructions to the Grid Manager. The Grid Manager takes care of staging out and registering the data (if requested), submitting logging and accounting information, and eventually cleaning up the space used by the job. Client tools can also be used to retrieve job outputs at any time. The reliable performance of such a system is achieved by using several innovative approaches: Usage of the specialized GridFTP server (see Section 3.1) for job submission provides for job and resource management flexibility, full GSI benefits, firewall-friendliness, and a lightweight client. The Grid Manager carries most of the workload, thus enabling a perfect front-end Grid model, where no Grid-related software (other than the LRMS) has to be installed on cluster worker nodes. The Grid Manager is highly configurable. The data management functionality of the Grid Manager includes reliable file transfer, input data caching and registration of output data to data indexing services via built-in interfaces. The system has no central element or service: the information tree is multirooted, and matchmaking is performed by individual clients. The information system appropriately describes the natural Grid entities and resources in a dedicated LDAP schema. The information about Grid jobs, users and resource characteristics is kept in local LDAP databases at every resource and is not cached in higher level databases: the imperative is to always have fresh data, or not to have it at all. There is no intermediate service between the client and the resource: the 6

7 Brokering Client performs the matchmaking and actual Grid job submission, as well as providing clients for other Grid services. Users can share the clients, or have several user interfaces per user. All the services are stateful, allowing easy failure recovery and synchronisation, when necessary. Figure 1 shows several optional components, which are not required for an initial Grid setup, but are essential for providing proper Grid services. Most important are the Storage Services and the Data Indexing Services. As long as the users jobs do not manipulate large amonts of data, these are not necessary. However, when Terabytes of data are being processed, they have to be reliably stored and properly indexed, allowing further usage. Data are stored at Storage Elements (SE), which in ARC come in two kinds: the regular disk-based SE is simply a GridFTP interface to a file system (see Section 3.4), while the Smart Storage Element (SSE) is a Web service providing extended functionality, including file movement and automatic file indexing (see Section 3.5). File indexing itself needs a specially structured database with a proper GSI-enabled interface. ARC currently has an interface to several data indexing systems: Globus Replica Catalog (RC) [16], Globus Replica Location Service (RLS) [17] and glite Fireman [18]. RC and RLS were developed by the Globus project with the goal to facilitate cataloging and consequent locating of data sources. The original Replica Catalog was not GSI-enabled; however, ARC developers have added possibility to perform securely authenticated connections. Fireman is being development as a component of the glite [19] Data Management System. Another non-critical component that nevertheless often comes among the topranked in user requirements, is the Grid Monitor (Section 4.4). In the ARC solution, it is only a client without capabilities to manipulate resources or jobs, providing information only to the end-users. It relies entirely on the Information System, being basically a specialized user-friendly LDAP browser. Monitoring in ARC does not require special sensors, agents or dedicated databases: it simply re-uses the Information System infrastructure. While monitoring offers real-time snapshots of the system, historical information has to be persistified for further analysis, accounting, auditing and similar purposes. To provide for this, a job s usage record is submitted by the Grid Manager to a Logging Service (Logger) (see Section 6.2). Each group of users that desires to maintain such records can run a dedicated Logger. For all the structure described above to serve a particular user, this user should be known (authenticated) to the Grid and be authorised to use one or more services according to the policies adopted by the service providers. The ARC user management scheme is identical to most other Grid solutions: typically, Grid users are grouped in Virtual Organisations (VO), which provide their 7

8 member lists via various User Databases or through a Virtual Organization Membership Service (VOMS [20]). ARC services that require authorisation, such as the Computing Service or a Storage Service, subscribe to a subset of these databases (depending on local policies) and regularly retrieve the lists of authorised user credentials. The overall architecture does not impose any special requirements upon computing resources, apart from the preferable presence of a shared filesystem (e.g. NFS) between the head node and the compute (worker) nodes of a cluster. If there is no shared area, LRMS has to provide means for staging data between the head node and the compute nodes. Cluster-wise, ARC implements a pure front-end model: nothing has to be installed on the worker nodes. Being developed mainly for the purposes of serial batch jobs, ARC presently does not support a range of other computational tasks. Most notably, there is no solution for an interactive analysis, such as graphical visualisation or other kind of tasks requiring real-time user input. Although it attempts to look to an end-user not much different from a conventional batch system, ARC has a substantial difference: the results of a batch job do not appear at the user s local area, until the user fetches them manually. The ARC job submission client does not provide an automatic job resubmission in the case of trivial failures (such as network outages), neither does it implement job re-scheduling from one computing service to another. Higher level tools or Grid services can be developed to assist in such tasks (e.g., the Job Manager, see Section 6). ARC is not suitable for jobs that are expected to execute instantly: this would require distributed advance resource reservation, which is not implemented in the Grid world. The related shortcoming is that ARC provides no solution for cross-cluster parallel jobs. All these limitations do not stem from the architectural solution: rather, they would require creating new dedicated services, quite possibly in the framework of the current design. 3 Services ARC provides a reliable, lightweight, production quality and scalable implementation of fundamental Grid services such as Grid job submission and management, Grid resource management, resource characterization, resource aggregation, and data management. A Grid enabled computing resource runs a non-intrusive Grid layer composed of three main services: specialized GridFTP (Sections 3.1), Grid Manager (Section 3.3), and the Local Information Service (Section 3.6.1). The first two constitute the Computing Service as such. 8

9 A Grid enabled storage resource makes use of either of the two types of Grid storage layers provided by ARC: a conventional solution, based upon the GridFTP server (Section 3.4), or the Web service based (Section 3.2) Smart Storage Element (Section 3.5). Storage resources also run Local Information Services. Computing and storage resources are connected to the Grid via the registration service (Section 3.6.3) which links a resource to an Information Index Service (Section 3.6.2), this way implementing resource aggregation. 3.1 ARC GridFTP server GridFTP interface was chosen. ARC provides its own GridFTP server (GFS), with a GridFTP interface implemented using the Globus GSIFTP Control Connection application programming interface (API) of the Globus Toolkit r libraries. The reason for introducing this server was an absence of GridFTP servers with flexible authorization. Only available implementation was modified WU-FTPD FTP server distributed as part of Globus Toolkit r Ẇith whole configurutation based on UNIX identity of users, it was incapable of providing adequate authorization in distributed environement. GFS consists of two parts. The front-end part implements the GridFTP protocol, authentication, authorization and configuration of the service. The backend part implements the file system visible through the GridFTP interface. This solution also adds possibility to serve various kind of information through GridFTP interface, and to reduce number of transport protocols used in system. And hence, to decrease complexity of software. GFS does not implement all the latest features of the GridFTP protocols, but it is compatible with the tools provided by the Globus Toolkit r and other Grid projects based on it. The front-end part takes care of identifying the connecting client and applying access rules (see Section 3.7). In the configuration, a virtual tree of directories is defined, with directories assigned to various back-ends. The activation of the virtual directories is performed in the configuration, according to the Grid identity of a client. Therefore, each client may be served by various combinations of services. Information about Grid identity of user is propagated through whole chain down to back-ends. This make it possible to write back-end capable of making internal authorization decisions based entirely on Grid identity of connection client and avoid necessity to manage dedicated local user accounts. 9

10 The back-end part is implemented as a set of plugins. Each plugin is a dynamic library which implements a particular C++ class with methods corresponding to FTP operations needed to create a view of a virtual file system-like structure visible through the GridFTP interface. Currently, the ARC distribution comes with three plugins. Two of them, called fileplugin and gaclplugin, implement different ways for accessing local file systems (see Section 3.4 for description). Their virtual file system almost corresponds to the underlying physical one. The third plugin is called jobplugin: it generates a virtual view of session directories managed by the Grid Manager service and is described in Section HTTPSd framework ARC comes with a framework for building lightweight HTTP, HTTP over SSL (HTTPS) and GSI (HTTPg) services and clients. 3 It consists of C++ classes and a server implementation. These classes implement a subset of the HTTP protocol, including GET, PUT and POST methods, content ranges, and other most often used features. The implementation is not feature rich, instead, it is light-weight and targeted to transfer the content with little overhead. It has some support for gsoap [22] integrated in the implementation of the POST method, thus allowing creation and communication with Web Services. This framework does not support message-level security. The server implements configuration, authentication and authorization in a way common for all the network services of ARC. It listens to two TCP/IP ports for HTTPS and HTTPg connections. The configuration assigns various plugin services to paths. The services themselves are dynamic libraries which provide configuration and creation functions and supply an implementation of a C++ class responsible for handling HTTP requests. The client part is a thin layer which allows one to pump data through GET and PUT methods, and provides functionality adequate to the server part. The DataMove library, described in Section 5.2, uses its functionality to provide a multi-stream access to HTTP(S/g) content. ARC is currently being delivered with two HTTP(S/g) plugins. Those are two Web Services: the Smart Storage Element (Section 3.5) and the Logger (Section 6.2). Some others are being developed. Among them is a Web Service for job submission and control, intended to replace the jobplugin of the GFS. It also provides an interface for ordinary Web browsers to monitor the execution of Grid jobs. 3 For a summary of these mechanisms, see [21]. 10

11 3.3 Grid Manager The purpose of the ARC Grid Manager (GM) is to take necessary actions on a computing cluster front-end, needed to execute a job. In ARC, a job is defined as a set of input files, a main executable, the requested resources (including pre-installed software packages) and a set of produced files. The process of gathering the input files, executing the main executable, and transferring/storing the output files is carried out by GM (Figure 2). Submission Stage-in Stage-out Front-end GridFTP server Downloader Uploader File access Job control Grid Manager LRMS Computing node LRMS Cache Link or copy Job session directory NFS Job session directory Fig. 2. A typical setup of ARC Grid Manager GM processes job descriptions written in Extended Resource Specification Language (XRSL, see Section 4.1). Each job gets a directory on the execution cluster, the session directory. This directory is created by GM as a part of the job preparation process. The job is supposed to use this directory for all input and output data, and possibly other auxilliary files. All the file paths in the job description are specified relative to the session directory. There is no notion of the user s home directory, nor user ID, nor other attributes common in shell access environments. Jobs executed on ARC enabled clusters should not rely on anything specific to user account, not even on the existence of such an account. In most common installations, the space for the session directory resides in an area shared by means of the Network File System with all the computing nodes. In this case, all the data that a job keeps in the session directory is continuously available to the user through the jobplugin of the GFS, as described below. It is also possible to have a session directory prepared on the front-end computer and then moved to a computing node. This limits the amount of remotely available information during the job execution to that generated by GM itself. Input files are gathered in the session directory by the downloader module of GM. In the most common configuration this module is run by the GM directly 11

12 on the front-end. To limit the load of the front-end machine, it is possible to control the number of the simultaneous instances of the downloader modules and the number of downloadable files. GM can be instructed to run downloader on a computing node as a part of the job execution. Such a configuration is desirable on computing clusters with slow or small disk space on the front-end. GM supports various kinds of sources for input as well as for output files. These include pseudo-urls expressed in terms of data registered in Indexing Services. All types of supported URLs can be found in Section 5.2. A client can push files to the session directory as a part of the job preparation. This way, the user can run remotely a snapshot of an environment prepared and tested on a local workstation without the trouble of arranging space on storage services. GM handles a cache of input files. Every URL requested for download is first checked against the contents of the cache. If a file associated with a given URL is already cached, the remote storage is contacted to check the file s validity and the access permissions based on the user s credentials. If a file stored in cache is found to be valid, it is either copied to session directory, or a soft-link is used, depending on the configuration of GM. The caching feature can be turned off either by the resource administrator or by the user submitting the job. GM can automatically register the files stored in its cache to a Data Indexing Service. This information is then used by a brokering client (see Section 4.2) to direct jobs into a cluster with a high probability of fast access to input data. This functionality has proved to be very useful when running many jobs which process the same input data. It is possible to instruct GM to modify the files URLs and hence access them through more suitable protocols and servers. This feature makes it possible to have so called local Storage Elements in a way transparent to users. Together with the possibility to have this information published in the Information System (see Section 3.6), this may significantly reduce the job preparation costs. After the contents of the session directory has been prepared, GM communicates with the Local Resource Management System (LRMS) to run the main executable of the job. GM uses a set of back-end shell scripts in order to perform such LRMS operations as job submission, detection of the exit state of the main executable, etc. Currently supported LRMS types include various PBS [23] flavors, Condor [1] and SGE [24]. There is also support for job execution on the front-end, meant for testing purposes only. The output files generated by the job have to be listed in the job description 12

13 together with their final destinations, or indicated to be kept on the computing resource for eventual retrieval. After the job s main executable has stopped running, they are delivered to the specified destinations by the uploader module. The uploader is similar to the downloader described above. The session directories are preserved only temporarily after the session is finished. Hence, the user can not directly re-use any produced data in sequential jobs and has to take care of retrieving output files kept on the computing resource as soon as possible. Support for software runtime environments (RTE, see also Section 4.1) is also a part of GM. RTEs are meant to provide a unified access to pre-installed software packages. In the GM, support for RTE is made through shell scripts executed a few times during job duration: after the preparation phase of the job, before running the main executable, and after it has finished. The scripts may set up environments needed by pre-installed software packages, and even modify the job description itself. To standardize creation of new RTEs and to make it easier for users to find out how to use available RTEs, a registration portal was established [25] by the NorduGrid partner, the Nordic Data Grid Facility. A job passes through several states in GM, and each state can be assigned an external plugin. This way, the functionality of GM can be significantly extended with external components to include e.g. logging, fine-grained authorization, accounting, filtering, etc. GM itself does not have a network port. Instead, it reads the information about job requests from the local file system. This information is delivered through the GridFTP channel by using a customized plugin for the ARC GridFTP server, named jobplugin. Information about the jobs served by GM and content of their session directories is represented by the same plugin through the virtual file system. Session directories are mapped to directories, and actions to various FTP commands and files. Access to every piece of information is protected, and a flexible access control over the job data is possible. By default, every piece of information about the job is accessible only by a client which presents the same credentials as those used to submit a job. By adding GACL [26] rules to job description, a user can allow other users to see results of the job and even to manage it. Thus users can form groups to share responsibility for submitted jobs. GM has several other, less significant, features. Those include notification of job status changes, support for site local credentials through customizable plugins, reporting jobs to centralized logging service (see Section 6.2), etc. 13

14 3.4 Classic Storage Element Most Storage Elements on sites equipped with ARC are GridFTP servers. A significant number of them use ARC GridFTP server with one of the following plugins. The fileplugin presents the underlying file system in a way similar to ordinary FTP servers with file access based on a mapping between the client certificate subject and a local UNIX 4 identity. It is also possible to specify static access restrictions for pre-defined directories directly in the configuration of the GridFTP server (GFS). In addition, there is a way to completely disregard UNIX identities of the clients, thus relying only on capability of GFS to create virtual file system trees, or to mix both ways. This plugin is meant to be used as a replacement for an ordinary FTP server in user mapping mode, or for user communities which do not need complex dynamic access control inside dedicated storage pools. The gaclplugin uses GACL [26], an XML-based access control language, in order to control access to files and directories on a storage element. Each directory and file has an associated GACL document stored in a dedicated file. In order not to break the compatibility with the FTP protocol, modification of GACL rules is performed by downloading and uploading those files. GACL files themselves are not visible through FTP file listing commands. Each GACL document consists of entries with list of identities and access rules associated with them. There is no predefined set of supported types of identities. Currently, ARC can recognize a X.509 certificate subject (Distinguished Name), VOMS [20] names, groups, roles and capabilities, and membership of a predefined Virtual Organization (lists of members must be created by a third-party utility). GACL permissions control possibility to read and write objects, list the contents of directories or get information about stored file. The admin permission allows one to obtain and modify the contents of GACL documents. A GridFTP server with the gaclplugin is most convenient for providing flexible dynamic access control based purely on Grid identities. 4 UNIX is a trademark of the Open Group. 14

15 3.5 Smart Storage Element A Smart Storage Element (SSE) is a data storage service that performs a basic set of the most important data management functions without user intervention. It is not meant to be used in an isolated way. Instead, its main purpose is to form a data storage infrastructure together with similar SSEs and Data Indexing Services, such as the Globus r Replica Catalog or Replica Location Service. The main features of SSE and the data storage infrastructure which can be implemented using SSE are as follows (see also Figure 3): SSE can be used as a stand-alone service, but preferably should be a part of a full infrastructure and must be complemented with Data Indexing Services (DIS). DIS can be either a common purpose or an application specific one. Modular internal architecture of SSE allows adding (at the source code level) interfaces to previously unsupported DIS types. SSE has no internal structure for storing data units (files). Files are identified by their identity used in a DIS (Logical File Name, GUID, Logical Path, etc.) All operations on data units (creation, replication, deletion of replica, etc.) are done upon a single request from a client through SSE. Hence most operations are atomic from the client s point of view. Nevertheless, objects created during those operations do not always have final state and may evolve with time. For example, a created file can be filled with data, then validated, and then registered. Entry points to a DIS infrastructure should be used by clients only for operations not involving data storage units. If an operation involves any modification of SSE content which has to be registered in DIS, any communication to DIS will be done by the SSE itself. Access to data units in SSE is based on a Grid Identity of the client and is controlled by GACL rules, for flexibility. Data transfer between SSEs is done upon request from a client, by SSEs themselves (third-party transfers using pull model). An interface to SSE is based on SOAP over HTTPS/HTTPg (Web Services) with data transfer through pure HTTPS/HTTPg. It is implemented as a plugin for the HTTPSd framework (see Section 3.2). SOAP methods implemented in SSE allow it to: Create a new data unit and prepare it to be either filled by the client, or its contents to be pulled from an external source. The SSE can pull data both from inside of the infrastructure (replication) and from external sources. SSE 15

16 Indexing Services Infrastructure control Client download/upload Data IS Data IS registration Data IS SSE SSE data transfer SSE SSE Fig. 3. An example of an architecture of a Data Storage Infrastructure. uses the DataMove library (see Section 5.2) for data transfer and supports all the corresponding protocols. Get information about stored data units regular expression search for names is possible. Inspect and modify the access rules of data units. Update the meta data of stored units. This method is mostly used to provide checksums needed for content validation. Remove data units. Recently, a SRM v1.1 [27] interface has been implemented for SSE. Work to provide support for the current standard SRM v2.1.1 is in progress. 3.6 Information system The ARC middleware implements a scalable, production quality, dynamic, LDAP-based distributed information system via a set of coupled resource lists (Index Services) and local LDAP databases. Implementation of the ARC information system is based on OpenLDAP ([28], [29], [30]) and its modifications provided by the Globus r MDS framework [31,32]. NorduGrid plans to replace the Globus r LDAP modifications with native OpenLDAP functionalities. The system consists of three main components: (1) Local Information Services (LIS), (2) Index Services (IS), (3) and Registration Processes (RP) The components and their connections shown in the overview Figure 4 are described in the following subsections. 16

17 3.6.1 Local Information Service Local Information Services (LIS) are responsible for resource (computing or storage) description and characterization. We note that the description of Grid jobs running on a resource is also a part of the resource characterization: job status monitoring within ARC is done via querying the LIS components of the information system. The local information is generated on the resource, and optionally is cached locally within the LDAP server. Upon client requests, the information is presented to the clients via LDAP interface. A LIS is basically nothing more but a specially populated and customized OpenLDAP database. Figure 5 together with the description below gives a detailed account of the internal structure of the LIS. The dynamic resource state information is generated on the resource. Small and efficient programs, called information providers, are used to collect local state information from the batch system, from the local Grid layer (e.g., the Grid Manager or the GridFTP server), or from the local operating system (e.g., information available in the /proc area). Currently, ARC infoproviders are interfaced with the same systems as GM, namely the UNIX fork, the PBS family [23] (OpenPBS, PBS-Pro, Torque), Condor [1] and Sun Grid Engine (SGE) [24] batch systems. The output of the information providers (generated in LDIF format) is used to populate the LIS. A special OpenLDAP back-end, the Globus r GRIS [32], is used to store the LDIF output of the information providers. The custom GRIS back-end implements two functionalities: it is Fig. 4. Overview of the ARC information system components. Clients issue type-1 queries (dotted arrows) against Index Services which maintain a dynamic list of contact URLs of LISs and further Index Services registering to them (registration processes are marked by dashed arrows). Then clients perform type-2 queries (solid arrows) to discover resource properties stored in the LISs. 17

18 capable of caching the providers output, and upon client query it triggers the information providers unless the data is already available in the cache. The caching feature of the OpenLDAP back-end provides protection against overloading the local resource by continuously triggering the information providers. The information stored in the LISs follows the ARC information model. This model gives a natural representation of Grid resources and Grid entities. It is implemented via an LDAP schema by specifying an LDAP Directory Information Tree (DIT). The schema naturally describes Grid enabled clusters, queues and storage elements. The schema, unlike other contemporary Grid schemas, also describes Grid jobs and authorized Grid users. A detailed description of this information model is given in the online manual [33]. Fig. 5. Internal structure of the Local Information Service Index Services Index Services are used to maintain dynamic lists of available resources, containing the LDAP contact information of the registrants. A registrant can either be a LIS or another Index Service. The content of the Index Service, that is, the information about the registrants, is pushed by the registrants via the registration process and is periodically purged by the Index Service, this way maintaining a dynamic registrant list. Index Services are implemented as special purpose LDAP databases: the registration entries are stored as pseudo- LDAP entries in the Globus GIIS LDAP back-end [32]. Although the registration information is presented in a valid LDIF format, the registration entries do not constitute a legitimate LDAP tree, therefore they are referred to as the pseudo-ldap registration entries of the Index Service. A specially crafted LDAP query is required to obtain the registration entries. Besides the registrant tables, no any other information is stored in the ARC indices. The current implementation based on such misuse of the GIIS-backend is planned to be replaced by something more suitable: the simplistic functionality of maintaining a dynamic list of registrant tables could be trivially done e.g. in 18

19 native LDAP. Furthermore, ARC Index Service does not implement any internal query machinery, Index Services are basically plain lists of contact URLs of other Index Services and LISs Registration Processes, Topology Individual LISs and Index Services need to be connected and organized into some sort of a topological structure in order to create a coherent information system. ARC utilizes registration processes and Index Services to build a distributed information system of the individual LISs and Index Services. During the registration process, the registrant (lower level) sends its registration packet to an Index Service. The packet contains information about the host (the registrant) initiating the process and the type of the information service running on the registrant (whether it is a LIS or an IS). The registration message is basically an LDAP contact URL of the information service running on the registrant. Registrations are always initiated by the registrants (bottom-up model) and are performed periodically, thus the registration mechanism follows a periodic push model. Technically, the registrations are implemented as periodic ldapadd operations. The LISs and the Index Services of ARC are organized into a multi-level tree hierarchy via the registration process. The LISs that describe storage or computing resources represent the lowest level of the tree-like topology. Resources register to first level Index Services, which in turn register to Second level Services, and so forth. The registration chain ends at the Top Level Indices which represent the root of the hierarchy tree. The structure is built from bottom to top: the lower level always registers to the higher one. The tree-like hierarchical structure is motivated by the natural geographical organization, where resources belonging to the same region register under a region index, region indices are registering to the appropriate country index, while country indices are grouped together and register to the top level Grid Index Services. Besides the geographical structuring there are some Index Services which group resources by specific application area or organization. These application/organization Index Services either link themselves under a country Index or register directly to a Top Level Index. In order to avoid any single point of failure, ARC suggests a multi-rooted tree with several top-level Indices. Figure 6 shows a simplified schematic view of the hierarchical multirooted tree topology of ARC-connected resources and Index Services Operational overview Grid clients such as monitors, Web portals or user interfaces (Section 4) perform two types of queries against the Information System: 19

20 (1) During the resource discovery process clients query Index Services in order to collect list of LDAP contact URLs of Local Information Services describing Grid resources. (2) During a direct resource query the clients directly contact the LISs by making use of the obtained LDAP contact URLs. Once the client has collected the list of resource LDAP contact URLs (by performing type-1 queries against the Index Services), the second phase of the information collection begins: type-2 queries are issued against LISs to gather resource descriptions. Both types of queries are carried out and served via the LDAP protocol. Unlike other similar purpose systems (Globus Toolkit r 2 GIIS or Globus Toolkit r 4 aggregator [34], or R-GMA [35]), ARC has no service which caches or aggregates resource specific information on a higher level. ARC Index Services are not used to store resource information, they only maintain dynamic list of LDAP URLs. Figure 4 presents an overview of the ARC information system components showing relations between the Index Services, LISs registration processes and the two type of queries. Registrations Deprecated registrations α Indexing servers β Indexing servers γ Indexing servers ω Local information (cluster, storage) Fig. 6. Resources (LIS) and Index Services are linked via the registration process creating a hierarchical multi-rooted tree topology. The hierarchy has several levels: the ω lowest level (populated by LIS), the γ organisation/project level, the β country/international project level and the α top (root) Grid level. The tree is multi-rooted, such that there are several α-level Index Services and in bigger countries there are several β- or country-level Index Services. 20

21 3.7 Security framework of services The security model of the ARC middleware is built upon the Grid Security Infrastructure (GSI) [3]. Currently, for communication ARC services do not use message level security, but rather transport level security (see [21]). The ARC services that make use of authentication and authorization are those built on top of the GridFTP server and the HTTPSd framework. The authentication part makes use of Certificate Authorities (CA) signing policies, whereby sites can limit the signing namespace of a CA. Moreover, ARC provides a utility (grid-update-crls) to retrieve certificate revocation lists on a regular basis. In a basic and largely obsolete authorization model, ARC services support a simple X.509 certificate subject (Distinguished Name) mapping to UNIX users through the grid-mapfile. Automatic grid-mapfile creation can be done using the nordugridmap utility which can fetch Virtual Organization membership information from LDAP databases, and simple X.509 subject lists through FTP and HTTP(s). In addition, it is possible to authorize local users, specify filters and black-lists with this utility. For advanced authorization, most of the ARC services are capable of acting purely on Grid identity of the connecting client without relying on local identities. Those services support a unified set of rules which allows to identify users, group them and choose adequate service level. In addition to personal Grid identity of a user, other identification mechanisms are supported. Among them are the Virtual Organization Membership Service (VOMS) and general VO membership (through external information gathering tools). It is also possible to use external identification plugins, thus providing high level of flexibility. In addition to access level authorization, many services make use of the client Grid identities internally to provide fine-grained access control. Currently, all such services use GACL language for that purpose. Fine-grained access control is available in the GridFTP server s file access and job control plugins and in the SSE service. The job handling service the Grid Manager also has a pluggable authorization mechanism that can provide fine-grained authorization and accounting during job execution (see Section 3.3) through third-party plugins. 21

22 4 User interfaces and clients Users interact with the ARC services on the Grid resources using various client tools. With the help of the basic command line interface (CLI, Section 4.2), users can submit jobs to the Grid, monitor their progress and perform basic job and file manipulations. Jobs submitted through the command line interface must be described in terms of the Extended Resource Specification Language (Section 4.1). The basic command line interface for job handling is similar to a set of commands that can be found in a local resource management system. In addition to this, there are commands that can be used for basic data management, like creating and registering files on storage elements and administering access rights to the data. In addition to the CLI, the ARC client package includes some commands from the Globus Toolkit r, in particular, those needed for certificate handling and data indexing (see Section 4.3). Several other clients interacting with the ARC services are developed as well. An example of such is the ARC Grid Monitor (see Section 4.4) that presents the information from the Information System using a Web browser. 4.1 Resource Specification Language The XRSL language used by ARC to describe job options is an extension of the core Globus RSL 1.0 [36]. The extensions were prompted by the specifics of the architectural solution, and resulted in the introduction of new attributes. In addition, differentiation between the two levels of job options specifications was introduced. The user-side XRSL is a set of attributes specified by a user in a job description file. This file is interpreted and pre-processed by the brokering client (see Section 4.2), and after the necessary modifications is passed to the Grid Manager (GM). The GM-side XRSL is a set of attributes prepared by the client, and ready to be interpreted by the GM. The purpose of XRSL in ARC is therefore dual: it is used not only by users to describe job requirements, but also by the client and GM as a communication language. The major challenge for many applications is pre- and post-staging of considerable amount of files, often of a large size. To reflect this, two new attributes were introduced in XRSL: inputfiles and outputfiles. Each of them is a list of local-remote file name or URL pairs. Local to the submission node input 22

Architecture Proposal

Nordic Testbed for Wide Area Computing and Data Handling NORDUGRID-TECH-1 19/02/2002 Architecture Proposal M.Ellert, A.Konstantinov, B.Kónya, O.Smirnova, A.Wäänänen Introduction The document describes