Grid Infrastructure For Collaborative High Performance Scientific Computing

Computing For Nation Development, February 08 09, 2008 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi Grid Infrastructure For Collaborative High Performance Scientific Computing R. Jehadeesan Scientific Officer (E), Computer Division, Indira Gandhi Centre for Atomic Research, Kalpakkam jeha@igcar.gov.in ABSTRACT Grid computing is a form of distributed computing that involves coordinating and sharing computing, application, data, storage, or network resources across geographically isolated sites. Grid is a collective framework of nodes which contribute a combination of resources to grid users. Computational grid is an infrastructure that provides dependable, consistent and inexpensive high-end computing capability. This paper describes the grid infrastructure deployed for enabling high-performance computing through sharing heterogeneous computational resources spread across different units of an R&D establishment. It gives technical overview of Grid architecture, technology and standards, and explains in detail the services implemented for successful deployment of the computing Grid. The Grid middleware, its fundamental components and various functionalities offered are covered. The Grid-enabled high performance scientific and engineering applications exploit the potential of the computational grid resulting in increased productivity, reduced time and better price-to-performance ratio for computing resources. KEYWORDS Grid computing, Grid middleware, Grid architecture, Grid services, Grid security, Resource management, Workload management, Data management, Resource broker, Computing element, Worker node, Storage element, User interface, Virtual organization, Resource sharing, Grid job workflow, Grid monitoring, DAEGrid, High performance computing. I. INTRODUCTION Grid computing is an innovative aspect of distributed computing evolved with the objective of providing a coordinated, heterogeneous resource sharing computing environment. Today s high-performance scientific and engineering applications demands large-scale numerical computations, data analysis and collaboration at various levels. Clusters and Grids are increasingly widespread solutions to large-scale computing challenges. Grid is a distributed computing infrastructure with shared heterogeneous services and resources, accessible by users and applications to solve complex computational problems and to provide access to huge data storage. Grid computing focuses on resources sharing among geographically distributed sites in an organized and uniform manner and the development of pioneering, highperformance oriented applications. It deals with the unique challenges of security, scalability, and manageability in distributed computing. The immense benefits of grid technology include scalable highend computing capability, economical computing cost and efficiency. The enterprise grid infrastructure that can be shared by geographically disparate groups across the organization creates a more productive enterprise environment with efficient use of computing resources. Exploitation of underutilized resources, balanced resource utilizations and the potential for massive parallel processing capacity are the attractive features in the use of grid computing. In an R&D organization involved in scientific and engineering research activities, there exist always growing requirements for computational capabilities and data storage to solve challenging scientific and engineering problems. Grid computing addresses this challenge by employing powerful clusters locally and interconnecting them through wide area networks. This paper details the grid architecture, technology and grid services provided by the grid infrastructure deployed to share the computing power, applications, and data & storage resources across four organizational units. II. DAEGrid Department of Atomic Energy (DAE) has a number of R&D organizations working in the field of Nuclear science and Technology and carry out research and development activities in the frontiers of nuclear physics, nuclear engineering, material science, mechanical engineering, control systems etc. Some R&D units have supercomputing clusters to solve highly compute-intensive problems in these fields and have large amount of data on local storage worth sharing with other units. Necessity was felt to extend the high-end computing facilities beyond the geographical boundaries to meet requirements of modern research and collaboration in multidisciplinary fields. An intra-dae Grid network has been setup to provide a scalable, wide-area computing platform which enable sharing of computing and information resources among the constituent units in secure manner. This Grid network interconnects four major R&D units of DAE using high-speed fibre-optical network, aggregates the computational resources at the grid sites and provides them to users for efficient sharing. It enables collaborative research in DAE organizations and facilitates development of grid-enabled applications in advanced fields of science and technology.

III. GRID ARCHITECTURE DAEGrid is based on the WLCG/EGEE 7 (Worldwide LHC Computing Grid / Enabling Grids for E-SciencE) model of Grid computing and utilizes the glite 5 middleware for providing Grid services. The architecture of the computing Grid is shown in Figure 1. The architecture defines the essential Grid services provided by the infrastructure and the set of conforming interfaces, to manage resources in a single unified grid environment. The set of basic functionalities and services that should be available for deploying a computing Grid include Compute Resource Services, Workload Management, Storage/Data Management, Information & Monitoring Services, Virtual Organization & Security Management, and User Interface. The resource centres which provide the computing infrastructure and resources for the Grid are referred as Grid sites. Figure 1. DAE Computing Grid Architecture The users of a Grid infrastructure are divided into Virtual Organizations (VO). Virtual Organization is an abstract entity for grouping users, institutions and resources in the same administrative domain. VO Management service manages VO members and authorizes them to use the resources meant for that VO. Compute Resource service provides access to the local resource manager or batch system to utilize the computing resources of a Grid site. Workload Management mechanism is used to manage the jobs and provide global resource management service for the Grid. Storage and Data Management service provides access to mass data storage resources in a Grid site and takes care of the file management activities involving file transfer and catalogues. Information and Monitoring services provide information about Grid resources and monitor their status. User Interface service provides a consistent interface to the Grid with a set of client tools used for job submission, resource listing, data management and status monitoring. IV. GRID MIDDLEWARE AND SERVICES An essential component of grid infrastructure is the Grid middleware which acts as a service layer between Grid resources and Grid applications/users. It performs the set of fundamental services in deployment, management and usage of resources and providing users and applications with consistent, user-friendly interface. During the recent past, numerous Grid middleware products have emerged, leading to the problem of interoperability and standards. Still any widely accepted, usable, inter-operable standard has not evolved to meet the expectations of Grid community. DAEGrid infrastructure adopts the glite middleware developed by EGEE Grid project. It consists of a packaged suite of functional components providing a basic set of Grid services including job management, information & monitoring and data management. The glite originated from contributions of different Grid projects namely Condor 10, EDG(European Data Grid), Globus 9, VDT 11 (Virtual Data Toolkit) and LCG. The services provided by the middleware can be classified into Site Services and Global Services. Site services are pertain to the functionalities provided by individual sites which are part of the Grid. Global services are the common functionalities utilized by all Grid sites. Building the infrastructure of a Grid site includes deployment of Compute Element with Worker Nodes, Storage Elements, User Interface and Information Service. There can be more than one Computing Element, Storage Element services running in a site depending on the availability of resources. The global service elements of computing Grid include Workload Management System, VO Management Service, Information & Monitoring Service, and Data Management (File Catalogue & Transfer Service). Some of the global services like Workload Management, VO, and File Catalogues can be deployed in multiple sites based on the Grid users requirements. The organization of grid services provided by the Computing Grid infrastructure is shown in Figure 2. The role and features of each middleware component is described below. Figure 2.Organization of Computing Grid Middleware Services

Grid Infrastructure For Collaborative High Performance Scientific Computing A. COMPUTING ELEMENT A Computing Element (CE), is a set of computing resources localized at a site. It is essentially a computing cluster for executing the jobs. It includes a head node called Grid Gate (GG) which acts as a generic interface to the cluster; Local Resource Management System (LRMS) or batch system, and a collection of compute nodes called Worker Nodes (WN), the nodes where the jobs are actually run. The gateway node is responsible for accepting jobs and dispatching them for execution on the WNs via the LRMS. It handles job submission (including staging of required files), cancellation, suspensions and resumption, job status enquiry and notification. It also makes use of logging and book-keeping services to keep track of the jobs during their lifetime. LRMS is a batch job queuing and scheduling system responsible for managing the local computing resources and executing the jobs using the local resources (WNs). The glite CE interface supports different LRMS software namely OpenPBS/PBSPro, LSF, Maui/Torque, BQS and Condor. Maui/Torque batch system has been configured in the CEs of DAEGrid sites. CE also publishes the information about the resources available at site and current status of those resources to the Grid Information System. B. STORAGE ELEMENT A Storage Element (SE) provides uniform access to data storage resources. The Storage Element may control simple disk servers, large disk arrays or tape-based mass storage systems. Usually each Grid site provides one or more SEs. Storage Elements can support different data access protocols and interfaces. The Storage Resource Manager (SRM) service is used to manage most of the storage resources. SRM interface defines a set of functions and services that a storage system provides independently in a mass storage implementation. It supports capabilities like transparent file migration from disk to tape, file pinning, space reservation, etc. dcache interface consists of one or more pool disks and a server presenting files under single virtual file system tree. It is widely employed as disk buffer front-end to many mass storage systems. In addition Disk Pool Manager (DPM) can be used for fairly small SEs with disk-based storage. A GSI-secure FTP protocol (GridFTP) is used for whole-file transfers. It provides a secure, fast file transfers to and from SE. The Remote File Input/Output protocol (RFIO) is used for direct remote access of files stored in the SEs. The gsidcap protocol is the GSI enabled version of the dcache native access protocol. The glite I/O is a set of POSIX-like I/O services for accessing the Grid files via their logical names. C. USER INTERFACE User Interface (UI) is the access point to the computing Grid. Each user has a personal account in this machine and also the user certificate is installed in it. User is authenticated and authorized to use the Grid resources, and access the functionalities offered by the Information, Workload and Data management systems. It provides command-line tools for Grid users to perform the following activities: Listing of resources suitable for execution of a given job Finding the status of different resources Job submission, Job status view, Job Cancellation Retrieval of job outputs and job logging information File management operations (copy, replicate and delete) UI provided a set of commands to submit and manage simple jobs to advanced job types. The various types of job submissions supported by UI include single job, job collection, checkpointable jobs, parametric jobs, MPI jobs and interactive jobs. A high-level scripting language called Job Description Language (JDL) is used to describe jobs and their desired characteristics and constraints. The high-level Data Management client tools hide the complexities of storage implementation and transport & access protocols, and enable users to move data in and out of the Grid, replicate files between Storage Elements, interact with the File Catalogue. Also low-level User Interface APIs are available to allow development of Grid enabled applications. Grid Portals provide user friendly environment for submission and monitoring of the jobs and removes the difficulty of using complex commandline interface. D. WORKLOAD MANAGEMENT SYSTEM Workload Management System (WMS) is the core service which accepts user job, determines a site that fulfills the job s resource requirements and submits the job to that site. It dispatches job to the most appropriate Computing Element in the Grid and provides facilities to manage jobs. It also records the jobs status and retrieves their output. WMS is otherwise called Resource Broker (RB). The user interacts with WMS/RB using Command Line Interface or APIs. The job being submitted is described in JDL by the user. The JDL script defines which executable to run and its command line arguments, input files needed, output files to be generated and files to be moved to and from the worker node, in addition stating any specific requirements on the CE and the worker node. The process of finding the suitable CE for submitting the job is called match-making. It involves the following steps: Each CE in the Grid is assigned a rank based on its status information derived from the number of currently running and queued jobs. Highest rank is assigned for the least loaded CE. Among all available CEs, those which fulfill the requirements articulated by the user and those which are close to specified input files on the Grid are selected The CE with the highest rank in the selection is chosen for job dispatch. The RB interacts with File Catalogues using DLI service to locate the Grid input files specified in the JDL script.

The Logging and Bookkeeping service (LB), which normally runs on the RB, tracks submitted jobs during their lifetime. It gathers events from WMS components and CEs and records the current status and the complete history of the jobs. This logging information about submitted jobs can be retrieved via UI commands and is useful in verifying the success or analyzing the job failure. E. DATA MANGEMENT SERVICES Data Management Services are essentially used to locate and access data files, to copy files between UI, CE, WN and SE, and replicate files between SEs. A variety of data management client tools to upload/download files to/from the Grid, replicate data and interact with the file catalogues. File Catalogue File is the primary unit of data in Grid. Users or applications generally use Logical File Names (LFN) to refer files in the Grid. A file is uniquely identified internally by Global Unique Identifier (GUID). Storage URL (SURL) and Transport URL (TURL) contain information about where a physical replica is located, and how it can be accessed. LFNs and GUIDs identify files irrespective of their location. File Catalogue service is used for this purpose. The mappings between LFNs, GUIDs and SURLs are stored in File Catalogue system, while actual files are stored in Storage Elements. LCG File Catalogue (LFC) is the catalogue service in use and it offers a hierarchical view of logical file name space. It translates LFN to SURL via a GUID and locate the site where the referred file resides. It supports a transactional API called Data Location Interface (DLI) for providing a generic interface to a file catalog. File Transfer Service Fle Transfer Service help users to carry out reliable file transfer operations across SEs of Grid sites. FTS is the low level data movement service which performs asynchronous (batch mode) reliable file replication from source to destination. It interacts with SRM interface for dealing storage resources and manages the basic-level data transfer with GridFTP protocol. It maintains a persistent transfer queue thus providing a reliable data transfer even with communication link interruptions. It does not depend on File Catalogue for resolving file names and hence SURL is used to specify source and destination. FTS service is not currently implemented in DAEGrid. F. INFORMATION AND MONITORING SERVICES Information Service (IS) provides information about the Grid resources and their status. The published information is used for resource discovery, monitoring and accounting purposes. This information conforms to the common conceptual data model GLUE Schema (Grid Laboratory for a Uniform Environment). It describes the attributes and value of CEs, SEs and their binding information. Following two information services are used for Grid resource monitoring and discovery. Monitoring and Discovery Service (MDS) The MDS is used for resource discovery and to publish the resource status. It implements the GLUE Schema using an open source implementation of LDAP (Lightweight Directory Access Protocol), a specialized database optimized for reading, browsing and searching information. No Grid credentials are required to access to MDS data, both by the users (for reading) and by the services (for writing/publishing information). MDS architecture is based on Grid Resource Information Server (GRIS) and Berkeley Database Information Index Server (BDII). GRIS is an LDAP server which runs on the resource (CE and SE) and publishes the relevant static and dynamic information about the resource. The resource information provided include number of CPUs, running jobs, waiting jobs; amount of memory; OS details; type of storage, used and available space etc. BDII service is an another LDAP server which runs at each site and collect information from local GRISs. Also there exist a global or top-level BDII which is configured to query the site-bdiis at every site and act as a cache by storing information about the Grid status in their database. It gives the status of overall Grid resources. Relational Grid Monitoring Architecture (RGMA) RGMA is used for accounting, monitoring and publication of system-level & user-level information. It is an implementation of the Grid Monitoring Architecture (GMA) and presents a relational view of the collected data. The model is based on global distributed relational database and supports advanced query operations. R-GMA is an alternative information system to MDS. It uses the same GLUE schema as the MDS. The RGMA architecture consists of Producers, which provide the information; Consumers, which request the information from Producers; and Registry, which mediates the communication between the Producers and the Consumers. The Producers and Consumers are the services running at each site, which interact with the global Registry service to answer users queries. G. VO MANGEMENT SERVICE Grid is organized into Virtual Organizations (VO), which is a dynamic collection of individuals and institutions sharing resources in a flexible, secure and coordinated manner. Virtual Organization Management Service (VOMS) is used to manage information about the roles and privileges of users within a VO. In order to use the Grid infrastructure, a user should choose a VO and become its member by registering some personal data and accepting the usage rules. Membership of a VO grants specific privileges to a user. It is possible that a user can belong to more than one VO. The VO must ensure that all its members have provided the necessary information and have accepted the usage rules. The user information is stored in a database maintained by the VO. The short-term proxies which are required for authentication and authorization are annotated with Attribute Certificate obtained from VOMS. The Attribute

Grid Infrastructure For Collaborative High Performance Scientific Computing Certificate contains information about VO, group membership, and roles. A single VOMS server can serve multiple VOs. VOMS Administrator web interface is used for managing VO membership using a web browser. V. GRID SECURITY The Grid middleware employs Grid Security Infrastructure (GSI) to enable secure authentication and communication over an open network. GSI is based on public key encryption, X.509 certificates, and the Secure Sockets Layer (SSL) communication protocol. The authorization of a user on a specific Grid resource is done by VOMS. Certification Authority (CA) In order to access Grid resources, a user needs to have a digital X.509 certificate from a CA trusted by organizations involved in Grid. It is the responsibility of CA to issue and manage certificates. Registration Authority is the service delegated by CA to validate the identity of user and legitimacy of the certification request at each site. This is a pre-requirement for joining any VO. Grid resources are also issued with certificates to allow them to authenticate themselves to users and other services. Proxy User s identity is required to run jobs on remote sites. The user certificate is used to generate and sign a temporary certificate, called proxy, which is used for the actual authentication to Grid services. A user needs a valid proxy to submit jobs; those jobs carry their own copies of the proxy to be able to authenticate with Grid services as they run. VOMS Proxy is an extension of proxy which contain additional information about the VO, the groups the user belongs to in the VO, and any roles the user is entitled to have. The proxy has a short lifetime to reduce security risks. For long-running jobs, the job proxy may expire before the job has finished, causing the job to fail. To avoid this, there is a proxy renewal mechanism to keep the job proxy valid for as long as needed. MyProxy server is a proxy credential repository system which allows the user to create and store a long-term proxy. The WMS will then be able to use this long-term proxy to periodically renew the proxy for a submitted job before it expires and until the job ends. VI. LOGICAL WORKFLOW FOR JOBS The sequence of steps involved in job submission and processing in the Grid is described below. Figure 3 shows the illustration of the logical job workflow in the Grid. User obtains certificate from CA, registers in a VO and gets an account on a UI The user creates a proxy certificate in UI to authenticate himself in subsequent secure interactions The user submits a job from the UI to RB. Any local input files specified in the JDL file are copied initially from the UI to the RB. The WMS (RB) finds the appropriate CE to execute the job. It consults BDII, to determine the status of computational and storage resources, and the File Catalogue to find the location of any required input files. The RB readies the job for submission, creating a wrapper script and required parameters to pass to the selected CE. The CE receives the request and sends the job for execution to the LRMS. The LRMS handles the execution of jobs on the local Worker Nodes. The input files are copied from the RB to the WNs where the job is executed. The running job runs can access the Grid files in SE using the supported protocols (RFIO/gsidcap) After successful completion of the job, the output file(s) is transferred back to the RB node. The user can retrieve the output of his job to the UI. Figure 3. Job Workflow in the Computing Grid If the chosen site does not accept or run the job, automatic resubmission to another suitable CE takes place. Job gets aborted if the number of successive failed resubmissions reaches a maximum limit. All the events during the process of job submission & execution are logged in LB. User can query the LB from the UI on the job status. Also, it is possible to query the BDII for the status of the resources. VII. GRID APPLICATIONS Application software should conform in its architecture to the overall design of the Grid and shall make use of the set of core tools, libraries and services which integrate and inter-operate with the Grid middleware The application software development has to meet the set of high-level requirements on language, platform and distributed computing for successful deployment on Grid. This Grid infrastructure enables

collaborative development of scientific computing applications across different organizations. User applications vary from sequential jobs to multi-cpu parallel jobs, in-house developed scientific codes to commercial engineering applications involving dense floating-point operations. Some of the specificpurpose applications for which the computing Grid is effectively used are: Highly compute intensive, number crunching, scientific applications in the areas of Computational Molecular Dynamics, High Energy Physics, Material Modeling, Reactor Core calculations & Safety Analysis, Weather Forecasting and Simulation Studies. Engineering applications in the areas of Finite Element Analysis, Computational Fluid Dynamics and Multiphysics Modeling. Experimental Data Processing and Analysis All computing components in the Grid are based on Scientific Linux (SLC) operating system. Applications are primarily developed in C/C++ and FORTRAN languages. The supported compilers, scientific/mathematical libraries, parallelization tools and grid related APIs are installed and configured to establish well-defined development environment for gridenabled applications. VIII. CONCLUSION The architectural design of the Grid setup for enabling highperformance computing through sharing heterogeneous computational resources spread across different units of an R&D establishment is detailed out. The Grid infrastructure, the middleware and its functionalities are explained. This paper enlightens the basic set of Grid services which are required for User Interface, Compute Resource Management, Workload Management, Storage & Data Management, Information Management and Security Enforcement in a computing Grid deployment. The organization of services and interaction between the functional elements are illustrated along with the underlying software framework. Modern research in advanced scientific and engineering areas call for solving high-end computational problems using distributed resources in a coordinated, uniform way. The Grid-enabled high performance scientific and engineering applications exploit the potential of the computational grid resulting in increased productivity, reduced time and better price-to-performance ratio for computing resources. Grid computing is an evolving area of computing, where standards and technology are still being developed. There is a scope for improvement in enhancing the middleware services with more intuitive user interface, advanced information and monitoring capabilities; supporting wide range of batch systems; and providing inter-operability among different middleware standards. REFERENCES [1] Fran Berman, Geoffrey Fox and Tony Hey Grid Computing: Making the Global Infrastructure a Reality; Wiley; 2003. [2] Joshy Joseph, Craig Fellenstein Grid Computing IBM Press; 2003. [3] Lucio Grandinetti Grid Computing: The New Frontier of High Performance Computing, 14; Elsevier; 2005. [4] Mark Baker, Rajkumar Buyya and Domenico Laforenza, Grids and Grid technologies for wide-area distributed computing, Software Practice and Experience, August 2002. [5] Stephen Burke, Simone Campana, Antonio Delgado Peris, Flavia Donno, Patricia M endez Lorenzo, Roberto Santinelli, Andrea Sciaba, Glite 3 User Guide, Worlwide LHC Computing Grid, January 2007. [6] Introduction to Grid Computing with Globus, IBM Red Book, http://ibm.com/redbooks, September 2003. [7] Worldwide LHC Computing Grid (WLCG), http://lcg.cern.ch/lcg [8] EGEE Homepage, http://www.eu-egee.org [9] The Globus Alliance, http://www.globus.org [10] Condor Project, http://www.cs.wisc.edu/condor [11] Virtual Data Toolkit, http://vdt.cs.wisc.edu IX. FUTURE SCOPE The infrastructure and design of DAEGrid implementation is adequate for facilitating the collaborative scientific research among constituent units. To improve upon the performance, utilization and reliability, some of the grid services like WMS, VOMS, Proxy, File Catalogue would be deployed or replicated on multiple sites. As the computing and storage requirements of scientific & engineering community grows explosively, it demands for periodic enhancement of processing power and storage capacity in computing and storage elements respectively.