Outline: NorduGrid Introduction ATLAS software preparation and distribution Interface between NorduGrid and Condor NGlogger graphical interface On behalf of: Ugur Erkarslan, Samir Ferrag, Morten Hanshaugen (USIT), Aleksandr Konstantinov, prof. Farid Ould Saada, Katarina Pajchel, prof. Alex Read, Haakon Riiser, Are Strandli (UC Gjøvik), Sturle Sunde (USIT)
NorduGrid Introduction The Experimental Particle Physics (EPP) group and the University of Oslo is one of the main contributors to the NorduGrid collaboration. It contributes at virtually all levels: Management, coordination and external contacts through the the steering board chairman Middelware development and maintenance Hardware clusters and storage elements Software Applications Education
NorduGrid Introduction NorduGrid middelware: Advanced Resource Connector (ARC) is designed to support a dynamic, heterogeneous Grid facility, spanning different computing resources and user communities. In contrast with other grids like EDG, LCG or EGEE, NorduGrid was built incrementally from the bottom up with one of the goals being to have a continuously running system. The middelware connects existing Local Resource Management Systems (LRMS) and both dedicated and non dedicated resources.
NorduGrid Introduction The ARC provides a complete Grid service, but it does not replace all levels of the job handling infrastructure. The ARC has been deployed at a number of computer resources running different Linux flavors and using various LRMSs like PBS and Condor.
NorduGrid Introduction The ARC provides a complete Grid service, but it does not replace all levels of the job handling infrastructure. The ARC has been deployed at a number of computer resources running on different Linux flavors and using various LRMSs. Despite of this diversity, the different sites can present the information about the resources in a uniform way and the user can access the computing resources through a simple and universal interface. The user interface talks in the so called extended resource specification language (XRSL).
NorduGrid Introduction NorduGrid is used by different scientific communities including physics, chemistry, biology and informatics. List of runtime environments on one of the Swegrid clusters (Bluesmoke)
NorduGrid Introduction Storage Smart Storage Element (SSE) is replacement for the current simple storage element. SSE is based on standard protocols. It provides: flexible access control data integrity between resources support for autonomous and reliable data replication
NorduGrid Introduction Hardware The Storage Elements provided and maintained by EPP are a substantial contribution to NorduGrid's total storage facilities 1700 GB 950 GB ~ 8 TB ~ 22 % of the total storage capacity (~36 TB) 6 of totally 37 storage elements currently available in NorduGrid
NorduGrid Introduction Education: Bachelor project on cluster and grid computing. Accounting and banking systems By students from University College in Gjøvik
DC2 More than ~ 5000 CPUs in NorduGrid Efficiently ~ 800 CPUs are dedicated to ATLAS, rest is shared LCG ~3800 CPUs, Grid3 ~2000 CPUs NorduGrid has proved to be very efficient and stable. Despite of the large amounts of data, there has been no reports on serious problems with the storage and data transport.
ATLAS software source code ~ 10 GB New releases are frequent NorduGrid operates on different platforms and compiler versions The ATLAS run time environment needs to be distributed through out the grid. NorduGrid has developed an ATLAS software distribution method using rpm binary packages. This reduces the size of the full runtime environment to ~ 2.2 GB Easy installation and validation using the KitVaildation provided by ATLAS The Atlas code was compiled and run for the first time under Red Hat Enterprise Edition Linux in Oslo. The group has provided the software packages for the RHEL3 platform and distribution on the local resources.
CONDOR NorduGrid Interface Condor is a Local Resource Management System. Condor is used by USIT (the university s central IT department) and locally by the Experimental Particle Physics (EPP) group. It allowes better utilization of the available resources, specially the idle time. Condor is well suited for environments with non dedicated machines The EPP group in collaboration with the University wanted to contribute to the Data Challenges (DC) at ATLAS, specially the ongoing DC2. Prior to the completion of the Condor/NorduGrid project, the only batch system that NorduGrid was interfaced to was PBS (Portable Bach System). The ATLAS DC jobs would be submitted with the NorduGrid Middleware, creating the need for a bridge between Condor and NorduGrid.
Overview of implementation The Condor/NorduGrid interface consists of two distinct parts: The Grid Manager (GM) interface, whose task is primarily the convert the job description into Condor s format, submit the job and store some information required by the Information System and, finally, to notify the GM on job completion. The Information System interface, which is a set of scripts called on regular intervals to poll the status of the cluster and it'sjobs
Cluster organization Submit/ Submit/ Execute Execute Submit/ Execute Submit/ Execute Submit/ The Grid GM Execute Central Manager Submit/ Execute Submit/ Submit/ Execute Submit/ Execute Execute
The Condor pools are available for all Grid users through the standard user interface.
DC2 has been a demanding test. Huge amounts of I/O (several hundred MB per job). 10 30 hours of CPU per job. At least 400 MB of memory is required. A runtime environment of 2.2 GB. Some problems in the beginning but most of them were not directly related to Condor or NorduGrid. They were more hardware related like lack of disk space and bottle necks like software distribution through NFS.
Summary Performance during one of the stable periods. Lowest failure rate. The two Condor pools of non dedicated machines are now significant contributors to the Data Challenges. The users of the desktop computers allocated to the pool were amazed by how little they were disturbed in their daily work.
NGLogger A Logger service is one of the Web services implemented by NorduGrid and based on gsoap and Globus IO API. It provides a front end to the underlying MySQL database to store and retrieve information about computing resources usage (jobs). The logger service is a part of the NorduGrid middelware. It provides information which is complementary to the Monitor, showing the history of the NorduGrid usage. The Monitor shows the current state of the system. The jobs are removed from the system as soon as their lifetime expires and thus they disappear from the Monitor. The information is persistified in the logging service.
The NGLogger is a graphical Web based interface to the underlying MySQL database. It is implemented using PHP4, JavaScript and JPGraph based on GD library. The queries are based on different basic approaches to the database like cluster, time period, application or user. Content of the Logger database. The columns shown are temporary. The database is being extended.
The information may be incomplete Not all clusters send information to the logger. The logging function may be switched off by the user. March April problems with the logger, not NorduGrid.
The Condor NorduGrid interface was installed and tested late spring. DC2 started in May. Oslo Grid Cluster UIO Grid
DC2 More than 40 000 successful jobs
DC2
User search form. Time distribution for DC2 jobs
Conclusion We have achieved the goals which were set two years ago. A growing set of active users from various scientific areas take advantage of NorduGrid as it is both an accessible and user friendly resource. Since DC1 (summer 2002) NorduGrid has been in continuous 24/7 operation and it has developed to be one of the largest production quality grids. The EPP group is giving a substantial contribution to ATLAS through the NorduGrid activities. The HEP community involved in NorduGrid is approaching the point where we will be ready to tackle the real data when LHC starts.
Extra slides
NorduGrid Introduction Despite of this diversity, the different sites can present the information about the resources in a uniform way and the user can access the computing resource s through a simple and universal interface. The user interface is the so called extended resource specification language (XRSL). Shematic example of a NorduGrid job: (& (executable="myprog") (inputfiles= ("myprog" "http://www.myserver.org/myfiles/myprog") ("myinputfile" "gsiftp://www.mystorage.org/data/file007.dat") ) (outputfiles= ("myoutput" "gsiftp://www.mystorage.org/results/file007.res") ) (disk=1000) (notify="e myname@mydomain.org") )
NorduGrid Introduction Once a proxy is obtained, jobs can be submitted to NorduGrid using simple commands and the user can keep track of the job through the whole process. Submitting: > ngsub -f myjob.xrsl The user can now follow the job on the Monitor Check the status of the job: >ngstat [options] [jobs] Retrieve the results of a finished job: >ngget [options] [jobs]