ATLAS NorduGrid related activities

Similar documents
Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

Data Management for the World s Largest Machine

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

ARC-XWCH bridge: Running ARC jobs on the XtremWeb-CH volunteer

NorduGrid Tutorial. Client Installation and Job Examples

High Performance Computing Course Notes Grid Computing I

Grid Computing at Ljubljana and Nova Gorica

EGEE and Interoperation

Distributing storage of LHC data - in the nordic countries

The LHC Computing Grid

Operating the Distributed NDGF Tier-1

The NorduGrid Architecture and Middleware for Scientific Applications

Atlas Data-Challenge 1 on NorduGrid

UW-ATLAS Experiences with Condor

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Cycle Sharing Systems

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Introduction to Grid Computing

Andrea Sciabà CERN, Switzerland

Architecture Proposal

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Atlas Managed Production on Nordugrid

Computing for LHC in Germany

ARC middleware. The NorduGrid Collaboration

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

The INFN Tier1. 1. INFN-CNAF, Italy

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

The University of Oxford campus grid, expansion and integrating new partners. Dr. David Wallom Technical Manager

Grid Compute Resources and Grid Job Management

Monte Carlo Production on the Grid by the H1 Collaboration

Introduction to Grid Technology

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

Monitoring the Usage of the ZEUS Analysis Grid

Edinburgh (ECDF) Update

ELFms industrialisation plans

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

NorduGrid Tutorial Exercises

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Understanding StoRM: from introduction to internals

PoS(EGICF12-EMITC2)081

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

SZDG, ecom4com technology, EDGeS-EDGI in large P. Kacsuk MTA SZTAKI

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

HPC learning using Cloud infrastructure

IEPSAS-Kosice: experiences in running LCG site

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Data storage services at KEK/CRC -- status and plan

Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Lecture 1: January 22

AutoPyFactory: A Scalable Flexible Pilot Factory Implementation

Grid Computing Activities at KIT

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

The Grid Monitor. Usage and installation manual. Oxana Smirnova

High Throughput WAN Data Transfer with Hadoop-based Storage

Lecture 1: January 23

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Heterogeneous Grid Computing: Issues and Early Benchmarks

Monitoring ARC services with GangliARC

ATLAS Tier-3 UniGe

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

PROOF-Condor integration for ATLAS

Multiple Broker Support by Grid Portals* Extended Abstract

FREE SCIENTIFIC COMPUTING

ATLAS COMPUTING AT OU

Overview of ATLAS PanDA Workload Management

WP3 Final Activity Report

Grid Computing with NorduGrid-ARC

N. Marusov, I. Semenov

Batch Services at CERN: Status and Future Evolution

Opportunities for container environments on Cray XC30 with GPU devices

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Lessons Learned in the NorduGrid Federation

Benoit DELAUNAY Benoit DELAUNAY 1

Scientific data management

On the employment of LCG GRID middleware

The Grid: Processing the Data from the World s Largest Scientific Machine

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

ARC integration for CMS

S i m p l i f y i n g A d m i n i s t r a t i o n a n d M a n a g e m e n t P r o c e s s e s i n t h e P o l i s h N a t i o n a l C l u s t e r

Outline. ASP 2012 Grid School

Geant4 on Azure using Docker containers

Applications of Grid Computing in Genetics and Proteomics

HEP replica management

Deploying virtualisation in a production grid

Benchmarking the ATLAS software through the Kit Validation engine

System Specification

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

Grid Scheduling Architectures with Globus

glite Grid Services Overview

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Easy Access to Grid Infrastructures

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Introduction to Grid Infrastructures

Use to exploit extra CPU from busy Tier2 site

Transcription:

Outline: NorduGrid Introduction ATLAS software preparation and distribution Interface between NorduGrid and Condor NGlogger graphical interface On behalf of: Ugur Erkarslan, Samir Ferrag, Morten Hanshaugen (USIT), Aleksandr Konstantinov, prof. Farid Ould Saada, Katarina Pajchel, prof. Alex Read, Haakon Riiser, Are Strandli (UC Gjøvik), Sturle Sunde (USIT)

NorduGrid Introduction The Experimental Particle Physics (EPP) group and the University of Oslo is one of the main contributors to the NorduGrid collaboration. It contributes at virtually all levels: Management, coordination and external contacts through the the steering board chairman Middelware development and maintenance Hardware clusters and storage elements Software Applications Education

NorduGrid Introduction NorduGrid middelware: Advanced Resource Connector (ARC) is designed to support a dynamic, heterogeneous Grid facility, spanning different computing resources and user communities. In contrast with other grids like EDG, LCG or EGEE, NorduGrid was built incrementally from the bottom up with one of the goals being to have a continuously running system. The middelware connects existing Local Resource Management Systems (LRMS) and both dedicated and non dedicated resources.

NorduGrid Introduction The ARC provides a complete Grid service, but it does not replace all levels of the job handling infrastructure. The ARC has been deployed at a number of computer resources running different Linux flavors and using various LRMSs like PBS and Condor.

NorduGrid Introduction The ARC provides a complete Grid service, but it does not replace all levels of the job handling infrastructure. The ARC has been deployed at a number of computer resources running on different Linux flavors and using various LRMSs. Despite of this diversity, the different sites can present the information about the resources in a uniform way and the user can access the computing resources through a simple and universal interface. The user interface talks in the so called extended resource specification language (XRSL).

NorduGrid Introduction NorduGrid is used by different scientific communities including physics, chemistry, biology and informatics. List of runtime environments on one of the Swegrid clusters (Bluesmoke)

NorduGrid Introduction Storage Smart Storage Element (SSE) is replacement for the current simple storage element. SSE is based on standard protocols. It provides: flexible access control data integrity between resources support for autonomous and reliable data replication

NorduGrid Introduction Hardware The Storage Elements provided and maintained by EPP are a substantial contribution to NorduGrid's total storage facilities 1700 GB 950 GB ~ 8 TB ~ 22 % of the total storage capacity (~36 TB) 6 of totally 37 storage elements currently available in NorduGrid

NorduGrid Introduction Education: Bachelor project on cluster and grid computing. Accounting and banking systems By students from University College in Gjøvik

DC2 More than ~ 5000 CPUs in NorduGrid Efficiently ~ 800 CPUs are dedicated to ATLAS, rest is shared LCG ~3800 CPUs, Grid3 ~2000 CPUs NorduGrid has proved to be very efficient and stable. Despite of the large amounts of data, there has been no reports on serious problems with the storage and data transport.

ATLAS software source code ~ 10 GB New releases are frequent NorduGrid operates on different platforms and compiler versions The ATLAS run time environment needs to be distributed through out the grid. NorduGrid has developed an ATLAS software distribution method using rpm binary packages. This reduces the size of the full runtime environment to ~ 2.2 GB Easy installation and validation using the KitVaildation provided by ATLAS The Atlas code was compiled and run for the first time under Red Hat Enterprise Edition Linux in Oslo. The group has provided the software packages for the RHEL3 platform and distribution on the local resources.

CONDOR NorduGrid Interface Condor is a Local Resource Management System. Condor is used by USIT (the university s central IT department) and locally by the Experimental Particle Physics (EPP) group. It allowes better utilization of the available resources, specially the idle time. Condor is well suited for environments with non dedicated machines The EPP group in collaboration with the University wanted to contribute to the Data Challenges (DC) at ATLAS, specially the ongoing DC2. Prior to the completion of the Condor/NorduGrid project, the only batch system that NorduGrid was interfaced to was PBS (Portable Bach System). The ATLAS DC jobs would be submitted with the NorduGrid Middleware, creating the need for a bridge between Condor and NorduGrid.

Overview of implementation The Condor/NorduGrid interface consists of two distinct parts: The Grid Manager (GM) interface, whose task is primarily the convert the job description into Condor s format, submit the job and store some information required by the Information System and, finally, to notify the GM on job completion. The Information System interface, which is a set of scripts called on regular intervals to poll the status of the cluster and it'sjobs

Cluster organization Submit/ Submit/ Execute Execute Submit/ Execute Submit/ Execute Submit/ The Grid GM Execute Central Manager Submit/ Execute Submit/ Submit/ Execute Submit/ Execute Execute

The Condor pools are available for all Grid users through the standard user interface.

DC2 has been a demanding test. Huge amounts of I/O (several hundred MB per job). 10 30 hours of CPU per job. At least 400 MB of memory is required. A runtime environment of 2.2 GB. Some problems in the beginning but most of them were not directly related to Condor or NorduGrid. They were more hardware related like lack of disk space and bottle necks like software distribution through NFS.

Summary Performance during one of the stable periods. Lowest failure rate. The two Condor pools of non dedicated machines are now significant contributors to the Data Challenges. The users of the desktop computers allocated to the pool were amazed by how little they were disturbed in their daily work.

NGLogger A Logger service is one of the Web services implemented by NorduGrid and based on gsoap and Globus IO API. It provides a front end to the underlying MySQL database to store and retrieve information about computing resources usage (jobs). The logger service is a part of the NorduGrid middelware. It provides information which is complementary to the Monitor, showing the history of the NorduGrid usage. The Monitor shows the current state of the system. The jobs are removed from the system as soon as their lifetime expires and thus they disappear from the Monitor. The information is persistified in the logging service.

The NGLogger is a graphical Web based interface to the underlying MySQL database. It is implemented using PHP4, JavaScript and JPGraph based on GD library. The queries are based on different basic approaches to the database like cluster, time period, application or user. Content of the Logger database. The columns shown are temporary. The database is being extended.

The information may be incomplete Not all clusters send information to the logger. The logging function may be switched off by the user. March April problems with the logger, not NorduGrid.

The Condor NorduGrid interface was installed and tested late spring. DC2 started in May. Oslo Grid Cluster UIO Grid

DC2 More than 40 000 successful jobs

DC2

User search form. Time distribution for DC2 jobs

Conclusion We have achieved the goals which were set two years ago. A growing set of active users from various scientific areas take advantage of NorduGrid as it is both an accessible and user friendly resource. Since DC1 (summer 2002) NorduGrid has been in continuous 24/7 operation and it has developed to be one of the largest production quality grids. The EPP group is giving a substantial contribution to ATLAS through the NorduGrid activities. The HEP community involved in NorduGrid is approaching the point where we will be ready to tackle the real data when LHC starts.

Extra slides

NorduGrid Introduction Despite of this diversity, the different sites can present the information about the resources in a uniform way and the user can access the computing resource s through a simple and universal interface. The user interface is the so called extended resource specification language (XRSL). Shematic example of a NorduGrid job: (& (executable="myprog") (inputfiles= ("myprog" "http://www.myserver.org/myfiles/myprog") ("myinputfile" "gsiftp://www.mystorage.org/data/file007.dat") ) (outputfiles= ("myoutput" "gsiftp://www.mystorage.org/results/file007.res") ) (disk=1000) (notify="e myname@mydomain.org") )

NorduGrid Introduction Once a proxy is obtained, jobs can be submitted to NorduGrid using simple commands and the user can keep track of the job through the whole process. Submitting: > ngsub -f myjob.xrsl The user can now follow the job on the Monitor Check the status of the job: >ngstat [options] [jobs] Retrieve the results of a finished job: >ngget [options] [jobs]