Operating the Distributed NDGF Tier-1

Similar documents
A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

Distributing storage of LHC data - in the nordic countries

ARC integration for CMS

Lessons Learned in the NorduGrid Federation

Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures

The LHC Computing Grid

Andrea Sciabà CERN, Switzerland

Analysis of internal network requirements for the distributed Nordic Tier-1

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

EGEE and Interoperation

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

First Experience with LCG. Board of Sponsors 3 rd April 2009

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Status of KISTI Tier2 Center for ALICE

Monitoring tools in EGEE

Scientific data management

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Austrian Federated WLCG Tier-2

Grid and Cloud Activities in KISTI

Computing / The DESY Grid Center

Service Availability Monitor tests for ATLAS

Travelling securely on the Grid to the origin of the Universe

PoS(EGICF12-EMITC2)106

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Future trends in distributed infrastructures the Nordic Tier-1 example

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Philippe Charpentier PH Department CERN, Geneva

Grid Computing Activities at KIT

The LHC Computing Grid

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

WLCG and Grid Computing Summer 2011 Part1: WLCG Markus Schulz. IT Grid Technology Group, CERN WLCG

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Grid Data Management

Storage and I/O requirements of the LHC experiments

The INFN Tier1. 1. INFN-CNAF, Italy

Clouds at other sites T2-type computing

CHIPP Phoenix Cluster Inauguration

Overview of ATLAS PanDA Workload Management

Data Management for the World s Largest Machine

Experience of the WLCG data management system from the first two years of the LHC data taking

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Data Storage. Paul Millar dcache

Clouds in High Energy Physics

On the employment of LCG GRID middleware

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

Grid Computing at Ljubljana and Nova Gorica

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Grid Interoperation and Regional Collaboration

The Grid: Processing the Data from the World s Largest Scientific Machine

glite Grid Services Overview

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Storage Resource Sharing with CASTOR.

Access the power of Grid with Eclipse

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

ATLAS NorduGrid related activities

HEP Grid Activities in China

Virtualizing a Batch. University Grid Center

Data transfer over the wide area network with a large round trip time

ALICE Grid Activities in US

High Energy Physics data analysis

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

On the EGI Operational Level Agreement Framework

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Connectivity Services, Autobahn and New Services

ARC NOX AND THE ROADMAP TO THE UNIFIED EUROPEAN MIDDLEWARE

The European DataGRID Production Testbed

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

European Grid Infrastructure

Experience of Data Grid simulation packages using.

EISCAT_3D Support (E3DS) Project.

The Global Grid and the Local Analysis

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

UW-ATLAS Experiences with Condor

Towards Network Awareness in LHC Computing

CouchDB-based system for data management in a Grid environment Implementation and Experience

FREE SCIENTIFIC COMPUTING

Tier-2 structure in Poland. R. Gokieli Institute for Nuclear Studies, Warsaw M. Witek Institute of Nuclear Physics, Cracow

The SweGrid Accounting System

IEPSAS-Kosice: experiences in running LCG site

The LHC computing model and its evolution. Dr Bob Jones CERN

Support for multiple virtual organizations in the Romanian LCG Federation

A scalable storage element and its usage in HEP

CC-IN2P3: A High Performance Data Center for Research

National R&E Networks: Engines for innovation in research

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

dcache, activities Patrick Fuhrmann 14 April 2010 Wuppertal, DE 4. dcache Workshop dcache.org

Experiences with the new ATLAS Distributed Data Management System

AMGA metadata catalogue system

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

Transcription:

Operating the Distributed NDGF Tier-1 Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei, April 10th 2008

Talk Outline What is NDGF? Why a distributed Tier-1? Services Computing Storage Databases VO Specific Operation Results 2

Nordic DataGrid Facility A Co-operative Nordic Data and Computing Grid facility Nordic production grid, leveraging national grid resources Common policy framework for Nordic production grid Joint Nordic planning and coordination Operate Nordic storage facility for major projects Co-ordinate & host major escience projects (i.e., Nordic WLGC Tier-1) Develop grid middleware and services NDGF 2006-2010 Funded (2 M /year) by National Research Councils of the Nordic Countries DK NOS-N SF N S Nordic Data Grid Facility3

Nordic DataGrid Facility Nordic Participation in Big Science: WLCG the Worldwide Large Hadron Collider Grid Gene databases for bio-informatics sciences Screening of CO2-Sequestration suitable reservoirs ESS European Spallation Source Astronomy projects Other... 4

Why a Distributed Tier-1?

Why a Distributed Tier-1? Computer centers are small and distributed

Why a Distributed Tier-1? Computer centers are small and distributed Even the biggest adds up to 7

Why a Distributed Tier-1? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community

Why a Distributed Tier-1? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy

Why a Distributed Tier-1? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center

Why a Distributed Tier-1? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center Fast inter Nordic network

Organization Tier-1 related

Tier-1 Services Storage Tape and Disk Computing well connected to storage Network - part of the LHC OPN Databases: 3D for e.g. ATLAS LFC for indexing files File Transfer Service Information systems Monitoring Accounting VO Services: ATLAS specific Taipei, ISGC08, April 2008 ALICE specific

Resources at Sites Storage is distributed Computing is distributed Many services are distributed But the sites are heterogeneous...

Resources at Sites

Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy

Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC! Already deployed Runs on all Linux flavors Uses resources optimally

Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC! Already deployed Runs on all Linux flavors Uses resources optimally glite keeps nodes idle in up/download

Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC! Already deployed Runs on all Linux flavors Uses resources optimally ARC uses the CE for datahandling

Storage 20

Storage 21

Storage

Storage

Storage dcache Java based so runs even on Windows! Separation between resources and services Open source Pools at sites Doors and Admin nodes centrally Part of the development Added GridFTP2 to bypass door nodes in transfers Various improvements a tweaks for distributed use Central services at the GEANT endpoint

Storage

Network Dedicated 10GE to CERN via GEANT (LHCOPN) Örestaden Dedicated 10GENORDUnet betweennren participating Tier-1 sites NDGF AS - AS39590 National Switch National Sites National FI SE DK NO Central host(s) CERN LHC HPC2N IP PDC network NSC......

Other Tier-1 Services Catalogue: RLS & LFC FTS File Transfer Service 3D Distributed Database Deployment SGAS -> APEL Service Availability Monitoring via ARCCE SAM sensors

ATLAS Services So far part of Dulcinea Moving to PanDa The act (ARC Control Tower aka the fat pilot ) PanDa improves glite performance through better data handling (similar to ARC) Moving RLS to LFC

ALICE Services Many VO Boxes one pr site Aalborg, Bergen, Copenhagen, Helsinki, Jyväskylä, Linjköping, Lund, Oslo, Umeaa Central VO Box integrating distributed dcache with xrootd Ongoing efforts to integrate ALICE and ARC

NDGF Facility - 2008Q1

Operations

Operation

Operation 1st line support (in operation) NORDUnet 2nd line support (in operation) Operator NOC 24x7 on Duty 8x365 3rd line support (in operation) NDGF Operation Staff Sys Admins at sites Shared tickets with NUNOC

People

Results - Accounting According to EGEE Accounting Portal for 2007: NDGF contributed to 4% of all EGEE NDGF was the 5th biggest EGEE site NDGF was the 3rd biggest ATLAS Tier-1 worldwide NDGF was the biggest European ATLAS Tier-1

Results - Reliability NDGF has been running SAM tests since 2007Q3 Overall 2007Q4 reliability was 96% Which made us the most reliable Tier-1 in the world

Results - Efficiency The efficiency of the NorduGrid cloud (NDGF + Tier-2/3s using ARC) was 93% Result was mainly due to: High middleware efficiency High reliability This was due to: Distributed setup Professional operation team

Worries Can re-constructions run on a distributed setup High data throughput Low CPU consumption NDGF, Triumph and BNL reprocessed M5 data in February in the CCRC08-1 Shown to work Bottleneck was 3D DB (which is running on only one machine)

Looking ahead... The Distributed Tier-1 a success High efficiency High reliability Passed the CCRC08-1 tests Partnering with EGEE on: Operation (taking part in CIC on Duty) Interoperability Tier-2s under setup CMS will use glite interoperability to run on ARC

Thanks! Questions? 40