First Experience with LCG. Board of Sponsors 3 rd April 2009

Similar documents
The LHC Computing Grid

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

CHEP Report. Jeff Templon

Distributed e-infrastructures for data intensive science

The grid for LHC Data Analysis

WLCG and Grid Computing Summer 2011 Part1: WLCG Markus Schulz. IT Grid Technology Group, CERN WLCG

Travelling securely on the Grid to the origin of the Universe

A European Vision and Plan for a Common Grid Infrastructure

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Operating the Distributed NDGF Tier-1

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

150 million sensors deliver data. 40 million times per second

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Andrea Sciabà CERN, Switzerland

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Summary of the LHC Computing Review

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

The LHC computing model and its evolution. Dr Bob Jones CERN

Batch Services at CERN: Status and Future Evolution

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

The LHC Computing Grid

Grid Computing: dealing with GB/s dataflows

Grid Security Policy

CERN and Scientific Computing

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Grid and Cloud Activities in KISTI

CHIPP Phoenix Cluster Inauguration

PoS(EGICF12-EMITC2)106

Accelerating Throughput from the LHC to the World

Experience of the WLCG data management system from the first two years of the LHC data taking

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

The Grid: Processing the Data from the World s Largest Scientific Machine

e-science Infrastructure and Applications in Taiwan Eric Yen and Simon C. Lin ASGC, Taiwan Apr. 2008

The EGEE-III Project Towards Sustainable e-infrastructures

Data Transfers Between LHC Grid Sites Dorian Kcira

Computing / The DESY Grid Center

On the EGI Operational Level Agreement Framework

DESY at the LHC. Klaus Mőnig. On behalf of the ATLAS, CMS and the Grid/Tier2 communities

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

The Role and Functions of European Grid Infrastructure

High Performance Computing from an EU perspective

European Grid Infrastructure

The CMS Computing Model

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Bob Jones. EGEE and glite are registered trademarks. egee EGEE-III INFSO-RI

Status of KISTI Tier2 Center for ALICE

Stakeholder consultation process and online consultation platform

e-sens Nordic & Baltic Area Meeting Stockholm April 23rd 2013

DAS LRS Monthly Service Report

Computing for LHC in Germany

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Server Virtualization and Optimization at HSBC. John Gibson Chief Technical Specialist HSBC Bank plc

CEF ehealth DSI 2018 Technical Boot Camp State of Play

PIC Tier-1 Report. 1st Board for the follow-up of GRID Spain activities Madrid 5/03/2008. G. Merino and M. Delfino, PIC

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Grid Computing: dealing with GB/s dataflows

CIMA Asia. Interactive Timetable Live Online

HTC/HPC Russia-EC. V. Ilyin NRC Kurchatov Institite Moscow State University

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Tel-Aviv University GRID Status

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Grid Computing a new tool for science

Computing at Belle II

App Economy Market analysis for Economic Development

Distributed Data Management on the Grid. Mario Lassnig

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

High Energy Physics data analysis

CernVM-FS beyond LHC computing

Grid Computing Activities at KIT

Lessons Learned in the NorduGrid Federation

Data Storage. Paul Millar dcache

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

EGEE and Interoperation

2014 Bond Technology Update Progress of the Technology Network Infrastructure Upgrades Long Range Planning Committee March 4, 2015

CERN s Business Computing

CIMA Asia. Interactive Timetable Live Online

Austrian Federated WLCG Tier-2

Section 1.2: What is a Function? y = 4x

DESY. Andreas Gellrich DESY DESY,

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Clouds at other sites T2-type computing

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

The INFN Tier1. 1. INFN-CNAF, Italy

New Concept for Article 36 Networking and Management of the List

Support for multiple virtual organizations in the Romanian LCG Federation

HPE Security Data Security. HPE SecureData. Product Lifecycle Status. End of Support Dates. Date: April 20, 2017 Version:

Clouds in High Energy Physics

IEPSAS-Kosice: experiences in running LCG site

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

COURSE LISTING. Courses Listed. with SAP Hybris Marketing Cloud. 24 January 2018 (23:53 GMT) HY760 - SAP Hybris Marketing Cloud

LHCb Computing Resource usage in 2017

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Transcription:

First Experience with LCG Operation and the future... CERN openlab Board of Sponsors 3 rd April 2009 Ian Bird LCG Project Leader

The LHC Computing Challenge Signal/Noise: 10-9 Data volume High rate * large number of channels * 4 experiments 15 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users 100 k of (today's) fastest CPUs 45 PB of disk storage Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology Ian.Bird@cern.ch 2 2

WLCG what and why? A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres The resources are distributed for funding and sociological reasons Our task is to make use of the resources available to us no matter where they are located We know it would be simpler to put all the resources in 1 or 2 large centres This is not an option... today Ian.Bird@cern.ch 3

Tier 0 at CERN: Acquisition, First pass processing Storage & Distribution 1.25 GB/sec (ions) Ian.Bird@cern.ch 4 Ian.Bird@cern.ch 4

Tier 0 Tier 1 Tier 2 Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis Ian.Bird@cern.ch 5 Ian.Bird@cern.ch 5

First events

CERN Amsterdam/NIKHEF-SARA Bologna/CAF Taipei/ASGC TRIUMF BNL NGDF FNAL Lyon/CCIN2P3 FZK Barcelona/PIC RAL

CERN Amsterdam/NIKHEF-SARA Bologna/CAF Taipei/ASGC TRIUMF BNL NGDF FNAL Lyon/CCIN2P3 FZK Barcelona/PIC RAL

Ian.Bird@cern.ch 10

CERN + Tier 1 accounting - 2008 CPU Time Delivered Ratio of CPU : Wall_clock Times 1,600 1,400 1,200 MSI2K-days. 1,000 800 600 400 200 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec month (2008) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec month (2008) Disk Storage Used Tape Storage Used TeraBytes. 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec month (2008) Ian.Bird@cern.ch 11 TeraBytes. 40,000 35,000 30,000 25,000 20,000 15,000 10,000 5,000 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec month (2008)

CMS Data Transfer History May 6th 2008 LHCC referees: CMS - Computing 12/32

10M files Test @ ATLAS (From S. Campana) Simon Fraser M.C. Vetterli LHCC review, CERN; Feb. 09 #13

20000000 18000000 16000000 14000000 LHCb 12000000 CMS 10000000 ATLAS 8000000 ALICE 6000000 Number of jobs/month 500k/day Main outstanding issues 4000000 2000000 related to service/site reliability 0 Alice ATLAS CMS LHCb Total Tier-1s 6.24 32.03 30.73 2.50 71.50 34.3% Tier-2s 961 9.61 52.2323 55.04 20.14 137.02 65.7% Total 15.85 84.26 85.77 22.64 208.52 From APEL accounting portal for Aug. 08 to Jan. 09; #s in MSI2k Ian.Bird@cern.ch 14

Database replication LCG 3-D The 3-D project is now finished runs as production service In full production Several GB/day user data can be sustained to all Tier 1s ~100 DB nodes at CERN and several 10 s of nodes at Tier 1 sites Very large distributed database deployment Used for several applications Experiment calibration data; replicating (central, read-only) file catalogues Ian.Bird@cern.ch 15

Reliabilities 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% Site Reliability: CERN + Tier 1s Tier 2 Reliabilities Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 1.00 080 0.80 0.60 0.40 0.20 May-08 Jun-08 000 0.00 J Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Average Average - 8 best sites Target Average Top 50% Top 20% Improvement during CCRC and later is encouraging -Tests do not show full picture e.g. Hide experiment-specific issues, - OR of service instances probably too simplistic a) publish VO-specific tests regularly; b) rethink algorithm for combining service instances Ian.Bird@cern.ch 16

Improving Reliability Testing Task forces/challenges Monitoring Appropriate Followed up Ian.Bird@cern.ch 17

Multi(many)-core: Better memory use efficiency challenges Co-scheduling many similar processes onto a single box Parallelizing (multi-thread, MPI,...) the applications New technologies: Clouds + virtualisation Use as overflow resources for peak periods demonstrated Running our facilities as clouds use of virtualisation, management tools, etc. Buying resources directly... needs education of the funding agencies Lessons: simplicity of interfaces, usage patterns,... Virtualisation helps improve service reliability; simplifies facility management (tbc) and leaves apps to deal with dependencies Grid grid of cloud-like like objects Filesystems Lustre, Hadoop, NFS4.1, etc Can we use these to improve our service reliability? Usability? Messaging systems Use for integrating systems Web services across languages etc did not deliver Ian.Bird@cern.ch 18

Enabling Grids for E-sciencE What are the limitations & possible solutions? EGEE-III INFSO-RI-222667 19

Ian.Bird@cern.ch 20

Challenges cont... Simplification of data management Clouds don t help much here Abstraction SRM has added complexity How much is required? How can we simplify? What are the lessons to learn? Database access grid authn/authz would help... New Tier 0 centre We will run out of power, new centre planned, will it be ready when we need it??? Moving from EGEE to a European sustainable grid infrastructure Whilst maintaining a solid service Ian.Bird@cern.ch 21

EGEE EGI+NGIs EGI blueprint published in December; endorsed in January by 20 NGIs March policy board has selected Amsterdam as the location of EGI.org (body resp. for managing EGI) Initiation of transition process to create an EGI council MoU to be prepared as an interim measure to identify NGIs prepared to commit as described in the blueprint (start with Letters of Intent) Anticipate 1 st council meeting in May Task force to be established for preparation of EGI proposals, for EC calls anticipated to close in November EGEE has outlined a fairly detailed transition plan for the final year of the project But can only go so far Ian.Bird@cern.ch 22

EGI and WLCG WLCG cannot take the risk of assuming EGI will be in place at the end of EGEE-III We plan to ensure that services provided to use to day by EGEE are assured by our Tier 1 sites Support the formation of a glite consortium to support the middleware In parallel we work with EGI_DS and EC to try and ensure that EGI and the NGIs will deliver what we need Ian.Bird@cern.ch 23

Conclusions We have built a working system that will be used for first data taking But it has taken a lot longer than anticipated... and was a lot harder... and the reality of grids does not quite match the hype... We now have an opportunity to rethink how we want this to develop in the future Clearer ideas of what is needed And must consider the risks, maintainability, reliability, and complexity Change of funding model and new technologies provide opportunities Challenges: data management and reliability, reliability,... Should remember... Our goal is to enable the experiments computing, not necessarily to develop computer science (unless we have to...) Ian.Bird@cern.ch 24

WLCG timeline 2009-2010 EGEE-III ends EGI...??? 2009 2010 2011 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb CCRC 09 Tests to be scheduled SU PP running HI? Switch to SL5/64bit completed? 2009 Capacity commissioned 2010 Capacity commissioned Deployment of glexec/scas; CREAM; SRM upgrades; SL5 WN