LHC Computing Grid today Did it work?

Similar documents
The Grid: Processing the Data from the World s Largest Scientific Machine

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

The grid for LHC Data Analysis

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

CHIPP Phoenix Cluster Inauguration

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Computing for LHC in Germany

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Grid Computing Activities at KIT

CC-IN2P3: A High Performance Data Center for Research

Virtualizing a Batch. University Grid Center

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

The LHC Computing Grid

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

IEPSAS-Kosice: experiences in running LCG site

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Grid Computing a new tool for science

Travelling securely on the Grid to the origin of the Universe

Storage and I/O requirements of the LHC experiments

Andrea Sciabà CERN, Switzerland

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

The LHC Computing Grid

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

The CMS Computing Model

Challenges of the LHC Computing Grid by the CMS experiment

Austrian Federated WLCG Tier-2

Experience of the WLCG data management system from the first two years of the LHC data taking

Operating the Distributed NDGF Tier-1

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

High-Energy Physics Data-Storage Challenges

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Distributing storage of LHC data - in the nordic countries

The LHC computing model and its evolution. Dr Bob Jones CERN

Towards Network Awareness in LHC Computing

Grid Computing: dealing with GB/s dataflows

e-science for High-Energy Physics in Korea

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Application of Virtualization Technologies & CernVM. Benedikt Hegner CERN

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

LHCb Computing Strategy

Distributed e-infrastructures for data intensive science

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Programmable Information Highway (with no Traffic Jams)

DESY. Andreas Gellrich DESY DESY,

Batch Services at CERN: Status and Future Evolution

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

UW-ATLAS Experiences with Condor

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Volunteer Computing at CERN

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Grid Computing at the IIHE

High Performance Computing from an EU perspective

Accelerating Throughput from the LHC to the World

Summary of the LHC Computing Review

HTC/HPC Russia-EC. V. Ilyin NRC Kurchatov Institite Moscow State University

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Status of KISTI Tier2 Center for ALICE

Visita delegazione ditte italiane

1. Introduction. Outline

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 21 July 2011

Big Data Analytics and the LHC

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

ATLAS Experiment and GCE

ATLAS operations in the GridKa T1/T2 Cloud

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

Overview. About CERN 2 / 11

Grid Computing: dealing with GB/s dataflows

150 million sensors deliver data. 40 million times per second

EGEE and Interoperation

Philippe Laurens, Michigan State University, for USATLAS. Atlas Great Lakes Tier 2 collocated at MSU and the University of Michigan

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

CERN and Scientific Computing

CMS Computing Model with Focus on German Tier1 Activities

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Storage Resource Sharing with CASTOR.

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

HEP replica management

Network Requirements. September, Introduction

LHCb Computing Resource usage in 2017

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO


ISTITUTO NAZIONALE DI FISICA NUCLEARE

AGIS: The ATLAS Grid Information System

CSCS CERN videoconference CFD applications

Support for multiple virtual organizations in the Romanian LCG Federation

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Understanding the T2 traffic in CMS during Run-1

DØ Southern Analysis Region Workshop Goals and Organization

Opportunities A Realistic Study of Costs Associated

Lessons Learned in the NorduGrid Federation

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

arxiv: v1 [physics.ins-det] 1 Oct 2009

Transcription:

Did it work? Sept. 9th 2011, 1 KIT University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Institut www.kit.edu Abteilung

Large Hadron Collider and Experiments The LHC is addressing fundamental questions of contemporary particle physics and cosmology: - Origin of mass the Higgs mechanism - What is dark matter? - Matter antimatter asymmetry? - Matter in the early universe? - hints for Quantum Gravity?? - Dark Energy??? 2

Particle Physics is international teamwork Working @Cern: 290 institutes fromeurope, 318 institutes elsewhere, 3 ~ 6349 Users ~ 3766 Users

Particle Physics international Status: 2008 4

Grid for LHC Given the international and collaborative nature of HEP Computing must be distributed harvest intellectual contributions from all partners, also funding issues Early studies in 1999 (Monarc Study group) suggested a hierarcical approach, following the typical data reduction schemes ususally adopted in data analysis in high enery physics Grid paradigm came at the right time and was adopted by LHC physicists as the base line for distibuted computing Major contributions by physicists to developments in Grid computing Other HEP communities also benefit and contributed 5

WLHC Computing Grid in action Source: http://rtm.hep.ph.ic.ac.uk/webstart.php A truly international, world-spanning Grid for LHC data processing, simulation and analysis 6

WLHC Computing Grid in action Source: http://rtm.hep.ph.ic.ac.uk/webstart.php A truly international, world-spanning Grid for LHC data processing, simulation and analysis 7

Organisation of the World-wide LHC computing Grid Grids can`t work without an organisational structure representing all parties involved 8

Structure of the LHC Grid A grid with hierarcies and different tasks at different levels In addition, it is a Grid of Grids with interoperability between different middlewares: - EGEE middleware in most of Europe - Open Science Grid in USA - NorduGrid in Northern Europe - Alien by the Alice collaboration 9

LHC Experiments are huge data sources ~ 100 Millionen detector cells LHC collision rate: 40 MHz 10-12 bit/cell ~1000 Tbyte/s raw data 100 Kh z (1 0 0G B /s ec d ig 300 itiz ed) Hz (25 0M B/s ec) wo1 rld- phy wid com sics e mu nity 10 Zero-suppression and Trigger reduce this to only some 100 Mbyte/s i.e. 1 /sec Grid Gridcomputing computing ideal idealfor forthe thejob job!!

What Particle Physicists do on the Grid CPU-intense simulation of particle physics reactions and detector response processing (=reconstrucion) of large data volumes I/O-intense filtering and distribution of data transfer to local clusters and workstations for final physics interpretation 11

Why Grid is well suited for HEP Experimental HEP codes key characteristics: modest memory requirement (~2GB) & modest floating point perform well on PCs independent events easy parallelism large data collections (TB PB) shared by very large collaborations 12

WLCG: Grid with hierarcical structure Tier-0 the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data to T1/T2 11 Tier-1 Centres Tier-1 Canada Triumf (Vancouver) France IN2P3 (Lyon) Spain PIC (Barcelona) Germany Forschunszentrum Karlsruhe Taiwan Academia SInica (Taipei) Italy CNAF (Bologna) UK CLRC (Oxford) Netherlands NIKHEF/SARA (Amsterdam) US FermiLab (Illinois) Nordic countries distributed Tier-1 Brookhaven (NY) Tier-2 Tier-3 13 online to the data acquisition process high availability Managed Mass Storage grid-enabled data service Data-intensive analysis National, regional support 150 Centres in 60 Federations in 35 countries End-user (physicist, research group) analysis & collaboration with T3 (= institute recources) where the discoveries are made Monte Carlo Simulation several 100 grid-enabled PC clusters @ institutes

Grid Structure Example: CMS computing model 7 Tier-1 centres ~50 Tier-2 centres LHC-Experiments - typically share big Tier-1s, take responsibility for experiment-specific services - have a large number of Tier2s, usually supporting only one experiment - have an even larger number of Tier-3s without any obligations towards WLCG 14

Typical workflows on the Grid 15

How big is WLCG today? CPU (khs06) T0 - T1 - T2 800000 700000 600000 500000 400000 300000 200000 100000 0 2011 2010 T1 2012 T2 T0 Total CPU 2011: 1500 khs06 approx. equiv. 150'000 CPU cores Disk (PB) T0 - T1 - T2 80000 2010 70000 60000 T1 50000 2011 2012 T2 40000 30000 20000 T0 10000 0 Total Disk 2011: 130 PB, same amount as Tape Storage 2011: 130 PB ( 40 PB at CERN, none @ T2s) 2012 numbers still being negotiated! The largest Science Grid in the World 16

A closer look to the surroundings of GridKa GridKa supports >20 T2s in 6 countries, provides ~15% of WLCG T1 resoucres Alice T2 sites in Russia Göttingen Most Mostcomplex complex T2 T2cloud cloud of of any anyt1 T1inin WLCG WLCG 17

The World-wide LHC Computing Grid After almost 2 years of experience with LHC operation: How well did it work? 18

Almost 2 years of experience - Did it work? Up to the users to give feedback: 19 D. Charlton,ATLAS, EPS HEP 2011

Did it work (2) G. Tonelli, CMS, EPS HEP 2011 20

Did it work? Obviously it did! - Grid infrastructure for the LHC performed extremely well - physics results from freshly recorded data - but: effort for running computing infrastructure is high! There are challenges ahead! Let`s have a look in detail... 21

Data Export Rate from CERN 22

Daily data export from CERN 23

Data Export from T1: example: CMS @ KIT 24

Data Import T2->T1 (MC): example CMS@KIT 25

Processed Jobs ATLAS 26

Pocessed jobs success rates (ATLAS) 27

Processed Jobs - CMS 28

Processed Jobs success rates (CMS) 29

Job success rates @ T2s (CMS T2s shown here ATLAS looks similar ) T2s see a mixture of - MC production jobs - user analyis and skimming jobs Job success rates not in all cases above 90% 90% success rate would be considered very low for a classical computer centre This must improve Not easy to disentangle failures of the system from user errors 30

Site Reliability (examples) 31

T2 performance Availability of CMS T2 sites There are sites with peroformance issues! Typically, less well performing sites are very small! 32

T2 Perfromance ATLAS example 33

T2 Performance Atlas (cont'd) This is an example of a time-resolved measurement of site avaiability Message similar as previously. This kind of graphs helps the site responsibles to monitor their sites and act on problems. Provided centrally by WLCG and expriments! 34

Again: Does it work? YES! Routinely runing ~150'000 jobs simultaneously on the Grid Shipping over 100 TB/day to T1 centres data distribution to T2 works well some T2s have performance issues very little is known about T3 usage and success rates responsibility of the institutes plenty of resources available at LHC start-up, now approaching resource limited operation Users have adapted to the GridWorld Grid is routinely used as a huge batch system, output is transferred home but... 35

Does it work? Message: it worked better than expeced by many, but running such a complex computing infrastructure as the WLCG is tedious (and expensive!) Reliability and cost of operation can be improved by simpified and more robust middleware redundancy of services and sites, requires dynamic placement of data and investment in network bandwidh automated monitoring and triggering of actions use of commercially supported approaches to distributed computing: - private clouds are particularly important for shared resources at universities - eventually off-load simple tasks (simulation, statistics calculations) to commercial clouds Many of the new developments were addressed at this School Let`s have a look at some future developments... 36

LHCONE Atlas ATLAS and CMS computing models differ slightly CMS CMS already more distributed Aim of LHCONE project is better trans-regional networking for data analyis, complementary to LHCOPN network connecting LHC T1s - flat(er) hierarchy: any site has access to any other site's data - dynamic data caching: pull data on demand - remote data access: jobs may use data remotely by interconnecting open exchange points between regional networks 37

LHCONE (2) Schamatic layout of LHCONE network infrastrucure A dedicated HEP network infrastrucure what is the cost? 38

Virtualisation You have heard a lot about clouds and virtualisation at this school in a nutshell: - Clouds offer Infrastructure as a Service - easy provision of resources on demand even by including (private) clould resources as a classical batch queue (e.g. ROCED project developed at EKP, KIT) - independent of local hardware and operating system (Scientific Linux 5 for Grid middleware and experiment software) 39

CernVM & CernVM-FS CernVM is a virtual machine ( Virtual Software Appliance ) based on Scientific Linux with CERN software environment, runs on CernVM-FS is a client-server file system based on http and implemented as as user-space file system optimized for readonly access to software repositories with a performant caching mechanism. Allows a CernVM instance to efficiently access software installed remotely. 40

A recap of this School This week, you have heard about many of the new developments: J. Templon, Grid and Cloud Grids need Clouds to prosper, Clouds need Grids to scale O. Synge, Virtualisation P. Millar, Data Storage C. Witzig, European Grid Projects T. Beckers, Storage Architectures for Petaflops Computing S. Reißer, Grid User Support N. Abdennadher, Combining Grid, Cloud and Volunteer Computing U. Schwickerath, Cloud Computing S. Maffioletti, ARC for developers T. Metsch, B. Schott, Sustainable DCI Operations A. Aeschlimann, Grid and Cloud Security and a number of Hands-on workshops in parallel sessions. HEP Grid runs fine in its initial version, but virtualisation and clouds offer new possibilities for resource increase, efficiency, cost effectiveness, operation and reliability. It's up the you, the participants of this school, to shape the future! Thanks to all speakers, session teams and organizers and also from my side. 41