Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Similar documents
The LHC Computing Grid

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

First Experience with LCG. Board of Sponsors 3 rd April 2009

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Andrea Sciabà CERN, Switzerland

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The Grid: Processing the Data from the World s Largest Scientific Machine

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Grid Computing Activities at KIT

Grid Computing at the IIHE

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Travelling securely on the Grid to the origin of the Universe

Austrian Federated WLCG Tier-2

Summary of the LHC Computing Review

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

The LHC Computing Grid

Status of KISTI Tier2 Center for ALICE

Towards Network Awareness in LHC Computing

CouchDB-based system for data management in a Grid environment Implementation and Experience

The CMS Computing Model

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

PoS(EGICF12-EMITC2)106

Exploring cloud storage for scien3fic research

European Grid Infrastructure

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

The European DataGRID Production Testbed

CERN and Scientific Computing

The EU DataGrid Testbed

Storage Resource Sharing with CASTOR.

Accelerating Throughput from the LHC to the World

Operating the Distributed NDGF Tier-1

Batch Services at CERN: Status and Future Evolution

High Throughput WAN Data Transfer with Hadoop-based Storage

CERN s Business Computing

Computing for LHC in Germany

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

The grid for LHC Data Analysis

Storage and I/O requirements of the LHC experiments

CHIPP Phoenix Cluster Inauguration

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Grid Computing a new tool for science

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Data Transfers Between LHC Grid Sites Dorian Kcira

Distributed e-infrastructures for data intensive science

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

IEPSAS-Kosice: experiences in running LCG site

CernVM-FS beyond LHC computing

Virtualizing a Batch. University Grid Center

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

N. Marusov, I. Semenov

Grid and Cloud Activities in KISTI

AGIS: The ATLAS Grid Information System

The INFN Tier1. 1. INFN-CNAF, Italy

ISTITUTO NAZIONALE DI FISICA NUCLEARE

On the EGI Operational Level Agreement Framework

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CC-IN2P3: A High Performance Data Center for Research

LHCb Computing Resource usage in 2017

Tier-2 structure in Poland. R. Gokieli Institute for Nuclear Studies, Warsaw M. Witek Institute of Nuclear Physics, Cracow

Data transfer over the wide area network with a large round trip time

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

LHCb Computing Strategy

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

HEP Grid Activities in China

Figure 1: cstcdie Grid Site architecture

CMS Computing Model with Focus on German Tier1 Activities

Experience of the WLCG data management system from the first two years of the LHC data taking

LHC and LSST Use Cases

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

The JINR Tier1 Site Simulation for Research and Development Purposes

A Simulation Model for Large Scale Distributed Systems

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Introduction to Grid Infrastructures

Overview of ATLAS PanDA Workload Management

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

STATUS OF PLANS TO USE CONTAINERS IN THE WORLDWIDE LHC COMPUTING GRID

ANSE: Advanced Network Services for [LHC] Experiments

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

The LHC computing model and its evolution. Dr Bob Jones CERN

Computing / The DESY Grid Center

1. Introduction. Outline

Federated data storage system prototype for LHC experiments and data intensive science

Computing Model Tier-2 Plans for Germany Relations to GridKa/Tier-1

Service Availability Monitor tests for ATLAS

DIRAC pilot framework and the DIRAC Workload Management System

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 21 July 2011

Challenges of the LHC Computing Grid by the CMS experiment

HammerCloud: A Stress Testing System for Distributed Analysis

Transcription:

Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011

Who I am 2 Computing science background Working in the field of computing for high-energy physics since 1992 software development for scientific data management (data transfer over high-throughput networks, mass storage & retrieval, cataloguing,...) and operations of IT services for research Involved in grid computing projects since 2000 and in particular in planning, prototyping, deploying and operating the computing infrastructure for the LHC technical leadership of the French contribution to the LHC computing grid served in the management board and grid deployment board of the WLCG collaboration Served as deputy director of IN2P3 computing centre, which hosts and operates the French WLCG tier-1 Since August 2010, visiting IHEP computing centre

Contents 3 LHC overview LHC Computing Grid Data distribution Data processing Perspectives Questions & Comments

4 Photo CERN

5 100 m 220 MB/s 50 MB/s 320 MB/s 100 MB/s Source: CERN

6 Scientific research at global scale High energy physics research Size, complexity and cost of instruments (particle accelerators and detectors) Highly skilled set of large teams to build and operate the instruments and the research infrastructure Teams, instruments and funding are multinational and highly distributed Required coordinated effort to answer specific research questions

7

Data acquisition 8 Design event rates 40 million collisions/second After hardware- and software-based filtering, 1 out of 200.000 collisions stored Experiment Data Rate [MB/sec] Alice 100 Atlas 320 CMS 220 LHCb 50 All Experiments 690 7 PB of additional raw data per nominal year* excluding derived and simulated data * Accelerator duty cycle assumption: 14 hours/day, 200 days/year

9 LHC Computing Grid: Architecture Grid-based infrastructure, composed of 150+ geographically dispersed sites, linked by high-speed networks Integration of resources into a coherent environment that can be used by any collaboration member from desktop to clusters in universities to high-performance computing centres in national laboratories Host Laboratory (CERN) data acquisition and initial processing, longterm data curation, distribution to tier-1 centres, 24x7 National computing centres viewed as online for the the data acquisition process, grid-enabled data service backed by a massstorage system, data-heavy analysis, 24x7 University and research group clusters simulation, interactive and batch end-user analysis Tier-0 1 site Tier-1 11 sites Tier-2 & Tier-3 140+ sites

10 Worldwide collaboration 49 countries 159 sites 2100+ users* * As of June 2011 Source: WLCG GStat

Scale: available resources 11 Aggregated resources made available by the participating sites for year 2011 for all LHC experiments Resource CPU Disk Tape 1.5M HS06 132 PB 130 PB Equivalent capacity of 175.000 recent CPU cores* Equivalent to 66.000 disk spins, 2TB each * Intel Xeon 5650 @ 2.6 GHz Source: WLCG Resource, Balance and Usage

Data distribution: infrastructure 12 CERN is connected to tier-1 sites through dedicated and redundant 10 Gbps links, provided by national academic and research networks Usage restricted to LHC data exchange General purpose research networks used for data exchange with tier-2s and tier-3s Source: LHCOPN

13 Data exchange: example of CMS Volume average aggregate daily data exchange over 60+ sites: 500 TB Rate average aggregate exchange data rate: 1.1 GB/sec Source: CMS PhEDEx

14 Grid usage: simulation + data processing LHC Computing Grid Monthly Job Throughput and Consumed CPU Time All LHC Experiments Number of Jobs, Millions 35 30 25 20 15 1 million jobs per day 100 90 80 70 60 50 40 CPU Time, Equivalent Number of CPU Cores, Thousands 10 30 5 20 10 0 2010-01 2010-04 2010-07 2010-10 2011-01 2011-04 0 Alice Atlas CMS LHCb CPU Time Source: EGEE Accounting Portal

15 Experiment-specific higher-level layers Experiments have developed and deployed higher-level layers on top of the infrastructure-level grid services Goal: to hide the grid plumbing to the end-user and add experiment-specific logic Examples data placement data and meta-data cataloguing dataset bookkeeping systems job management & workflow engines monitoring & alerting site status monitoring long-lived agents network link commissioning accounting

EGI & WLCG Operations Tools 16 Site registry: GOCDB RAL (UK) European Grid Infrastructure Headquarters EGI (NL) Ticketing system:ggus KIT (DE) EGI Operations portal CC-IN2P3 (FR) Dashboard, Monitoring, GridMap, Security CERN (CH) Resource Usage: EGEE Accounting Portal CESGA (ES) Central tools complemented by experiment-specific tools and procedures and by the operations tools deployed by each one of the national grid infrastructures in Europe and by OpenScienceGrid in USA fabio hernandez CAS/IHEP Compu@ng Centre CNRS/IN2P3 Compu@ng Centre

17 Achievements & lessons learned Global grid infrastructure is a reality and is delivering Used routinely by thousands of users for processing LHC data, including analysis Allows experiments to deliver physics results short time after data taking Same platform used also by other sciences, albeit at a smaller scale Network traffic higher than initially expected Network is very reliable: redundancy is key A priori data placement does not scale well Jobs sent to site hosting the data suppose multiple copies of the same data around the world Lot of data never used: refreshing all those caches generates a lot of network traffic and load on storage services Balance between centralization and distribution Network reliability makes centralization (with adequate redundancy) an easier to manage option than full distribution Complexity and sustainability Ongoing effort for support is high for experiments, sites and grid infrastructure operations Adapted from: Ian Bird, LCG-France Meeting, May 30/2011

Perspectives 18 Evolution of the computing models from a priori to dynamic data placement based on popularity: popular data is replicated when jobs are sent to a site unused data is removed from cache Use remote WAN I/O read a file remotely over the long distance network download a missing file from a dataset when needed Site interconnection evolution use of open network exchange points to allow tier-2s and tier-3s to exchange data among them and with tier-1s do not overload the general research and education network with LHC data

Perspectives (cont.) 19 Efficient usage of available resources Available resources starting to constraint experiments Evolution of hardware building blocks Many-cores machines, GPUs Cloud technology & virtualization

20

21 Questions & Comments

22 Backup Slides

Resource distribution 23 WLCG Distribution of Resources per Tier Year 2011 CPU Disk Tape Tier-0 Tier-1 Tier-2

CMS data distribution: IHEP 24 other sites IHEP IHEP other sites

Grid middleware 25 Infrastructure Applications End-user applications VO-specific middleware Higher-Level Services Foundation Services

Grid middleware (cont.) 26 Infrastructure Applications End-user applications VO-specific middleware Higher-Level Services Foundation Services Authentication individuals, hosts and services use X.509 certificates issued by accredited certification authorities Authorization Virtual organization membership provides information on the user s relationship with his/her virtual organization Computing element remote job submission to the site s batch system Storage element inter-site file transfer to and from disk- and tape-based storage Information system Accounting

Grid middleware (cont.) 27 Infrastructure Applications End-user applications VO-specific middleware Higher-Level Services Foundation Services Workload management job scheduling Data management scheduled file transfers file replica management meta-data management Virtual organization software installation