Data Handling for LHC: Plans and Reality

Similar documents
Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

The LHC Computing Grid

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Storage and I/O requirements of the LHC experiments

Using the In-Memory Columnar Store to Perform Real-Time Analysis of CERN Data. Maaike Limper Emil Pilecki Manuel Martín Márquez

ATLAS Experiment and GCE

CERN s Business Computing

Batch Services at CERN: Status and Future Evolution

Data Transfers Between LHC Grid Sites Dorian Kcira

CHIPP Phoenix Cluster Inauguration

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Big Data Analytics and the LHC

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Virtualizing a Batch. University Grid Center

CouchDB-based system for data management in a Grid environment Implementation and Experience

Grid Computing Activities at KIT

CERN and Scientific Computing

Overview. About CERN 2 / 11

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Storage Resource Sharing with CASTOR.

1. Introduction. Outline

and the GridKa mass storage system Jos van Wezel / GridKa

Travelling securely on the Grid to the origin of the Universe

PoS(EGICF12-EMITC2)106

ATLAS Distributed Computing Experience and Performance During the LHC Run-2

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Distributed Data Management on the Grid. Mario Lassnig

The Grid: Processing the Data from the World s Largest Scientific Machine

First Experience with LCG. Board of Sponsors 3 rd April 2009

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

IEPSAS-Kosice: experiences in running LCG site

LHCb Computing Resources: 2018 requests and preview of 2019 requests

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

The CMS Computing Model

Grid Computing a new tool for science

HEP replica management

Tackling tomorrow s computing challenges today at CERN. Maria Girone CERN openlab CTO

Experience of the WLCG data management system from the first two years of the LHC data taking

CERN Tape Archive (CTA) :

Grid Data Management

The Global Grid and the Local Analysis

High-Energy Physics Data-Storage Challenges

The grid for LHC Data Analysis

Data Storage. Paul Millar dcache

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Austrian Federated WLCG Tier-2

Data Management for the World s Largest Machine

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Andrea Sciabà CERN, Switzerland

Università degli Studi di Ferrara

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

CMS Computing Model with Focus on German Tier1 Activities

Summary of the LHC Computing Review

Evaluation of the Huawei UDS cloud storage system for CERN specific data

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

HEP Grid Activities in China

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

The INFN Tier1. 1. INFN-CNAF, Italy

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Philippe Charpentier PH Department CERN, Geneva

ISTITUTO NAZIONALE DI FISICA NUCLEARE

UW-ATLAS Experiences with Condor

The PanDA System in the ATLAS Experiment

The LHC Computing Grid

Lessons Learned in the NorduGrid Federation

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Data oriented job submission scheme for the PHENIX user analysis in CCJ

LHCb Computing Strategy

Towards Network Awareness in LHC Computing

Grid Computing: dealing with GB/s dataflows

arxiv: v1 [cs.dc] 20 Jul 2015

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

ATLAS NOTE. December 4, ATLAS offline reconstruction timing improvements for run-2. The ATLAS Collaboration. Abstract

Grid Computing at the IIHE

Overcoming Obstacles to Petabyte Archives

ATLAS 実験コンピューティングの現状と将来 - エクサバイトへの挑戦 坂本宏 東大 ICEPP

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Data services for LHC computing

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Prompt data reconstruction at the ATLAS experiment

Accelerating Throughput from the LHC to the World

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

LHCb Computing Resource usage in 2017

A L I C E Computing Model

Visita delegazione ditte italiane

WORK PROJECT REPORT: TAPE STORAGE AND CRC PROTECTION

Data Access and Data Management

Streamlining CASTOR to manage the LHC data torrent

Transcription:

Data Handling for LHC: Plans and Reality Tony Cass Leader, Database Services Group Information Technology Department 11 th July 2012 1

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 2

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 3

Familiar, but not Fundamental Periodic Table courtesy of wikipedia 4

5 5

The Standard Model Fundamental and well tested, but Why do particles have mass? Why is there no antimatter? Are these the only particles? 4 th Generation? LEP Discovery Do Fermions have bosonic partners and vice-versa? How does Gravity fit in? 6

Other interesting questions How do quarks LEP and gluons behave at ultra-high temperatures and densities? LHC What is dark matter? Supersymmetric particles? 7

How to find the answers? Smash things together! Images courtesy of hyperphysics 8

CERN Methodology The fastest racetrack on the planet Trillions of protons will race around the 27km ring in opposite directions over 11,000 times a second, travelling at 99.999999991 per cent the speed of light. 9

Energy of a 1TeV Proton 10 10

Energy of 7TeV Beams Two nominal beams together can melt ~1,000kg of copper. Current beams: ~100kg of copper. 11 11

CERN Methodology The emptiest space in the solar system To accelerate protons to almost the speed of light requires a vacuum as empty as interplanetary space. There is 10 times more atmosphere on the moon than there will be in the LHC. 12

CERN Methodology One of the coldest places in the universe With an operating temperature of about -271 degrees Celsius, just 1.9 degrees above absolute zero, the LHC is colder than outer space. 13

CERN Methodology The hottest spots in the galaxy When two beams of protons collide, they will generate temperatures 1000 million times hotter than the heart of the sun, but in a minuscule space. 14

CERN Methodology The biggest most sophisticated detectors ever built To sample and record the debris from up to 600 million proton collisions per second, scientists are building gargantuan devices that measure particles with micron precision. 15

Compact Detectors! 16

17

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 19

We are looking for rare events! number of events = Luminosity Cross section 2010 Luminosity: 45pb -1 70 billion pb à 3 trillion events! * * N.B. only a very small fraction saved! ~250x more events to date Higgs (m H =120 GeV) : 17 pb à 750 events 7 e.g. potentially ~1 Higgs in every 300 billion interactions! Emily Nurse ATLAS 20

So the four LHC Experiments ATLAS - General purpose - Origin of mass - Supersymmetry - 2,000 scientists from 34 countries CMS - General purpose - Origin of mass - Supersymmetry - 1,800 scientists from over 150 institutes ALICE LHCb - heavy ion collisions, to create quark-gluon plasmas - to study the differences between matter and antimatter - 50,000 particles in each collision - will detect over 100 million b and b-bar mesons each year 21

So the four LHC Experiments 22

generate lots of data The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments detectors 23

generate lots of data CASTOR data wri.en, reduced by online computers to01/01/2010 to 29/6/2012 (in PB) 60 a few hundred good events per 50 second. 40 ATLAS Zà μμ event from 2012 data with 25 reconstructed vertices USER NTOF NA61 NA48 LHCB 30 COMPASS CMS 20 ATLAS Which are recorded on disk and magnetic tape Zà μμ at 100-1,000 MegaBytes/sec ~15 PetaBytes per year 0 for all four experiments 10 AMS ALICE Current forecast ~ 23-25 PB / year, 100-120M files / year ~ 20-25K 1 TB tapes / year Archive will need to store 0.1 EB in 2014, ~1Billion files in 2015 24

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 25

What is the technique? Break up a Massive Data Set 26

What is the technique? into lots of small pieces and distribute them around the world 27

What is the technique? analyse in parallel 28

What is the technique? gather the results 29

What is the technique? a and discover the Higgs boson: Nice result, but is it novel? 30

Is it Novel? Maybe not novel as such, but the implementation is Terrascale computing that is widely appreciated! 31

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 32

Requirements! Computing Challenges Summary of Computing Resource Requirements All experiments - 2008 From 100,000 LCG TDR - June PCs 2005 15PB/year to tape CERN All Tier-1s All Tier-2s Total CPU (MSPECint2000s) 25 56 61 142 Disk O(100PB) (PetaBytes) disk cache 7 31 19 57 Tape (PetaBytes) 18 35 53 Worldwide Collaboration A Problem and a Solution Tier1s 4,000HS06 = 1MSPECint2000 33

Timely Technology! The WLCG project deployed to meet LHC computing needs. The EDG and EGEE projects organised development in Europe. (OSG and others in the US.) The Grid 34

Compute Element Grid Middleware Basics Standard interface to local workload management systems (batch scheduler) Storage Element Standard interface to local mass storage systems Resource Broker Tool to analyse user job requests (input data sets, cpu time, data output requirements) and route these to sites according to data and cpu time availability. Many implementations of the basic principles: Globus, VDT, EDG/EGEE, NorduGrid, OSG 35

Job Scheduling in Practice Issue Grid sites generally want to maintain a high average CPU utilisation; easiest to do this if there is a local queue of work to select from when another job ends. Users are generally interested in turnround times as well as job throughput. Turnround is reduced if jobs are held centrally until a processing slot is known to be free at a target site. Solution: Pilot job frameworks. Per-experiment code submits a job which chooses a work unit to run from a per-experiment queue when it is allocated an execution slot at a site. Pilot job frameworks separate out site responsibility for allocating CPU resources from Experiment responsibility for allocating priority between different research sub-groups. 36 36

Data Issues Reception and long-term storage Delivery for processing and export Distribution Metadata distribution 700MB/s 700MB/s 420MB/s 2600MB/s (3600MB/s) (>4000MB/s) 1430MB/s Scheduled work only and we need ability to support 2x for recovery! 37

(Mass) Storage Systems After evaluation of commercial alternatives in the late 1990s, two tape-capable Mass storage systems have been developed for HEP: CASTOR: an integrated mass storage system dcache: a disk pool manager that interfaces to multiple tape archives (Enstore @ FNAL, IBM s TSM) dcache is also used a basic disk storage manager Tier2s along with the simpler DPM 38

A Word About Tape Our data set may be massive, but 35 30 25 20 15 10 5 0 CERN Archive file size distribution, in % ~195MB average only increasing slowly after LHC startup! It is made up of many small files Drive write performance, CASTOR tape format (ANSI AUL) 120000 100000 which is bad for tape speeds: Write speed (KB/s) 80000 60000 40000 20000 0-500 0 500 1000 1500 2000 2500 IBM AUL SUN AUL Average write drive speed: < 40MB/s (cf native drive speeds: 120-160MB/s) Small increases with new drive generations -20000 file size (MB) 39

Tape Drive Efficiency So we have to change tape writing policy Drive write performance, buffered vs nonbuffered tape marks 140 Average drive performance (MB/s) for CERN Archive files 120 120 100 100 speed, MB/s 80 60 CASTOR present (3sync/file) CASTOR new (1sync/ file) 80 60 40 CASTOR future (1 sync / 4GB) 40 20 20 0 0 200 400 600 file size, MB 0 3 sync/file 1 sync/file 1 sync / 4GB 40

Users aren t the only writers! Bulk data storage requires space! Fortunately 25000 Tape capacity Repack in 1 will year: continue to double every 2-3yrs 35 & 50TB @ 35 MB/s tape demonstrations in 2010 CERN has ~50K slots: ~0.25EB with new T10KC cartridges 20000 Unfortunately You have to copy data Repack from in 1 year: 15000 old cartridges to 500M-1G new or ~28 drives you run out of slots @ 63 MB/s 100M-500M 10M-100M Data rates for repack will soon Repack exceed in 1 year: LHC rates drive / days 10000 Repack in 1 year small files (<500M) 2012: 55PB = 1.7GB/s sustained 2015: 120PB = 3.8GB/s sustained 5000 time to migrate 55 PB (2012), drive/days, by file size ~55 drives ~18 drives @ 104 MB/s C.f. PP LHC rates: ~0.7GB/s; PbPb peak rate of 2.5GB/s And! 0 All LEP data fits on ~150 cartridges, or 30 new T10KCs 3 TM / file 1 TM / file TM / 4GB Automatic data duplication becomes a necessity >2G 1G-2G 1M-10M 100K-1M 10K-100K <10K 41 41

Media Verification Data in the archive cannot just be written and forgotten about. Q: can you retrieve my file? A: let me check err, sorry, we lost it. Proactive and regular verification of archive data required Ensure cartridges can be mounted Ensure data can be read and verified against metadata (checksum, size, ) Do not wait until media migration to detect problems Opportunistic scanning when resources available

Storage vs Recall Efficiency Efficient data acceptance: Have lots of input streams, spread across a number of storage servers, wait until the storage servers are ~full, and write the data from each storage server to tape. Result: data recorded at the same time is scattered over many tapes. How is the data read back? Generally, files grouped by time of creation. How to optimise for this? Group files on to a small number of tapes. Ooops 43 43

Keep users away from tape 44 44

CASTOR & EOS 45

Data Access Realism Mass Storage systems work well for recording, export and retrieval of production data. Good: This is what they were designed for! But some features of the CASTOR system developed at CERN are unused or ill-adapted experiments want to manage data availability file sizes, file-placement policies and access patterns interact badly alleviated by experiment management of data transfer between tape and disk analysis use favours low latency over guaranteed data rates aggravated by experiment management of data; automated replication of busy datasets is disabled. But we should not be too surprised: storage systems were designed many years before analysis patterns were understood. (If they are even today ) 46 46

Data Distribution The LHC experiments need to distribute millions of files between the different sites. The File Transfer System automates this handling failures of the underlying distribution technology (gridftp) ensuring effective use of the bandwidth with multiple streams, and managing the bandwidth use ensuring ATLAS, say, is guaranteed 50% of the available bandwidth between two sites if there is data to transfer 47

Data Distribution FTS uses the Storage Resource Manager as an abstract interface to the different storage systems A Good Idea but this is not (IMHO) a complete storage abstraction layer and anyway cannot hide fundamental differences in approaches to MSS design Lots of interest in the Amazon S3 interface these days; this doesn t try to do as much as SRM, but HEP should try to adopt de facto standards. Once you have distributed the data, a file catalogue is needed to record which files are available where. LFC, the LCG File Catalogue was designed for this role as a distributed catalogue to avoid a single point of failure, but other solutions are also used And as many other services rely on CERN, the need for a distributed catalogue is no longer (seen as ) so important. 48

Looking more widely I Only a small subset of data distributed is actually used Experiments don t know a priori which dataset will be popular CMS has 8 orders magnitude in access between most and least popular Dynamic data replication: create copies of popular datasets at multiple sites. 49 49

University n.10 6 MIPS m Tbyte Robot" Looking more widely II Network capacity is readily available 622 Mbits/s" and it is reliable: FNAL 4.10 7 MIPS Desk" 110 Tbyte So let s simply tops" Robot" copy data from another site if Desk" tops" it is not available locally rather than recalling from tape or failing the job. N x 622 Mbits/s" Inter-connectedness is increasing with the design of LHCOne to deliver (multi-) 10Gb links Desk" CERN between tops" Tier2s. n.10 7 MIPS m Pbyte Robot" MONARC Fibre 2000 cut during tests in 2009 Capacity reduced, but alternative links took over 50 50

Metadata Distribution Conditions data is needed to make sense of the raw data from Average the experiments Streams Throughput 45000 Data on items such as temperatures, detector voltages 40000 40000 40000 and gas compositions 37000 is needed to turn the ~100M Pixel 35000 34000 image of the event into a meaningful description in 30000 30000 terms of particles, tracks and momenta. LCR/s 25000 25000 This 20000 data is in an RDBMS, Oracle at CERN, and 15000 presents interesting distribution challenges 10000 One 5000 cannot tightly 4600 couple databases across the loosely 2800 1700 coupled 0 WLCG sites, for example row size = 100B row size = 500B row size = 1000B Oracle streams technology improved to deliver the Oracle 10g Oracle 11gR2 Oracle 11g R2 (opnmized) necessary performance, and http caching systems developed to address need for cross-dbms distribution. 51

Job Execution Environment Jobs submitted to sites depend on large, rapidly changing libraries of experiment specific code Major problems ensue if updated code is not distributed to every server across the grid (remember, there are x0,000 servers ) Shared filesystems can become a bottleneck if used as a distribution mechanism within a site. Approaches 2011 ATLAS Today: 22/1.8M files ATLAS Today: 921/115GB Pilot job framework can check to see if the execution host has the correct environment A global caching file system: CernVM-FS. 52 52

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 53

Learning from our mistakes We have just completed a review of WLCG operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown. Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation. Clouds Towards the Future Identity Management 54

Learning from our mistakes We have just completed a review of WLCG operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown. Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation. Clouds Towards the Future Identity Management 55

Learning from our mistakes We have just completed a review of WLCG operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown. Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation. Clouds Towards the Future Identity Management 56

Integrating With The Cloud? User Site A Slide courtesy of Ulrich Schwickerath Central Task Queue Payload pull Instance requests VO service Site B Site C Shared Image Repository (VMIC) Image maintainer Cloud bursting Commercial cloud 57

Learning from our mistakes We have just completed a review of WLCG operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown. Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation. Clouds Towards the Future Identity Management 58

Learning from our mistakes We have just completed a review of WLCG operations and services based on 2+ years of operations with the aim to simplify and harmonise during the forthcoming long shutdown. Key areas to improve are data management & access and exploiting many/multi-core architectures, especially with use of virtualisation. Clouds Towards the Future Identity Management 59

Compute Element Grid Middleware Basics Standard interface to local workload management systems (batch scheduler) Storage Element Standard interface to local mass storage systems Resource Broker Tool to analyse user job requests (input data sets, cpu time, data output requirements) and route these to sites according to data and cpu time availability. Many implementations of the basic principles: Globus, VDT, EDG/EGEE, NorduGrid, OSG 60

Trust! 61

One step beyond? 62

HEP, CERN, LHC and LHC Experiments LHC Computing Challenge The Technique In outline In more detail Towards the Future Summary Outline 63

Summary WLCG has delivered the capability to manage and distribute the large volumes of data generated by the LHC experiments and the excellent WLCG performance has enabled physicists to deliver results rapidly. HEP datasets may not be the most complex or (any longer) massive, but in addressing the LHC computing challenges, the community has delivered the world s largest computing Grid, practical solutions to requirements for large-scale data storage, distribution and access, and a global trust federation enabling world-wide collaboration. 64 64

Thank You! And thanks to Vlado Bahyl, German Cancio, Ian Bird, Jakob Blomer, Eva Dafonte Perez, Fabiola Gianotti, Frédéric Hemmer, Jan Iven, Alberto Pace and Romain Wartel of CERN, Elisa Lanciotti of PIC and K. De, T. Maeno, and S. Panitkin of ATLAS for various unattributed graphics and slides. 65