Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Similar documents
Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

The CMS Computing Model

Andrea Sciabà CERN, Switzerland

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Data Transfers Between LHC Grid Sites Dorian Kcira

CouchDB-based system for data management in a Grid environment Implementation and Experience

HIGH ENERGY PHYSICS ON THE OSG. Brian Bockelman CCL Workshop, 2016

Data services for LHC computing

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Storage and I/O requirements of the LHC experiments

The ATLAS Production System

First Experience with LCG. Board of Sponsors 3 rd April 2009

Summary of the LHC Computing Review

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

and the GridKa mass storage system Jos van Wezel / GridKa

CERN and Scientific Computing

The LHC Computing Grid

The LHC Computing Grid

1. Introduction. Outline

WLCG Transfers Dashboard: a Unified Monitoring Tool for Heterogeneous Data Transfers.

Distributed Data Management on the Grid. Mario Lassnig

Outline. ASP 2012 Grid School

Towards Network Awareness in LHC Computing

Grid Computing Activities at KIT

Virtualizing a Batch. University Grid Center

CERN s Business Computing

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The Grid: Processing the Data from the World s Largest Scientific Machine

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Computing at Belle II

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Experience of the WLCG data management system from the first two years of the LHC data taking

Data Management for the World s Largest Machine

Storage Resource Sharing with CASTOR.

Challenges of the LHC Computing Grid by the CMS experiment

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Overview. About CERN 2 / 11

ISTITUTO NAZIONALE DI FISICA NUCLEARE

PoS(EGICF12-EMITC2)106

Accelerating Throughput from the LHC to the World

Access to ATLAS Geometry and Conditions Databases

Top Trends in DBMS & DW

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Evolution of Database Replication Technologies for WLCG

Evolution of Database Replication Technologies for WLCG

Big Data Analytics and the LHC

Batch Services at CERN: Status and Future Evolution

Experiences with the new ATLAS Distributed Data Management System

Grid Computing a new tool for science

Federated data storage system prototype for LHC experiments and data intensive science

Was ist dran an einer spezialisierten Data Warehousing platform?

Tier-2 structure in Poland. R. Gokieli Institute for Nuclear Studies, Warsaw M. Witek Institute of Nuclear Physics, Cracow

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

LHCb Computing Strategy

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

High-Energy Physics Data-Storage Challenges

Support for multiple virtual organizations in the Romanian LCG Federation

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

CSCS CERN videoconference CFD applications

The INFN Tier1. 1. INFN-CNAF, Italy

Travelling securely on the Grid to the origin of the Universe

Benchmarking third-party-transfer protocols with the FTS

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

Grid Data Management

LHC and LSST Use Cases

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

Austrian Federated WLCG Tier-2

The Compact Muon Solenoid Experiment. Conference Report. Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland

150 million sensors deliver data. 40 million times per second

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

arxiv: v1 [cs.dc] 20 Jul 2015

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

ANSE: Advanced Network Services for [LHC] Experiments

A L I C E Computing Model

The Global Grid and the Local Analysis

Data transfer over the wide area network with a large round trip time

The grid for LHC Data Analysis

HEP Grid Activities in China

IEPSAS-Kosice: experiences in running LCG site

The EU DataGrid Testbed

Data Storage. Paul Millar dcache

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

arxiv: v1 [cs.dc] 12 May 2017

High Energy Physics data analysis

ICN for Cloud Networking. Lotfi Benmohamed Advanced Network Technologies Division NIST Information Technology Laboratory

Transcription:

Worldwide Production Distributed Data Management at the LHC Brian Bockelman MSST 2010, 4 May 2010

At the LHC http://op-webtools.web.cern.ch/opwebtools/vistar/vistars.php?usr=lhc1 Gratuitous detector pictures:

Forewarning... The LHC is a particle accelerator with 4 primary detectors. ATLAS, CMS, LHCb, and ALICE Each detector is massive, and the collaborations are massive (CMS and ATLAS have over 1,000 workers each). I work for CMS, so my knowledge is biased towards that experiment.

Mission for Today* *all in about 25 minutes Today, I want to convince you that: LHC experiments have a massive data management challenge. Data management is based on files and explicit file movement / pre-placement. After many prototypes, the LHC formed around a common hierarchical model. We have evolved this model in the last 5 years.

The Problem

Setting the Scale 2000 MB/s Destination sites: CMS on a Sunday - 52 sites (most are run on an 8x5 basis) regularly moving data at over 2200 MB/s.

Setting the Scale In the last 30 days: 2.5 PB moved 69 sites worldwide Overall: 16.7M files in the system 28 PB of replicas in the system. Only 1.5 months of serious data taking. The data management challenge is just beginning

Data Management at a LHC experiment Everything fits neatly into one of two categories. Today, we discuss offline Online - Centralized - Detectors are on this network. - Computing work critical to operation of detectors - Highly secured, highly walled-off from outside world. Offline - Very Large - thus massively decentralized - Majority of computing resources. - Majority of users. - Processing, archival storage, analysis of data, etc. - the grid

Offline Activities Data-Ops: transformation of raw data to usable data formats, grouping data according to physics content. Simulation: Simulating collisions in the detectors. Analysis: Filtering data for interesting events. Turn to simpler data format. Physics: Deep understanding of event content. Interactive and done on laptops.

The Solution

CMS Circa 1996 Each link was cached output of an object DB. From CERN/LHCC 96-45

LHC, circa 2001 The LHC experiments prototyped a tiered model of data distribution around 2001, based upon grid technologies. T0: CERN (>5000 cores; many PB of disk and tape) T1: Regional centers around the world (1000-5000 cores; PBs of disk and tape). T2: Smaller, mostly university computational centers. (300-1000 cores; 100s of TB of disk)

LCG hierarchy This diagram has been re-drawn hundreds of times in thousands of presentations. T0 US T1 FR T1 UK T1 All T1s connect US T2 US T2 T1 T1 T1 T1 T1 T2 T1 T2 T2s only connect to their regional T1 Legend Site Network link

The WLCG In addition to the tiered data distribution, the LHC experiments adopted an idea of a common computing fabric - the Worldwide LHC Computing Grid (WLCG). Composed of smaller grids with names like EGEE (EU), OSG (US), NorduGrid. Took awhile to figure out what the grid means.

LCG Data Management In the tiered model, data moves downward*. The #1 axiom of LCG data management is that jobs move to data. The indivisible unit of data in particle physics is an event - basically, a readout of the detector when 2 particles collide. *experiment data; simulation data is the exception, but it is order magnitude smaller

LCG Data Management Events are grouped into files. This is the smallest unit computing deals with. from point A to point B. The files/explicit movement metaphor was in Data is explicitly managed in terms of files moving response to the field s 5-10 years of object database/streaming experience: Wonderful reference: http://wwwdb.cs.wisc.edu/cidr/cidr2005/papers/p06.pdf

LCG TDR, 2005 From 2001 to 2005, LCG experiments built lots of research/prototype around this model. In 2005, the LCG Technical Design Report set out the baseline services for the grid. Idea: during 2005-2007, these services into production through a series of increasingly-large services challenges

Service Challenge Average data transfer volume Logarithmic scale; TB / day Terabytes per day 1,000.0 100.0 10.0 1.0 0.1 Expected burst level in the first year 2004-01 2005-01 2006-01 2007-01 2008-01 2009-01 2010-01 DC04 SC2 SC3 SC4 CSA06 Load test CCRC08 Debug General Beam 100 TB

LCG Baseline Services The LCG has the following data management services: File Catalogues: Registers file replicas centrally. File Transfer Service: Given two URLs, transfer from point A to point B. Storage Element/SRM: Ability to interact with large storage systems through uniform grid protocol. Databases for detector conditions - configuration of detector subsystems at a given point in time.

Data Placement Each experiment builds their own services on top of the baseline, which might represent different approaches: CMS: About 1/2 of data is subscribed centrally; about 1/2 is subscribed by physics groups. Fairly decentralized! Full mesh of transfer links (inc. T2 to T2). ATLAS: All data centrally subscribed. Transfer links same as original model. Both experiments use the File Transfer Service (FTS) baseline service to automatically move the data once the experiment decides the source and dest.

Crisis in Storage

Problems in Data Management Implementations reveal many issues: Explicit movement of files is hard. Everything can fail! This confuses and frustrates users to no end. Explicit placement implies keeping location data (file catalog). Global file catalogs are almost always out of sync with our local systems. We are not a walled garden - we must (safely) export data off the grid to laptops and institutes.

Catalogs, oh my! We have up to 3 layers of catalog info: 1. Each experiment maintains a catalog of file/dataset locations. 2. Sites maintain a mapping from experiment name to filesystem name. 3. Filesystem maintains mapping of names to location on one of many disk servers.

Catalogs Each transition between catalog introduces inconsistencies. Every day, humans must deal with these by hand. Software produced by three different layers - experiment, grid, and site - and they don t auto-sync with each other. Solutions? None yet... Glad we didn t design DNS!

Get the data out What to do with the output of user jobs? Getting the data off the grid and closer to users is essential. Science happens when playing with data on laptops. Downloading data must be brain-dead easy, so Joe Q Physicist can download data. HTTP(s)? Some users have institutional computing resources larger than our data centers. We must be able to protect ourselves from (inadvertent) DOS attacks. Our storage elements are a bottleneck for users - good (protects sites from overload) or bad (slows science)?

Conditions distribution Conditions are metadata about the detector when the event data was recorded. Relatively small, can change quickly; being constantly improved. Experiments started by streaming database contents to sites. This turned into copying SQL-lite (or other format) database files around. A huge pain to track lots of small files!

Conditions distribution Conditions data is now transported through hierarchy of HTTP caches. Top-level is a Java web servlet at CERN. Each site has a local squid cache. Data transfer and access is 100% transparent to user, fast, and repeated accesses are low-latency. Thought: Can such a system scale from metadata to the real event data? Users understand how to use HTTP(s) more than grid protocols! Can we think of the LHC as a contentdistribution network?

Futures In the 1990 s, we had a distributed object-oriented database model of DM. In the 2000 s, we had a hierarchical(ish) explicit file management and movement model. Solutions to central workflows are in hand. In the 2010 s, it is about the physicist! Implicit instead of explicit placement? Reduce complexity to increase reliability for the end-user?

Futures We have a hard rule about data being in place before jobs are sent. Not very cloud-like. Done because in 1996 we thought the networks were fragile. But today s networks are pretty impressive! cycles if we can t stream data? How can we take advantage of opportunistic CPU Can we start jobs if 95% of the data is in place? 75? Is it quicker to retransfer files from another site instead of pulling it from tape? Should one copy always be on disk to avoid tape latency?

Conclusions The LHC has been working on managing data at the petascale for 15+ years. This led to a common model that solidified in 2005, and we have been working to make it production ready since. The CS research is 1/5 of the effort necessary to make something reliable in production. There has been continuous evolution of our data models to this point - and all signs point to continued innovation.

Departing Thoughts The most important physics is done on the laptop, not the cluster. Data management on the LHC is about reliably and quickly transforming 1PB of data to the 1GB the user cares about.