Data Access and Data Management

Similar documents
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

and the GridKa mass storage system Jos van Wezel / GridKa

Data Storage. Paul Millar dcache

A scalable storage element and its usage in HEP

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

The INFN Tier1. 1. INFN-CNAF, Italy

dcache Introduction Course

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

The Global Grid and the Local Analysis

Data storage services at KEK/CRC -- status and plan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Scientific data management

Philippe Charpentier PH Department CERN, Geneva

Distributed Data Management on the Grid. Mario Lassnig

Outline. ASP 2012 Grid School

Challenges of the LHC Computing Grid by the CMS experiment

HEP Grid Activities in China

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Understanding StoRM: from introduction to internals

glite Grid Services Overview

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

CMS Computing Model with Focus on German Tier1 Activities

AMGA metadata catalogue system

Parallel File Systems. John White Lawrence Berkeley National Lab

Cluster Setup and Distributed File System

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

The Echo Project An Update. Alastair Dewhurst, Alison Packer, George Vasilakakos, Bruno Canning, Ian Johnson, Tom Byrne

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

CHIPP Phoenix Cluster Inauguration

Introduction to SRM. Riccardo Zappi 1

Lessons Learned in the NorduGrid Federation

Status of KISTI Tier2 Center for ALICE

Promoting Open Standards for Digital Repository. case study examples and challenges

irods usage at CC-IN2P3: a long history

Data Transfers Between LHC Grid Sites Dorian Kcira

File Access Optimization with the Lustre Filesystem at Florida CMS T2

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

Embedded Filesystems (Direct Client Access to Vice Partitions)

The glite middleware. Ariel Garcia KIT

Andrea Sciabà CERN, Switzerland

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

CC-IN2P3 activity. irods in production: irods developpements in Lyon: SRB to irods migration. Hardware setup. Usage. Prospects.

Grid Computing Activities at KIT

Austrian Federated WLCG Tier-2

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Monitoring of large-scale federated data storage: XRootD and beyond.

The German National Analysis Facility What it is and how to use it efficiently

The Grid: Processing the Data from the World s Largest Scientific Machine

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Long Term Data Preservation for CDF at INFN-CNAF

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

The Legnaro-Padova distributed Tier-2: challenges and results

PoS(EGICF12-EMITC2)106

HPSS Treefrog Summary MARCH 1, 2018

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

EGEE and Interoperation

The Leading Parallel Cluster File System

ATLAS Experiment and GCE

An Introduction to GPFS

irods usage at CC-IN2P3 Jean-Yves Nief

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Storage and I/O requirements of the LHC experiments

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Benoit DELAUNAY Benoit DELAUNAY 1

Streamlining CASTOR to manage the LHC data torrent

1Z0-433

High Throughput WAN Data Transfer with Hadoop-based Storage

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

Lessons learned from Lustre file system operation

The LHC Computing Grid

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

First Experience with LCG. Board of Sponsors 3 rd April 2009

Data transfer over the wide area network with a large round trip time

Overview of HEP software & LCG from the openlab perspective

Data services for LHC computing

Storage Resource Sharing with CASTOR.

Computing / The DESY Grid Center

ARC integration for CMS

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Computing at Belle II

Parallel Storage Systems for Large-Scale Machines

The National Analysis DESY

LCG-2 and glite Architecture and components

Operating the Distributed NDGF Tier-1

Edinburgh (ECDF) Update

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

A copy can be downloaded for personal non-commercial research or study, without prior permission or charge

Grid Operation at Tokyo Tier-2 Centre for ATLAS

ALICE Grid Activities in US

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

PROOF-Condor integration for ATLAS

Data Movement & Tiering with DMF 7

Transcription:

Data Access and Data Management in grids Jos van Wezel

Overview Background [KIT, GridKa] Practice [LHC, glite] Data storage systems [dcache a.o.] Data and meta data

Intro KIT = FZK + Univ. of Karlsruhe Steinbuch Centre for Computing GridKa: German T1 for LHC About me Worked for GridKa 5 years Since 01/08 includes data storage at SCC, storage team of 12 people Data intensive computing, disk and tape systems, SAN, dcache, SRM European Large Scale Data Center: storage of scientific data

Storage of LHC data Disk Storage Home-grown parallel files systems with own access protocols dcache, CASTOR, DPM, xrootd Tape Storage Site dependent Sites re-use what is available (TSM, HPSS, Enstor) Transfer gridftp for WAN transfers (globus based including auth.) SRM for protocol negotiation

Data (reduction) in HEP Detector data RAW - tracks, hits : size ~2.5MB needs reconstruction RECO - detailed reconstructed info, particles suitable for detector studies and reconstruction code development AOD - most used objects for analysis (Analysis Object Data) this is what end scientists use to run their jobs. Skims - selected channels for focused research groups and individual scientists they are just pointers to AOD objects Ntuples - suitable for download to laptop and to run ntuple analysis (apply selections, cuts) Calibration and condition data must be available at every job side band data flow using distributed databases Data reduction chain Relative sizes of data tiers RECO AOD RAW 0.2 of RAW 0.2 of RECO not to scale!

What is current in LHC data handling DATA reduction: experiment s data is split into data tiers orthogonal to the usefullness of a particular activity or step in the workflow grid-wide replicated disk-based storage of most useful for analysis data. limited, scheduled access to the rest of data that is normally kept only on tape Grid jobs are sent to where data is no WAN access to data by jobs WAN transfer mostly bulk scheduled transfers Non-event data stored using completely different highly reliable and proven technology Oracle streams, web portals, MySQL, apache, squid Metadata is very experiment specific and seldom anyone trusts canned solutions almost everything from database to web portals are developed in house Relational databases is a natural choice for metadata store database development needs knowledge and development skills (read: time, money)

Storage in the grid assume that each computing element (worker node) has local access to storage resources data light applications jobs can be scheduled anywhere simple cp of redirected stdout/stderr will do shared home or work directory data intensive applications jobs are scheduled where the data is jobs are scheduled to prepare access to local data storage preparation space for data is reserved data is transferred (to the site or to local scratch) data is pinned i.e. locked on a specific storage tier (disk, tape, dvd) interaction with storage via storage services are offered by SE SRM file transfer

SRM managing storage in the grid provide access to storage resource (via SURLs) web based protocol policy layer on top of the site policy layer (independent) mediates file transfers, relays to file transfer services file pinning space reservation optimise file placement transfer protocol negotiation handle abort, suspend and resume of transfers ACL and directory management for SURLs

SE - Grid Storage Element SRM DataTransfer (gridftp) Internet ESRF Physical storage Diamond VO access File repository / catalog filename SURL filename SURL filename SURL ANKA/SCC Note: all these connections need public IPs

SRM does not cut it SRM possibilities: in principle all fair and nice but: implementation partly driven by eclectic use cases implementation of (some of ) the features is slow interoperability between implementations (intra and inter) is troublesome (mostly glite grids use SRM). Grid-enabled (SRM) storage is not successful a lot of problems using SRM/gridftp/globus technologies e.g. transfer problems

Data management middleware SRM implementations dcache DPM - Disk Pool Manager CASTOR StoRM - Storage Resource Manager BeStMan Provide SRM for storage management transport protocols unified namespace over various storage monitoring and accounting

DPM disk pool manager developed at CERN for smaller sites configuration is stored in a database support of ACLs France == DPM (except for the French WLCG-T1) reportedly reliable and stable operation

StoRM storage resource manager data and meta-data in local file system: local filename = site filename based originally on GPFS now adapters for Lustre or XFS full POSIX access from worker nodes policy based HSM interface to HPSS and TSM currently few installations

CASTOR CERN developed rather complicated everything is a LSF job all data lands on tape: optimised for tape CASTOR itself handles robotics and drives implementations at RAL, INFN and Taiwan

dcache DESY and FNAL development pool nodes manage storage separate data and metadata path poolmanager manages storage and directs traffic to pools back-end for MSS (tape) various protocols (gsiftp, dcap, srm, xrootd) widely established scales to large sites [ but huge resource footprint (cpu, admin) ]

Grid (glite) data management VO specific Data Management Compute Elements FileTransferService SRM gfal GridFTP NFS Transfer MetaData FS/cache MetaData File system Physical disk dcache Tape system Tape

GFAL library data management system

Non SRM solutions xrootd SLAC based recently integrated in ROOT environment probably also integration in CASTOR no metadata controller redirector selects asynchronously from 1 to 64 hosts can be setup with hierarchical redirectors (managers) Tape backend IRods policy based storage system experience and development also at CCIN2P3 Cluster file systems offer large multiple volume storage gpfs (interface to tape) lustre (more or less open source) Gfarm reference implementation of the grid-filesystem global filesystem, federates storage over WAN links re-uses local (worker-node) disks used in asia pacific region hot Cloud, dust, mist and nebula storage

Cloud computing and storage Simplify the marketing hype: I see 3 layers through the cloud Layer 1: applications and content (end user interfaces) example: hotmail Layer 2: platform and operating system (grids, interoperability, programming), facilitates layer 1, example: heroku.com, mosso.com (cloudfs), Layer 3: infrastructure (virtual hosting, virtual storage, large scale storage), facilitates layer 2, example: amazon, gogrid.com, linode.com Concentrate on the Layer 3: it will probably give you at least a reliable infrastructure Cloud infrastructure offerings: S3, GoGrid, Microsoft SSDS Forget cloud storage This is as much as I can say today, ask me again in a year

Revisit Raw data transport via gridftp storage with DPM, dcache disk and tape systems Meta data transport/distribution via experiment specific software use of proven technology LFC

Meta data data replication, relocation, migration are all based on meta-data How to find the proper data and not use filenames like these: /mount/datatape/fdr08_run2/raw/fdr08_run2.0052301.physics_jet.daq.raw.o3/fdr08_run2.0052301.physics_jet.daq.ra W.o3._lb0004._0001.1 currently there is limited meta-data handling support limited means: unaware of specific requirements basically just file names (inode info) WLCG experiments have cooked their own solutions AMI (ATLAS), RefDB (CMS), ALien (ALICE) OGSA-DAI GGF standard for Grid data(base) access AMGA ARDA metadata grid application several interfaces and front-ends can replace LFC and for other relational MD handling strong security (in use for BioMed, ATLAS/LHCb Ganga) file system like arrangement of MD Starting from scratch? probably no way around some development tools to develop workflow exist (see previous talks) follow the data

Summary / Conclusions Reduce work data sizes match size to requirement of the step in the workflow Increase file sizes reduces the effect of large handling overhead in transfers from to sites and tape Bulk data transfer/handling methods exist and can be used use with caution FTS and SRM can be done without See Derek's talk for examples from CMS Data storage New Internet flowers are still too fragile For long time storage you need tape. Count long in number of accesses per year. Meta-data transfer/handling methods are not generic must be developed will probably need lots of thought and development

Summary/Questions 3. expected GridFTP performance: 50% of line capacity 7. File transfer service needed: yes. Suggest Phedex if still alive 9. Small files in the grid: don t do that. Try to assemble, tar, block, zip 11. Intranet to internet traffic: open gridftp data port range for known sites via acl 13. Access to available data: setup gridftp rr DNS 14. see 7 17. Is LFC sufficient: not sure, but look at AMGA 18. naming scheme: yes, although purists hate this 19. Tape needed: archival of data on live disk is costly

Storage Requirements what you should know before you write your proposal or attach to the grid High available (how high is high, well, the costs are high) every 9 after 99. doubles the costs maintenance windows possible High reliable (again how high is high) can you sustain a reboot now and then should be taken care of via software (failover, round robin dns etc) Persistancy how long to keep the data High data rates (again.. ist getting boring) from WNs to storage from storage to archive Interface to the storage API use open and accepted standards compatible to existing Grid storage (e.g. Glite) X.509 end to end security and VO Access control access pattern to and from WNs and repository size of files size of reads and writes proportion read to writes

Some Hardware Globals For large filesystems (>0.5TB) forget EXT use: XFS: comes with SL5 ZFS: Solaris only (would the grid exist without Linux?) GPFS: rocks Lustre: rocks mostly Watch out for silent data corruption use checksums see: http://cern.ch/peter.kelemen/talk/2007/kelemen-2007-c5- Silent_Corruptions.pdf and various vendor initiatives Use RAID6 reduce rebuild risk reduce time to failure reduce silent data corruption waiting for T10 DIF (SCSI Data Integrity Field)

Storage costs storage speed * storage size = constant (your budget?) high speed disk storage TB size disk units of 400 GB (SAS) intra-cluster Infiniband or SAN 1000 2000 /TB + 1200 /TB power/cooling (0.15 /kwh) bulk disk storage PB size disk units of 1 TB (SATA) inter-cluster Ethernet 600 1500 /TB + 600 /TB power/cooling tape storage PB ( - EB) size cartridges of 1 TB Ethernet or SAN 60 120 /TB + 1 /TB power/cooling costs depend on size of the procurement startup or established vendor maintenance model cloud storage GB size wide area 1000 3000 /TB (http://www.aw20.co.uk/storagecosts.cfm) Read my lips: 1kB disk space is 1000 and not 1024 bytes