Data Storage. Paul Millar dcache

Similar documents
Lessons Learned in the NorduGrid Federation

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

Data Access and Data Management

dcache Introduction Course

Scientific data management

LCG data management at IN2P3 CC FTS SRM dcache HPSS

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Distributed Data Management on the Grid. Mario Lassnig

A scalable storage element and its usage in HEP

The INFN Tier1. 1. INFN-CNAF, Italy

Challenges in Storage

Introduction to SRM. Riccardo Zappi 1

Data Transfers Between LHC Grid Sites Dorian Kcira

The LHC Computing Grid

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

dcache Ceph Integration

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Operating the Distributed NDGF Tier-1

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

Federated Data Storage System Prototype based on dcache

Patrick Fuhrmann (DESY)

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Grid Data Management

Data transfer over the wide area network with a large round trip time

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Experience of Data Grid simulation packages using.

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints

and the GridKa mass storage system Jos van Wezel / GridKa

First Experience with LCG. Board of Sponsors 3 rd April 2009

Outline. ASP 2012 Grid School

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Staggeringly Large Filesystems

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Storage for HPC, HPDA and Machine Learning (ML)

Using a dynamic data federation for running Belle-II simulation applications in a distributed cloud environment

New data access with HTTP/WebDAV in the ATLAS experiment

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

HEP replica management

dcache: sneaking up on NFS4.1

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Cluster Setup and Distributed File System

Challenges of Big Data Movement in support of the ESA Copernicus program and global research collaborations

CHIPP Phoenix Cluster Inauguration

PoS(EGICF12-EMITC2)106

Storage and I/O requirements of the LHC experiments

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Study of the viability of a Green Storage for the ALICE-T1. Eduardo Murrieta Técnico Académico: ICN - UNAM

From an open storage solution to a clustered NAS appliance

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Andrea Sciabà CERN, Switzerland

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

Status of KISTI Tier2 Center for ALICE

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Cost-Effective Virtual Petabytes Storage Pools using MARS. FrOSCon 2017 Presentation by Thomas Schöbel-Theuer

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

CERN European Organization for Nuclear Research, 1211 Geneva, CH

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Promoting Open Standards for Digital Repository. case study examples and challenges

IBM Storwize V7000 Unified

VerifyFS in Btrfs Style (Btrfs end to end Data Integrity)

Handling Big Data an overview of mass storage technologies

GlusterFS and RHS for SysAdmins

Analytics Platform for ATLAS Computing Services

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

dcache, activities Patrick Fuhrmann 14 April 2010 Wuppertal, DE 4. dcache Workshop dcache.org

Long Term Data Preservation for CDF at INFN-CNAF

Federated data storage system prototype for LHC experiments and data intensive science

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

EMI Data, the unified European Data Management Middleware

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Department of Physics & Astronomy

PARALLEL PROCESSING OF LARGE DATA SETS IN PARTICLE PHYSICS

Effective Use of CSAIL Storage

Unified storage systems for distributed Tier-2 centres

An Introduction to GPFS

High Energy Physics data analysis

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Philippe Charpentier PH Department CERN, Geneva

Challenges of the LHC Computing Grid by the CMS experiment

Data Movement & Tiering with DMF 7

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Understanding StoRM: from introduction to internals

An introduction to GPFS Version 3.3

High Throughput WAN Data Transfer with Hadoop-based Storage

Tier2 Centers. Rob Gardner. University of Chicago. LHC Software and Computing Review UC San Diego Feb 7-9, 2006

LHCb Computing Status. Andrei Tsaregorodtsev CPPM

The glite middleware. Ariel Garcia KIT

The DMLite Rucio Plugin: ATLAS data in a filesystem

Bootstrapping a (New?) LHC Data Transfer Ecosystem

XRAY Grid TO BE OR NOT TO BE?

Data storage services at KEK/CRC -- status and plan

Transcription:

Data Storage Paul Millar dcache

Overview Introducing storage How storage is used Challenges and future directions 2

(Magnetic) Hard Disks 3

Tape systems 4

Disk enclosures 5

RAID systems 6

Types of RAID 7

(Local) File systems Ext3, Ext4, XFS,... 8

(Local) File systems ZFS, BtrFS 9

Cluster filesystems 10

Storage Element 11

Storage Element 12

Example of redirection 13

Protocols Transferring data Redirecting the client is important! LAN access (for worker nodes): NFS v4.1, dcap, rfio, xrootd, (HTTP?) WAN access (for transferring data) GridFTP, HTTP, WebDAV, (xrootd?) Management SRM v2.2 Standardisation: GSI vs SSL/TLS 14

Grid storage Lots of sites (so, lots of SEs) Data appears in multiple locations Current Grid-level services: FTS: moving data File Catalogues: finding the files Experiment provides: File grouping (data sets) Access framework (software) Unfortunately it adds layer of indirection between end-users and sites(!!) 15

Grid Storage: catalogues Diagram from P. Fuhrmann 16

Grid problems Communication: VOs have many storage provides Sites (typically) have many VOs VOs have many users Diagnosing problems is hard A networking problem could involve: end-user and VO, src and dest storage elements (the sites), FTS, catalogue(s), network providers,.. Use of non-standards doesn't help! 17

Monte Carlo Diagram from Dr. G. Stewart 18

Data taking Diagram from Dr. G. Stewart 19

Reconstruction Diagram from Dr. G. Stewart 20

Chaotic analysis Diagram from P. Fuhrmann 21

Grid storage in context 22

WLCG site storage capacity 16 14 12 Petabytes 10 Total Used 8 6 4 2 0 CERN BNL FNAL IN2P3 KIT CNAF PIC 23

+ some non-wlcg sites 25 20 Petabytes 15 Total Used 10 5 0 CERN Facebook FNAL BNL KIT IN2P3 WayBackMachine CNAF PIC WoW 24

Distributed storage 200 180 160 Petabytes 140 120 Total Used 100 80 60 40 20 0 WLCG Amazon S3 Google/Gmail 25

Sites + IBM Almaden 120 100 Petabytes 80 60 Total Used 40 20 0 Facebook IBM Almaden CERN BNL IN2P3 FNAL KIT CNAF PIC WayBackMachine WoW 26

Current challenges and Future directions 27

Dynamic data placement Example from ATLAS Data was copied based on what people thought would be useful Turns out they didn't know! Lots of data copied but never read. Try replicating based on use: Example policy: When a T2 pulling in a file from T1, make two additional replicas elsewhere. So far, working pretty well. 28

Standardisation HEP storage requirements aren't that enormous any more. Others are finding solutions, don't reinvent the wheel! EMI: we're switching from Gridspecific protocols to standards GSI to SSL/TLS, GridFTP to HTTP/WebDAV, LAN custom protocols to NFS v4.1 29

The death of Monach Monach is a rigid Tier structure. T0, T1, T2. Rational: network will be a bottleneck Reality: Prolifically of classifications: Non-geo. T1, Large T2, T3, Exp. Clouds Backbone network isn't a bottleneck Gradual relaxing of rules Eventually: any file from anywhere. 30

Global namespace 31

Future of tape Only really HEP that uses TAPE storage in-band. elsewhere used for archiving data. Still need tape for archive, but.. Data processing move to (almost) completely on disk Fetching from tape will be like a copy Tape will be write once, read never 32

Disks: where are SSDs? SSDs are FLASH memory in a blockdevice format Much faster than Mag. Disks for reading (writing is slower) Predicted introduction in data centres hasn't happened (yet) Why? Errors are sudden, unpredictable. They're still expensive Software support isn't here (yet) 33

Satellites Structure storage SSDs for random access (analysis) Mag. disks for archival storage Support? In filesystems: ZFS In cluster filesystems: GPFS In storage systems: EOS and dcache (soon) 34

Disks: new technologies Diagram from Storage Class Memory, Technology, and Uses David A. Pease, IBM Almaden Research Centre. 35

Data integrity More data means more likely to see corruption Detecting corruption: Disk (T10 DIF) RAID systems (scrubbing) Filesystems (ZFS, BtrFS) Storage Element (file-level checksums when uploading; scrubbing) Tape: (proposed) 36

What is EMI? EMI is an EU-funded project to provide Grid software Combines four technologies (ARC, dcache, glite, UNICORE) Single responsibility allowing Mix-n-match usage. Consolidation. First major release, EMI-1, is now available 37

Thank you! EMI is partially funded by the European Commission under Grant Agreement RI-261611 38

(Magnetic) Hard disks Block device (addressable units of fixed size) Characteristics Streaming is fast Random access is slow The more concurrent activity, the poorer the overall throughput Failure modes are well understood J-curve bath-tub (wrong!) See Google Con SMART! 39

Storage: from small to big Disk RAID Filesystems Tape Cluster filesystems HSM Storage element Grid 40

Tape / disk separation Motivations: Avoid accidental staging Want clear separation (separately addressable) between disk and tape. Store data is either disk or tape, never both Part of a move away from including tape as part of normal data-flow. 41

HSM storage Files migrate to slower media Based on policies or explicit commands Commercially available (TSM, SAMFS/QFS,..) 42

Networking 10G is now cheap enough for ubiq. use Sites are (or have) rolled out 10G Needs CAT6a (or CAT6) cabling 40G and 100G MUX exist too expensive for host connect could be used for switch interconnect 43