dcache: sneaking up on NFS4.1

Similar documents
A scalable storage element and its usage in HEP

LCG data management at IN2P3 CC FTS SRM dcache HPSS

dcache Introduction Course

Distributing storage of LHC data - in the nordic countries

Lessons Learned in the NorduGrid Federation

NFS around the world Tigran Mkrtchyan for dcache Team dcache User Workshop, Umeå, Sweden

Data Storage. Paul Millar dcache

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

dcache, activities Patrick Fuhrmann 14 April 2010 Wuppertal, DE 4. dcache Workshop dcache.org

and the GridKa mass storage system Jos van Wezel / GridKa

Introduction to SRM. Riccardo Zappi 1

dcache Ceph Integration

Scientific data management

dcache NFSv4.1 Tigran Mkrtchyan Zeuthen, dcache NFSv4.1 Tigran Mkrtchyan 4/13/12 Page 1

Patrick Fuhrmann (DESY)

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

February 15, 2012 FAST 2012 San Jose NFSv4.1 pnfs product community

Understanding StoRM: from introduction to internals

First Experience with LCG. Board of Sponsors 3 rd April 2009

Data Access and Data Management

The LHC Computing Grid

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

The INFN Tier1. 1. INFN-CNAF, Italy

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

The German National Analysis Facility What it is and how to use it efficiently

pnfs: Blending Performance and Manageability

CASTORFS - A filesystem to access CASTOR

Benoit DELAUNAY Benoit DELAUNAY 1

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

Virtualization. A very short summary by Owen Synge

Managed Data Storage and Data Access Services for Data Grids

Lustre overview and roadmap to Exascale computing

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Considerations for a grid-based Physics Analysis Facility. Dietrich Liko

NFS Version 4.1. Spencer Shepler, Storspeed Mike Eisler, NetApp Dave Noveck, NetApp

Status of the Linux NFS client

The grid for LHC Data Analysis

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

CHIPP Phoenix Cluster Inauguration

Promoting Open Standards for Digital Repository. case study examples and challenges

Long Term Data Preservation for CDF at INFN-CNAF

An Introduction to GPFS

NFSv4.1 Using pnfs PRESENTATION TITLE GOES HERE. Presented by: Alex McDonald CTO Office, NetApp

Edinburgh (ECDF) Update

Lustre A Platform for Intelligent Scale-Out Storage

Coming Changes in Storage Technology. Be Ready or Be Left Behind

Storage Resource Sharing with CASTOR.

DSIT WP1 WP2. Federated AAI and Federated Storage Report for the Autumn 2014 All Hands Meeting

EMI Data, the unified European Data Management Middleware

The Grid: Processing the Data from the World s Largest Scientific Machine

Kerberized NFS 2010 Kerberos Conference

TSM HSM Explained. Agenda. Oxford University TSM Symposium Introduction. How it works. Best Practices. Futures. References. IBM Software Group

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

The Parallel NFS Bugaboo. Andy Adamson Center For Information Technology Integration University of Michigan

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Federated Namespace BOF: Applications and Protocols

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

The Global Grid and the Local Analysis

an Object-Based File System for Large-Scale Federated IT Infrastructures

The National Analysis DESY

Austrian Federated WLCG Tier-2

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

The ASCI/DOD Scalable I/O History and Strategy Run Time Systems and Scalable I/O Team Gary Grider CCN-8 Los Alamos National Laboratory LAUR

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

CERN Lustre Evaluation

EGEE and Interoperation

Linux Support of NFS v4.1 and v4.2. Steve Dickson Mar Thu 23, 2017

Reasons to Migrate from NFS v3 to v4/ v4.1. Manisha Saini QE Engineer, Red Hat IRC #msaini

NFSv4 extensions for performance and interoperability

Scalability and Performance Improvements in the Fermilab Mass Storage System

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Distributed Data Management on the Grid. Mario Lassnig

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

SAM at CCIN2P3 configuration issues

ARC integration for CMS

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

NFSv4 and rpcsec_gss for linux

System input-output, performance aspects March 2009 Guy Chesnot

NFS-Ganesha w/gpfs FSAL And Other Use Cases

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

A GPFS Primer October 2005

irods Scalable Architecture

Data Movement & Tiering with DMF 7

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

dcache as open-source project showcase for education Tigran Mkrtchyan for dcache team CHEP2018, Sofia,

pnfs and GPFS Dean Hildebrand Storage Systems IBM Almaden Research Lab 2012 IBM Corporation

Global Software Distribution with CernVM-FS

Storage Resource Manager Interface Specification V2.2 Implementations Experience Report

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011

Andrea Sciabà CERN, Switzerland

HOME GOLE Status. David Foster CERN Hawaii GLIF January CERN - IT Department CH-1211 Genève 23 Switzerland

Outline. ASP 2012 Grid School

HEP replica management

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

NFS Version 4 Open Source Project. William A.(Andy) Adamson Center for Information Technology Integration University of Michigan

Transcription:

dcache: sneaking up on NFS4.1 Tigran Mkrtchyan Björn Böttcher Patrick Fuhrmann for the dcache Team support and funding by

What is dcache.org Head of dcache.org Patrick Fuhrmann Core Team (Desy and Fermi) Andrew Baranovski Bjoern Boettscher Ted Hesselroth Alex Kulyavtsev Iryna Koslova Dmitri Litvintsev David Melkumyan Dirk Pleiter Martin Radicke Owen Synge Neha Sharma Vladimir Podstavkov Head of Development FNAL : External Development Gerd Behrmann, NDGF Jonathan Schaeffer, IN2P3 Support and Help Timur Perelmutov Head of Development DESY : Tigran Mkrtchyan Abhishek Singh Rana, SDSC Greig Cowan, gridpp Stijn De Weirdt (Quattor) Maarten Lithmaath, CERN Flavia Donno, CERN

Plan for today The LHC Tier model and the SE. What is a dcache SE? Why NFS 4.1?

The LHC Tier model and the SE.

LHC (Data Management) Tier Structure Significantly oversimplified Tier 1 NDGF Tier 1 BNL Tier 1 Amsterdam Tier 1 4 more Tier 0 CERN Tier 1 Fermilab Tier 1 Lyon Tier 1 Karlsruhe Tier 1 Barcelona Tier II Aachen Tier II Munich Tier II DESY

The LHC Storage Element The Storage Element, the Storage Management Workhorse Streaming data IMPORT and EXPORT Posix like access from local worker-nodes Managing storage

The Idea of a Grid Storage Element File Transfer Service SRM Protocol SRM Protocol Space Reservation Space Attribute Selection Protocol Negotiation Network shaping Tier I Storage Element Local Area Access Far far away Wide area transport Tier 0 Storage Element Compute Cluster

LHC Storage Grid Intentionally not mentioned here Information Provider Protocols File Catalogs

The Idea of a (LCG) Grid Storage Element Storage Element SRM Storage Resource Management Space/Protocol Management Wide Area Transport Protocol In use : gsiftp Discussed : http(s) Local Access Protocol (gsi)dcap or rfio and xroot This is not at all a standard

What do we need a grid storage element for? We need to serve large amounts of data locally Access from local Compute Element Huge amount of simultaneously open files. Posix like access (What does that mean?) We need to exchange large amounts of data with remote sites Streaming protocols. Optimized for low latency (wide area) links. Possibly controlling 'link reservation'.

What do we need a grid storage element for? (cont.) We need to allow storage control Space reservation to guarantee maximum streaming. Define space properties (TAPE, ONLINE,...) Transport protocol negotiation. We need to publish SE specific information Clients need to select 'best' SE or CE for a job. Availability Available Space (max, used, free...) Supported Spaces (Tape, disk...) Which VO owns which space?

dcache in a Nutshell

dcache in a Nutshell Black (yellow) Box View Tape Storage OSM, Enstore Tsm, Hpss, DMF Namespace provider dcache Core ACLs Protocol Engines Information Protocol(s) Storage Management Protocol(s) SRM 1.1 2.2 Data & Namespace Protocols dcap (NFS 4.1) gsiftp ftp (V2) xroot (http) Namespace ONLY NFS 2 / 3

dcache in a Nutshell Strict name space and data storage separation, allowing consistent name space operations (mv, rm, mkdir e.t.c) consistent access control per directory resp. file managing multiple internal and external copies of the same file convenient name space management by nfs (or http) File system view dcache disks External (HSM) /pnfs/desy.de /atlas /myfile 001067645 001067645 001067645 001067645

dcache in a Nutshell Overload and meltdown protection Request Scheduler. Primary Storage pool selection by protocol, IP, directory, IO direction Secondary selection by system load and available space considerations. Separate I/O queues per protocol (load balancing) Supported protocols : (gsi)ftp (gsi)dcap xroot SRM nfs2/3 (name space only)

dcache in a Nutshell I/O Request List of candidates Space Manager Dispatcher by request Attributes Dispatcher by Pool Cost Scheduler and I/O queues Pool Candidates selected by Protocol Client IP number/net Data Flow Direction Name Space Attributes (Directory) SRM Spaces nfs4.1 Pool Protocol Queues dcap gsiftp

In a Nutshell dcache partitioning for very large installations Different tuning parameter for different parts of dcache File hopping on automated hot spot detection configuration (read only, write only, stage only pools) on arrival (configurable) outside / inside firewalls Resilient Management at least n but never more than m copies of a file

In a Nutshell File Hopping Backup Store RAW Data 2. copy Main Tertiary Store From Client Write Pools Transfer Pools Read Only Cache Replicate on high load non RAW Data immediately copied to read pools

In the Nutshell HSM Support TSM, HPSS, DMF, Enstore, Osm Automated migration and restore Working on Central Flush facility support of multiple, non overlapping HSM systems (NDGF approach) Misc Graphical User Interface Command line interface Jpython interface SRM watch NEW : Monitoring Plots

dcache and the LHC storage management dcache is in use at 8 Tier I centers fzk (Karlsruhe, GR) in2p3 (Lyon, FR) BNL (New York. US) FERMILab (Chicago, US) SARA (Amsterdam. NL) PIC (Spain) Triumf (Canada) NDGF (NorduGrid) and at about 60 Tier II's dcache is part of VDT (OSG) We are expecting > 20 PB per site > 2011 dcache will hold the largest share of the LHC data.

Is this useful for non LCG applications? Weak points : Http(s) not really supported Security might not be sufficient Posix like is NOT posix (file system driver)

Posix like is NOT posix Linked Library xfs App dcap libc inodes ext3 Application needs to be linked with the dcap library. SE Preload Library App preload dcap xfs libc inodes ext3 Application stays unchanged but doesn't work in all cases. (Static linked, Some C++ apps.)

And this is real posix xfs App libc inodes Kernel Driver SE Application doesn't need to be changed. Kernel driver comes with OS. But dcache.org doesn't want to write/support kernel drivers.

Solution is on the way... NFS 4.1

And another project : NFS 4 within CITI University of Michigan Introduction of RFC 3530 We are developing an implementation of NFSv4 and NFSv4.1 for Linux. The Network File System (NFS) version 4 is a distributed filesystem protocol which owes heritage to NFS protocol version 2, RFC 1094, and version 3, RFC 1813. Unlike earlier versions, the NFS version 4 protocol supports traditional file access while integrating support for file locking and the mount protocol. In addition, support for strong security (and its negotiation), compound operations, client caching, and internationalization have been added. Of course, attention has been applied to making NFS version 4 operate well in an Internet environment.

Quotes are stolen from CITI wiki : And what is NFS 4.1?! NFSv4.1 extends NFSv4 with two major components: sessions and pnfs Parallel : is exactly what we need!!! IETF Road Map! Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008). This will freeze the specification of sessions, generic pnfs protocol issues, and pnfs file layout March : exactly when we need it!!! Who are the nfs4, (pnfs) partners?! exactly what our clients need!!! All known storage big shots, gpfs(ibm), Sun, EMC, Panasas, netapp, Lustre (Sun), dcache

NFS 4.1 : More details dcache is invited to the regular bakeathons. CITI, IBM and others are working on Linux client implementation A stable client implementation is essential for industry to sell their products. -> we profit. Bakeathon last week : except for sparse files, the dcache server could interact reliably with all client implementations. Currently, NFS4.1 is only available as a special pool within dcache. We are currently refurbishing the generic pool in order to integrate NFS4.1.

Why is NFS 4.1 : project perspective POSIX Clients are coming for free (provided by all major OS vendors). NFS 4.1 is aware of distributed data. Will make dcache attractive to other (non-hep) communities. LCG could consider to drop LAN protocol zoo (dcap,rfio,xroot)

Why is NFS 4.1 : technical perspective NFS 4.1 is aware of distributed data Faster (optimized) e.g.: Compound RPC calls e.g. : 'Stat' produces 3 RPC calls in v3 but only one in v4 GSS authentication Built-in mandatory security on file system level ACL's dcache can keep track on client operations OPEN / CLOSE semantic (so system can keep track on open files) 'DEAD' client discovery (by client to server pings) smart client caching.

NFS 4.1 in dcache : technically dcache core NFS 4.1 data server I/O only I/O only I/O only NFS 4.1 Door for name space operations NFS 4.1 Client

NFS 4.1 in dcache : time-line 14. Bakeathon Sep NFS 4 Name Space 15.Bakeathon Draft 19 71. IETF 72. IETF 2007 2008 Jan Apr Jul Oct Special pnfs pool in dcache dcache pool system refurbished dcache server is fine with Solaris and Linux Client NFS (pnfs) in generic dcache pool Dec Clients in official solaris and linux distributions. dcache server in production.

Goal : Industry standards in HEP? SE SE http(s) nfs 4.1 nfs 4.1

Summary NFS 4.1 (pnfs) is just an additional protocol for dcache NFS 4.1. simplifies LAN Posix access to dcache. Applications don't need special treatment any more NFS4.1/dCache is attractive for non HEP communities. We expect production system end of 2008 BUT : Success resp acceptance not guarantied yet.

Further reading www.dcache.org www.citi.umich.edu/projects/nfsv4/

Some more hot topics

The NDGF Challenge : gsiftp Protocol Version II NDGF Tier I Finland Denmark Sweden Norway Denmark Head-node Chimera SRM CERN

The NDGF Challenge : Multi Site HSM support Single Site approach Flush to HSM Restore to any Pool Multi Site approach Norway Sweden Not all pools can access all HSM systems

The wonderful world of SRM 2.2 Only if there is a lot of time left

SRM 2.2 The SRM in dcache supports CUSTODIAL (T1Dx) NON-CUSTODIAL (T0D1) Dynamic Space Reservation late pool binding for spaces and more

SRM 2.2 ( The space token) As it used to be ( <= 1.7 ) As it will be with 1.8 Space Token ST Space Size Retention Policy Access Latency (Custodial T1) Link Group T1 D0 T0 D1 Remark : The space of a Space Token is assigned to a pool at the moment the file is opened and not when the space token is created.