A scalable storage element and its usage in HEP

Similar documents
LCG data management at IN2P3 CC FTS SRM dcache HPSS

dcache: sneaking up on NFS4.1

dcache Introduction Course

and the GridKa mass storage system Jos van Wezel / GridKa

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Scientific data management

Introduction to SRM. Riccardo Zappi 1

Understanding StoRM: from introduction to internals

Benoit DELAUNAY Benoit DELAUNAY 1

dcache, activities Patrick Fuhrmann 14 April 2010 Wuppertal, DE 4. dcache Workshop dcache.org

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

dcache Ceph Integration

Data Access and Data Management

Data Storage. Paul Millar dcache

HEP Grid Activities in China

The German National Analysis Facility What it is and how to use it efficiently

DSIT WP1 WP2. Federated AAI and Federated Storage Report for the Autumn 2014 All Hands Meeting

Outline. ASP 2012 Grid School

Lessons Learned in the NorduGrid Federation

Embedded Filesystems (Direct Client Access to Vice Partitions)

DESY. Andreas Gellrich DESY DESY,

Distributing storage of LHC data - in the nordic countries

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

glite Grid Services Overview

Andrea Sciabà CERN, Switzerland

The Global Grid and the Local Analysis

ELFms industrialisation plans

An Introduction to GPFS

The INFN Tier1. 1. INFN-CNAF, Italy

Managed Data Storage and Data Access Services for Data Grids

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Scalability and Performance Improvements in the Fermilab Mass Storage System

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

SDS: A Scalable Data Services System in Data Grid

CA485 Ray Walshe Google File System

ATLAS operations in the GridKa T1/T2 Cloud

CHIPP Phoenix Cluster Inauguration

Existing Tools in HEP and Particle Astrophysics

The LHC Computing Grid

Storage Resource Sharing with CASTOR.

The National Analysis DESY

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

A Simple Mass Storage System for the SRB Data Grid

Virtualization. A very short summary by Owen Synge

IEPSAS-Kosice: experiences in running LCG site

Edinburgh (ECDF) Update

Computing for LHC in Germany

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Grid Computing Activities at KIT

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Storage and I/O requirements of the LHC experiments

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

CernVM-FS beyond LHC computing

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

Status of KISTI Tier2 Center for ALICE

Data Transfers Between LHC Grid Sites Dorian Kcira

Operating the Distributed NDGF Tier-1

CMS Belgian T2. G. Bruno UCL, Louvain, Belgium on behalf of the CMS Belgian T2 community. GridKa T1/2 meeting, Karlsruhe Germany February

Site Report. Stephan Wiesand DESY -DV

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

High Throughput WAN Data Transfer with Hadoop-based Storage

Challenges of the LHC Computing Grid by the CMS experiment

Insights into TSM/HSM for UNIX and Windows

Challenges in Storage

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

IOPStor: Storage Made Easy. Key Business Features. Key Business Solutions. IOPStor IOP5BI50T Network Attached Storage (NAS) Page 1 of 5

Grid Data Management

irods usage at CC-IN2P3 Jean-Yves Nief

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Long Term Data Preservation for CDF at INFN-CNAF

IBM Storwize V7000 Unified

High-Energy Physics Data-Storage Challenges

Austrian Federated WLCG Tier-2

Scientific Computing at SLAC

Distributed Monte Carlo Production for

Creating a Multi-Tenant Environment using SAS Enterprise Guide and SAS Grid Tim Acton - General Dynamics Health Solutions

Content Manager Version 8 Resource Manager Overview

An introduction to GPFS Version 3.3

Streamlining CASTOR to manage the LHC data torrent

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

dcache NFSv4.1 Tigran Mkrtchyan Zeuthen, dcache NFSv4.1 Tigran Mkrtchyan 4/13/12 Page 1

EGEE and Interoperation

ARC integration for CMS

CERN Lustre Evaluation

OPERATING SYSTEM. Chapter 12: File System Implementation

The dcache Book for 2.6-series (FHS layout)

StoRM configuration. namespace.xml

irods usage at CC-IN2P3: a long history

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Chapter 11: Implementing File Systems

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

Cluster Setup and Distributed File System

Enterprise Manager: Scalable Oracle Management

Transcription:

AstroGrid D Meeting at MPE 14 15. November 2006 Garching dcache A scalable storage element and its usage in HEP Martin Radicke Patrick Fuhrmann

Introduction to dcache 2

Project overview joint venture between DESY and FERMI Lab 10 FTE in total development/ in production since 2000/03 MoUs with: LHC, OSG, D Grid (HEPCG, DGI) DESY provides first line support Trouble Ticket System, user forum mailing list, central documentation (1 FTE) workshops Sourcecode available at www.dcache.org 3

The basic idea Client dcache HSM precious Space needed cached File requested cached Radicke, Fuhrmann AstroGrid-D Meeting at MPE 4

What is dcache? dcache is managed storage storage element (SE) full Storage Resource Manager (SRM) support variety of data access protocols local area: dcap, xroot wide area: GridFTP, (HTTP) information providing: GIP (LCG), JClarens (OSG) frontend to Tertiary Storage Systems supported: OSM, TSM, Enstore, HPSS 5

dcache key features combines hundreds of commodity disk servers to get a huge PetaByte scale data store strictly separates between namespace and data repositories increased fault tolerance allows several copies of a single file for distributed data access internal load balancing using cost metrics and inter pool transfers automatic file replication on high load (hotspot detection) 6

Insight dcache: components PNFS /pnfs/<site>/<vo>/... SRM Doors provides single rooted namespace service one central instance can be viewed and modified via NFS 2/3 mount, FTP commands Storage Resource Manager identified by TURL selects Door based on load and agreed transfer protocol protocol specific entry points to the file repository identified by SURL contacted by Clients to initiate file transfer diskserver holding (caching) 0..n copies of files known to dcache get selected and do the actual file transfers (called movers) migrating files to and from Tertiary Storage Pools 7

Pool Selection & Load Balancing run by the Poolmanager (the heart of dcache) Select a set of pools which matches the following criteria Protocol Dataflow Direction (put, get, restore from HSM, pool2pool) Directory Subtree Client IP Address Out of these, select the best target pool with lowest cost get request: cost = cpu load (number of mover) put request: cost = free space + cpu load 8

LAN filetransfer protocols Transparent access from within ROOT toolkit dcache behaves like a xrootd server cluster protocol implementation makes full use of dcache core features (load balancing, HSM backend) token based authorization DCap the native dcache protocol advanced tuning options New: passive mode (to get access from behind firewalls/nat) authenticated flavour available (gsidcap) 9

Storage Resource Manager (SRM) an interface for standardized access of SEs main functions: prepares transfers (resolves SURL >TURL) negotiates protocols (in theory) initiates pre staging provides directory functions (e.g. ls) clients Command Line Tool, Generic File Access Library, File Transfer Service 10

SRM v2.2 space reservation (space token) can be set per SE and Virtual Organisation storage classes no HSM, or file is never written to HSM file is written to HSM but kept on disk forever file is written to HSM and kept on disk for a certain time (Pinning) full SRM v2.2 implementation for dcache is under development Radicke, Fuhrmann AstroGrid-D Meeting at MPE 11

dcache Scaling 12

Optimizing internal dataflow Smart handling of bunch requests by a fast pool selection unit Intelligent HSM backend migration Centrally controlled mechanism to optimize HSM interaction alternate flushing strategy to reduce tape mounting times File Hopping Automatic data set replication on hot spot detection allows write only pools: replication on demand or on arrival 13

Smart flushing & file hopping Backup Store Master Store Read Only Cache RAW Data 2. copy Replicate on high load To Client From Client non RAW Data immediately copied to read pools 14

SE scaling in the near future Speed Storage Space Number of 'opens' per second 500 Opens/s 5000 Pools 50 Opens/s future present 1000 Pools (Manchester) Number of pools 1 2 Peta Bytes/ Day 300 TB/ Day 300 TBytes (Fermilab) 2 Peta Bytes Number of bytes per second Amount of disk space 15

The dcache SE in HEP 16

LHC Data Grid Schema` Tier 0 2 Storage Elements (SE) Tier 0 CERN 300 MByte/s IN2P3 Tier 1 etc. GridKa Grid Site Compute Element (CE)......... Radicke, Fuhrmann Tier 2 DESY LAN Transfer AstroGrid-D Meeting at MPE 17

dcache@desy internaldcache SRM dcache SE SRM File Transfer Service FTS Channels SRM protocol G R I D share Local Clients (Experiments) PNFS /pnfs/<site>/<vo>/... Information System 18

Throughput at DESY (central dcache) Almost all datasets written into dcache are migrated to magnetic tape (raw data) Most datasets requested for read are still on disk high cache efficiency 19

Installation & Configuration 20

Deployment latest production version: dcache 1.7.0 Binaries for Linux, Solaris, (WinXP) available from http://www.dcache.org/ direct download of RPMs APT/YUM repositories source code manual installation (the standard method) new: automatic installation/upgrade via YAIM a full single host dcache instance in 10min!! can handle more complex setups (multi host) 21

Administration and Monitoring feature rich ssh admin interface GUI application full commandset available Topology browser visualises cost tuning and central flush management webinterface for monitoring status of each service queue lengths Pool and HSM usage dataflow rules billing database for customized statistics 22

D Grid Integration Project (DGI) collaboration with DGI providing dcache SEs as Core D Grid resources accessible via VO 'dgtest' existing installations FZ Jülich (Terabyte scale, Tape backend) RWTH Aachen FZK Karlsruhe DESY Hamburg 23

Future development full SRM 2.2 compliance expected in Spring 2007 Extended Information System (HEPCG) How long does it take to get a file ready for i/o? Chimera (Improved file system engine) better performance for large scale installations Acl's, Quotas nfs4.1 (including data transport) Improved HSM connectivity (central staging) 24

Agenda: 2 nd dcache Workshop 18-19 January 2007 DESY, Hamburg * The SRM 2.2 implementation * Space Tokens and Storage spaces * dcache operational issues * Yaim (Quattor) Installation * Meet the FERMI and DESY developers * Reports from some of the Tier I sites Register at: https://indico.desy.de/conferencedisplay.py?confid=138 ` Contact: Specific help for your installation or your customized dcache instance: User Forum: www.dcache.org suport@dcache.org user forum@dcache.org 25