Storage Resource Sharing with CASTOR.

Similar documents
CASTOR: A Distributed Storage Resource Facility for High Performance Data Processing at CERN

CERN Tape Archive (CTA) :

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

The INFN Tier1. 1. INFN-CNAF, Italy

PROOF-Condor integration for ATLAS

UW-ATLAS Experiences with Condor

CouchDB-based system for data management in a Grid environment Implementation and Experience

Storage and I/O requirements of the LHC experiments

The CMS Computing Model

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Virtualizing a Batch. University Grid Center

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

Experience with Fabric Storage Area Network and HSM Software at the Tier1 INFN CNAF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Storage and Storage Access

High-Energy Physics Data-Storage Challenges

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Understanding StoRM: from introduction to internals

IEPSAS-Kosice: experiences in running LCG site

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 27 July 2010

New strategies of the LHC experiments to meet the computing requirements of the HL-LHC era

Streamlining CASTOR to manage the LHC data torrent

Benoit DELAUNAY Benoit DELAUNAY 1

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

and the GridKa mass storage system Jos van Wezel / GridKa

CERN openlab II. CERN openlab and. Sverre Jarp CERN openlab CTO 16 September 2008

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

CC-IN2P3: A High Performance Data Center for Research

Overcoming Obstacles to Petabyte Archives

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Towards Network Awareness in LHC Computing

Physics Computing at CERN. Helge Meinhard CERN, IT Department OpenLab Student Lecture 21 July 2011

IBM Tivoli Storage Manager for HP-UX Version Installation Guide IBM

Andrea Sciabà CERN, Switzerland

Data oriented job submission scheme for the PHENIX user analysis in CCJ

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

CERN and Scientific Computing

The LHC Computing Grid. Slides mostly by: Dr Ian Bird LCG Project Leader 18 March 2008

Long Term Data Preservation for CDF at INFN-CNAF

CERN Lustre Evaluation

IBM Tivoli Storage Manager for AIX Version Installation Guide IBM

CHIPP Phoenix Cluster Inauguration

Integrating Fibre Channel Storage Devices into the NCAR MSS

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Overview. About CERN 2 / 11

Batch Services at CERN: Status and Future Evolution

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Operating the Distributed NDGF Tier-1

The LHC computing model and its evolution. Dr Bob Jones CERN

Introduction to SRM. Riccardo Zappi 1

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

Layered Architecture

Table 9. ASCI Data Storage Requirements

Experiences with the new ATLAS Distributed Data Management System

Andy Kowalski Ian Bird, Bryan Hess

PoS(EGICF12-EMITC2)106

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Effizientes Speichern von Cold-Data

Experience the GRID Today with Oracle9i RAC

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Data Transfers Between LHC Grid Sites Dorian Kcira

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

ISTITUTO NAZIONALE DI FISICA NUCLEARE

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Computing for LHC in Germany

10 Gbit/s Challenge inside the Openlab framework

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Distributed Data Management with Storage Resource Broker in the UK

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

A scalable storage element and its usage in HEP

Data Movement & Tiering with DMF 7

An Overview of a Large-Scale Data Migration

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Distributed e-infrastructures for data intensive science

A Simple Mass Storage System for the SRB Data Grid

LHCb Computing Strategy

Software and computing evolution: the HL-LHC challenge. Simone Campana, CERN

dcache Introduction Course

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

SC17 - Overview

CASTORFS - A filesystem to access CASTOR

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

1. Introduction. Outline

The COMPASS Event Store in 2002

Computing at the Large Hadron Collider. Frank Würthwein. Professor of Physics University of California San Diego November 15th, 2013

Grid Architecture for ALICE Experiment

Petabytes of Preservation on Tape Jason Pierson Oct 2012

Monitoring system for geographically distributed datacenters based on Openstack. Gioacchino Vino

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

The EU DataGrid Testbed

Implementing GRID interoperability

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Virtualization Strategies on Oracle x86. Hwanki Lee Hardware Solution Specialist, Local Product Server Sales

Coordinating Parallel HSM in Object-based Cluster Filesystems

Transcription:

Storage Resource Sharing with CASTOR Olof Barring, Benjamin Couturier, Jean-Damien Durand, Emil Knezo, Sebastien Ponce (CERN) Vitali Motyakov (IHEP) ben.couturier@cern.ch 16/4/2004 Storage Resource Sharing with CASTOR 1

CERN European Laboratory for particle physics (Geneva) (Celebrating its 50 th anniversary) 2007: The Large Hadron Collider (LHC) 4 International Experiments: Alice, Atlas, CMS, LHCb High energy proton or heavy ions collisions (energy up to 14 TeV for proton beams and 1150 TeV for lead ion beams). Bigger amounts of data to store/analyze compared to previous experiments (Up to 4 GB/s, 10 Petabytes per year in 2008) Introduction 16/4/2004 Storage Resource Sharing with CASTOR 2

A LHC Experiment 16/4/2004 Storage Resource Sharing with CASTOR 3

The LHC Computing Grid grid for a regional group Lab a Uni x USA CERN Tier 1 Lab m UK Uni a Tier3 physics department γ Tier2 Lab b β α Tier 1 Italy Uni y China CERN Tier 0 Uni b Germany France Japan Lab c grid for a physics study group Uni n Desktop 16/4/2004 Storage Resource Sharing with CASTOR 4

CASTOR CERN Advanced STORage Manager Hierarchical Storage Manager (HSM) used to store user and physics files Manages the secondary and tertiary storage History Development started in 1999 based on SHIFT, CERN's tape and disk management system since beginning of 1990s (SHIFT was awarded a 21st Century Achievement Award by Computerworld in 2001) In production since the beginning of 2001 http://cern.ch/castor/ 16/4/2004 Storage Resource Sharing with CASTOR 5

CASTOR deployment CASTOR teams at CERN Dev team (5) Operations team (4) HW Setup at CERN Disk servers ~ 370 disk servers ~ 300 TB of staging pools ~ 35 stagers (disk pool managers) Tapes and Libraries ~ 90 tapes drives (50 9940B) 2 sets of 5 Powderhorn silos (2 x 27500 cartridges) 1 Timberwolf (1 x 600 cartridges) 1 L700 (1 x 288 cartridges) Deployed in other HEP institutes PIC Barcelona CNAF Bologna 16/4/2004 Storage Resource Sharing with CASTOR 6

CASTOR Data Evolution 2PB Drop due to deletion of 0.75Million ALICE MDC4 files (1GB each) 1PB 16/4/2004 Storage Resource Sharing with CASTOR 7

CASTOR Architecture Central Services Oracle or MySQL Oracle or MySQL CASTOR name server (logical name space) Volume Manager (VMGR) (PVL) Application RFIO API Stager (disk pool manager) tpdaemon (PVR) Remote Tape Copy (tape mover) rfiod (disk mover) Disk cache Volume and Drive Queue Manager (VDQM) (PVL) Disk Subsystem Tape Subsystem 16/4/2004 Storage Resource Sharing with CASTOR 8

Main Characteristics POSIX-like client interface Use of RFIO in the HEP community Modular / Highly Distributed A set of central servers Disk subsystem Tape subsystem Allows for tape resource sharing Grid Interfaces GridFTP Storage Resource Manager (V1.1) (Cooperation between CERN/FNAL/JLAB/LBNL; c.f. http://sdm.lbl.gov/srm-wg/) 16/4/2004 Storage Resource Sharing with CASTOR 9

Platform/Hardware Support Multiplatform support Linux, Solaris, AIX, HP-UX, Digital UNIX, IRIX The clients and some of the servers run on Windows NT/2K Supported drives DLT/SDLT, LTO, IBM 3590, STK 9840, STK 9940A/B (and old drives already supported by SHIFT) Libraries SCSI Libraries ADIC Scalar, IBM 3494, IBM 3584, Odetics, Sony DMS24, STK Powderhorn (with ACSLS), STK L700 (with SCSI or ACSLS) 16/4/2004 Storage Resource Sharing with CASTOR 10

Requirements for LHC (1) CASTOR currently performs satisfactorily: Alice Data Challenge: 300 MB/s for a week. 600 MB/s maintained for a half day. High request load: 10s of thousands of requests per day per TB of disk cache. However, when LHC starts in 2007 A single stager should scale up to 500/1000 requests per second Expected system configuration ~ 4PB of disk cache 10 PB stored on tapes per year (peak rate of 4GB/s) ~ 10000 disks, 150 tape drives 16/4/2004 Storage Resource Sharing with CASTOR 11

Requirements for LHC (2) Various types of Workload Access Pattern Amount of Data Involved Quality of Service? Frequency TIER 0 Data Recording Sequential Write Up to 1 GB/s for Alice (2 GB/s total) Necessary to avoid loosing data Once Reconstruc tion Sequential RW Complete dataset (up to 2 GB/s as well) - A few times per year Data to Tier 1 centers Sequential Read Part of data set - Once per tier TIER 1 Analysis Random Random - Always runnning 16/4/2004 Storage Resource Sharing with CASTOR 12

Stager Central Role Central Services Oracle or MySQL Oracle or MySQL CASTOR name server (logical name space) Volume Manager (VMGR) (PVL) Application RFIO API Stager (disk pool manager) tpdaemon (PVR) Remote Tape Copy (tape mover) rfiod (disk mover) Disk cache Volume and Drive Queue Manager (VDQM) (PVL) Disk Subsystem Tape Subsystem 16/4/2004 Storage Resource Sharing with CASTOR 13

Limitations of the current system The CASTOR stager has a crucial role: but the current version limits CASTOR performance CASTOR stager catalogue limited in size (~100 000 files in cache) The Disks resources have to be dedicated to each stager which leads to: Sub-optimal use of disk resources Difficult configuration management. It is difficult to switch disk resources quickly when the workload changes Not very resilient to disk server errors Current stager design does not scale Tape mover API not flexible enough to allow dynamic scheduling of disk access 16/4/2004 Storage Resource Sharing with CASTOR 14

Vision... With clusters of 100s of disk and tape servers, the automated storage management faces more and more the same problems as CPU clusters management (Storage) Resource management (Storage) Resource sharing (Storage) Request scheduling Configuration Monitoring The stager is the main gateway to all resources managed by CASTOR Vision: Storage Resource Sharing Facility 16/4/2004 Storage Resource Sharing with CASTOR 15

New CASTOR Stager Architecture Application Control (TCP) Data (TCP) RFIO/stage API 3 rd party Policy Engine Request Handler Garbage Collector Request repository and file catalogue (Oracle or MySQL) Resource Management Interface Recall Migration rfiod (disk mover) Disk cache LSF Maui CASTOR tape archive components (VDQM, VMGR, RTCOPY) 16/4/2004 Storage Resource Sharing with CASTOR 16

Database Centric Architecture Set of stateless multithreaded daemons accessing a DB Request state stored in RDMS Support for Oracle and MySQL Allows for big stager catalogs with reasonable performance Locking/transactions handled by the RDBMS Stager as scalable as the database it uses Use of DB clustering solutions if necessary Easier administration Standard backup procedures less problems to restart in case of problems 16/4/2004 Storage Resource Sharing with CASTOR 17

Resource Scheduling Externalized Scheduling Interface Currently developped for MAUI (version 3.2.4) (http://supercluster.org/maui) LSF 5.1 (Platform Computing http://www.platform.com) Leverage the advanced features from CPU scheduling: Fair share Advanced reservations Load balancing Accounting This will allow to: Share of all disk resources, with a fair share between the LHC experiments Exploit all resources at the maximum of their performance (avoid hotspots on disk servers ) follow the evolution of scheduling systems 16/4/2004 Storage Resource Sharing with CASTOR 18

Other improvements Improved Request Handling Request throttling possible Improved security Strong authentication using GSI or Kerberos Modified tape mover interface Allows for just-in-time scheduling of the disk resources when copying to/from tape Controlled Access to disk resources All user access to disk resources will have to be scheduled though the stager, so as to control the IO on disk servers 16/4/2004 Storage Resource Sharing with CASTOR 19

Conclusion Current stager at CERN has shown its limitations New CASTOR stager proposal aims for A pluggable framework for intelligent and policy controlled file access scheduling Evolvable storage resource sharing facility framework rather than a total solution File access request running/control and local resource allocation delegated to disk servers Currently In development Proof of concept/prototype implemented 16/4/2004 Storage Resource Sharing with CASTOR 20