irods usage at CC-IN2P3 Jean-Yves Nief

Similar documents
CC-IN2P3 activity. irods in production: irods developpements in Lyon: SRB to irods migration. Hardware setup. Usage. Prospects.

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

irods usage at CC-IN2P3: a long history

DTM a lightweight computing virtualization system based on irods. Yonny Cardenas, Pascal Calvat, Jean-Yves Nief, Thomas Kachelhoffer

LCG data management at IN2P3 CC FTS SRM dcache HPSS

A Simple Mass Storage System for the SRB Data Grid

Benoit DELAUNAY Benoit DELAUNAY 1

An overview of irods clients. Ton Smeele

DSpace Fedora. Eprints Greenstone. Handle System

irods Scalable Architecture

irods 4.0 and Beyond Presented at the irods & DDN User Group Meeting 2014

Proceedings. irods User Group Meeting Policy-Based Data Management Sharing and Preservation

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

D0 Grid: CCIN2P3 at Lyon

Centre de Calcul de l Institut National de Physique Nucléaire et de Physique des Particules. Singularity overview. Vanessa HAMAR

SRB Logical Structure

irods and Objectstorage UGM 2016, Chapel Hill / Othmar Weber, Bayer Business Services / v0.2

A scalable storage element and its usage in HEP

Policy Based Distributed Data Management Systems

SnapCenter Software 4.0 Concepts Guide

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Streamlining CASTOR to manage the LHC data torrent

SAM at CCIN2P3 configuration issues

Exploring cloud storage for scien3fic research

The INFN Tier1. 1. INFN-CNAF, Italy

Distributed Data Management with Storage Resource Broker in the UK

Data Intensive processing with irods and the middleware CiGri for the Whisper project Xavier Briand

Distributing BaBar Data using the Storage Resource Broker (SRB)

Initial. Initial Virtual. Initial. Initial. Initial Virtual. Application Background. Application Background. Application Background

CC-IN2P3: A High Performance Data Center for Research

Transcontinental Persistent Archive Prototype

Storage Resource Sharing with CASTOR.

SECURITY DOCUMENT. 550archi

and the GridKa mass storage system Jos van Wezel / GridKa

Introduction to The Storage Resource Broker

High Availability irods System (HAIRS)

Oracle Database 11g: New Features for Administrators DBA Release 2

Virtualization. A very short summary by Owen Synge

Remote power and console management in large datacenters

Linux Administration

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

LAMBDA The LSDF Execution Framework for Data Intensive Applications

Simplifying Collaboration in the Cloud

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

IBM C IBM FileNet Content Manager V5.2, Specialist.

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Oracle Database 11g: New Features for Administrators Release 2

Swedish National Storage Infrastructure for Academic Research with irods

ORACLE 11g R2 New Features

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Backup and Restore. Cisco Unified Contact Center Express Operations Guide, Release 10.0(1) 1

SAP HANA. HA and DR Guide. Issue 03 Date HUAWEI TECHNOLOGIES CO., LTD.

Existing Tools in HEP and Particle Astrophysics

Data Management Components for a Research Data Archive

CERN Lustre Evaluation

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

SnapCenter Software 4.1 Concepts Guide

ELFms industrialisation plans

Storage for HPC, HPDA and Machine Learning (ML)

Intransa StorStac System Software

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Data Access and Data Management

HPSS Treefrog Summary MARCH 1, 2018

Knowledge-based Grids

Mitigating Risk of Data Loss in Preservation Environments

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus

COMPUTE CANADA GLOBUS PORTAL

Creating a Multi-data Center (MDC) System

A Quick guide to AFS Simon Wilkinson school of Informatics University of Edinburgh

CLOUD 102 The Long Range Forecast is Cloudy, with a Chance Of Virtualization Ron Clifton, James Kelso & Thomas Crowe III

Overview of ATLAS PanDA Workload Management

IBM WebSphere Application Server 8. Clustering Flexible Management

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Providing a first class, enterprise-level, backup and archive service for Oxford University

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Solution Brief: Archiving Avid Interplay Projects using NLT and XenData

Computing. DOE Program Review SLAC. Rainer Bartoldus. Breakout Session 3 June BaBar Deputy Computing Coordinator

Symantec Backup Exec 2012

A national approach for storage scale-out scenarios based on irods

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Azure Webinar. Resilient Solutions March Sander van den Hoven Principal Technical Evangelist Microsoft

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

File Protection Whitepaper

File Protection Whitepaper

Fusion Registry 9 SDMX Data and Metadata Management System

ElasterStack 3.2 User Administration Guide - Advanced Zone

StratusLab Cloud Distribution Installation. Charles Loomis (CNRS/LAL) 3 July 2014

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Backup Solution. User Guide. Issue 01 Date

Petabytes of Preservation on Tape Jason Pierson Oct 2012

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

KillTest. 半年免费更新服务

MySQL HA Solutions Selecting the best approach to protect access to your data

Managing the Cisco APIC-EM and Applications

DESY site report. HEPiX Spring 2016 at DESY. Yves Kemp, Peter van der Reest. Zeuthen,

PostgreSQL in Mission-Critical Financial Systems May 20th 2010

Transcription:

irods usage at CC-IN2P3 Jean-Yves Nief

Talk overview What is CC-IN2P3? Who is using irods? irods administration: Hardware setup. irods interaction with other services: Mass Storage System, backup system, Fedora Commons etc... irods clients usage. Architecture examples with collaborating sites. Rules examples. SRB to irods migration. To-do list and prospects. irods at CC-IN2P3 03/25/10 2

CC-IN2P3 activities Federate computing needs of the french scientific community in: Nuclear and particle physics. Astrophysics and astroparticles. Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC,. Opened now to biology, Arts & Humanities. IRFU irods at CC-IN2P3 03/25/10 3

irods setup @ CC-IN2P3 In production since early 2008. 14 servers: 2 icat servers (metacatalog): Linux SL4, Linux SL5 12 data servers (520 TB): Sun Thor x454 with Solaris 10, DELL v510 with Linux SL5. Metacatalog on a dedicated Oracle 11g cluster. Monitoring and restart of the services fully automated (crontab + Nagios). Automatic weekly reindexing of the icat databases. Accounting: daily report on our web site. irods at CC-IN2P3 03/25/10 4

irods setup @ CC-IN2P3 Data server Data server Data server Data server DNS alias: load balanced. redundancy improved. avoid single point of failure. scalability improved. icat server icat server DNS alias: ccirods irods at CC-IN2P3 03/25/10 5

irods monitoring: Nagios irods at CC-IN2P3 03/25/10 6

irods interaction with other services Mass storage system: HPSS. Using compound resources. Interfaced using the universal MSS driver (RFIO protocol used). Staging requests ordered by tapes using Treqs. Backup system: TSM. Used for projects who do not have the possibility to replicate precious data on other sites. Fedora Commons: Storage backend based on irods using FUSE. Rules to register irods files into Fedora. External databases: Rules using RDA (see Yonny s talk). irods at CC-IN2P3 03/25/10 7

irods clients Clients: from laptop to batch farms. Authentication: password or X509 certificates. icommands: most popular. From any platform: Windows, Mac OSX, Linux (RH, CentOS, Debian ), Solaris 10. Java APIs: interaction with irods within workflows. C APIs: direct access to files (open, read, write) to do «random access». Drivers for some viewer such as OsiriX (biomedical apps). FUSE for legacy web sites and Fedora Commons. Windows explorer and idrop. irods at CC-IN2P3 03/25/10 8

Who is using irods? High energy and nuclear physics: BaBar: data management of the entire data set between SLAC and CC-IN2P3: total foreseen 2PBs. dchooz: neutrino experiment (France, USA, Japan etc ): 400 TBs. Astroparticle and astrophysics: AMS: cosmic ray experiment on the International Space Station (280 TBs). TREND, BAOradio: radioastronomy (170 TBs). Biology and biomedical applications: phylogenetics, neuroscience, cardiology (50 TBs). Arts and Humanities: Adonis (46 TBs). irods at CC-IN2P3 03/25/10 9

Who is using irods? irods at CC-IN2P3 03/25/10 10

Architecture example: BaBar archival in Lyon of the entire BaBar data set (total of 2 PBs). automatic transfer from tape to tape: 3 TBs/day (no limitation). automatic recovery of faulty transfers. ability for a SLAC admin to recover files directly from the CC-IN2P3 zone if data lost at SLAC. irods at CC-IN2P3 03/25/10 11

Architecture example: dchooz irods at CC-IN2P3 03/25/10 12

Architecture example: embryogenesis and neuroscience irods at CC-IN2P3 03/25/10 13

Rules examples (I) Delayed replication to the MSS: Data on disk cache replication into MSS asynchronously (1h later) using a delayexec rule. Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: Rules needed for fine granularity access rights management. Eg: 3 groups of users (admins, experts, users). ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r ACLs on all others subcollections => admins + experts : r/w, users : r External database interface: using RDA to build rules for the DTM tool (see Yonny s talk). irods at CC-IN2P3 03/25/10 14

Rules examples (II) Fedora Commons: Tar balls content stored in irods are automatically registered into Fedora Commons. 1. Automatic untar of the files + checksum on the irods side: msitarfileextract. 2. Automatic registration in Fedora-commons (delayed rule): msiexeccmd of a java application. Automatic metadata extraction from DICOM files (neuroscience ): A given predefined list of metadata is extracted from the files using DCMTK (thanks to Yonny), then user metadata are created for each file. irods at CC-IN2P3 03/25/10 15

SRB to irods migration SRB still used: 3.7 PBs so far. Migration to irods already made for BioEmergence (embryogenesis) in 2010: Data workflow was using Jargon: transparent. Migration from Scommands to icommands was needed. 2 hours of downtime to complete the migration (scripts were needed). Needs to migrate all the other projects by the end of 2012, beginning of 2013: SRB is deeply embedded in data management workflows and projects can t live without SRB. Main issue: migration should be as «transparent» as possible in order to keep up with the data activity. irods at CC-IN2P3 03/25/10 16

To-do list Complete SRB to irods migration. Connection control: Connections can come from anywhere especially batch farms on the data grid. Servers can be overwhelmed (network, disk activity for hundreds of connection in //). Causes clients to exit with an error not good. Improved version of CCMS (connection control) is needed. Conversion to rules of the scripts used to manage cache space on compound resources. Dealing with filename with accentuated characters for icommands on Windows. Provide a light weight transfer tool for every single users (ship files between CC- IN2P3 to distant site). Centralized administration through a GUI (15 instances of irods running so far). irods at CC-IN2P3 03/25/10 17

Prospects 1.6 PBs in irods as of Feb 2012 (should be 5 PBs at the end of this year). Future projects: Biomedical field: research in cardiology, MS (anonymization) with data from > 10 hospitals. Private companies (data encryption needed?). Astrophysics. Grid: irods officially promoted by the French NGI. irods at CC-IN2P3 03/25/10 18

Acknowledgement Thanks to: Pascal Calvat. Yonny Cardenas. Rachid Lemrani. Thomas Kachelhoffer. Pierre-Yves Jallud. irods at CC-IN2P3 03/25/10 19