dcache Introduction Course

Similar documents
dcache Introduction Course

A scalable storage element and its usage in HEP

The dcache Book for 2.6-series (FHS layout)

Lessons Learned in the NorduGrid Federation

and the GridKa mass storage system Jos van Wezel / GridKa

The dcache Book for 2.4-series (FHS layout)

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Data Storage. Paul Millar dcache

dcache NFSv4.1 Tigran Mkrtchyan Zeuthen, dcache NFSv4.1 Tigran Mkrtchyan 4/13/12 Page 1

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

dcache, activities Patrick Fuhrmann 14 April 2010 Wuppertal, DE 4. dcache Workshop dcache.org

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

dcache Ceph Integration

Introduction to SRM. Riccardo Zappi 1

Scientific data management

Understanding StoRM: from introduction to internals

The Global Grid and the Local Analysis

The German National Analysis Facility What it is and how to use it efficiently

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Handling Big Data an overview of mass storage technologies

Info Service, GLUE And Checksum Module

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Data Access and Data Management

dcache: sneaking up on NFS4.1

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

dcache integration into HDF

Storage Resource Sharing with CASTOR.

Linux File Systems: Challenges and Futures Ric Wheeler Red Hat

vcenter Server Installation and Setup Update 1 Modified on 30 OCT 2018 VMware vsphere 6.7 vcenter Server 6.7

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

ROCK INK PAPER COMPUTER

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

GlusterFS Architecture & Roadmap

File Services. File Services at a Glance

Introduction To Gluster. Thomas Cameron RHCA, RHCSS, RHCDS, RHCVA, RHCX Chief Architect, Central US Red

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Scalability and Performance Improvements in the Fermilab Mass Storage System

vcenter Server Installation and Setup Modified on 11 MAY 2018 VMware vsphere 6.7 vcenter Server 6.7

DSIT WP1 WP2. Federated AAI and Federated Storage Report for the Autumn 2014 All Hands Meeting

StorageCraft OneBlox and Veeam 9.5 Expert Deployment Guide

Quick Start Guide TABLE OF CONTENTS COMMCELL ARCHITECTURE OVERVIEW COMMCELL SOFTWARE DEPLOYMENT INSTALL THE COMMSERVE SOFTWARE

<Insert Picture Here> Enterprise Data Management using Grid Technology

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

The National Analysis DESY

The DMLite Rucio Plugin: ATLAS data in a filesystem

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Chapter 11: File-System Interface

Editorial Manager(tm) for Journal of Physics: Conference Series Manuscript Draft

F3S. The transaction-based, power-fail-safe file system for WindowsCE. F&S Elektronik Systeme GmbH

Perceptive VNA. Technical Specifications. 6.0.x

Distributed Systems 16. Distributed File Systems II

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Implementing a Digital Video Archive Based on the Sony PetaSite and XenData Software

StorageCraft OneXafe and Veeam 9.5

vsphere Upgrade Update 2 Modified on 4 OCT 2017 VMware vsphere 6.0 VMware ESXi 6.0 vcenter Server 6.0

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

An Introduction to GPFS

GlusterFS Current Features & Roadmap

Root cause codes: Level One: See Chapter 6 for a discussion of using hierarchical cause codes.

Scale-Out backups with Bareos and Gluster. Niels de Vos Gluster co-maintainer Red Hat Storage Developer

Infrastructure as a Service (IaaS) Compute with Storage and Backup PRICING DOCUMENT

Long Term Data Preservation for CDF at INFN-CNAF

INDIGO-DataCloud Architectural Overview

Cost-Effective Virtual Petabytes Storage Pools using MARS. FrOSCon 2017 Presentation by Thomas Schöbel-Theuer

Grid Data Management

Lustre overview and roadmap to Exascale computing

Echidna Concepts Guide

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Your upgrade to smartdata.gen2

Storage Made Easy Enterprise File Share and Sync Fabric Architecture

RED HAT GLUSTER STORAGE TECHNICAL PRESENTATION

SCS Distributed File System Service Proposal

Toward An Integrated Cluster File System

Advanced Operating Systems

Oracle 10g and IPv6 IPv6 Summit 11 December 2003

IBM Storwize V7000 Unified

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Chapter 10: File System. Operating System Concepts 9 th Edition

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Cluster Setup and Distributed File System

GlusterFS and RHS for SysAdmins

Distributed File Storage in Multi-Tenant Clouds using CephFS

OPEN STORAGE IN THE ENTERPRISE with GlusterFS and Ceph

CS 4284 Systems Capstone

Insights into TSM/HSM for UNIX and Windows

ATLAS operations in the GridKa T1/T2 Cloud

Reserva Room Signage: Installation Instructions and Administration Guide

VMware AirWatch Content Gateway for Windows. VMware Workspace ONE UEM 1811 Unified Access Gateway

vsphere Installation and Setup Update 2 Modified on 10 JULY 2018 VMware vsphere 6.5 VMware ESXi 6.5 vcenter Server 6.5

SCAM Portfolio Scalability

CernVM-FS beyond LHC computing

Analisi Tier2 e Tier3 Esperienze ai Tier-2 Giacinto Donvito INFN-BARI

The File Systems Evolution. Christian Bandulet, Sun Microsystems

Disaster Recovery-to-the- Cloud Best Practices

Realizing the Promise of SANs

Transcription:

GRIDKA SCHOOL 2013 KARLSRUHER INSTITUT FÜR TECHNOLOGIE KARLSRUHE August 29, 2013 dcache Introduction Course Overview Chapters I, II and Ⅴ christoph.anton.mitterer@lmu.de

I. Introduction To dcache Slide 2

Storage Management Systems dcache is a so called storage management system. These systems manage files like normal filesystems (for example btrfs, ext4 or XFS) do, but usually on a much higher logical level. Storage management systems are characterised by the following main attributes: Manage much higher amounts of data than normal filesystems usually do. Integrate many storage media that can be of different type into one system. Utilise Hierarchical Storage Management (for example hard disk tape). Provide mechanisms to automatically increase the performance and balance the load (for example auto-replication ). Provide means for resilience and high availability. Provide advanced control systems to manage the data as well as the data flow. May provide interfaces to several access-protocols (for example POSIX or SRM). As a component of a computing centre or a grid, an instance of a storage management system (including the actual data) is often referred to as Storage Element (in contrast to the Compute Element ). Slide 3

Overview And History dcache is a highly sophisticated storage management system written mostly in Java. It is being actively developed at DESY since 2003. Major contributions come from FNAL and NDGF. It uses standard open source technologies in many places. It is released under the GNU Affero General Public License version 3 and some parts under the GNU Lesser General Public License version 3. It is one of the main storage management systems used within the European Middleware Initiative and the LHC Computing Grid and it is used in different other places. Support is available through the developers, dedicated support groups and mailing lists. Major release-branches of dcache include: 2.2: Old stable branch. 2.6: Current stable branch. git: Development branch. Slide 4

Functionalities And Features The main functionalities and features of dcache include: Management of enormous amounts of data (no hard limit, tested in the petabyte-range). Hierarchical Storage Management Scalability, reasonable performance, modular design and portability. Highly configurable and customisable, thus allowing very mighty setups. Some support for the usage of different database management systems. Use of normal filesystems (for example btrfs, ext4 or XFS) to store the actual data on the storage-nodes. X.509-based authentication through TLS/SSL or Grid Security Infrastructure and powerful authorisation systems with support for access control lists. Kerberos-based authentication. Firewall-friendly by utilising the concept of door-nodes. Access interfaces for NFS 4.1 (pnfs), HTTP and WebDAV, GridFTP ( gsiftp ), xrootd ( proofd ) and SRM (version 1 to 2.2) in addition to its legacy native protocols DCAP and gsidcap. Scriptable administration interface with terminal-based and graphical front-ends. Slide 5

Functionalities And Features Detailed logging and debugging as well as accounting and statistics. XML information system with detailed live information about the cluster. Web-interface with live summaries of the most important information and some administration functionality. Checksumming of data. Easy migration and management of data via the migration module. Resilience and high availability can be implemented on different ways by having multiple replicas of the same files. Powerful cost calculation systems that allows to control the data flow (from and to pools, between pools and also between pools and tape). Load balancing and performance tuning by hot pool replication (via cost calculation and replicas created by pool-to-pool-transfers). Garbage collection of replicas, depending on their flags, age, et cetera. Space management and support for space tokens. Et cetera. Slide 6

Future And Planned Developments dcache is under active development with the current focus on the following topics: Improving or rewriting parts of the existing code. Improving the NFS 4.1 (pnfs) interface, to support GSI and other features. Further deploying standardised technologies in many areas of dcache. New modules for advanced functionality like handling of missing files or alerting. Et cetera. Slide 7

Organisations That Are Using dcache Some organisations that are using dcache include: Slide 8

Contact Addresses You should note the following contact addresses: dcache Website http://www.dcache.org/ (includes the dcache-book and a dcache-wiki) dcache Support support@dcache.org dcache User Mailing List user-forum@dcache.org Mass Storage Support Group of the Helmholtz-Gemeinschaft Deutscher Forschungszentren through the mailing list: german-support@dcache.org through the Global Grid User Support: http://ggus.eu/ Slide 9

dcache s Structure dcache is a modularised system that consists of many different components, where each component fulfils a dedicated task. Depending on the specific task, a component may occur exactly once or multiple times per cluster. A dcache cluster is structured by the following two classes of objects: Domains A domain consists of one or more cells, grouping them (typically logically) together. Each domain corresponds to exactly one Java VM and is thus bound to exactly one node of the cluster. Nodes can generally run multiple domains. Service A service is the actual entity that fulfils a specific task and thus corresponds roughly to one of the components mentioned above. Depending on the service-type there may be multiple instances per cluster, possibly each running with a different set of settings. Slide 10

dcache s Structure Technically, there is actually a third class of objects: Cells The services mentioned before are provided by one or more cells (possibly from different domains), grouping them semantically together. Cells may be mandatory or optional for the service s functionality. Each cell belongs to exactly one domain. Slide 11

Service-Classes The different types of services are divided in the following classes: Core-services These implement basic or common tasks for the whole cluster, for example inter-cell-communication, access to the file hierarchy provider or accounting. Many types of core-services can exist only once per cluster. Door-services ( doors ) These serve as gateways to the actual data and provide accessing clients with an interface speaking a given access-protocol (Depending on the protocol, its configuration and the client s request, the actual data flows either directly between the client and the pool or indirectly via the door.). The different types of doors can exist multiple times per cluster, typically placed on different nodes (so called door-nodes ). Pool-services ( pools ) These manage the actual data on the storage-nodes (so called pool-nodes ) and can even communicate directly with the accessing clients. Of course, there may be many pools per cluster, but even per node. Slide 12

Important Core-Service And -Cell-Types The following describes some more important core-service and -cell-types: core cells in every domain The RoutingMgr- and lm-cells are used with dcache s native cell-messaging (and to some extent also with its successor JMS). They maintain the routes to domains and cells known to the specific local domain as well as network communication information. In the default configuration, dcachedomain s RoutingMgr-cell has knowledge on all routes within the cluster and its lm-cell acts as a server to all the other lm-(client-)cells. The System-cells can be used to explore the respective domain itself. poolmanager-service Manages and configures the pools as well as the data flow ( pool selection ) from and to pools, between pools and also between pools and tape. Slide 13

Important Core-Service And -Cell-Types pnfsmanager-service (called so for historical reasons) The general interface to the file hierarchy (and thus to its database), providing an interface to the other cells. It allows access to all the file meta data, including pathnames, file properties, ACLs and PNFS-IDs. admin-servce dcache s administration interface. gplazma-service Provides functionalities for access control, that are used by doors in order to implement their access control system. See chapter Ⅴ. spacemanager-service Handles space management and space reservation. See chapter ⅤⅥ. pinmanager-service Handles the pinning of files (mainly used with HSM-systems and in combination with SRM). Slide 14

Important Core-Service And -Cell-Types httpd-service The web interface and web server. info-service The information system, which provides all kinds of live information to other services. billing-service The accounting system. statistics-service The statistics collection system. loginbroker-service Maintains a list of doors as well as their protocol, version, fully qualified domain name, port and some other data. The srm-loginbroker-service is a very similar service specially for SRM. dir-service Provides the listing of directories with the legacy protocols DCAP and gsidcap. Slide 15

Access-Protocols And Door-Types dcache provides accessing clients with interfaces (so called doors ) for a number of protocols. A door runs usually (but not necessarily) in one domain and all types of doors can have in principle multiple instances per cluster. Currently, there are door-types for the following protocols: NFS 3 NFS version 3 is a network filesystem, which allows rather limited access to the file hierarchy itself and the file-meta-data. NFS 4.1 (pnfs) NFS version 4.1 is a network filesystem defining parallel NFS, which allows full (and direct ) access to the files. WebDAV The WebDAV protocol, which is a superset of HTTP. DCAP The dcache Access Protocol is dcache s legacy native access protocol. gsidcap The GSI-secured version of DCAP. Slide 16

Access-Protocols And Door-Types FTP The File Transfer Protocol. GridFTP ( gsiftp ) A GSI-secured version of FTP targeted on grid-environments, which is very common in most grids. xrootd ( proofd ) The protocol from xrootd, the extended ROOT Daemon, which is used within the ROOT and PROOF frameworks and in some other places. SRM Storage Resource Manager is a meta-protocol targeted at storage management systems, including many specific features like space tokens, lifetime and pinning. For the actual data transport another transfer-protocol (like GridFTP or gsidcap) is used. Having multiple SRM-doors per cluster or more than one NFS 4.1 (pnfs)-door per node is possible, but a somewhat advanced setup. All other door-types can easily occur multiple times per cluster and node. Slide 17

File Hierarchy Provider In order to make access to the actual data possible trough a usual file hierarchy and to allow files being identified via a pathname, dcache makes use of a special service, namely the file hierarchy provider. In the past, the separate pnfsd was used for this but it has been replaced by Chimera, which provides much better performance and more advanced features. Chimera keeps track of all pathname PNFS-ID-mappings, the pool-locations of each file s replicas as well as all the other file-meta-data (for example file-times, traditional POSIX file permission modes, ACLs and PNFS-tags). All this information is stored within a database, managed by one of the supported database management systems. dcache s components can access the file hierarchy and its meta-data via the previously mentioned PnfsManager. It is furthermore possible to access both via NFS-mounts (typically at /pnfs/ ). Slide 18

PNFS-IDs dcache itself uses in most places the so called PNFS-IDs to identify and manage the actual files. Basically there are two types of PNFS-IDs: Legacy PNFS-IDs These are 96 bit numbers that were used with dcache s legacy file hierarchy provider pnfsd. They are still used in clusters that were migrated from pnfsd to Chimera. Current PNFS-IDs With Chimera a new number format was introduced, now having 144 bits. The name PNFS-ID was retained for historical reasons. PNFS-IDs are usually written as hexadecimal numbers, for example: 000100000000000152203B20 (legacy PNFS-ID) 0000AF1AF0735E30448C87FAF373325DEDA0 (current PNFS-ID) It is worth to note, that all replicas of a given file share the same PNFS-ID. Slide 19

Databases Apart from the actual data (which is stored as files in a filesystem), dcache uses a database in most cases to store its internal information (working- and state-data). The following databases may be typically encountered: dcache Holds miscellaneous data, including those from the PinManager and the SpaceManager as well as any gplazma-service(s) and SRM-door(s). chimera Holds the data from dcache s file hierarchy provider Chimera. replicas Holds the data from dcache s ReplicaManager. billing Holds the data from dcache s billing system. This is only used when billingtodb is set to yes in the configuration. Currently, PostgreSQL is the only fully supported DBMS, but in the future generic database interfaces will be implemented that allow other DBMS being used. Slide 20

Typical Cluster Setups Core-Nodes In a small cluster, the following setup might be used: All core services, all databases and all used doors run on one single node. + Cheap as it requires only one node. + Might be easier to administrate. - Puts a high load on this node. - Definitely not suitable for bigger clusters and one could even consider whether a storage management system is the right choice. - Single point of failure. - Might be difficult to maintain, especially if more functionality or features shall be added. Slide 21

Typical Cluster Setups Core-Nodes In a medium cluster, the following setup might be used: All core services and most databases run on one node, while the SRM-door and its related databases run on another. + Reasonable distribution of the load. + Great benefit for relatively few additionally required hardware-resources. - There are still many components and probably large databases on the non-srm-node. Only makes sense if SRM is heavily used. Analogous, the pnfsmanager-service and its related database (chimera) could be split off. Slide 22

Typical Cluster Setups Core-Nodes In a large cluster, the following setup might be used: Most core services run on one node, but the pnfsmanager-service has a dedicated node as well as the SRM-door. The big databases have dedicated nodes, too. The billing-files and statistics-files as well as the dcache-log-files from the miscellaneous nodes are written to a remote filesystem. + Perfect distribution of the load. - Requires much hardware-resources just for the core-nodes. - Might be difficult to administrate. Splitting off the SRM-door and its related databases only makes sense if SRM is heavily used. Slide 23

Typical Cluster Setups Door-Nodes There are basically two ways to run a door: Using a dedicated node. Attaching it to another node, for example a core- or a pool-node. In addition, it is possible in both cases to have either one or multiple doors per node. The answer to these choices depends on at least the following factors: Is the specific protocol heavily used or not? Produces the door such a high load that it needs a dedicated node or can it coexist with the services from another node? Flows the actual data directly between the client and the pool or indirectly through the door (thus producing a high network load)? Slide 24

Typical Cluster Setups Pool-Nodes By their nature, pools run usually on dedicated nodes (the storage-nodes). It is however possible to run one or multiple pools per node. The answer on the best pool-allocation depends on at least the following factors: Usually, there should be one pool per volume (for example a single hard disk or the storage exported by a RAID-controller). Pools should however be not to small and not to big, which is why it can make sense to unify or to split into logical volumes, each running separate pools. Administration of the cluster might be easier, if all pools have similar sizes. Slide 25

Data Access Model The following is a very brief description of the access-process, the procedure that happens when a client reads or writes data to the cluster: 1. The client connects to a door with the desired protocol. 2. dcache determines the pool from which data should be read or where the data should be written to. 3. Depending on the protocol, its configuration and the client s request the actual data flows either directly between the client and the pool or indirectly via the door. As far as new connections are opened, these may be initiated either by the client or the server (which can be the door or the pool). Of course there are many other steps like the authentication- and authorisation-process, accounting, et cetera. Slide 26

Finis coronat opus.