Distributed Data Management with Storage Resource Broker in the UK

Similar documents
Introduction to The Storage Resource Broker

A Simple Mass Storage System for the SRB Data Grid

Knowledge-based Grids

Mitigating Risk of Data Loss in Preservation Environments

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

High Performance Computing Course Notes Grid Computing I

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Delivering Data Management for Engineers on the Grid 1

The International Journal of Digital Curation Issue 1, Volume

Distributing BaBar Data using the Storage Resource Broker (SRB)

DSpace Fedora. Eprints Greenstone. Handle System

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Grid Architectural Models

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

GSI Online Credential Retrieval Requirements. Jim Basney

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Kenneth A. Hawick P. D. Coddington H. A. James

Deploying the TeraGrid PKI

Scientific Data Curation and the Grid

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Microsoft Windows Server 2008, Active Directory, Configuring (70-640)

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Research and Design Application Platform of Service Grid Based on WSRF

SRB Logical Structure

YourSRB: A cross platform interface for SRB and Digital Libraries

Oracle Fusion Middleware

A Metadata Catalog Service for Data Intensive Applications

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Storage Resource Sharing with CASTOR.

AMGA metadata catalogue system

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

NeuroLOG WP1 Sharing Data & Metadata

A VO-friendly, Community-based Authorization Framework

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

Web Services for Visualization

IBM Tivoli Access Manager for e-business V6.1.1 Implementation

Implementing a Data Publishing Service via DSpace. Jon W. Dunn, Randall Floyd, Garett Montanez, Kurt Seiffert

VMware View Upgrade Guide

Promoting Open Standards for Digital Repository. case study examples and challenges

UNICORE Globus: Interoperability of Grid Infrastructures

Collection-Based Persistent Digital Archives - Part 1

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

IRODS: the Integrated Rule- Oriented Data-Management System

Requirements for data catalogues within facilities

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures

Web Services Based Instrument Monitoring and Control

Data Management for the World s Largest Machine

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

The Logical Data Store

Oracle Fusion Middleware Planning an Installation of Oracle Fusion Middleware. 12c ( )

ARCHER Data Services Service Layer

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

SwinDeW-G (Swinburne Decentralised Workflow for Grid) System Architecture. Authors: SwinDeW-G Team Contact: {yyang,

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Astrophysics and the Grid: Experience with EGEE

Experiences with the new ATLAS Distributed Data Management System

Outline. Infrastructure and operations architecture. Operations. Services Monitoring and management tools

COMPUTE CANADA GLOBUS PORTAL

ACET s e-research Activities

Database Assessment for PDMS

EGEE and Interoperation

Oracle Fusion Middleware

Grid Programming: Concepts and Challenges. Michael Rokitka CSE510B 10/2007

Designing and Deploying Messaging Solutions with Microsoft Exchange Server 2010

PoS(EGICF12-EMITC2)106

ER/Studio Enterprise Portal 1.1 Installation Guide

Data Management Components for a Research Data Archive

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

The National Fusion Collaboratory

[MS20414]: Implementing an Advanced Server Infrastructure

Geographical failover for the EGEE-WLCG Grid collaboration tools. CHEP 2007 Victoria, Canada, 2-7 September. Enabling Grids for E-sciencE

Anatomy of the BIRN The Biomedical Informatics Research Network

glite Grid Services Overview

By Ian Foster. Zhifeng Yun

Designing Windows Server 2008 Network and Applications Infrastructure

Knowledge Discovery Services and Tools on Grids

Windows Server 2016 MCSA Bootcamp

DISTRIBUTED DATABASES

EPCC Sun Data and Compute Grids Project Update

Implementing an Advanced Server Infrastructure

Travelling securely on the Grid to the origin of the Universe

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Vision deliver a fast, easy to deploy and operate, economical solution that can provide high availability solution for exchange server

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

COURSE OUTLINE MOC : PLANNING AND ADMINISTERING SHAREPOINT 2016

irods usage at CC-IN2P3 Jean-Yves Nief

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

CC-IN2P3 activity. irods in production: irods developpements in Lyon: SRB to irods migration. Hardware setup. Usage. Prospects.

Microsoft Implementing an Advanced Server Infrastructure

An Adaptable Framework for Integrating and Querying Sensor Data

EUROPEAN MIDDLEWARE INITIATIVE

Configuring Advanced Windows Server 2012 Services (ITNW 1372)

MCSE Server Infrastructure. This Training Program prepares and enables learners to Pass Microsoft MCSE: Server Infrastructure exams

Andrea Sciabà CERN, Switzerland

Transcription:

Distributed Data Management with Storage Resource Broker in the UK Michael Doherty, Lisa Blanshard, Ananta Manandhar, Rik Tyer, Kerstin Kleese @ CCLRC, UK Abstract The Storage Resource Broker (SRB) is a data management product developed by the San Diego Supercomputing Centre (SDSC). This software is used for distributed data management and can form the basis of a data grid. This paper will cover an overview of SRB and will then highlight how CCLRC has implemented SRB software across a number of UK e-science projects to build data grids. Further to this, the session will describe how the Data Management Group at CCLRC are becoming involved in the development of this software and are helping shape the next generation of SRB to meet the needs of UK e-science. Introduction to SRB The Storage Resource Broker (SRB) is a software product developed by the San Diego Supercomputing Centre (SDSC) [1]. It allows users to access files and database objects seamlessly across a distributed environment. In simple terms, the actual physical location and way the data is stored is abstracted from the user, who is presented with an interface similar to a regular file system. Further to this, the system allows the user to add user defined metadata describing the scientific content of the information to resources managed by the system, such that meaningful queries can be performed across data in multiple sites. Access can be via graphical or command line based tools. SRB also includes a host of management features that include data replication, access control and storage optimization. The SRB system is comprised of 4 major components: The Metadata Catalogue (MCAT) database The MCAT SRB The SRB The SRB Client Each SRB system must consist of a single MCAT database and one or more SRB s and SRB clients. This is known as the SRB Domain. Note there has been some confusion in the community that SRB runs off a single world wide MCAT database. This is not true. Anyone wishing to set up an SRB is free to create there own MCAT database and hence SRB Domain. Software and instructions on how to do this can be obtained from SDSC home page [2]. SRB Components: The key software components identified above are described in more details below. Figure 1 outlines the major components and can be used as a guide in the descriptions that follow. MCAT Database: The MCAT database is a metadata repository that provides a mechanism for storing information used by the SRB

system. This includes both internal system data required for running the system and application (user) metadata regarding data sets being brokered by SRB. SRB makes a clear distinction between these two types of data. An example of system metadata might be the physical location of a particular file and how to access it. An example of application metadata may be the energy and luminosity of X-ray from a synchrotron. The MCAT database is hosted within a Relational Database Management System such as Oracle, DB2 or PostgreSQL. Thus to be successful in an SRB implementation, a professionally managed database is required. SRB A b a c e MCAT Database MCAT SRB Client SRB B Figure 1: SRB Components MCAT SRB At least one SRB must be installed on the node that can access the f d g MCAT database. This is known as the MCAT SRB. The MCAT SRB performs the following data management operations in conjunction with the MCAT database: Stores metadata on Data sets, Users, Resources and Access Methods. Maintains replica information for data and containers. Containers provide a method of storing related small data items physically close to each other to optimize storage. Provides Collection abstraction for data. A Collection is a way of logically grouping data together across multiple physical sites analogous to a directory or folder in a normal file system. Provides a Global User name space and authentication. Provides Authorization through Access Control Lists and tickets. Maintains audit trail on data and collections. Provides Resource Transparency to enable logical resources. Each of these topics is a large scale subject itself and can not be discussed here fully. Further information can be found at the SRB home page [2]. SRB The SRB is a middleware application that accepts requests from clients and obtains the necessary data sets. It queries the MCAT SRB to gather information on datasets and supplies this back to the SRB client. The SRB server can operate in a federated mode, whereby it can request another SRB server to obtain the necessary data on its behalf. The data is then transparently passed back to

the client. This is shown in Figure 1 and can be explained as follows: SRB Client contacts the local SRB A and requests a file. This is looked up via the MCAT SRB and MCAT database and then passed back to SRB A. As the file is located on SRB B, SRB A sends the request to SRB B who then services the client directly. SRB Client This is an end user tool that provides a user interface to send requests to the SRB server. There are 3 main implementations of this: command line, MS Windows (InQ shown in Figure 2) or web based (MySRB). A recent addition is the MySRB server component. This allows access to storage via a thin client such as Internet Explorer, Netscape or Mozilla. Figure 2 InQ Interface The MySRB server is actually an application server middleware component that runs as a library in an Apache Web. This acts as a client to service multiple thin client sessions and is based on CGI scripts. This option is proving popular as it allows access to the system from anywhere with an internet connection. Perhaps the most recent significant development is that SRB 2.1 includes a Web Services interface in the Matrix component. This effectively allows you to build an open distributed application that utilizes SRB as a storage mechanism. An SRB programmers API is also available for programmers who wish to develop there own interfaces to SRB. C is currently supported and a new java interface was introduced in SRB 2.1 SRB in CCLRC The Data Management Group in CCLRC started working with SRB in November 2002 after a fact finding mission to the USA. There was an immediate requirement for a storage based product that allowed the addition of searchable metadata within CCLRC. The specific requirement was initially for use with the CCLRC data portal [3], however other projects subsequently developed. While other data storage projects, and specifically GIGGLE [4] show much promise (and should be monitored), it was felt that only SRB offered a production quality system that could be implemented in the time scales available. Having selected the SRB system, initial testing began in January 2003, with full scale systems becoming available in April 2003. This was tied to the introduction of the CCLRC e- Science Database Service, described elsewhere in these proceedings. While several test systems exist, the main production systems described here are those using the CCLRC Database Service [5]. Projects in Development All of the following projects are either using or investigating SRB as a the basis of data grids. CCLRC Data Management Group are providing both SRB infrastructure and expertise. CMS [ ] - largest project using CCLRC SRB services is the

CERN CMS experiment. The CMS detector is a particle physics experiment based at CERN. Atlas Data Store [ ] currently has an on-line, nominal capacity of 1 Petabyte, based on an STK 9930 (PowderHorn) tape robot. The ADS team are currently migrating from IBM 3590 (10GB) tape drives to STK 9940B (200GB) drives. CCLRC have developed an interface to access ADS through SRB e-minerals project [] as part of a range of services provided for running large environmental calculations on the UK grid and storage and management of data e-materials UK e-science project - similar to e-minerals above. National Crystallographic Service (Southampton University): For both data management and storage in the ADS. Test system already set up. British Atmospheric Data Centre (BADC) archive. For both data management and storage in the ADS. Test system already set up. NERC Data Grid. Details to be confirmed. SRB Future There are many areas that CCLRC are commencing work with in collaboration with SDSC. In particular: Web Services Deploying federated SRB with GSI authentication Web Services The first is in the area of Web Services. As mentioned above, SRB 2.1 contains a new component called Matrix. This component is currently in a beta stage and the CCLRC Data Management Group have been testing this software. We now have a fully functional web services client developed and can access all standard features of SRB. The benefit of a web services interface is that it allows other applications and most notably web and grid based portal applications to utilize SRB. Thus once can effectively create a virtual application that is distributed across services. Deploying federated SRB with GSI What is SRB federation In the latest version of SRB new mechanisms have been put in place so that different SRB installations can interact with each other. Each SRB installation is thought of as a zone. Therefore it is now possible to make multiple zones interact with each other to create a federation. This brings the benefit of allowing users to access resources and data across zones as opposed to just a single SRB installation. GSI Authentication GSI uses X.509 certificates to securely authenticate users across the network. This same GSI authentication system that is used for Globus commands can also be used for SRB commands. [http://www.npaci.edu/thrusts/di/srb/ CurrentSRB/README.gsi.htm] Deploying SRB federation with GSI authentication mechanism provides the following benefits: GSI authentication framework can provide secure cross zone interaction As GSI can support the notion of delegation, a third party on behalf of the user may log onto SRB using a certificate. The authenticity of the

user would be guaranteed by the Certification Authority. Also the user would be guaranteed of the 's authenticity during the process of mutual authentication. With the use of user delegation to SRB, SRB may continue to use user's privileges in accessing remote files thus reducing the need for administrator rights between zones. (currently SRB administrator's privileges are used to access files) User's identity is maintained for any transaction. users can log on to any SRB within the federation as the user's authenticity is guaranteed by the CA future interoperability with other applications that share the same authentication mechanism is enabled Conclusions CCLRC have successfully implemented SRB across a number of projects. We believe this is the only production ready system at this moment in time. The group are now actively collaborating with SDSC and aim to help others in the UK e-science program bring SRB into their projects.