A Simple Mass Storage System for the SRB Data Grid

Similar documents
Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

Knowledge-based Grids

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Mitigating Risk of Data Loss in Preservation Environments

Distributed Data Management with Storage Resource Broker in the UK

SRB Logical Structure

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

Collection-Based Persistent Digital Archives - Part 1

irods Scalable Architecture

DSpace Fedora. Eprints Greenstone. Handle System

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

The International Journal of Digital Curation Issue 1, Volume

MetaData Management Control of Distributed Digital Objects using irods. Venkata Raviteja Vutukuri

DataCutter Joel Saltz Alan Sussman Tahsin Kurc University of Maryland, College Park and Johns Hopkins Medical Institutions

SDS: A Scalable Data Services System in Data Grid

Introduction to The Storage Resource Broker

Distributing BaBar Data using the Storage Resource Broker (SRB)

IRODS: the Integrated Rule- Oriented Data-Management System

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

an Object-Based File System for Large-Scale Federated IT Infrastructures

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Oracle Secure Backup 12.1 Technical Overview

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Data Grids, Digital Libraries, and Persistent Archives

Data Movement & Tiering with DMF 7

irods usage at CC-IN2P3 Jean-Yves Nief

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

Transcontinental Persistent Archive Prototype

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Glamour: An NFSv4-based File System Federation

Technical White Paper

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Implementing Trusted Digital Repositories

A Close-up Look at Potential Future Enhancements in Tivoli Storage Manager

COURSE OUTLINE MOC : PLANNING AND ADMINISTERING SHAREPOINT 2016

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

and the GridKa mass storage system Jos van Wezel / GridKa

HPSS Treefrog Summary MARCH 1, 2018

Oracle Enterprise Manager. 1 Before You Install. System Monitoring Plug-in for Oracle Unified Directory User's Guide Release 1.0

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

A Metadata Catalog Service for Data Intensive Applications

Technical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.

Storage Challenges at the San Diego Supercomputer Center

Managing Large Scale Data for Earthquake Simulations

Delivering Data Management for Engineers on the Grid 1

Scientific data management

Course Outline: Course : Core Solutions Microsoft SharePoint Server 2013

COURSE OUTLINE IT TRAINING

CMB-207-1I Citrix Desktop Virtualization Fast Track

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

Overview Guide. Mainframe Connect 15.0

Replication Server Heterogeneous Edition

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Understanding StoRM: from introduction to internals

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

GlobalSearch Security Definition Guide

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Layered Architecture

CS2506 Quick Revision

Table 9. ASCI Data Storage Requirements

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Mass Storage at the PSC

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

Oracle Identity Manager 11gR2-PS2 Hands-on Workshop Tech Deep Dive Upgrade

Andy Kowalski Ian Bird, Bryan Hess

Basics of the High Performance Storage System

MCSA: Windows Server MCSA 2016 Windows 2016 Server 2016 MCSA 2016 MCSA : Installation, Storage, and Compute with Windows Server 2016

Day 1 : August (Thursday) An overview of Globus Toolkit 2.4

The NASA/GSFC Advanced Data Grid: A Prototype for Future Earth Science Ground System Architectures

Policy Based Distributed Data Management Systems

Course : Planning and Administering SharePoint 2016

Oracle Associate User With Schema Difference Between

Site Caching Services Installation Guide

Fusion Registry 9 SDMX Data and Metadata Management System

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Oracle Database 11g: New Features for Administrators Release 2

DB2 for z/os: Programmer Essentials for Designing, Building and Tuning

Oracle Database 10g: New Features for Administrators Release 2

HPSS Installation Guide

SQL Server DBA Course Content

Enabling Cross-Platform File Replication with Data Integrity

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1

SwiftStack Object Storage

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

AIM Enterprise Platform Software IBM z/transaction Processing Facility Enterprise Edition 1.1.0

A: PLANNING AND ADMINISTERING SHAREPOINT 2016

Oracle Enterprise Manager Ops Center. Introduction. Creating Oracle Solaris 11 Zones 12c Release 2 ( )

An Introduction to the Mass Storage System Reference Model, Version 5

glite Grid Services Overview

Optimizing and Managing File Storage in Windows Environments

The Materials Data Facility

IBM Tivoli Identity Manager V5.1 Fundamentals

X100 ARCHITECTURE REFERENCES:

BMC Configuration Management (Marimba) Best Practices and Troubleshooting. Andy Santosa Senior Technical Support Analyst

CA ARCserve Backup. Benefits. Overview. The CA Advantage

Transcription:

A Simple Mass Storage System for the SRB Data Grid Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews San Diego Supercomputer Center SDSC/UCSD/NPACI

Outline Motivations for implementing a Mass Storage System in a data grid Simple MSS design An Overview of SRB architecture and features Simple MSS implementation

Motivations for a MSS Cost of licensing commercial MSS systems. Efficiency and performance Tightly integrated with SRB data grid functions Use SRB feature such as file replication, server directed parallel I/O, latency management. Elimination of duplication of features Better disk cache utilization Single authentication scheme SRB data grid abstractions - logical name space, storage repository abstraction, logical storage resource naming and metadata management Provide a storage system that spans remote caches and distributed archival devices

Simple MSS - Design Goal A distributed farm of disk cache resources backed by a tape library Use a pool of distributed cache resources to form one large cache resource A tape library system to control the mounting and dismounting of tapes. Support Storage Tech silo running ACSLS software Physical location (on cache or tape) transparency User uses same access mechanism to access data independent of data location

Simple MSS - Design Goal Files should always be staged to cache before I/O operations Tool to manage disk cache Large files should be stored in segments. handle files of very large size parallel data transfer between tapes and cache system can be implemented (not yet implemented)

Simple MSS Components Needed A client-server architecture authentication scheme a framework for client-server and server-server communication A federated server system - cache and tape resources located on different hosts. A metadata server that maintains logical POSIX-like name space mapping of each logical file name to its physical location.

Simple MSS Components Needed Server infrastructure and meta data that allow files stored in the MSS to appear the same as other files stored on disk caches translate user requests to physical actions using metadata driver functions for basic tape I/O operations. driver functions for basic cache I/O operations. Functions to stage files from tape to cache and dump files from cache to tapes. Data transfer between the cache system and clients.

SRB Architecture Federated middleware system Client/server model Federated servers with uniform interfaces Access to all resources in the federation MCAT metadata catalog Information repository abstraction Client access through SRB server

Simplified SRB Model Application Resource, User MCAT User Defined Dublin Core Application Meta-data C, C++, Linux I/O Archives HPSS, ADSM, UniTree, DMF Unix Shell HRM Java, NT Browsers SRB File Systems Unix, NT, Mac OSX Prolog Python Web Databases DB2, Oracle, Sybase Third-party copy Remote Proxies DataCutter

SRB Server Design Three layer architecture Top layer (client abstraction layer) Interacts with clients and other servers through tcp/ip sockets User authentication Handle function requests parses requests and calls handlers in middle and bottom layers. Middle layer (logical layer) Input parameters are in their logical representations (logical name space, logical resource name) Queries MCAT, translates from logical to physical representations (e.g., host address, type of resources, actual path where the file is located) Calls functions in the bottom (physical) layer to access resources.

SRB Server Design Bottom layer (physical layer), handles three types of resources File system Drivers for the 17 POSIX functions (creat, open, read, write..) FS supported : UNIX, HPSS, DPSS, UniTree, gridftp (to be released), SRB s own tape library system (to be released) Include/exfSw.h defines driver function mapping DB large objects Similar to files except stored in DB Drivers for Oracle, DB-2 and Illustra DB tables Access external DB tables (query, insert, )

Federated SRB Operation Logical Name Or Attribute Condition Read Application Peer-to-peer Brokering Parallel Data Access SRB server 1 6 3 5/6 SRB server 4 SRB agent 2 5 SRB agent 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control R1 MCAT Data Access R2 Server(s) Spawning

SRB Concepts and Features Abstraction of User Space Single sign-on No need for UNIX account on every systems Multiple authentication schemes supported Certificates, (secure) passwords, tickets, group permissions Robust access control User level, grant access to multiple users Group level Tickets

SRB Concepts and Features Abstraction of Data and Collections Logical Name space - UNIX like directories (collections) and files (data) Mapping of logical name to physical attributes - host address, physical path. Single logical name space mapped to data on multiple resources POSIX like API for making collections (mkdir) and data creation (creat) Data replication replica on different resource same logical name but different replica number

SRB Concepts and Features Virtualisation of Resources Mapping of a logical resource name to physical attributes: Resource Location, Type & Access transparency Client use a single logical name to specify a resource Resource group (logical resource) bundling of resources, automatic replication, transparent caching/archival storage Uniform Access Methods Use same APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI) to access various resources

SRB Concepts and Features Performance enhancement Client and server-driven Parallel I/O strategies Interface with HPSS s mover protocol for parallel I/O Container physical grouping of small files for tape I/O

Containers Physical Grouping of Objects Similar to tar but many more features Each incontainer object appears in logical name space Can be accessed as normal object Data objects in a container moved together To aid access patterns To take advantage of resource characteristics Automatic staging(caching) and Archiving - sync to backend Family of Containers New container automatically created when filled Containers for Collections Associate a collection with a container

Access Control Access controls on logical names Datasets Collections Resources Multi-level access Read, Annotate,Write, Curate, Own Access control for users and groups Ticket-based access control Audit access

Simple MSS Components Needed A tape library server to schedule, mount and dismount tapes. A tape database that tracks the usage of all tapes controlled by the MSS. A set of tape management utilities to initialize and migrate data

Simple MSS Implementation Use many existing features in SRB Federated client/server architecture MCAT - logical POSIX-like name space Parallel I/O for data movement Mechanisms added to the SRB A tape library server for mounting and dismounting tapes Drivers for Tape I/O in SRB server metadata needed to manage files on the MSS. functions to stage files from tape to cache. Tape management functions and utilities.

Simple MSS Implementation Compound resource Appear as a single resource to user Internally: A pool of distributed cache resource for front end A tape library system resource for back end

Simple MSS Implementation Compound object Object stored in compound resource Behave as a single normal SRB object I/O always always done on cache resource first When object opened for read/write: Check whether already on cache. Stage to one of the cache resources. Dirty bit set if object modified API for synchronizing dirty copies to tape and purging cache copies by sys admin File locking on object during I/O

Simple MSS Implementation Tape library server An independent server For STK silo running ACSLS software Use the same SRB server framework Authentication scheme Client-server function call Scheduling, mounting and dismounting tapes Handle multiple drive types Handles only 3 function calls mount tape dismount tape report priorities by drive types (based on number of drives idling and in use)

Simple MSS Implementation Other supporting features Drivers in SRB servers for tape I/O operations A database schema and metadata that tracks tape usage Tape labels controlled by MSS Current tape positions Bytes written for each tape Full flag Functions to insert, modify and query the tape meta data Tape management utilities - tape initialization, data migration, tape metadata manipulation and query, etc.

Comparison to IEEE MSSRM Provides similar functionality Implementation - similarities and differences. Differences attributed to SRB abstraction mechanisms SRB MSS uses the underlying File System for managing data storage Reference Model - maps bitfiles to the logical and physical volume abstractions Name Server Similar to SRB s POSIX-like name space Reference model s Bitfile Server, Storage Server and Mover are combined into a single SRB resource server Similar Migration-Purge facility

Simple MSS - Status First version released with SRB 2.0 (2/19/03) Validated through significant testing. About to enter production at SDSC Performance on SDSC resources Cache to client - up to 40 Mbytes/sec Tape to cache - 5-7 Mbytes/sec (3590 tape)

More Information http://www.npaci.edu/dice