Introduction to The Storage Resource Broker

Similar documents
Distributed Data Management with Storage Resource Broker in the UK

Mitigating Risk of Data Loss in Preservation Environments

Knowledge-based Grids

SRB Logical Structure

A Simple Mass Storage System for the SRB Data Grid

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

T-Systems Solutions for Research. Data Management and Security. T-Systems Solutions for Research GmbH

Data Sharing with Storage Resource Broker Enabling Collaboration in Complex Distributed Environments. White Paper

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Data Grids, Digital Libraries, and Persistent Archives

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Data Grid Services: The Storage Resource Broker. Andrew A. Chien CSE 225, Spring 2004 May 26, Administrivia

Introduction to Grid Computing

DSpace Fedora. Eprints Greenstone. Handle System

Scalable, Reliable Marshalling and Organization of Distributed Large Scale Data Onto Enterprise Storage Environments *

The International Journal of Digital Curation Issue 1, Volume

Transcontinental Persistent Archive Prototype

Distributing BaBar Data using the Storage Resource Broker (SRB)

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Collection-Based Persistent Digital Archives - Part 1

IRODS: the Integrated Rule- Oriented Data-Management System

CA485 Ray Walshe Google File System

Simplifying Collaboration in the Cloud

CSE 124: Networked Services Fall 2009 Lecture-19

Policy Based Distributed Data Management Systems

irods and Objectstorage UGM 2016, Chapel Hill / Othmar Weber, Bayer Business Services / v0.2

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

YourSRB: A cross platform interface for SRB and Digital Libraries

Inca as Monitoring. Kavin Kumar Palanisamy Indiana University Bloomington

Datagridflows: Managing Long-Run Processes on Datagrids

Data Movement & Storage Using the Data Capacitor Filesystem

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

Arun Jagatheesan Reagan Moore San Diego Supercomputer Center (SDSC) University of California, San Diego {arun,

Enabling Interaction and Quality in a Distributed Data DRIS

CSE 124: Networked Services Lecture-16

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Web Services Based Instrument Monitoring and Control

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

AMGA metadata catalogue system

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Data Management for Distributed Scientific Collaborations Using a Rule Engine

CC-IN2P3 activity. irods in production: irods developpements in Lyon: SRB to irods migration. Hardware setup. Usage. Prospects.

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

Managing Large Scale Data for Earthquake Simulations

Pegasus. Automate, recover, and debug scientific computations. Rafael Ferreira da Silva.

Pandektis: Implementing a repository of greek historical and cultural material with DSpace

irods usage at CC-IN2P3 Jean-Yves Nief

Implementing Trusted Digital Repositories

Anatomy of the BIRN The Biomedical Informatics Research Network

Managing Large Distributed Data Sets using the Storage Resource Broker

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

pcj-blast - highly parallel similarity search implementation

A Metadata Catalog Service for Data Intensive Applications

Toward Scalable Monitoring on Large-Scale Storage for Software Defined Cyberinfrastructure

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Map-Reduce. Marco Mura 2010 March, 31th

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

HPSS RAIT. A high performance, resilient, fault-tolerant tape data storage class. 1

Distributed Filesystem

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

5th International Digital Curation Conference December 2009

Storage Challenges at the San Diego Supercomputer Center

Building a Reference Implementation for Long-Term Preservation

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Web Object Scaler. WOS and IRODS Data Grid Dave Fellinger

Scaling Massive Content Stores in the Cloud. CloudExpo New York June Alfresco Founder & CTO

Building the Archives of the Future: Self-Describing Records

CineGrid Exchange. Building A Global Networked Testbed for Distributed Media Management and Preservation

Data Staging: Moving large amounts of data around, and moving it close to compute resources

The Earth System Grid: A Visualisation Solution. Gary Strand

GlobalSearch Security Definition Guide

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Database Assessment for PDMS

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

ACET s e-research Activities

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Managing large-scale workflows with Pegasus

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

Migrating from Oracle to Espresso

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Leveraging High Performance Computing Infrastructure for Trusted Digital Preservation

Matrex Table of Contents

IT Infrastructure: Poised for Change

Web Services for Visualization

A BigData Tour HDFS, Ceph and MapReduce

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

Requirements for data catalogues within facilities

Take Risks But Don t Be Stupid! Patrick Eaton, PhD

LHC and LSST Use Cases

Working with Scientific Data in ArcGIS Platform

Distributed Systems. Bina Ramamurthy. 6/13/2005 B.Ramamurthy 1

System Description. System Architecture. System Architecture, page 1 Deployment Environment, page 4

Apache Ignite and Apache Spark Where Fast Data Meets the IoT

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

Transcription:

http://www.nesc.ac.uk/training http://www.ngs.ac.uk Introduction to The Storage Resource Broker http://www.pparc.ac.uk/ http://www.eu-egee.org/

Policy for re-use This presentation can be re-used for academic purposes. However if you do so then please let trainingsupport@nesc.ac.uk know. We need to gather statistics of re-use: no. of events, number of people trained. Thank you!! 2

Acknowledgements This tutorial selects slides from several sources, specifically from talks given by Wayne Schroeder (SDSC) and Peter Berrisford (RAL) 3

Goal Introduce use of the SRB for distributed file management on the NGS This is the focus of the practical that follows To explore further: http://www.sdsc.edu/srb/ and http://datacentral.sdsc.edu/user_guide.html For a full SRB tutorial, at NIEES last January, see: http://www.niees.ac.uk/events/srb2006 4

What is SRB? Storage Resource Broker (SRB) is a software product developed by the San Diego Supercomputing Centre (SDSC). Allows users to access files and database objects across a distributed environment. Actual physical location and way the data is stored is abstracted from the user Allows the user to add user defined metadata describing the scientific content of the information 5

What is SRB? The SRB is an integrated solution which includes: a logical namespace, interfaces to a wide variety of storage systems, high performance data movement (including parallel I/O), fault-tolerance and fail-over, WAN-aware performance enhancements (bulk operations), storage-system-aware performance enhancements ('containers' to aggregate files), metadata ingestion and queries (a MetaData Catalog (MCAT)), user accounts, groups, access control, audit trails, GUI administration tool data management features, replication user tools (including a Windows GUI tool (inq), a set of SRB Unix commands, and Web (mysrb)), and APIs (including C, C++, Java, and Python). SRB Scales Well (many millions of files, terabytes) Supports Multiple Administrative Domains / MCATs (srbzones) And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines. 6

SRB Scalability Project Instance As of 7/24/2003 Data_size (in Count (files) GB) Users As of 9/12/2003 Data_size (in Count (files) GB) Storage Resource Broker (SRB) Data brokered by SDSC instances of SRB** As of 10/01/2003 Data_size (in Count (files) GB) NPACI 6,050.00 2,317,368 367 8,350.00 2,903,386 372 8,570.00 2,946,526 375 8,822.00 2,995,432 377 Digsky 46,100.00 5,719,025 68 46,100.00 5,719,025 68 46,100.00 5,719,025 68 42,786.00 6,076,982 69 DigEmbryo 720.00 45,365 23 720.00 45,365 23 720.00 45,365 23 720.00 45,365 23 HyperLter 215.00 5,097 27 215.00 5,097 28 215.00 5,097 28 215.00 5,097 28 Hayden 7,078.00 59,399 142 7,130.00 59,781 158 7,830.00 59,983 158 7,835.00 60,001 168 Portal 968.00 27,250 316 1,133.00 31,717 339 1,141.00 32,396 342 1,244.00 34,094 352 SLAC 1,790.00 254,974 43 1,790.00 254,974 43 1,790.00 254,974 43 2,108.00 294,149 43 NARA/Collection 52.80 79,195 51 53.10 79,112 55 53.90 79,118 55 67.00 82,031 56 NSDL/SIO Exp 232.00 15,809 23 315.00 27,170 23 418.00 39,253 23 603.00 87,191 26 TRA 90.60 2,385 25 92.00 2,378 26 92.00 2,387 26 92.00 2,387 26 LDAS/SALK 498.00 9,858 60 737.00 12,898 66 767.00 12,906 66 824.00 13,016 66 BIRN 121.00 237,283 138 273.00 675,531 145 382.00 2,048,132 156 389.00 1,084,749 167 AfCS 95.30 18,762 20 99.00 19,714 20 102.00 20,260 20 107.00 21,295 21 UCSDLib 1,084.00 138,415 29 1,085.00 138,421 29 1,085.00 138,421 29 1,085.00 138,421 29 NSDL/CI 278.00 993,886 113 379.00 2,596,090 114 379.00 2,596,090 114 465.00 2,948,903 114 SCEC 12.60 18,660 38 7,561.00 1,249,144 39 9,680.00 1,561,396 40 12,274.00 1,721,241 43 TeraGrid 623.00 36,508 1,978 1,664.00 47,644 1,942 1,745.00 49,106 2,073 10,603.00 433,938 2,229 TOTAL 66,008.30 9,979,239 3,461 77,696.10 13,867,447 3,490 81,069.90 15,610,435 3,639 90,239.00 16,044,292 3,837 Users Users As of 11/14/2003 Data_size (in GB) Count (files) Users 66 TB 9.97 million 3 thousand 77 TB 13.9 million 3 thousand 81 TB 15.6 million thousand 90 TB 16 million 3 thousand ** Does not cover data brokered by SRB spaces administered outside SDSC. Does not cover databases; covers only files stored in file systems and archival storage systems Does not cover shadow- 7

User sees a virtual filesytem: Command line (S-Commands) MS Windows (InQ) Web based (MySRB). Java (JARGON) Web Services (MATRIX) Storage Resource Broker SRB server SRB server SRB server Resource Driver Resource Driver Resource Driver Filesystems in different administrative 8 domains

How SRB Works c MCAT Database d 4 major components: The Metadata Catalogue (MCAT) SRB A Server b e f MCAT Server SRB B Server The MCAT-Enabled SRB Server The SRB Storage Server The SRB Client a g SRB Client 9

SRB on the NGS SRB provides NGS users with a virtual filesystem Accessible from all core nodes and from the UI / desktop (will provide) redundancy mirrored catalogue server Replica files Support for application metadata associated with files fuller metadata support from the R-commands 10

Practical Overview Use of the Scommands Commands for unix based access to srb Strong analogy to unix file commands Accessing files from multiple (two) sites 11