Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Similar documents
glite Grid Services Overview

Introduction to SRM. Riccardo Zappi 1

Philippe Charpentier PH Department CERN, Geneva

Data Management. Enabling Grids for E-sciencE. Vladimir Slavnic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Scientific data management

A scalable storage element and its usage in HEP

The glite middleware. Ariel Garcia KIT

Grid Data Management

Grid Infrastructure For Collaborative High Performance Scientific Computing

glite Middleware Usage

LCG data management at IN2P3 CC FTS SRM dcache HPSS

Understanding StoRM: from introduction to internals

Data Access and Data Management

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

StoRM configuration. namespace.xml

Experience of Data Grid simulation packages using.

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

and the GridKa mass storage system Jos van Wezel / GridKa

Programming the Grid with glite

LCG-2 and glite Architecture and components

ARC integration for CMS

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Storage Resource Manager Interface Specification V2.2 Implementations Experience Report

Storage Resource Manager Interface Specification V2.2 Implementations Experience Report

HEP Grid Activities in China

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

DIRAC data management: consistency, integrity and coherence of data

Grid Data Management

How to use the Grid for my e-science

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

A Simple Mass Storage System for the SRB Data Grid

dcache Introduction Course

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

FREE SCIENTIFIC COMPUTING

Adding SRM Functionality to QCDGrid. Albert Antony

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

The INFN Tier1. 1. INFN-CNAF, Italy

glite/egee in Practice

Status of KISTI Tier2 Center for ALICE

Data Storage. Paul Millar dcache

Programming the Grid with glite *

Promoting Open Standards for Digital Repository. case study examples and challenges

glite Data Management System Hands-on

Overview of HEP software & LCG from the openlab perspective

High Performance Computing Course Notes Grid Computing I

Grid Computing Middleware. Definitions & functions Middleware components Globus glite

DataGrid EDG-BROKERINFO USER GUIDE. Document identifier: Date: 06/08/2003. Work package: Document status: Deliverable identifier:

Consorzio COMETA - Progetto PI2S2. DMS API glite. Salvatore Scifo Consorzio Cometa (PI2S2) - Catania. Corso introduttivo al Grid Computing

Service Availability Monitor tests for ATLAS

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

AGATA Analysis on the GRID

Monitoring tools in EGEE

Andrea Sciabà CERN, Switzerland

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Advanced Job Submission on the Grid

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

CHIPP Phoenix Cluster Inauguration

EGEE and Interoperation

Outline. ASP 2012 Grid School

The DESY Grid Testbed

PoS(EGICF12-EMITC2)106

Operating the Distributed NDGF Tier-1

WHEN the Large Hadron Collider (LHC) begins operation

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Lessons Learned in the NorduGrid Federation

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

dcache Introduction Course

High Energy Physics data analysis

ARDA status. Massimo Lamanna / CERN. LHCC referees meeting, 28 June

Benoit DELAUNAY Benoit DELAUNAY 1

The EU DataGrid Testbed

dcache: sneaking up on NFS4.1

Distributed Computing Grid Experiences in CMS Data Challenge

Long Term Data Preservation for CDF at INFN-CNAF

Features and Future. Frédéric Hemmer - CERN Deputy Head of IT Department. Enabling Grids for E-sciencE. BEGrid seminar Brussels, October 27, 2006

PoS(ISGC 2016)035. DIRAC Data Management Framework. Speaker. A. Tsaregorodtsev 1, on behalf of the DIRAC Project

Experiences in testing a Grid service in a production environment

Storage Resource Sharing with CASTOR.

Knowledge-based Grids

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Mitigating Risk of Data Loss in Preservation Environments

Distributed Data Management on the Grid. Mario Lassnig

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Enabling Grids for E-sciencE. A centralized administration of the Grid infrastructure using Cfengine. Tomáš Kouba Varna, NEC2009.

Storage and I/O requirements of the LHC experiments

On the employment of LCG GRID middleware

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

Distributed Computing Framework. A. Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille

The Grid: Processing the Data from the World s Largest Scientific Machine

Edinburgh (ECDF) Update

Grid Interoperation and Regional Collaboration

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Access the power of Grid with Eclipse

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

ATLAS DQ2 to Rucio renaming infrastructure

Experience with LCG-2 and Storage Resource Management Middleware

Transcription:

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, 13-14 November 2008

Outline Introduction SRM Storage Elements in glite LCG File Catalog (LFC) Information System Grid Tutorial, 13-14 November 2008 2

Introduction Grid infrastructures are usually used for analysing and manipulating large amounts of data coming from scientific instruments and other sources Example: LOFAR, MAGIC,.. Grid Tutorial, 13-14 November 2008 3

Introduction Example: Large Hadron Collider Produces ~15 PByte/year Grid computing for data storage and processing Depends on EGEE and OSG infrastructure Grid Tutorial, 13-14 November 2008 4

Introduction Data is stored at CERN and 11 other (tier1) sites Data is processed at CERN, the 11 tier1 sites and ~100 tier2 sites Grid Tutorial, 13-14 November 2008 5

Introduction Data management tools enables the usage and sharing data in a grid environment Grid Tutorial, 13-14 November 2008 6

Introduction Storage Infrastructures Disk Hierarchical Storage Management (HSM) The hierarchy consists of different types of storage media, such as disks systems or tape, each type representing a different level of cost and speed of retrieval policy-based management of file backup and archiving without the user needing to be aware of when files are being retrieved from or stored on backup storage media. Example: files that have not been used for some time are automatically migrated from disk to tape HSM Software: TSM, DMF, CASTOR, Enstore, HPSS, Grid Tutorial, 13-14 November 2008 7

Introduction How do we link users, user programs and the data given the fact that data is distributed over different storage systems? Grid Tutorial, 13-14 November 2008 8

Introduction Data management in the Grid environment needs: A system which keeps track of the location of all files and copies of those files A uniform interface for all storage systems Grid Tutorial, 13-14 November 2008 9

SRM Uniform access to heterogeneous storage resources on the Grid: SRM Storage Resource Managers SRM is a control protocol for: Space reservation File management Replication Protocol negotiation Grid Tutorial, 13-14 November 2008 10

SRM SRM implementation SRM I/F is implemented as a web service Implementations for dcache, DPM, SRB,. SRM Examples srmls srmpreparetoput srmbringonline srmcopy srmgettransferprotocols The user never gets to see this, since SRM is hidden by the glite client software Grid Tutorial, 13-14 November 2008 11

Storage Elements in glite DPM SRM Data Transfer protocols: gridftp, secure rfio Storage type: disk dcache SRM Data Transfer protocols: gridftp, gsidcap, xrootd Storage type: disk, HSM StoRM SRM Data Transfer protocols: gridftp, rfio Storage type: disk Grid Tutorial, 13-14 November 2008 12

LFC LFC Keeps track of the location of copies (replicas) of files on the Grid Grid Tutorial, 13-14 November 2008 13

LFC Enabling Grids for E-sciencE Name conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:/grid/tutor/mydir/myfile Unix-like namespace Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) The location of an actual piece of data on a storage system, e.g. srm://pcrd24.cern.ch/flatfiles/cms/output10_1 Transport URL (TURL) Locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat Grid Tutorial, 13-14 November 2008 14

Naming conventions How do they fit together? LFC holds the mapping LFN-GUID-SURL TURL 11 SURL 1 : LFN 1 : LFN i GUID : : TURL 1k TURL j1 LFC SURL j : TURL jl Grid Tutorial, 13-14 November 2008 15

LFC Grid Tutorial, 13-14 November 2008 16

LFC LFN acts as main key in the database. It has: Symbolic links to it (additional LFNs) Unique Identifier (GUID) System metadata Information on replicas One field of user metadata Grid Tutorial, 13-14 November 2008 17

LFC Two kinds of LFC Central LFC For each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid. Local LFC Local catalogs record the file replicas stored at that site's SEs only. Grid Tutorial, 13-14 November 2008 18

LFC Integrated GSI Authentication + Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Sessions (multiple operations inside a single transaction ) Bulk operations (inside transactions ) Grid Tutorial, 13-14 November 2008 19

LFC LFC interfaces Interaction with the WMS(RB) The InputSandbox and OutputSandbox should only be used for small amounts of data. Large files should be on SEs The RB can locate Grid files: allows for data-based matchmaking Jdl file: InputData = "lfn:/grid/tutor/myfile"; othe lfn s / guid s needed by the job as an input to the process otells RB to schedule job on CE close to SE holding the file oglite-brokerinfo getinputdata returns list of files in InputData attribute OutputSE=srm.grid.sara.nl ; olocation of a SE where the output data will be stored DataAccessProtocol= gsiftp ; othe list of protocols that the application is able to speak for accessing files listed in the InputData Grid Tutorial, 13-14 November 2008 20

LFC LFC interfaces Commandline interface and C/C++/Python api Lcg_utils commandline tools and API Combined operations on LFC and data GFAL Provides a Posix-like interface for File I/O Operation More to come in the next talk! Grid Tutorial, 13-14 November 2008 21

Information system Finding out where to put your data: BDII BDII collects information of all nodes running grid services in the EGEE infrastructure. Based on ldap Need to set environment variable LCG_GFAL_INFOSYS Needs to be set to a BDII. Example: bdii.grid.sara.nl:2170 Grid Tutorial, 13-14 November 2008 22

Information system lcg-infosites Example: finding an SE: > lcg-infosites --vo tutor se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 1320000000 n.a n.a gb-se-ams.els.sara.nl 1320000000 n.a n.a gb-se-wur.els.sara.nl 536868064 2848 n.a se.grid.rug.nl 104856555 1044 n.a srm.grid.sara.nl Example: finding an LFC > lcg-infosites --vo tutor lfc lfc.grid.sara.nl Grid Tutorial, 13-14 November 2008 23

Information system lcg-info For more advanced searches: For example, finding out where to put your files >lcg-info --vo tutor --list-se --query='se=srm.grid.sara.nl' --attrs=path - SE: srm.grid.sara.nl - Path /pnfs/grid.sara.nl/data/tutor Grid Tutorial, 13-14 November 2008 24

Links glite User Guide: https://edms.cern.ch/file/722398//glite-3-userguide.html Grid Tutorial, 13-14 November 2008 25

Questions? Grid Tutorial, 13-14 November 2008 26