Consorzio COMETA - Progetto PI2S2. DMS API glite. Salvatore Scifo Consorzio Cometa (PI2S2) - Catania. Corso introduttivo al Grid Computing

Similar documents
Data Management. Enabling Grids for E-sciencE. Vladimir Slavnic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

glite Grid Services Overview

glite Middleware Usage

AMGA metadata catalogue system

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Introduction to SRM. Riccardo Zappi 1

AMGA tutorial. Enabling Grids for E-sciencE

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Comparative evaluation of software tools accessing relational databases from a (real) grid environments

glite Data Management System Hands-on

Understanding StoRM: from introduction to internals

DIRAC data management: consistency, integrity and coherence of data

LCG-2 and glite Architecture and components

Deliverable D8.9 - First release of DM services

Interconnect EGEE and CNGRID e-infrastructures

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

A Simplified Access to Grid Resources for Virtual Research Communities

The AMGA Metadata Service

Outline. File Systems. File System Structure. CSCI 4061 Introduction to Operating Systems

AMGA metadata catalogue and high level API

AMGA Metadata catalogue service for managing mass data of high-energy physics

Metadaten Workshop 26./27. März 2007 Göttingen. Chimera. a new grid enabled name-space service. Martin Radicke. Tigran Mkrtchyan

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

DIRAC File Replica and Metadata Catalog

Scientific data management

A Login Shell interface for INFN-GRID

Grid Data Management

Chelonia User s manual

Experience of Data Grid simulation packages using.

A Distributed Media Service System Based on Globus Data-Management Technologies1

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Database Assessment for PDMS

EUROPEAN MIDDLEWARE INITIATIVE

Introduction to UNIX. Logging in. Basic System Architecture 10/7/10. most systems have graphical login on Linux machines

WHEN the Large Hadron Collider (LHC) begins operation

A Simple Mass Storage System for the SRB Data Grid

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Pegasus. Pegasus Workflow Management System. Mats Rynge

UNIT-IV HDFS. Ms. Selva Mary. G

Programming the Grid with glite

Grid Infrastructure For Collaborative High Performance Scientific Computing

File System Interface. ICS332 Operating Systems

The glite middleware. Ariel Garcia KIT

Using Standards-based Interfaces to Share Data across Grid Infrastructures

Content. 1. Introduction. 2. IBM Social Business Toolkit - Social SDK. 3. Social Builder. 4. Sample WEF Portlet application. 5.

CSCE 212H, Spring 2008, Matthews Lab Assignment 1: Representation of Integers Assigned: January 17 Due: January 22

The glite middleware. Presented by John White EGEE-II JRA1 Dep. Manager On behalf of JRA1 Enabling Grids for E-sciencE

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Perl and R Scripting for Biologists

Knowledge Discovery Services and Tools on Grids

Distributed Systems 16. Distributed File Systems II

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

The UNIX Operating System. HORT Lecture 2 Instructor: Kranthi Varala

Overview of HEP software & LCG from the openlab perspective

CAS 703 Software Design

Virtual File System. Don Porter CSE 506

Table of Contents 1 FTP and SFTP Configuration TFTP Configuration 2-1

Data Movement and Storage. 04/07/09 1

Chapter 11: File-System Interface

Week 2 Lecture 3. Unix

Unix Filesystem. January 26 th, 2004 Class Meeting 2

CS Fundamentals of Programming II Fall Very Basic UNIX

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Data transfer at CINECA: how to enjoy it!

Chapter 10: File System

Adding SRM Functionality to QCDGrid. Albert Antony

The CORAL Project. Dirk Düllmann for the CORAL team Open Grid Forum, Database Workshop Barcelona, 4 June 2008

EMC Isilon. Cisco UCS Director Support for EMC Isilon

Chapter 11: File-System Interface. File Concept. File Structure

Introduction. SSH Secure Shell Client 1

Chapter 11: File-System Interface. Operating System Concepts 9 th Edition

Philippe Charpentier PH Department CERN, Geneva

Bootstrapping a (New?) LHC Data Transfer Ecosystem

You should see something like this, called the prompt :

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring HDFS Basics

COMPUTE CANADA GLOBUS PORTAL

Tutorial for CMS Users: Data Analysis on the Grid with CRAB

Chapter 11: File-System Interface. Long-term Information Storage. File Structure. File Structure. File Concept. File Attributes

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Architecture Proposal

Part I. Introduction to Linux

Service Availability Monitor tests for ATLAS

Chapter 10: File-System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing Protection

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

X-WAVE Canonical Use Case 3 Remote File Access Level 3 Version 0.92

Gergely Sipos MTA SZTAKI

Virtual File System. Don Porter CSE 306

Basic Unix Commands. CGS 3460, Lecture 6 Jan 23, 2006 Zhen Yang

GROWL Scripts and Web Services

API Gateway Version September Key Property Store User Guide

/* Copyright 2012 Robert C. Ilardi

Monitoring the Usage of the ZEUS Analysis Grid

MTU Computer Structure

Secure Shell Commands

TFTP and FTP Basics BUPT/QMUL

Chapter 11: File-System Interface

Outline. Structure of a UNIX command

DataGRID. Lead Partner: Document status:

Oregon State University School of Electrical Engineering and Computer Science. CS 261 Recitation 1. Spring 2011

Transcription:

Consorzio COMETA - Progetto PI2S2 FESR DMS API glite Salvatore Scifo Consorzio Cometa (PI2S2) - Catania Corso introduttivo al Grid Computing Lezione 13 - glite middleware (Data Management - Parte 4) Catania, 06 Maggio 2008 www.consorzio-cometa.it

Consorzio COMETA - Progetto PI2S2 FESR DMS Overview www.consorzio-cometa.it

The Grid DM Challenge Heterogeneity Data are stored on different storage systems using different access technologies Distribution Data are stored in different locations in most cases there is no shared file system or common namespace Data need to be moved between different locations Data description Data are stored as files: need a way to describe files and locate them according to their contents Need common interface to storage resources Storage Resource Manager (SRM) Need to keep track where data is stored File and Replica Catalogs Need scheduled, reliable file transfer File transfer service Need a way to describe files content and query them Metadata service

Introduction Assumptions: Users and programs produce and require data the lowest granularity of the data is on the file level (we deal with files rather than data objects or tables) Data = files Files: Mostly, write once, read many Located in Storage Elements (SEs) Several replicas of one file in different sites Accessible by Grid users and applications from anywhere Locatable by the WMS (data requirements in JDL) Also WMS can send (small amounts of) data to/from jobs: Input and Output Sandbox Files may be copied from/to local filesystems (WNs, UIs) to the Grid (SEs)

Files Naming conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/20030203/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g. srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1 (SRM) Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat

glite Storage Element

File Catalog glite File Catalog (LFC) SE SE glite UI SE

C++ multiprocess server Backends Oracle, MySQL, PostgreSQL, SQLite Front Ends TCP Streaming High performance Client API for C++, Java, Python, Perl, Ruby SOAP (web services) Interoperability Scalability Standalone Python Library implementation Data stored on file system glite Metadata Catalog (AMGA) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 8

Consorzio COMETA - Progetto PI2S2 FESR GridFTP/GsiFTP Grid File Transfer Protocol www.consorzio-cometa.it

FTP overview File transfer protocol (FTP) is one of the Protocols older and used in the Internet. Its task is to transfer files between hosts on a network. Client/server architecture based It allows users to access files on remote systems, using a set of commands very simple. It uses a password and username method to authenticate, not encrypted. It is considered a Protocol not secure and should only be used if necessary. A SUBSTITUTE suitable for FTP is sftp, OpenSSH suite of tools. Corso introduttivo al Grid Computing - Catania (Italy), May 06th 10

GridFTP globus solution GridFTP is a high-performance, secure, reliable FTP optimized for high-bandwidth wide-area networks It is based upon the Internet FTP protocol it implements extensions for high-performance operation that were either already specified in the FTP specification but not commonly implemented or that were proposed as extensions by globus team. GridFTP uses basic Grid security on both control (command) and data channels. Other features include multiple data channels for parallel transfers partial file transfers third-party (direct server-to-server) transfers reusable data channels, and command pipelining. Both C/Java APIs are available (Globus TK 4.0) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 11

GridFTP commands get put cd dir pwd delete ls quit mkdir rmdir Download a remote file from the server to the local machine Upload a file from the local machine to the remote server Change the pointed directory on the remote server Lists the contents of the remote directory.the asterisk (*) and the question mark (?) may be used as wild cards Print working directory Delete (remove) a file in the current remote directory (same as rm in UNIX) List the names of the files in the current remote directory Close the ftp connection and exit Make a new directory within the current remote directory Remove (delete) a directory in the current remote directory

GridFTP java API sample public class GSIFTPTest { public static void main(string[] args) throws Exception{ testconnection(); } public static void testconnection() throws ServerException, IOException, GlobusCredentialException, GSSException, ClientException { CoGProperties.getDefault().setCaCertLocations( D:\\ca"); GridFTPClient client = new GridFTPClient("aliserv6.ct.infn.it", 2811); GlobusCredential globuscredential = new GlobusCredential(new FileInputStream("C:\\x509up_u3017")); ExtendedGSSCredential gsscred = new GlobusGSSCredentialImpl(globusCredential, GSSCredential.INITIATE_AND_ACCEPT); GSSCredential credential = gsscred; client.authenticate(credential); client.setpassive(); client.setlocalactive(); client.settype(session.type_image); client.get("/dpm/ct.trigrid.it/home/trigrid/generated/2007-11-13/file375c2a75-225f-4888-8df2-820eeae6af67", new File( C:/primo3.txt")); client.put(new File("d:/primo3.txt"), "/dpm/ct.infn.it/home/gilda/tony/primo3_sammy.txt", true) } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 13

Consorzio COMETA - Progetto PI2S2 FESR GFAL Grid File Access Library www.consorzio-cometa.it

GFAL: Main features The GFAL Features Provides a Posix-like interface for File I/O Operation Based on shared libraries (both threaded e unthreaded version) Needs only one header file (gfal_api.h) to write C applications Supports of low level protocols (dcache, CASTOR, ) Fully compliance with the GSI (Grid Security Infrastructure) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 15

GFAL: Available APIs C API -> functions oriented The header file gfal_api.h needs to be included in the application source code to get the prototype of the functions. The function names are obtained by prepending gfal_ to the Posix names, for example gfal_open, gfal_read, gfal_close... The argument lists and the values returned by the functions are identical. The variable errno is set to the Posix Error Codes in the case of failure. Java API (C API Wrapper) -> Object Oriented It provides three main Java Objects that need to be imported in the java applications in order to hide the underlying C functions. GFalFile : to handle and read/write files GFalDirectory : to handle and manage directories (create, delete, list) GFalUtilities : to manage file (rename, stat, lstat, delete) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 16

GFAL java API sample public class GFALUploadFileTest { public static void main(string[] args) throws Exception { if(args == null args.length < 4) { System.out.println("\nWrong Arguments Passed!"); System.out.println("\nUsage: GFALUploadFileTest LocalFileSystemName seurl LFN amode"); System.out.println("\nEs: GFALUploadFileTest /home/user/file.dat aliserv6.ct.infn.it lfn:/grid/gilda/user/file.dat 644"); System.exit(-1); } GFalFile gfalfile = new GFalFile(); String filename = args[0]; String seurl = args[1]; String lfn = args[2]; String mod = args[3]; FileToByteArray filetba; String SURL = null; try { filetba = new FileToByteArray(fileName); gfalfile.createfile(seurl, Integer.parseInt(mod.trim()), false, false); int ret = gfalfile.writefile(filetba.tobytearray()); if(ret == -1) { SURL = "No SURL has been provided!"; throw new Exception("Error has been detected during file writing onto SE : " + seurl); } else { SURL = gfalfile.getsurl(); gfalfile.closefile(); gfalfile.lfcregisterfile(lfn); } } catch(exception e) { System.out.println("\n" + e.getmessage()); if(surl == null) { SURL = "No SURL has been provided!"; } } System.out.println("\nFollowing SURL has been created : " + SURL); } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 17

Consorzio COMETA - Progetto PI2S2 FESR LFC/LCG-Utils (File Catalog) www.consorzio-cometa.it

LFC Features Timeouts and retries from the client User exposed transactional API (+ auto rollback on failure) Hierarchical namespace and namespace operations (for LFN's) Integrated GSI Authentication + Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Checksums Integration with VOMS Corso introduttivo al Grid Computing - Catania (Italy), May 06th 19

LFC APIs LCG UTIL C API GFAL C API LCG UTIL Java API GFAL Java API (wrapper) LFC CLIENT C API LFC SERVER Python API CLI lfc-ls, lfc-mkdir, lfc-setacl, Corso introduttivo al Grid Computing - Catania (Italy), May 06th 20

LFC commands lfc-chmod lfc-chown lfc-delcomment lfc-getacl lfc-ln lfc-ls lfc-mkdir lfc-rename lfc-rm lfc-setacl lfc-setcomment Change access mode of the LFC file/directory Change owner and group of the LFC file-directory Delete the comment associated with the file/directory Get file/directory access control lists Make a symbolic link to a file/directory List file/directory entries in a directory Create a directory Rename a file/directory Remove a file/directory Set file/directory access control lists Add/replace a comment

Replica Management lcg-cp Copies a grid file to a local destination lcg-utils commands lcg-cr lcg-del lcg-rep lcg-gt Copies a file to a SE and registers the file in the catalog Delete one file Replication between SEs and registration of the replica Gets the TURL for a given SURL and transfer protocol lcg-sd File Catalog Interaction Sets file status to Done for a given SURL in a SRM request lcg-aa lcg-ra lcg-rf lcg-uf lcg-la lcg-lg lcg-lr Add an alias in LFC for a given GUID Remove an alias in LFC for a given GUID Registers in LFC a file placed in a SE Unregisters in LFC a file placed in a SE Lists the alias for a given SURL, GUID or LFN Get the GUID for a given LFN or SURL Lists the replicas for a given GUID, SURL or LFN

LFC/LCG Java Sample public class CopyAndRegister{ public static void main(string args []){ int numargs = args.length; if(numargs!=4) { System.out.println("Copies and registers file to grid"); System.out.print("Usage: copyandregister sourcefilepath "); System.out.print("gridDestinationDir griddestfilename SEName_withSfnOrSrmPrefix"); System.out.println(); System.exit(-1); } DataStorageInterface dsi = new LFCDataStorage(); DirectoryItem di = new LFCDirectoryItem("",args[1],null,dsi); boolean success; success = di.copyandregister(args[0],args[2],args[3]); if(success) { System.out.print("File "+args[0]+" copied and registed as: "); System.out.print(args[1]+"/"+args[2]+"."); System.out.println(); } else { System.err.println("Unable to copy and register file."); System.exit(-1); } } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 23

Consorzio COMETA - Progetto PI2S2 FESR AMGA (Metadata Catalog) www.consorzio-cometa.it

glite Metadata Catalog (AMGA) Grids often contain millions of files spread over several storage sites. Users and applications need an efficient mechanism to find the files of their interest to discover and query information about their contents This is provided by associating descriptive attributes (metadata) to files by exposing this information in catalogues, accessible and searchable by user and client application Corso introduttivo al Grid Computing - Catania (Italy), May 06th 25

To better understand how AMGA works think of schema database schema collection table attribute column entry row Metadata Concepts AMGA Metadata is list of attributes associated with entries according to a user defined schema. Schema is a set of attributes Entry is the abstraction of directory/file mapped by the metadata server Collection is a set of entries associated with a schema Attribute typed key/value pair associated with entries Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute Corso introduttivo al Grid Computing - Catania (Italy), May 06th 26

Amga commands dir [path] / ls [path] pwd cd directory createdir /parentdir/dir rmdir path rm pattern addentry entry value removentries path listattr path setattr entry value getattr pattern (attribute) removeattr dir attribute Returns all subdirectories and files in the given path Returns the current directory Change directory Creates the directory dir if it does not yet exist but parentdir already does Removes the directory given by path Removes all entries matching pattern Add a new entry and initializes some attributes Remove the entry of the given path Returns a list of all attributes of the given file/direcory Sets one or more attributes of an entry to given values Returns the entries and all the attributes for every file matching pattern Removes an attribute from a directory Corso introduttivo al Grid Computing - Catania (Italy), May 06th 27

API Overview The AMGA Java API can be used in three ways LOW LEVEL It means the direct usage of MDServerConnection and MDServerConnectionContext Objects. Developer must know the right syntax of all Amga Commands and write himself a lot of code to send command and parse results. MEDIUM LEVEL API a complete Object Model for all Logical Entities of Amga (Collection, Entry, Attribute, User, Group,...) a complete list of Collection Builders to exsecute Queries and Searches and iterate or manipulate results API also, provides the class arda.md.javaclient.mdclient that wraps the most used command and uses all mentioned Components. Anyway, user must know the properties of the MdClient object and the correct behavior of its methods. No design patterns are provided and dependencies of syntax is still present. HIGH LEVEL developer can adopt an high level engine, that wraps all APIs providing an OO framework. The framework hides any APIs complexity implementing several OO Design Patterns to improve code reusing and to leave developer free to think about Business Logic only. Corso introduttivo al Grid Computing - Catania (Italy), May 06th 28

AMGA Java API sample... MDServerConnection conn = null; ConnectionPool pool = null; try { String path = "/gilda"; pool = ConnectionPool.getInstance(); conn = pool.getconnection(); //Utility Class if(dirutils.direxist(conn, path)) { CollectionElement dir = new CollectionElement(path, CollectionElement.COLLECTION); AmgaContextBase context = new AmgaContextBase(); context.setconnection(conn); context.setcollection(dir); AmgaManager manager = new AmgaManager(); // Factory Pattern and dynamic object creation and usage Collection col = manager.executequery(amgacollectionsfactory.factoryqry(amgaoperationcodes.amga_operation_entry_list), context); Iterator iterator = col.iterator(); pool.releaseconnection(conn); conn = null; // work with your collection or iterate it as your pleasure... Corso introduttivo al Grid Computing - Catania (Italy), May 06th 29

Consorzio COMETA - Progetto PI2S2 FESR GSAF (Grid Storage Access Framework) www.consorzio-cometa.it

Development troubles Grid Data Services are independent each from others (SOA) They work in a stand a lone mode (API fragmentation) Any kind of coherence is ensured (uncontrollable resources) How to build this Access Layer Fragmentation effects software engineers must consider a vertical architecture applications must take care themselves (at Business Logic Level), about the atomicity, coherence and synchronization of data manipulation (development effort) knowledge Gap (traditional development Vs grid development) code replication for different use cases Corso introduttivo al Grid Computing - Catania (Italy), May 06th 31

GSAF idea We have common features, we have common problems we need a Design Pattern Built on top of the Grid Metadata Service and Grid Data Service collects and implements functionalities shared among applications according to write once use anywhere principle reduces the knowledge gap hiding the complexity and the fragmentation of the several underlying APIs exposing a unified interface more near to the developer mind (design patterns) rather than the Grid stuff details (API syntaxes) works as a black box providing classes and related methods for applications located above interfaces to extend the implemented capabilities Corso introduttivo al Grid Computing - Catania (Italy), May 06th 32

GSAF Software Model Design covers several Object Oriented Design Patterns Singleton method, Strategy method, Factory method, Template Method, Iterator and Composite method. This ensures a very clean and simple software architecture (specially for adopter applications) with an high degree of cohesion and decoupling. Application sees only GSAF components Corso introduttivo al Grid Computing - Catania (Italy), May 06th 33

public class SynchUploadTestCase { public void macroupload() throws Exception { // AMGA client configuration file String amgaconffile = "/tmp/mdjavaclient.conf"; String currentuserproxy = "/tmp/x509_up3017"; MDServerConnectionContext mdservercontext = new MDServerConnectionContext(amgaConfFile); MDServerConnection conn = new MDServerConnection(mdServerContext); DataStorageInterface dsi = new LFCDataStorage(); dsi.setproxypath(currentuserproxy); GSAFContext gsafctx = new GSAFContext(); gsafctx.setconnection(conn); gsafctx.setdsi(dsi); // name of the file on LFC Catalog String lfcfilename = "test1"; // virtual LFC directory where file must be uploaded String lfcdirpath = "/grid/trigrid/unict/tmp100"; // storage element where save the file String sehostname = "adat.ct.trigrid.it"; // path of file on the local HardDrive String localfilepath = "/home/sammy/001v.jpg"; FileElement file = new FileElement(lfcFileName, lfcdirpath, sehostname, localfilepath); gsafctx.setfile(file); //current amga collection String mdcollection = lfcdirpath; //amga entry name String mdentry = lfcfilename; CollectionElement entryelement = new CollectionElement(mdEntry, CollectionElement.ENTRY); CollectionElement direlement = new CollectionElement(mdCollection, CollectionElement.COLLECTION); gsafctx.setcollection(direlement); gsafctx.setentry(entryelement); // providing it is an application matter String[] keys = {"fileid", "owner", "size"}; String[] values = {"10", "sammy", "1256777"}; AttributeSet attributes = new AttributeSet(keys, values); gsafctx.setattributes(attributes); GSAFCommand op = GSAFOperationFactory.factoryCMD(GSAFOperationCodes.GSAF_OPERATION_MACRO_FILE_UPLOAD); op.execute(gsafctx); } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 34

GFAL References GFAL Excercises (C/Java): https://grid.ct.infn.it/twiki/bin/view/pi2s2/usinggfal AMGA API http://amga.web.cern.ch/amga/api_java13.html GSAF https://grid.ct.infn.it/twiki/bin/view/pi2s2/gsaf LFC https://grid.ct.infn.it/twiki/bin/view/gilda/lfcjavaapi Corso introduttivo al Grid Computing - Catania (Italy), May 06th 35

Than you very much for your kind attention! Questions Corso introduttivo al Grid Computing - Catania (Italy), May 06th 36