Consorzio COMETA - Progetto PI2S2 FESR DMS API glite Salvatore Scifo Consorzio Cometa (PI2S2) - Catania Corso introduttivo al Grid Computing Lezione 13 - glite middleware (Data Management - Parte 4) Catania, 06 Maggio 2008 www.consorzio-cometa.it
Consorzio COMETA - Progetto PI2S2 FESR DMS Overview www.consorzio-cometa.it
The Grid DM Challenge Heterogeneity Data are stored on different storage systems using different access technologies Distribution Data are stored in different locations in most cases there is no shared file system or common namespace Data need to be moved between different locations Data description Data are stored as files: need a way to describe files and locate them according to their contents Need common interface to storage resources Storage Resource Manager (SRM) Need to keep track where data is stored File and Replica Catalogs Need scheduled, reliable file transfer File transfer service Need a way to describe files content and query them Metadata service
Introduction Assumptions: Users and programs produce and require data the lowest granularity of the data is on the file level (we deal with files rather than data objects or tables) Data = files Files: Mostly, write once, read many Located in Storage Elements (SEs) Several replicas of one file in different sites Accessible by Grid users and applications from anywhere Locatable by the WMS (data requirements in JDL) Also WMS can send (small amounts of) data to/from jobs: Input and Output Sandbox Files may be copied from/to local filesystems (WNs, UIs) to the Grid (SEs)
Files Naming conventions Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:/grid/gilda/20030203/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g. srm://grid009.ct.infn.it/dpm/ct.infn.it/gilda/output10_1 (SRM) Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g. rfio://lxshare0209.cern.ch//data/alice/ntuples.dat
glite Storage Element
File Catalog glite File Catalog (LFC) SE SE glite UI SE
C++ multiprocess server Backends Oracle, MySQL, PostgreSQL, SQLite Front Ends TCP Streaming High performance Client API for C++, Java, Python, Perl, Ruby SOAP (web services) Interoperability Scalability Standalone Python Library implementation Data stored on file system glite Metadata Catalog (AMGA) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 8
Consorzio COMETA - Progetto PI2S2 FESR GridFTP/GsiFTP Grid File Transfer Protocol www.consorzio-cometa.it
FTP overview File transfer protocol (FTP) is one of the Protocols older and used in the Internet. Its task is to transfer files between hosts on a network. Client/server architecture based It allows users to access files on remote systems, using a set of commands very simple. It uses a password and username method to authenticate, not encrypted. It is considered a Protocol not secure and should only be used if necessary. A SUBSTITUTE suitable for FTP is sftp, OpenSSH suite of tools. Corso introduttivo al Grid Computing - Catania (Italy), May 06th 10
GridFTP globus solution GridFTP is a high-performance, secure, reliable FTP optimized for high-bandwidth wide-area networks It is based upon the Internet FTP protocol it implements extensions for high-performance operation that were either already specified in the FTP specification but not commonly implemented or that were proposed as extensions by globus team. GridFTP uses basic Grid security on both control (command) and data channels. Other features include multiple data channels for parallel transfers partial file transfers third-party (direct server-to-server) transfers reusable data channels, and command pipelining. Both C/Java APIs are available (Globus TK 4.0) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 11
GridFTP commands get put cd dir pwd delete ls quit mkdir rmdir Download a remote file from the server to the local machine Upload a file from the local machine to the remote server Change the pointed directory on the remote server Lists the contents of the remote directory.the asterisk (*) and the question mark (?) may be used as wild cards Print working directory Delete (remove) a file in the current remote directory (same as rm in UNIX) List the names of the files in the current remote directory Close the ftp connection and exit Make a new directory within the current remote directory Remove (delete) a directory in the current remote directory
GridFTP java API sample public class GSIFTPTest { public static void main(string[] args) throws Exception{ testconnection(); } public static void testconnection() throws ServerException, IOException, GlobusCredentialException, GSSException, ClientException { CoGProperties.getDefault().setCaCertLocations( D:\\ca"); GridFTPClient client = new GridFTPClient("aliserv6.ct.infn.it", 2811); GlobusCredential globuscredential = new GlobusCredential(new FileInputStream("C:\\x509up_u3017")); ExtendedGSSCredential gsscred = new GlobusGSSCredentialImpl(globusCredential, GSSCredential.INITIATE_AND_ACCEPT); GSSCredential credential = gsscred; client.authenticate(credential); client.setpassive(); client.setlocalactive(); client.settype(session.type_image); client.get("/dpm/ct.trigrid.it/home/trigrid/generated/2007-11-13/file375c2a75-225f-4888-8df2-820eeae6af67", new File( C:/primo3.txt")); client.put(new File("d:/primo3.txt"), "/dpm/ct.infn.it/home/gilda/tony/primo3_sammy.txt", true) } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 13
Consorzio COMETA - Progetto PI2S2 FESR GFAL Grid File Access Library www.consorzio-cometa.it
GFAL: Main features The GFAL Features Provides a Posix-like interface for File I/O Operation Based on shared libraries (both threaded e unthreaded version) Needs only one header file (gfal_api.h) to write C applications Supports of low level protocols (dcache, CASTOR, ) Fully compliance with the GSI (Grid Security Infrastructure) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 15
GFAL: Available APIs C API -> functions oriented The header file gfal_api.h needs to be included in the application source code to get the prototype of the functions. The function names are obtained by prepending gfal_ to the Posix names, for example gfal_open, gfal_read, gfal_close... The argument lists and the values returned by the functions are identical. The variable errno is set to the Posix Error Codes in the case of failure. Java API (C API Wrapper) -> Object Oriented It provides three main Java Objects that need to be imported in the java applications in order to hide the underlying C functions. GFalFile : to handle and read/write files GFalDirectory : to handle and manage directories (create, delete, list) GFalUtilities : to manage file (rename, stat, lstat, delete) Corso introduttivo al Grid Computing - Catania (Italy), May 06th 16
GFAL java API sample public class GFALUploadFileTest { public static void main(string[] args) throws Exception { if(args == null args.length < 4) { System.out.println("\nWrong Arguments Passed!"); System.out.println("\nUsage: GFALUploadFileTest LocalFileSystemName seurl LFN amode"); System.out.println("\nEs: GFALUploadFileTest /home/user/file.dat aliserv6.ct.infn.it lfn:/grid/gilda/user/file.dat 644"); System.exit(-1); } GFalFile gfalfile = new GFalFile(); String filename = args[0]; String seurl = args[1]; String lfn = args[2]; String mod = args[3]; FileToByteArray filetba; String SURL = null; try { filetba = new FileToByteArray(fileName); gfalfile.createfile(seurl, Integer.parseInt(mod.trim()), false, false); int ret = gfalfile.writefile(filetba.tobytearray()); if(ret == -1) { SURL = "No SURL has been provided!"; throw new Exception("Error has been detected during file writing onto SE : " + seurl); } else { SURL = gfalfile.getsurl(); gfalfile.closefile(); gfalfile.lfcregisterfile(lfn); } } catch(exception e) { System.out.println("\n" + e.getmessage()); if(surl == null) { SURL = "No SURL has been provided!"; } } System.out.println("\nFollowing SURL has been created : " + SURL); } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 17
Consorzio COMETA - Progetto PI2S2 FESR LFC/LCG-Utils (File Catalog) www.consorzio-cometa.it
LFC Features Timeouts and retries from the client User exposed transactional API (+ auto rollback on failure) Hierarchical namespace and namespace operations (for LFN's) Integrated GSI Authentication + Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Checksums Integration with VOMS Corso introduttivo al Grid Computing - Catania (Italy), May 06th 19
LFC APIs LCG UTIL C API GFAL C API LCG UTIL Java API GFAL Java API (wrapper) LFC CLIENT C API LFC SERVER Python API CLI lfc-ls, lfc-mkdir, lfc-setacl, Corso introduttivo al Grid Computing - Catania (Italy), May 06th 20
LFC commands lfc-chmod lfc-chown lfc-delcomment lfc-getacl lfc-ln lfc-ls lfc-mkdir lfc-rename lfc-rm lfc-setacl lfc-setcomment Change access mode of the LFC file/directory Change owner and group of the LFC file-directory Delete the comment associated with the file/directory Get file/directory access control lists Make a symbolic link to a file/directory List file/directory entries in a directory Create a directory Rename a file/directory Remove a file/directory Set file/directory access control lists Add/replace a comment
Replica Management lcg-cp Copies a grid file to a local destination lcg-utils commands lcg-cr lcg-del lcg-rep lcg-gt Copies a file to a SE and registers the file in the catalog Delete one file Replication between SEs and registration of the replica Gets the TURL for a given SURL and transfer protocol lcg-sd File Catalog Interaction Sets file status to Done for a given SURL in a SRM request lcg-aa lcg-ra lcg-rf lcg-uf lcg-la lcg-lg lcg-lr Add an alias in LFC for a given GUID Remove an alias in LFC for a given GUID Registers in LFC a file placed in a SE Unregisters in LFC a file placed in a SE Lists the alias for a given SURL, GUID or LFN Get the GUID for a given LFN or SURL Lists the replicas for a given GUID, SURL or LFN
LFC/LCG Java Sample public class CopyAndRegister{ public static void main(string args []){ int numargs = args.length; if(numargs!=4) { System.out.println("Copies and registers file to grid"); System.out.print("Usage: copyandregister sourcefilepath "); System.out.print("gridDestinationDir griddestfilename SEName_withSfnOrSrmPrefix"); System.out.println(); System.exit(-1); } DataStorageInterface dsi = new LFCDataStorage(); DirectoryItem di = new LFCDirectoryItem("",args[1],null,dsi); boolean success; success = di.copyandregister(args[0],args[2],args[3]); if(success) { System.out.print("File "+args[0]+" copied and registed as: "); System.out.print(args[1]+"/"+args[2]+"."); System.out.println(); } else { System.err.println("Unable to copy and register file."); System.exit(-1); } } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 23
Consorzio COMETA - Progetto PI2S2 FESR AMGA (Metadata Catalog) www.consorzio-cometa.it
glite Metadata Catalog (AMGA) Grids often contain millions of files spread over several storage sites. Users and applications need an efficient mechanism to find the files of their interest to discover and query information about their contents This is provided by associating descriptive attributes (metadata) to files by exposing this information in catalogues, accessible and searchable by user and client application Corso introduttivo al Grid Computing - Catania (Italy), May 06th 25
To better understand how AMGA works think of schema database schema collection table attribute column entry row Metadata Concepts AMGA Metadata is list of attributes associated with entries according to a user defined schema. Schema is a set of attributes Entry is the abstraction of directory/file mapped by the metadata server Collection is a set of entries associated with a schema Attribute typed key/value pair associated with entries Type The type (int, float, string, ) Name/Key The name of the attribute Value - Value of an entry's attribute Corso introduttivo al Grid Computing - Catania (Italy), May 06th 26
Amga commands dir [path] / ls [path] pwd cd directory createdir /parentdir/dir rmdir path rm pattern addentry entry value removentries path listattr path setattr entry value getattr pattern (attribute) removeattr dir attribute Returns all subdirectories and files in the given path Returns the current directory Change directory Creates the directory dir if it does not yet exist but parentdir already does Removes the directory given by path Removes all entries matching pattern Add a new entry and initializes some attributes Remove the entry of the given path Returns a list of all attributes of the given file/direcory Sets one or more attributes of an entry to given values Returns the entries and all the attributes for every file matching pattern Removes an attribute from a directory Corso introduttivo al Grid Computing - Catania (Italy), May 06th 27
API Overview The AMGA Java API can be used in three ways LOW LEVEL It means the direct usage of MDServerConnection and MDServerConnectionContext Objects. Developer must know the right syntax of all Amga Commands and write himself a lot of code to send command and parse results. MEDIUM LEVEL API a complete Object Model for all Logical Entities of Amga (Collection, Entry, Attribute, User, Group,...) a complete list of Collection Builders to exsecute Queries and Searches and iterate or manipulate results API also, provides the class arda.md.javaclient.mdclient that wraps the most used command and uses all mentioned Components. Anyway, user must know the properties of the MdClient object and the correct behavior of its methods. No design patterns are provided and dependencies of syntax is still present. HIGH LEVEL developer can adopt an high level engine, that wraps all APIs providing an OO framework. The framework hides any APIs complexity implementing several OO Design Patterns to improve code reusing and to leave developer free to think about Business Logic only. Corso introduttivo al Grid Computing - Catania (Italy), May 06th 28
AMGA Java API sample... MDServerConnection conn = null; ConnectionPool pool = null; try { String path = "/gilda"; pool = ConnectionPool.getInstance(); conn = pool.getconnection(); //Utility Class if(dirutils.direxist(conn, path)) { CollectionElement dir = new CollectionElement(path, CollectionElement.COLLECTION); AmgaContextBase context = new AmgaContextBase(); context.setconnection(conn); context.setcollection(dir); AmgaManager manager = new AmgaManager(); // Factory Pattern and dynamic object creation and usage Collection col = manager.executequery(amgacollectionsfactory.factoryqry(amgaoperationcodes.amga_operation_entry_list), context); Iterator iterator = col.iterator(); pool.releaseconnection(conn); conn = null; // work with your collection or iterate it as your pleasure... Corso introduttivo al Grid Computing - Catania (Italy), May 06th 29
Consorzio COMETA - Progetto PI2S2 FESR GSAF (Grid Storage Access Framework) www.consorzio-cometa.it
Development troubles Grid Data Services are independent each from others (SOA) They work in a stand a lone mode (API fragmentation) Any kind of coherence is ensured (uncontrollable resources) How to build this Access Layer Fragmentation effects software engineers must consider a vertical architecture applications must take care themselves (at Business Logic Level), about the atomicity, coherence and synchronization of data manipulation (development effort) knowledge Gap (traditional development Vs grid development) code replication for different use cases Corso introduttivo al Grid Computing - Catania (Italy), May 06th 31
GSAF idea We have common features, we have common problems we need a Design Pattern Built on top of the Grid Metadata Service and Grid Data Service collects and implements functionalities shared among applications according to write once use anywhere principle reduces the knowledge gap hiding the complexity and the fragmentation of the several underlying APIs exposing a unified interface more near to the developer mind (design patterns) rather than the Grid stuff details (API syntaxes) works as a black box providing classes and related methods for applications located above interfaces to extend the implemented capabilities Corso introduttivo al Grid Computing - Catania (Italy), May 06th 32
GSAF Software Model Design covers several Object Oriented Design Patterns Singleton method, Strategy method, Factory method, Template Method, Iterator and Composite method. This ensures a very clean and simple software architecture (specially for adopter applications) with an high degree of cohesion and decoupling. Application sees only GSAF components Corso introduttivo al Grid Computing - Catania (Italy), May 06th 33
public class SynchUploadTestCase { public void macroupload() throws Exception { // AMGA client configuration file String amgaconffile = "/tmp/mdjavaclient.conf"; String currentuserproxy = "/tmp/x509_up3017"; MDServerConnectionContext mdservercontext = new MDServerConnectionContext(amgaConfFile); MDServerConnection conn = new MDServerConnection(mdServerContext); DataStorageInterface dsi = new LFCDataStorage(); dsi.setproxypath(currentuserproxy); GSAFContext gsafctx = new GSAFContext(); gsafctx.setconnection(conn); gsafctx.setdsi(dsi); // name of the file on LFC Catalog String lfcfilename = "test1"; // virtual LFC directory where file must be uploaded String lfcdirpath = "/grid/trigrid/unict/tmp100"; // storage element where save the file String sehostname = "adat.ct.trigrid.it"; // path of file on the local HardDrive String localfilepath = "/home/sammy/001v.jpg"; FileElement file = new FileElement(lfcFileName, lfcdirpath, sehostname, localfilepath); gsafctx.setfile(file); //current amga collection String mdcollection = lfcdirpath; //amga entry name String mdentry = lfcfilename; CollectionElement entryelement = new CollectionElement(mdEntry, CollectionElement.ENTRY); CollectionElement direlement = new CollectionElement(mdCollection, CollectionElement.COLLECTION); gsafctx.setcollection(direlement); gsafctx.setentry(entryelement); // providing it is an application matter String[] keys = {"fileid", "owner", "size"}; String[] values = {"10", "sammy", "1256777"}; AttributeSet attributes = new AttributeSet(keys, values); gsafctx.setattributes(attributes); GSAFCommand op = GSAFOperationFactory.factoryCMD(GSAFOperationCodes.GSAF_OPERATION_MACRO_FILE_UPLOAD); op.execute(gsafctx); } } Corso introduttivo al Grid Computing - Catania (Italy), May 06th 34
GFAL References GFAL Excercises (C/Java): https://grid.ct.infn.it/twiki/bin/view/pi2s2/usinggfal AMGA API http://amga.web.cern.ch/amga/api_java13.html GSAF https://grid.ct.infn.it/twiki/bin/view/pi2s2/gsaf LFC https://grid.ct.infn.it/twiki/bin/view/gilda/lfcjavaapi Corso introduttivo al Grid Computing - Catania (Italy), May 06th 35
Than you very much for your kind attention! Questions Corso introduttivo al Grid Computing - Catania (Italy), May 06th 36