Grid Data Management Week #4 Hardi Teder hardi@eenet.ee University of Tartu March 6th 2013
Overview Grid Data Management Where the Data comes from? Grid Data Management tools 2/33
Grid foundations 3/33
Where the data comes from? CERN's LHC CMS experiment example CERN European Organization for Nuclear Research LHC Large Hadron Collider CMS Compact Muon Solenoid 4/33
Grid acronyms EGI Glossary http://www.egi.eu/about/glossary/ Goole search helps EGI Security Policy Glossary of Terms https://documents.egi.eu/public/showdocument?docid=71 5/33
Large Hadron Collider (LHC) 6/33
Smash things together, see what happens! 7/33
Discover particles Quarks Leptons Leptons Quarks up charm electron muon electron neutrino muon neutrino tau top down strange bottom tau neutrino 8/33
Large Hadron Collider (LHC) 9/33
CMS detector Took ~2000 scientists and engineers more than 20 years to design and build Is about 15 metres wide and 21.5 metres long Weighs twice as much as the Eiffel Tower about 14000t Uses the largest, most powerful magnet of its kind ever made 10/33
11/33
12/33
Collisions in CMS 13/33
CMS in production volume: ~250 TB/day among dozens of Tiers # files: ~19M logical files (but total of replicas so far is ~27M) throughput: 2-2.5 GB/s aggregate (weekly averages) in peak weeks in 2012 14/33
Worldwide LHC Computing Grid (WLCG) Tier0 at CERN 11 Tier1 sites 138 Tier2 sites 15/33
WLCG 15 Petabytes of data annually generated 16/33
There are more projects DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents, images 17/33
Grid foundations 18/33
Data management Data access and transfer Simple, automatic multi-protocol file transfer tools: Integrated with Resource Management service Move data from local machine to remote machine, where the job is executed (input file staging) Move the output files from the remote computer to the local machines (output file staging) Pull executable from a remote location To have a secure, high-performance, reliable file transfer over modern WANs: GridFTP Data replication and management 19/33
ARC Computing Element (CE) Universal frontend for different batch systems Standard and custom interfaces Status information publishing File handling 20/33
ARC CE and data handling Data are moved by the users and/or by the ARC Frequently used files are cached at the execution sites Cached files are indexed 21/33
ARC CE internals All services are only in the frontend Grid users are mapped to local identities Use /tmp/user for files witch are actively used 22/33
ARC UI data manipulation arcls to list contents and view some attributes of objects of a specified (by a URL) remote directory arccp a tool to copy files over the Grid arcrm allows users to erase files and directories at any location specified by a valid URL arcmkdir allows users to create directories, if the protocol of the specified URL supports it 23/33
ARC URLs ftp ordinary File Transfer Protocol (FTP) gsiftp GridFTP, the Globus - enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance http ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams https HTTP with SSL v3 httpg HTTP with Globus GSI ldap ordinary Lightweight Data Access Protocol (LDAP) [9] lfc LFC catalog and indexing service of glite [1] srm Storage Resource Manager (SRM) service [7] root Xrootd protocol (read-only, available in ARC 2.0.0 and later) file local to the host le name with a full path 24/33
An URL can be used: In standard form: protocol://[host[:port]]/file Or, to enhance the performance protocol://[host[:port]][;option[;option[...]]]/file protocol://[url[ url[...]]@]host[:port][;option[;option[...]]] /lfn[:metadataoption[:metadataoption[...]]] protocol://[;commonoption[;commonoption] ][url[ url[...]]@]host[:port [;option[;option[...]]/lfn[:metadataoption[:metadataoption[...]]] 25/33
URL examples ARC UI arcls lfc://lfc.balticgrid.org/grid/balticgrid/bgcc2013/lab4/ arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid/bgcc2013 XRSL to store the job output to storage (optputfiles=("jobhugeoutputfile.tgz" "gsiftp://se.grid.eenet.ee/storage/balticgrid/bgcc2013/user/")) 26/33
GridFTP The GSIFTP protocol offers the functionalities of FTP, but with support for GSI. Supported by all VOs in Gird arccp gsiftp://lscf.nbi.dk:2811/jobs/1323842831451666535/jo b.out job.out
File Catalogue (LFC) Users and applications need to locate files (or replicas) on the Grid. The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s). lfc://lfc.balticgrid.org/grid/balticgrid/bgcc2013/lab4/p 4_data.test Lfc:P4_data.test
Relationships between tables 29/33
LFC environment!/bin/bash export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170 export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfc.balticgrid.org echo -e 'Prindin muutujaid: LCG_GFAL_INFOSYS; LCG_CATALOG_TYPE; LFC_HOST \n' echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST export LFC_HOME=/grid/balticgrid/BGCC2012/Hardi_Teder
Clean up after yourself Delete the files you don't use any more 31/33
References I used several pictures from: CMS experiment public presentations: NorduGrid repository http://svn.nordugrid.org/trac/nordugrid/browser/doc/trunk/figures FREEIMAGES.co.uk http://cms.web.cern.ch/org/cms-presentations-public www.freeimages.co.uk More information about ARC Data Management: http://www.nordugrid.org/papers.html 32/33
Thank you More information from: Hardi Teder hardi@eenet.ee http://courses.cs.ut.ee/2013/cloud ati.gtla@lists.ut.ee 33/33