Dynamic Federations Seamless aggregation of standard-protocol-based storage endpoints Fabrizio Furano Patrick Fuhrmann Paul Millar Daniel Becker Adrien Devresse Oliver Keeble Ricardo Brito da Rocha Alejandro Alvarez Credits to ShuTing Liao (ASGC) 1
2 WLCG Computing Model Data Worker Worker Worker Worker App Data Cernvmfs Data 18 Sept 2012 F.Furano - Dynamic federations
3 Storage Federations: Motivations Currently data lives on islands of storage catalogues are the maps FTS/gridFTP are the delivery companies Experiment frameworks populate the island Jobs are directed to places where the needed data is or should be... Almost all data lives on more than one island Assumption: perfect storage ( unlikely to impossible) perfect experiment workflow and catalogues ( unlikely ) Strict locality has some limitations a single missing file can derail the whole job or series of jobs -> Failover to data on another island could help Replica catalogues impose limitations, too E.g. synchronization is difficult, performance too Quest for direct, Web-like forms of data access Great plus: other use cases may be fulfilled e.g. site caching, sharing storage amongst sites 18 Sept 2012 F.Furano - Dynamic federations
Storage federations What s the goal? Make different storage clusters be seen as one Make global file-based data access seamless How should this be done? Dynamically easy to setup/maintain no complex metadata persistency no DB babysitting (keep it for the experiment s metadata) no replica catalogue inconsistencies, by design Light config constraints on participating storage Using standards No strange APIs, everything looks familiar Global direct access to global data 3
The basic idea We see this All the metadata interactions are hidden NO persistency needed here, just efficiency and parallelism Aggregation /dir1 /dir1/file1 /dir1/file2 /dir1/file3 With 2 replicas Storage/MD endpoint 1 Storage/MD endpoint 2 /dir1/file1 /dir1/file2 /dir1/file2 /dir1/file3 11
Dynamic HTTP Federations Federation Simplicity, redundancy, storage/network efficiency, elasticity, performance Dynamic: does everything on the fly, no DB Focus on HTTP/DAV Standard clients everywhere One protocol for everything (WAN/LAN) Transparent redirection Use cases Easy, direct job/user data access, WAN friendly Access missing files after job starts Friend sites can share storage Cache integration (future) 2
What is federated? We federate (meta)data repositories that are compatible HTTP interface Name space (modulo simple prefixes) Including catalogues Permissions (they don t contradict across sites) Content (same key or filename means same file [modulo translations]) Dynamically and transparently discovering metadata looks like a unique, very fast file metadata system properly presenting the aggregated metadata views redirecting clients to the geographically closest endpoint Local SE is preferred The system also can load a Geo plugin 4
What is federated? Technically TODAY we can aggregate: SEs with DAV/HTTP interfaces dcache, DPM Future: Xrootd? EOS? Storm? Catalogues with DAV/HTTP interfaces LFC supported Future: Experiment catalogues could be integrated Cloud DAV/HTTP/S3 services Anything else that happens to have an HTTP interface Caches Native LFC and DPM databases 5
Why HTTP/DAV? It s everywhere A very widely adopted technology It has the right features Redirection, WAN friendly Convergence Transfers and data access No other protocols required We (humans) like browsers, they give an experience of simplicity Open to direct access and integrated web apps 6
DPM/HTTP DPM has invested significantly in HTTP as part of the EMI project New HTTP/DAV interface Parallel WAN transfers 3rd party copy Solutions for replica fallback Global access and metalink Performance evaluations Experiment analyses Hammercloud Synthetic tests Root tests 7
Demo We have set up a stable demo testbed, using HTTP/DAV Head node in DESY: http://federation.desy.de/myfed/ a DPM instance at CERN a DPM instance at ASGC (Taiwan) a dcache instance in DESY a Cloud storage account by Deutsche Telecom The feeling it gives is surprising Metadata performance is in avg higher than contacting the endpoints We see the directories as merged, as it was only one system There s one test file in 3 sites, i.e. 3 replicas. /myfed/atlas/fabrizio/hand-shake.jpg Clients in EU get the one from DESY/DT/CERN Clients in Asia get the one from ASGC There s a directory whose content is interleaved between CERN and DESY http://federation.desy.de/myfed/dteam/ugrtest/interleaved/ There s a directory where all the files are in two places http://federation.desy.de/myfed/dteam/ugrtest/all/ 10
Example Client Frontend (Apache2+DMLite) Aggregator (UGR) Plugin DMLite Plugin DAV/HTTP Plugin HTTP LFC or DB LFC SE SE SE SE SE SE SE SE Plain DAV/HTTP Plain DAV/HTTP 18 Sept 2012 F.Furano - Dynamic federations 1
Design and performance Full parallelism Composes on the fly the aggregated metadata views by managing parallel tasks of information location Never stacks up latencies! The endpoints are treated in a completely independent way No global locks/serialisations! Thread pools, prod/consumer queues used extensively (e.g. to stat N items in M endpoints while X clients wait for some items) Aggressive metadata caching The metadata caching keeps the performance high Peak raw cache performance is ~500K->1M hits/s per core A relaxed, hash-based, in-memory partial name space Juggles info in order to always contain what s needed Keep them in an LRU fashion and we have a fast 1st level namespace cache Stalls clients the minimum time that is necessary to juggle their information bits 15
Server architecture Clients come and are distributed through: different machines (DNS alias) different processes (Apache config) Clients are served by the UGR. They can browse/stat or be redirected for action. The architecture is multi/manycore friendly and uses a fast parallel caching scheme 13
Name translation A sophisticated scheme of name translation is a key to be able to federate almost any source of metadata UGR implements algorithmic translations and can accommodate non algorithmic ones as well A plugin could also query an external service (e.g. an LFC or a private DB) 14
Design and performance Horizontally scalable deployment Multithreaded DNS balanceable High performance DAV client implementation Wraps DAV calls into a POSIX-like API, saves from the difficulty of composing requests/responses Performance is privileged: uses libneon w/ sessions caching Compound list/stat operations are supported Loaded by the core as a location plugin 16
A performance test Two endpoints: DESY and CERN (poor VM) One UGR frontend at DESY Swarm of test clients at CERN 10K files in a 4-levels deep directory Files exist on both endpoints The test (written in C++) invokes Stat only once per file, using many parallel clients doing stat() at the maximum pace from 3 machines 17
The result, WAN access 18
Another test, LAN, Cache impact 18
Another test, LAN, access patterns 18
Get started Get it here: https://svnweb.cern.ch/trac/lcgdm/wiki/dynafed s What you can do with it: Easy, direct job/user data access, WAN friendly Access missing files after job starts Friend sites can share storage Diskless sites Federating catalogues Combining catalogue-based and catalogue-free data 19
Next steps Release our beta, as the nightlies are good More massive tests, with many endpoints, possibly distant We are now looking for partners Precise performance measurements Refine the handling of the death of the endpoints Immediate sensing of changes in the endpoints content, e.g. add, delete SEMsg in EMI2 SYNCAT would be the right thing in the right place Some more practical experience (getting used to the idea, using SQUIDs, CVMFS, EOS, clouds,... <put your item here> ) 21
References Wiki page and packages https://svnweb.cern.ch/trac/lcgdm/wiki/dynafeds CHEP papers Federation http://cdsweb.cern.ch/record/1460525?ln=en DPM & dmlite https://cdsweb.cern.ch/record/1458022?ln=en HTTP/dav https://cdsweb.cern.ch/record/1457962?ln=en 23
Conclusions Dynamic Federations: an efficient, persistencyfree, easily manageable approach to federate remote storage endpoints HTTP, standard, WAN and cloud friendly Interoperating with and augmenting the xrootd ones is desirable and productive Work in progress, status is very advanced, demoable, installable, documented. 24
Thank you Questions? Partially funded by EMI is partially funded by the European Commission under Grant Agreement INFSO-RI-261611 25