Grid Data Management

Similar documents
Scientific data management

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

Data Transfers Between LHC Grid Sites Dorian Kcira

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

Operating the Distributed NDGF Tier-1

Lessons Learned in the NorduGrid Federation

PoS(EGICF12-EMITC2)106

Data transfer over the wide area network with a large round trip time

Introduction to SRM. Riccardo Zappi 1

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

CernVM-FS beyond LHC computing

Grid services. Enabling Grids for E-sciencE. Dusan Vudragovic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

Grid Computing at the IIHE

Data Storage. Paul Millar dcache

glite Grid Services Overview

CMS Computing Model with Focus on German Tier1 Activities

Stephen J. Gowdy (CERN) 12 th September 2012 XLDB Conference FINDING THE HIGGS IN THE HAYSTACK(S)

WHEN the Large Hadron Collider (LHC) begins operation

CouchDB-based system for data management in a Grid environment Implementation and Experience

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

AGIS: The ATLAS Grid Information System

Outline. ASP 2012 Grid School

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Understanding StoRM: from introduction to internals

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Architecture Proposal

Extended Resource Specification Language Reference Manual for ARC versions 0.8 and above

Data Management. Enabling Grids for E-sciencE. Vladimir Slavnic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

High Throughput WAN Data Transfer with Hadoop-based Storage

Extended Resource Specification Language Reference Manual for ARC versions 0.8 and above

Benchmarking third-party-transfer protocols with the FTS

Data Management for the World s Largest Machine

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

EGI-InSPIRE. ARC-CE IPv6 TESTBED. Barbara Krašovec, Jure Kranjc ARNES. EGI-InSPIRE RI

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The CMS Computing Model

ATLAS Experiment and GCE

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

The glite middleware. Ariel Garcia KIT

Virtualizing a Batch. University Grid Center

The LHC Computing Grid

A data Grid testbed environment in Gigabit WAN with HPSS

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

High Performance Computing Course Notes Grid Computing I

File Access Optimization with the Lustre Filesystem at Florida CMS T2

RADU POPESCU IMPROVING THE WRITE SCALABILITY OF THE CERNVM FILE SYSTEM WITH ERLANG/OTP

Distributing storage of LHC data - in the nordic countries

From raw data to new fundamental particles: The data management lifecycle at the Large Hadron Collider

IEPSAS-Kosice: experiences in running LCG site

Grid Architectural Models

PoS(ACAT2010)039. First sights on a non-grid end-user analysis model on Grid Infrastructure. Roberto Santinelli. Fabrizio Furano.

ARC integration for CMS

ARC middleware. The NorduGrid Collaboration

High Energy Physics data analysis

Department of Physics & Astronomy

Data Quality Monitoring at CMS with Machine Learning

Experience of Data Grid simulation packages using.

The ATLAS PanDA Pilot in Operation

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Access the power of Grid with Eclipse

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

Travelling securely on the Grid to the origin of the Universe

Evaluation of the computing resources required for a Nordic research exploitation of the LHC

Monte Carlo Production on the Grid by the H1 Collaboration

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Distributed Data Management on the Grid. Mario Lassnig

ISTITUTO NAZIONALE DI FISICA NUCLEARE

A scalable storage element and its usage in HEP

高能物理分布式计算技术 张晓梅

Data Management for Distributed Scientific Collaborations Using a Rule Engine

The Grid: Processing the Data from the World s Largest Scientific Machine

Grid Data Management

Philippe Charpentier PH Department CERN, Geneva

Unified storage systems for distributed Tier-2 centres

Università degli Studi di Ferrara

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Overview. Grid vision Grid application domains The role of CERN in the Grid research Grid Architecture Standards and related activities Summary

CERN s Business Computing

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Using Hadoop File System and MapReduce in a small/medium Grid site

Grid Computing. Olivier Dadoun LAL, Orsay. Introduction & Parachute method. Socle 2006 Clermont-Ferrand Orsay)

Introduction to Grid Infrastructures

Grid Computing: dealing with GB/s dataflows

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Overview. About CERN 2 / 11

Beob Kyun KIM, Christophe BONNAUD {kyun, NSDC / KISTI

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

DESY. Andreas Gellrich DESY DESY,

Overview of HEP software & LCG from the openlab perspective

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

Data Access and Data Management

CSCS CERN videoconference CFD applications

HEP Grid Activities in China

Grid Infrastructure For Collaborative High Performance Scientific Computing

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Andrea Sciabà CERN, Switzerland

Transcription:

Grid Data Management Week #4 Hardi Teder hardi@eenet.ee University of Tartu March 6th 2013

Overview Grid Data Management Where the Data comes from? Grid Data Management tools 2/33

Grid foundations 3/33

Where the data comes from? CERN's LHC CMS experiment example CERN European Organization for Nuclear Research LHC Large Hadron Collider CMS Compact Muon Solenoid 4/33

Grid acronyms EGI Glossary http://www.egi.eu/about/glossary/ Goole search helps EGI Security Policy Glossary of Terms https://documents.egi.eu/public/showdocument?docid=71 5/33

Large Hadron Collider (LHC) 6/33

Smash things together, see what happens! 7/33

Discover particles Quarks Leptons Leptons Quarks up charm electron muon electron neutrino muon neutrino tau top down strange bottom tau neutrino 8/33

Large Hadron Collider (LHC) 9/33

CMS detector Took ~2000 scientists and engineers more than 20 years to design and build Is about 15 metres wide and 21.5 metres long Weighs twice as much as the Eiffel Tower about 14000t Uses the largest, most powerful magnet of its kind ever made 10/33

11/33

12/33

Collisions in CMS 13/33

CMS in production volume: ~250 TB/day among dozens of Tiers # files: ~19M logical files (but total of replicas so far is ~27M) throughput: 2-2.5 GB/s aggregate (weekly averages) in peak weeks in 2012 14/33

Worldwide LHC Computing Grid (WLCG) Tier0 at CERN 11 Tier1 sites 138 Tier2 sites 15/33

WLCG 15 Petabytes of data annually generated 16/33

There are more projects DNA experiments Radio telescopes Sensor networks Digitalizing data: books, documents, images 17/33

Grid foundations 18/33

Data management Data access and transfer Simple, automatic multi-protocol file transfer tools: Integrated with Resource Management service Move data from local machine to remote machine, where the job is executed (input file staging) Move the output files from the remote computer to the local machines (output file staging) Pull executable from a remote location To have a secure, high-performance, reliable file transfer over modern WANs: GridFTP Data replication and management 19/33

ARC Computing Element (CE) Universal frontend for different batch systems Standard and custom interfaces Status information publishing File handling 20/33

ARC CE and data handling Data are moved by the users and/or by the ARC Frequently used files are cached at the execution sites Cached files are indexed 21/33

ARC CE internals All services are only in the frontend Grid users are mapped to local identities Use /tmp/user for files witch are actively used 22/33

ARC UI data manipulation arcls to list contents and view some attributes of objects of a specified (by a URL) remote directory arccp a tool to copy files over the Grid arcrm allows users to erase files and directories at any location specified by a valid URL arcmkdir allows users to create directories, if the protocol of the specified URL supports it 23/33

ARC URLs ftp ordinary File Transfer Protocol (FTP) gsiftp GridFTP, the Globus - enhanced FTP protocol with security, encryption, etc. developed by The Globus Alliance http ordinary Hyper-Text Transfer Protocol (HTTP) with PUT and GET methods using multiple streams https HTTP with SSL v3 httpg HTTP with Globus GSI ldap ordinary Lightweight Data Access Protocol (LDAP) [9] lfc LFC catalog and indexing service of glite [1] srm Storage Resource Manager (SRM) service [7] root Xrootd protocol (read-only, available in ARC 2.0.0 and later) file local to the host le name with a full path 24/33

An URL can be used: In standard form: protocol://[host[:port]]/file Or, to enhance the performance protocol://[host[:port]][;option[;option[...]]]/file protocol://[url[ url[...]]@]host[:port][;option[;option[...]]] /lfn[:metadataoption[:metadataoption[...]]] protocol://[;commonoption[;commonoption] ][url[ url[...]]@]host[:port [;option[;option[...]]/lfn[:metadataoption[:metadataoption[...]]] 25/33

URL examples ARC UI arcls lfc://lfc.balticgrid.org/grid/balticgrid/bgcc2013/lab4/ arcls -l gsiftp://se.grid.eenet.ee/storage/balticgrid/bgcc2013 XRSL to store the job output to storage (optputfiles=("jobhugeoutputfile.tgz" "gsiftp://se.grid.eenet.ee/storage/balticgrid/bgcc2013/user/")) 26/33

GridFTP The GSIFTP protocol offers the functionalities of FTP, but with support for GSI. Supported by all VOs in Gird arccp gsiftp://lscf.nbi.dk:2811/jobs/1323842831451666535/jo b.out job.out

File Catalogue (LFC) Users and applications need to locate files (or replicas) on the Grid. The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s). lfc://lfc.balticgrid.org/grid/balticgrid/bgcc2013/lab4/p 4_data.test Lfc:P4_data.test

Relationships between tables 29/33

LFC environment!/bin/bash export LCG_GFAL_INFOSYS=bdii.balticgrid.org:2170 export LCG_CATALOG_TYPE=lfc export LFC_HOST=lfc.balticgrid.org echo -e 'Prindin muutujaid: LCG_GFAL_INFOSYS; LCG_CATALOG_TYPE; LFC_HOST \n' echo $LCG_GFAL_INFOSYS; echo $LCG_CATALOG_TYPE; echo $LFC_HOST export LFC_HOME=/grid/balticgrid/BGCC2012/Hardi_Teder

Clean up after yourself Delete the files you don't use any more 31/33

References I used several pictures from: CMS experiment public presentations: NorduGrid repository http://svn.nordugrid.org/trac/nordugrid/browser/doc/trunk/figures FREEIMAGES.co.uk http://cms.web.cern.ch/org/cms-presentations-public www.freeimages.co.uk More information about ARC Data Management: http://www.nordugrid.org/papers.html 32/33

Thank you More information from: Hardi Teder hardi@eenet.ee http://courses.cs.ut.ee/2013/cloud ati.gtla@lists.ut.ee 33/33