Experience of Data Grid simulation packages using.

Similar documents
Introduction to SRM. Riccardo Zappi 1

Department of Physics & Astronomy

Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

RDMS CMS Computing Activities before the LHC start

The LHC Computing Grid

RUSSIAN DATA INTENSIVE GRID (RDIG): CURRENT STATUS AND PERSPECTIVES TOWARD NATIONAL GRID INITIATIVE

Distributed Data Management on the Grid. Mario Lassnig

The grid for LHC Data Analysis

The JINR Tier1 Site Simulation for Research and Development Purposes

Database monitoring and service validation. Dirk Duellmann CERN IT/PSS and 3D

Status of KISTI Tier2 Center for ALICE

Federated data storage system prototype for LHC experiments and data intensive science

Federated Data Storage System Prototype based on dcache

DIRAC data management: consistency, integrity and coherence of data

Data Storage. Paul Millar dcache

Workload Management. Stefano Lacaprara. CMS Physics Week, FNAL, 12/16 April Department of Physics INFN and University of Padova

Operating the Distributed NDGF Tier-1

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

A Simulation Model for Large Scale Distributed Systems

CHIPP Phoenix Cluster Inauguration

Data Management. Enabling Grids for E-sciencE. Vladimir Slavnic Scientific Computing Laboratory Institute of Physics Belgrade, Serbia

Data transfer over the wide area network with a large round trip time

GRID Activity at Russia and JINR

The Grid: Processing the Data from the World s Largest Scientific Machine

LHCb Distributed Conditions Database

Scientific data management

glite Grid Services Overview

The Grid. Processing the Data from the World s Largest Scientific Machine II Brazilian LHC Computing Workshop

Data Management 1. Grid data management. Different sources of data. Sensors Analytic equipment Measurement tools and devices

Tier-2 DESY Volker Gülzow, Peter Wegner

HEP replica management

Future Generation Computer Systems. PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids

HOME GOLE Status. David Foster CERN Hawaii GLIF January CERN - IT Department CH-1211 Genève 23 Switzerland

Worldwide Production Distributed Data Management at the LHC. Brian Bockelman MSST 2010, 4 May 2010

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

Challenges of the LHC Computing Grid by the CMS experiment

CERN network overview. Ryszard Jurga

WLCG Network Throughput WG

First Experience with LCG. Board of Sponsors 3 rd April 2009

Philippe Charpentier PH Department CERN, Geneva

Grid Data Management

I Service Challenge e l'implementazione dell'architettura a Tier in WLCG per il calcolo nell'era LHC

A scalable storage element and its usage in HEP

2012 STANDART YEAR CERN Tier0 CAF Tier 1 Total Tier1 Tier1ex Tier2ex Total ex Total CPU (MSI2K)

Understanding StoRM: from introduction to internals

150 million sensors deliver data. 40 million times per second

Data Transfers Between LHC Grid Sites Dorian Kcira

EU DataGRID testbed management and support at CERN

High Performance Computing Course Notes Grid Computing I

The EU DataGrid Testbed

The glite middleware. Ariel Garcia KIT

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

( PROPOSAL ) THE AGATA GRID COMPUTING MODEL FOR DATA MANAGEMENT AND DATA PROCESSING. version 0.6. July 2010 Revised January 2011

A short introduction to the Worldwide LHC Computing Grid. Maarten Litmaath (CERN)

Andrea Sciabà CERN, Switzerland

DataGRID. Lead Partner: Document status:

A Toolkit for Modelling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis

The European DataGRID Production Testbed

University of Castilla-La Mancha

ANSE: Advanced Network Services for [LHC] Experiments

ATLAS DQ2 to Rucio renaming infrastructure

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

RESEARCH NETWORKS & THEIR ROLE IN e-infrastructures

Benchmarking third-party-transfer protocols with the FTS

Distributed Computing Grid Experiences in CMS Data Challenge

Simulation model and instrument to evaluate replication technologies

Future Developments in the EU DataGrid

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

PoS(EGICF12-EMITC2)106

Monte Carlo Production on the Grid by the H1 Collaboration

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

CERN and Scientific Computing

HEP Grid Activities in China

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Grid Architectural Models

SAM at CCIN2P3 configuration issues

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

Grid Challenges and Experience

Storage and I/O requirements of the LHC experiments

Challenges and Evolution of the LHC Production Grid. April 13, 2011 Ian Fisk

WHEN the Large Hadron Collider (LHC) begins operation

High Performance Computing on MapReduce Programming Framework

Replication and scheduling Methods Based on Prediction in Data Grid

The DESY Grid Testbed

Bookkeeping and submission tools prototype. L. Tomassetti on behalf of distributed computing group

Overview of HEP software & LCG from the openlab perspective

HTC/HPC Russia-EC. V. Ilyin NRC Kurchatov Institite Moscow State University

Experiences in testing a Grid service in a production environment

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Currently, Grid computing has emerged as a

GRID COMPUTING APPLIED TO OFF-LINE AGATA DATA PROCESSING. 2nd EGAN School, December 2012, GSI Darmstadt, Germany

Sphinx: A Scheduling Middleware for Data Intensive Applications on a Grid

GridNEWS: A distributed Grid platform for efficient storage, annotating, indexing and searching of large audiovisual news content

LCG MCDB Knowledge Base of Monte Carlo Simulated Events

Investigation on Oracle GoldenGate Veridata for Data Consistency in WLCG Distributed Database Environment

LCG-2 and glite Architecture and components

ALHAD G. APTE, BARC 2nd GARUDA PARTNERS MEET ON 15th & 16th SEPT. 2006

where the Web was born Experience of Adding New Architectures to the LCG Production Environment

Transcription:

Experience of Data Grid simulation packages using. Nechaevskiy A.V. (SINP MSU), Korenkov V.V. (LIT JINR) Dubna, 2008

Contant Operation of LCG DataGrid Errors of FTS services of the Grid. Primary goals of the Grid simulation systems. The OptorSim and the GridSim simulators. Results of the LCG DataGrid simulation with the OptorSim. Tier- 2s and Tier-1s are inter- connected by the general Grid solution for LHC experiments Any may access data at any Tier- 1 BNL Nordic IN2P3 purpose research networks GridKa TRIUMF ASCC FNAL CNAF SARA PIC RAL

LHC experiments support Errors description are used in FTS monitoring: Scope source s error (SOURCE source site, DESTINATION destination site, TRANSFER during transfer). Category an error class (FILE-EXIST, NO-SPACE-LEFT, TRANSFER-TOMEOUT etc.). Phase a stage in transfer life cycle on which there was an error (ALLOCATION, TRANSFER-PREPARATION, TRANSFER, etc.). Message the detailed description of an error. We have a list from more than 400 various patterns which changes in time. Main faults have been allocated for the monitoring time: timeouts, the program errors, specific errors of applications and an users errors. SOURCE during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out The server sent an error response: 425 425 Can't open data connection. timed out() failed DESTINATION during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [srm]. Givin' up after 3 tries Error s details description: https://twiki.cern.ch/twiki/bin/view/lcg/transferoperationspopularerrors

The primary goals solved by DataGrid simulation tools Grid simulators: SimGrid OptorSim GridSim Simulation allows to make various experiments of investigated object; Simulation allows to predict and prevent a number of unexpected situations; Simulation makes it possible to define equipment for data transfers and data storage in a minimum variation for providing requirements of the project; Simulation also gives possibilities to check the system work to define its "bottlenecks" and many other possibilities.

Requirements for grid simulator It is obvious that a simulator must include: simulation of operation of DataGrid s basic elements (data storage elements (SE), resource brokers (RB), replica catalogs (RC), network, users, sites); simulation time has to be much less then a time of real work of DataGrid; different kind of statistics is needed (for example, volume of data transfers, throughput, etc.); simulation of failures of the equipment is necessary and also results of the simulation should be comparable to a real situation.

OptorSim allows to estimate various algorithms of optimisation and replication strategy Implemented in Java Configuration files are used to set simulation s parameters The source code is available OptorSim edg-wp2.web.cern.ch/edgwp2/optimization/optorsim.html

Implementation of the Replica Catalog in the LCG and in the OptorSim LCG: The file catalogue LFC stores the information about all the files and their replicas in the LCG. It is one of the critical services. Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:cms/20030203/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) / Physical FN (PFN) / Site FN (SFN) The location of an actual piece of data on a storage system, e.g. srm://srm.cern.ch/castor/ cern.ch/grid/cms/output10_1 OptorSim: File information is stored in the OptorSim in the Replica Catalogue (same in LCG) Replica Catalogue is a list of mapping of LFN to their physical file names (LFN and PFN in LCG) Replica Manager manages the data replication and registers files in Replica Catalogue (The cataloging of the files is implemented in the LFC) The "best" placement of replica is defined before the transfer. It allows Sites to copy the files from different sources in order to avoid huge loadings of the resources.

OptorSim s - graphic interface The Statistics is available in the table forms, graphics and diagrammes

GridSim GridSim allows to simulate various classes of heterogeneous resources, users, applications and brokers Implemented in Java Configuration files are used to set simulation s parameters The source code is available There is a lot of examples of the GridSim using http://www.gridbus.org/gridsim/

The simulation details CERN-RDIG segment is a part of global LCG structure GEANT2 network are used for the huge data traffic between CERN and RDIG s sites and other participants Routers are also used for foreign traffic and they are represented as background traffic in the simulastion Four RDIG s sites - JINR, SINP (Moscow State University), IHEP, ITEP were considered

Simulation s results It is required 12-14 hours for transfer of 500-700 GB data with 6-12 Mb/s throughputs. This situation is close to a reality The volumes of the data transfers can vary from several Gigabytes to hundreds of Gigabytes per hour but channel s throughputs in the OptorSim are fixed The possibility to simulate various failures of the equipment and the other errors is absent in the OptorSim Throughput of the channel CERN-JINR and quantity of the passed data for 02.02.2008

Conclusion The main errors of the LCG including the FTS errors were considered The simulation toolkits do not provide possibility to simulate various sorts of errors in Grid The simulation of the various sorts of errors in Grid-networks is necessary

Questions?