Experience of Data Grid simulation packages using. Nechaevskiy A.V. (SINP MSU), Korenkov V.V. (LIT JINR) Dubna, 2008
Contant Operation of LCG DataGrid Errors of FTS services of the Grid. Primary goals of the Grid simulation systems. The OptorSim and the GridSim simulators. Results of the LCG DataGrid simulation with the OptorSim. Tier- 2s and Tier-1s are inter- connected by the general Grid solution for LHC experiments Any may access data at any Tier- 1 BNL Nordic IN2P3 purpose research networks GridKa TRIUMF ASCC FNAL CNAF SARA PIC RAL
LHC experiments support Errors description are used in FTS monitoring: Scope source s error (SOURCE source site, DESTINATION destination site, TRANSFER during transfer). Category an error class (FILE-EXIST, NO-SPACE-LEFT, TRANSFER-TOMEOUT etc.). Phase a stage in transfer life cycle on which there was an error (ALLOCATION, TRANSFER-PREPARATION, TRANSFER, etc.). Message the detailed description of an error. We have a list from more than 400 various patterns which changes in time. Main faults have been allocated for the monitoring time: timeouts, the program errors, specific errors of applications and an users errors. SOURCE during PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds TRANSFER during TRANSFER phase: [TRANSFER_TIMEOUT] gridftp_copy_wait: Connection timed out The server sent an error response: 425 425 Can't open data connection. timed out() failed DESTINATION during PREPARATION phase: [CONNECTION] failed to contact on remote SRM [srm]. Givin' up after 3 tries Error s details description: https://twiki.cern.ch/twiki/bin/view/lcg/transferoperationspopularerrors
The primary goals solved by DataGrid simulation tools Grid simulators: SimGrid OptorSim GridSim Simulation allows to make various experiments of investigated object; Simulation allows to predict and prevent a number of unexpected situations; Simulation makes it possible to define equipment for data transfers and data storage in a minimum variation for providing requirements of the project; Simulation also gives possibilities to check the system work to define its "bottlenecks" and many other possibilities.
Requirements for grid simulator It is obvious that a simulator must include: simulation of operation of DataGrid s basic elements (data storage elements (SE), resource brokers (RB), replica catalogs (RC), network, users, sites); simulation time has to be much less then a time of real work of DataGrid; different kind of statistics is needed (for example, volume of data transfers, throughput, etc.); simulation of failures of the equipment is necessary and also results of the simulation should be comparable to a real situation.
OptorSim allows to estimate various algorithms of optimisation and replication strategy Implemented in Java Configuration files are used to set simulation s parameters The source code is available OptorSim edg-wp2.web.cern.ch/edgwp2/optimization/optorsim.html
Implementation of the Replica Catalog in the LCG and in the OptorSim LCG: The file catalogue LFC stores the information about all the files and their replicas in the LCG. It is one of the critical services. Logical File Name (LFN) An alias created by a user to refer to some item of data, e.g. lfn:cms/20030203/run2/track1 Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g. guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Site URL (SURL) / Physical FN (PFN) / Site FN (SFN) The location of an actual piece of data on a storage system, e.g. srm://srm.cern.ch/castor/ cern.ch/grid/cms/output10_1 OptorSim: File information is stored in the OptorSim in the Replica Catalogue (same in LCG) Replica Catalogue is a list of mapping of LFN to their physical file names (LFN and PFN in LCG) Replica Manager manages the data replication and registers files in Replica Catalogue (The cataloging of the files is implemented in the LFC) The "best" placement of replica is defined before the transfer. It allows Sites to copy the files from different sources in order to avoid huge loadings of the resources.
OptorSim s - graphic interface The Statistics is available in the table forms, graphics and diagrammes
GridSim GridSim allows to simulate various classes of heterogeneous resources, users, applications and brokers Implemented in Java Configuration files are used to set simulation s parameters The source code is available There is a lot of examples of the GridSim using http://www.gridbus.org/gridsim/
The simulation details CERN-RDIG segment is a part of global LCG structure GEANT2 network are used for the huge data traffic between CERN and RDIG s sites and other participants Routers are also used for foreign traffic and they are represented as background traffic in the simulastion Four RDIG s sites - JINR, SINP (Moscow State University), IHEP, ITEP were considered
Simulation s results It is required 12-14 hours for transfer of 500-700 GB data with 6-12 Mb/s throughputs. This situation is close to a reality The volumes of the data transfers can vary from several Gigabytes to hundreds of Gigabytes per hour but channel s throughputs in the OptorSim are fixed The possibility to simulate various failures of the equipment and the other errors is absent in the OptorSim Throughput of the channel CERN-JINR and quantity of the passed data for 02.02.2008
Conclusion The main errors of the LCG including the FTS errors were considered The simulation toolkits do not provide possibility to simulate various sorts of errors in Grid The simulation of the various sorts of errors in Grid-networks is necessary
Questions?