Analysis of internal network requirements for the distributed Nordic Tier-1

Similar documents
A distributed tier-1. International Conference on Computing in High Energy and Nuclear Physics (CHEP 07) IOP Publishing. c 2008 IOP Publishing Ltd 1

Operating the Distributed NDGF Tier-1

Interoperating AliEn and ARC for a distributed Tier1 in the Nordic countries.

Future trends in distributed infrastructures the Nordic Tier-1 example

ARC integration for CMS

Monitoring ARC services with GangliARC

Distributing storage of LHC data - in the nordic countries

Overview of ATLAS PanDA Workload Management

File Access Optimization with the Lustre Filesystem at Florida CMS T2

Lessons Learned in the NorduGrid Federation

Monitoring System for the GRID Monte Carlo Mass Production in the H1 Experiment at DESY

ATLAS operations in the GridKa T1/T2 Cloud

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

DIRAC pilot framework and the DIRAC Workload Management System

Large scale commissioning and operational experience with tier-2 to tier-2 data transfer links in CMS

Data Management for the World s Largest Machine

Monte Carlo Production on the Grid by the H1 Collaboration

Towards sustainability: An interoperability outline for a Regional ARC based infrastructure in the WLCG and EGEE infrastructures

Data transfer over the wide area network with a large round trip time

Performance of the NorduGrid ARC and the Dulcinea Executor in ATLAS Data Challenge 2

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Use of containerisation as an alternative to full virtualisation in grid environments.

Andrea Sciabà CERN, Switzerland

CMS users data management service integration and first experiences with its NoSQL data storage

Availability measurement of grid services from the perspective of a scientific computing centre

Grid Computing Activities at KIT

Pan-European Grid einfrastructure for LHC Experiments at CERN - SCL's Activities in EGEE

ATLAS software configuration and build tool optimisation

arxiv:cs.dc/ v2 14 May 2002

Austrian Federated WLCG Tier-2

Scheduling of Multiple Applications in Wireless Sensor Networks Using Knowledge of Applications and Network

Experience with ATLAS MySQL PanDA database service

Experience with PROOF-Lite in ATLAS data analysis

An Analysis of Storage Interface Usages at a Large, MultiExperiment Tier 1

Usage statistics and usage patterns on the NorduGrid: Analyzing the logging information collected on one of the largest production Grids of the world

CMS High Level Trigger Timing Measurements

The ATLAS Tier-3 in Geneva and the Trigger Development Facility

The INFN Tier1. 1. INFN-CNAF, Italy

Performance of popular open source databases for HEP related computing problems

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

Scientific data management

D0 Grid: CCIN2P3 at Lyon

Popularity Prediction Tool for ATLAS Distributed Data Management

Data preservation for the HERA experiments at DESY using dcache technology

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Evaluation of Apache Hadoop for parallel data analysis with ROOT

The Grid Monitor. Usage and installation manual. Oxana Smirnova

WebSphere Application Server Base Performance

Improved ATLAS HammerCloud Monitoring for Local Site Administration

Empowering a Flexible Application Portal with a SOA-based Grid Job Management Framework

Status of KISTI Tier2 Center for ALICE

IEPSAS-Kosice: experiences in running LCG site

Reliability Engineering Analysis of ATLAS Data Reprocessing Campaigns

Evolution of Database Replication Technologies for WLCG

The LHC Computing Grid

ISTITUTO NAZIONALE DI FISICA NUCLEARE

The Legnaro-Padova distributed Tier-2: challenges and results

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

High Throughput WAN Data Transfer with Hadoop-based Storage

Current Status of the Ceph Based Storage Systems at the RACF

Computing / The DESY Grid Center

EMC DATA DOMAIN OPERATING SYSTEM

Tier 3 batch system data locality via managed caches

Scalability and Performance Improvements in the Fermilab Mass Storage System

Computing for LHC in Germany

Random Neural Networks for the Adaptive Control of Packet Networks

White Paper Broadband Multimedia Servers for IPTV Design options with ATCA

Scalable Computing: Practice and Experience Volume 10, Number 4, pp

Atlas Data-Challenge 1 on NorduGrid

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Prague Tier-2 storage experience

ATLAS Tracking Detector Upgrade studies using the Fast Simulation Engine

Tier2 Centre in Prague

The High-Level Dataset-based Data Transfer System in BESDIRAC

Low Latency via Redundancy

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Telecommunication Markets in the Nordic Countries. - Per

The HP 3PAR Get Virtual Guarantee Program

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Accelerate Big Data Insights

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

Towards Network Awareness in LHC Computing

Best Practices for Setting BIOS Parameters for Performance

Tests of PROOF-on-Demand with ATLAS Prodsys2 and first experience with HTTP federation

Spanish Tier-2. Francisco Matorras (IFCA) Nicanor Colino (CIEMAT) F. Matorras N.Colino, Spain CMS T2,.6 March 2008"

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016

A Simulation Model for Large Scale Distributed Systems

Measuring the power consumption of social media applications on a mobile device

Resource Guide Implementing QoS for WX/WXC Application Acceleration Platforms

Data Analysis in ATLAS. Graeme Stewart with thanks to Attila Krasznahorkay and Johannes Elmsheuser

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

The NORDUnet Next Generation Network Building the future (Nordic) network together

IBM InfoSphere Streams v4.0 Performance Best Practices

Performance Modeling and Evaluation of Web Systems with Proxy Caching

Striped Data Server for Scalable Parallel Data Analysis

The CMS experiment workflows on StoRM based storage at Tier-1 and Tier-2 centers

Security in the CernVM File System and the Frontier Distributed Database Caching System

EGEE and Interoperation

White paper ETERNUS Extreme Cache Performance and Use

Benoit DELAUNAY Benoit DELAUNAY 1

Transcription:

Journal of Physics: Conference Series Analysis of internal network requirements for the distributed Nordic Tier-1 To cite this article: G Behrmann et al 2010 J. Phys.: Conf. Ser. 219 052001 View the article online for updates and enhancements. Related content - A distributed Tier-1 L Fischer, M Grønager, J Kleist et al. - German contributions to the CMS computing infrastructure A Scheurer and the German Cms Community - Optimised access to user analysis data using the glite DPM Sam Skipsey, Greig Cowan, Mike Kenyon et al. This content was downloaded from IP address 46.3.194.16 on 24/01/2018 at 22:22

Analysis of internal network requirements for the distributed Nordic Tier-1 G Behrmann 1, L Fischer 2, M Gamst 3, M Grønager 1 and J Kleist 4 1 NDGF - Nordic DataGrid Facility, Kastruplundgade 22(1), DK-2770 Kastrup 1 NORDUnet, Kastruplundgade 22(1), DK-2770 Kastrup 3 Technical University of Denmark, Department of Management Engineering, Produktionstorvet, DTU - Building 424, DK 2800 Kgs. Lyngby 4 NDGF and Aalborg University, Department of Computer Science, Selma Lagerlöfsvej 300, DK 9220 Aalborg SØ E-mail: behrmann@ndgf.org, lars@nordu.net, gamst@man.dtu.dk, gronager@ndgf.org, kleist@ndgf.org Abstract. The Tier-1 facility operated by the Nordic DataGrid Facility (NDGF) differs significantly from other Tier-1s in several aspects: It is not located at one or a few locations but instead distributed throughout the Nordic, it is not under the governance of a single organisation but but is instead build from resources under the control of a number of different national organisations. Being physically distributed makes the design and implementation of the networking infrastructure a challenge. NDGF has its own internal OPN connecting the sites participating in the distributed Tier-1. To assess the suitability of the network design and the capacity of the links, we present a model of the internal bandwidth needs for the NDGF Tier-1 and its associated Tier-2 sites. The model takes the different type of workloads into account and can handle different kinds of data management strategies. It has already been used to dimension the internal network structure of NDGF. We also compare the model with real life data measurements. 1. Introduction Dimensioning the network for a Tier-1 is always a challenge, but in the case of the Nordic Tier-1 operated by NDGF, there is the extra challenge that the Tier-1 is distributed. The NDGF Tier-1 consists of the 7 biggest Nordic compute centers, dtier-1s, with associated Tier-2 resources as far away as Slovenia. Resources (storage and computing) are widely scattered with a few central services. This give a lot of advantages in redundancy, especially for 24x7 data taking as reported in [1]. Figure 1 shows the centers participating in the Nordic countries including the resources they have available as of Q2 2009. NDGF has uses an internal OPN between all dtier-1 sites and the Slovenian Tier-2, other Tier-2 sites are connected via the national research network. Figure 2 shows the how NDGF sites are interconnected with special emphasis on Sweden; red lines depict dedicated private network lines and magenta lines depict public lines. As it can be seen from the figure, the main network infrastructure forms a star. This give us the advantage that it is easy to calculate the load on the links from the central NDGF router to each country; one just need to consider the resources in a country as a single site. c 2010 IOP Publishing Ltd 1

17th International Conference on Computing in High Energy and Nuclear Physics (CHEP09) IOP Publishing Figure 1. NDGF Distributed Storage and Computational setup. Figure 2. NDGF Network Layout. 2. Calculating network need The aim of with the model is to get a estimate of what the bandwidth requirements is. For links within the Tier-1s we are interested in knowing if the 10 Gbps dedicated lines are sufficient and 2

for the public links to the Tier-2s we are first of all interested in knowing if the load can be carried without disturbing other uses of the network. We will try and construct a model, where the network load is driven by the amount of available compute power, assuming that there is enough storge available in the system. 2.1. Basic assumptions Below we present that basic assumptions in the model. All worker nodes are occupied up to their efficiency, i.e. we assume that there is enough jobs and that all data is available in the system. Data is randomly and uniformly distributed over all storage sites, i.e. the chance that a piece of data is available at a specific storage site is proportional to the size of the storage site in relation to the total amount of storage in the system. The characteristics of different job types is know, i.e. we know in advance how much data a job consumes and generates and we know how many CPU resources are required for each job type. The job mix at a site is known in advance, i.e. we know how many percent a specific job type spends of the available compute time. Jobs are spread temporally uniformly a site. This means that we don t have bursts of a specific job type. Traffic flows directly between the compute site where a job is executed and the storage site. No intermediate servers are involved. This in fact the case for the NDGF setup. The caching mechanism in the ARC grid middleware [2] is not taken into account. The ARC middleware employed by NDGF for its ATLAS computations includes a caching mechanism, that can quite significantly reduce network traffic. Modelling the impact of the caching mechanism is rather difficult without any imperial evidence on what effect the cache has on different job types. Later in this paper we will discuss what impact changes to those assumptions will have on the model. 2.2. Site characteristics For a site s the we assume to know the following characteristics: Amount of tape installed: T s. Amount of disk installed: D s. Tier-i compute resources: C s i. Tier-i efficiency: e s i [0; 1]. Some NDGF sites act as both Tier-1 and 2 centers, this is why we allow a site to have a number of compute resources and efficiencies. We let T and D denote the total amount of tape and disk in the system respectively. 2.3. Job characteristics Just as for sites we need to know the characteristics of the jobs that are executed. NDGF runs a number of different jobs types, for each job j we assume to know: Amount of CPU seconds to run a job:. Amount of data read from disk while executing: DI j. Amount of data read from tape while executing: T I j. 3

Amount of data written to disk while executing: DO j. Amount of data written to tape while executing: T O j. Each site runs a special jobmix - i.e. a job type j is supposed to occupy a certain fraction of the available compute time. We let J denote the total set of job types in the system. Let f s i j denote the fraction for job type j on the Tier-i resources at site s (we assume that j J f s i j = 1). We can now calculate the amount of data that needs to be read from and written to disk and tape at a site s to keep its worker nodes occupied. Each job j needs DI j data from disk and runs for CPU seconds (this could be any general measurement of CPU performance like KSI2K or HEP2006). Therefore DI J /R J denotes the amount of data a job j requires per CPU second. We then just need to multiply with the amount of CPU resources available to that job type; for a Tier-i resource this number is given by f s i C s i, i.e. the fraction of the resource that runs jobs of type j times the efficiency times the total amount of computational resources. DI s C = j J i {1..2} f s i C s i DI j Similarly we can calculate values for tape read T I s, disk write DO s and tape write T O s : T I s C = j J DO s C = j J T O s C = j J i {1..2} f s i C s i T I j i {1..2} f s i C s i DO j i {1..2} f s i C s i T O j 2.4. Bandwidth requirements At a site s, parts of the data can be read and written locally. If we assume a uniform distribution of data over the sites, the part that can be read and is written locally corresponds to the fraction of storage available at s in relation to the total amount of storage in the systems. BIC s = DIC s D D s D + T T T s Is C T and similar for output. Furthermore, other sites will read and write to the disk and tape systems at a site s. Again the amount corresponds to the relation between the installed disk and tape capacity at s and the total installed capacity. BI s O = t S\{s} ( DOC t D s D + T Ot C Finally, we need to take traffic external to NDGF into account. Again the same arguments as before applies, and the traffic to a site is the fraction of the total traffic BI E that corresponds fraction of the total storage that a sites has. BI s E = BI E D s D T s ) T And the input bandwidth requirements for a site s becomes BI s = BI s c C + BI S O + BIs E. For bandwidth out of a site, we derive a similar formula 4

BO s = DOC s D D s D + T T T s Os C + ( DI t D s C T D + T It C t S\{s} T s ) D s + BO E T D The bandwidth requirement for a site is then the maximum of BI s and BO s. This model will as output give us the average network throughput at a site if all compute resources are occupied up to their efficiency with a random mix of jobs. It will not take burst into account, neither will it include any overhead caused by transport protocols. 3. Main Results Table 1 shows the outcome of using the model on the current NDGF setup. In Appendix A we list the values for the job types as well as the fractions expected each job type should take up on a Tier-1/2 site respectively. As it can be seen from Table 1 are most of the results within what we can expect to carry over a 10 Gbps line, the only worrying link is the link between the NDGF central router and Sweden. For the Swedish link a more carefull analysis based on real life measurenment will need to be carried out. Site/Country DCSC/KU Denmark CSC Jyv Finland UiB UiO Norway HPC2N LUNARC PDC NSC UPPMAX Sweden PIKOLIT Slovenia Network load 0.8 Gbps 0.8 Gbps 0.5 Gbps 0.0 Gbps 0.5 Gbps 0.8 Gbps 0.7 Gbps 1.3 Gbps 1.9 Gbps 0.3 Gbps 1.2 Gbps 1.0 Gbps 0.3 Gbps 3.8 Gbps 0.6 Gbps 0.6 Gbps Table 1. Network load between countries central routers and the NDGF main router. 4. Discussion The results presented here can only be considered as a first approximation at what kind of network load NDGF can expect internally. Some of the assumptions behind the model can rightfully be criticised for being too simple. Especially the assumption on the uniform distribution of job types over time, is questionable. One way to deal with that assumption would be to only consider the job type that causes the highest network load. This would make the model a better fit for worst case loads. However, it will be more important to to take the caching mechanism of ARC into account, as this mechanism has been reported to have a significant impact on how many times popular files are downloaded to a site. In order to extend the model to take caching into account, a more in-depth analysis of the caching mechanism of ARC need to be performed first. 5

At the time of writing it is not yet possible to compare the complete model to real world measurements. However, a few observations can be made: i) the limiting factor is a the moment not lack of network bandwith, it is in cases observed so far caused by lack of bandwidth between the compute element and the network. This issue is solvalable by upgrading the hardware at sites. ii) the assumption about a uniform mix of jobs does not hold. We observe that jobs of certain types comes in bursts. iii) The ARC caching mechanism has a dramatic effect on the amount of data transferred. This has been observered by correlating the load on the dcache system with the amount of data needed to carry out computations. In cases where data are cached, the load on dcache decreases. We are currently investigating the deployment of the monitoring services needed to perform better measurements of the used bandwidth and to have the necessary information to test the validity of our model. References [1] Field L, Grønager M, Johansson D and Kleist J Proceedings of CHEP 09 (Submitted for publication) [2] Ellert M, Grønager M, Konstantinov A, Kónya B, Lindemann J, Livenson I, Nielsen J, Niinimäki M, Smirnova O and Wä nänen A 2007 Future Generation Computer Systems 23 219 240 Appendix A. Job type descriptions The table presented here forms the basis for the calculation of the percentage of time that is spend on each job type at a site. The numbers cannot be used directly as is, since not every site support the same VOs. Job name Tier-1 Tier-2 Run time Disk in Disk out Tape in Tape out ALICE analysis 20% 50% 1 1000 10 0 0 ALICE recon 40% 0% 5 10 100 1000 0 ALICE MC 40% 50% 15 10 10000 0 0 ATLAS analysis 20% 50% 1 100 100 0 0 ATLAS recon 40% 0% 1 10 100 1000 0 ATLAS MC 40% 50% 12 100 500 0 0 CMS analysis 20% 50% 1 100 100 0 0 CMS recon 40% 0% 2 100 100 2000 0 CMS MC 40% 50% 12 100 500 0 0 6