A data Grid testbed environment in Gigabit WAN with HPSS

Similar documents
Data transfer over the wide area network with a large round trip time

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

High Throughput WAN Data Transfer with Hadoop-based Storage

Table 9. ASCI Data Storage Requirements

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data storage services at KEK/CRC -- status and plan

Grid Operation at Tokyo Tier-2 Centre for ATLAS

SAM at CCIN2P3 configuration issues

Storage Resource Sharing with CASTOR.

Andrea Sciabà CERN, Switzerland

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Profiling Grid Data Transfer Protocols and Servers. George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Benoit DELAUNAY Benoit DELAUNAY 1

Globus Online and HPSS. KEK, Tsukuba Japan October 16 20, 2017 Guangwei Che

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

Real Parallel Computers

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

1. Introduction. Outline

Storage Supporting DOE Science

Data oriented job submission scheme for the PHENIX user analysis in CCJ

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Cluster Setup and Distributed File System

The CMS Computing Model

Programmable Information Highway (with no Traffic Jams)

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Influence of Distributing a Tier-2 Data Storage on Physics Analysis

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Grid Computing at Ljubljana and Nova Gorica

Distributing storage of LHC data - in the nordic countries

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Xcellis Technical Overview: A deep dive into the latest hardware designed for StorNext 5

Department of Physics & Astronomy

Grid Data Management

A Study on TCP Buffer Management Algorithm for Improvement of Network Performance in Grid Environment

Figure 1: cstcdie Grid Site architecture

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

The INFN Tier1. 1. INFN-CNAF, Italy

Unified storage systems for distributed Tier-2 centres

The creation of a Tier-1 Data Center for the ALICE experiment in the UNAM. Lukas Nellen ICN-UNAM

IEPSAS-Kosice: experiences in running LCG site

Zhengyang Liu University of Virginia. Oct 29, 2012

IBM WebSphere MQ Low Latency Messaging Software Tested With Arista 10 Gigabit Ethernet Switch and Mellanox ConnectX

ISTITUTO NAZIONALE DI FISICA NUCLEARE

Implementation and Analysis of Large Receive Offload in a Virtualized System

Data Transfers Between LHC Grid Sites Dorian Kcira

arxiv:cs.dc/ v2 14 May 2002

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Storage and I/O requirements of the LHC experiments

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

NCP Computing Infrastructure & T2-PK-NCP Site Update. Saqib Haleem National Centre for Physics (NCP), Pakistan

10GE network tests with UDP. Janusz Szuba European XFEL

Introduction to Grid Computing

Austrian Federated WLCG Tier-2

ATLAS NorduGrid related activities

CLOUDS OF JINR, UNIVERSITY OF SOFIA AND INRNE JOIN TOGETHER

Send documentation comments to You must enable FCIP before attempting to configure it on the switch.

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

The LCG 3D Project. Maria Girone, CERN. The 23rd Open Grid Forum - OGF23 4th June 2008, Barcelona. CERN IT Department CH-1211 Genève 23 Switzerland

Lightstreamer. The Streaming-Ajax Revolution. Product Insight

CSE 124: Networked Services Lecture-16

The BABAR Database: Challenges, Trends and Projections

Transitioning NCAR MSS to HPSS

30 Nov Dec Advanced School in High Performance and GRID Computing Concepts and Applications, ICTP, Trieste, Italy

TERAGRID 2007 CONFERENCE, MADISON, WI 1. GridFTP Pipelining

Succeeded: World's fastest 600Gbps per lambda optical. transmission with 587Gbps data transfer

Ultra high-speed transmission technology for wide area data movement

Long Term Data Preservation for CDF at INFN-CNAF

The European DataGRID Production Testbed

Introduction to Ethernet Latency

The EU DataGrid Testbed

Multi-class Applications for Parallel Usage of a Guaranteed Rate and a Scavenger Service

Data Management for the World s Largest Machine

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Evaluation of the Huawei UDS cloud storage system for CERN specific data

Overview. About CERN 2 / 11

PacketShader: A GPU-Accelerated Software Router

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

ARISTA: Improving Application Performance While Reducing Complexity

On enhancing GridFTP and GPFS performances

CSE 124: Networked Services Fall 2009 Lecture-19

The evolving role of Tier2s in ATLAS with the new Computing and Data Distribution model

Data Block. Data Block. Copy A B C D P HDD 0 HDD 1 HDD 2 HDD 3 HDD 4 HDD 0 HDD 1

Grid Datafarm Architecture for Petascale Data Intensive Computing

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

10 Gbit/s Challenge inside the Openlab framework

Parallel File Systems. John White Lawrence Berkeley National Lab

A DAQ system for CAMAC controller CC/NET using DAQ-Middleware

NET ID. CS519, Prelim (March 17, 2004) NAME: You have 50 minutes to complete the test. 1/17

Andy Kowalski Ian Bird, Bryan Hess

Implementation of Software-based EPON-OLT and Performance Evaluation

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Cluster Network Products

High-Energy Physics Data-Storage Challenges

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

GridFTP Scalability and Performance Results Ioan Raicu Catalin Dumitrescu -

Transcription:

Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California A data Grid testbed environment in Gigabit with HPSS Atsushi Manabe, Setsuya Kawabata, Youhei Morita, Takashi Sasaki, Hiroyuki Sato, Yoshiyuki Watase and Shigeo Yashiro High Energy Accelerator Research Organization,KEK, Tsukuba,Japan Tetsuro Mashimo,Hiroshi Matsumoto, Hiroshi Sakamoto,Junichi Tanaka, Ikuo Ueda International Center for Elementary Particle Physics University of Tokyo (IPP, Tokyo, Japan) Kohki Ishikawa, Yoshihiko Itoh, Satomi Yamamoto IBM Japan, Ltd. Tsutomu Miyashita KSK-ALPA co. ltd. For data analysis of large-scale experiments such as LHC Atlas and other Japanese high energy and nuclear physics projects, we have constructed a Grid test bed at IPP and KEK. These institutes are connected to national scientific gigabit network backbone called SuperSINET. In our test bed, we have installed NorduGrid middleware based on Globus, and connected 12TB HPSS at KEK as a large scale data store. Atlas simulation data at IPP has been transferred and accessed using SuperSINET. We have tested various s and characteristics of HPSS through this high speed. The measurement includes comparison between computing and storage resources are tightly coupled with low latency LAN and long distant. 1. Introduction In the Atlas Japan collaboration, International Center for Elementary Particle Physics University of Tokyo(IPP) will build a Tier-1 regional center and High Energy Accelerator Research Organization(KEK) will build a Tier-2 regional center for the Atlas experiment of the Large Hadron Collider (LHC) project at RN. The two institutes are connected by the Super Sinet network which is an ultrahigh-speed Japanese academic researches Internet backbone. On the network with the Grid technologies a test bed was constructed to study requisite functionality and issues for the tiered regional centers. High Performance Storage System (HPSS) with high density digital tape libraries could be a key component to handle petabytes of data produced by Atlas experiment and to share such data among the regional collaborators. HPSS parallel and concurrency data transfer mechanisms, which support disk, tape and tape libraries, are effective and scale to support huge data archives. This paper describes about integration of HPSS into a Grid architecture and the measurement of HPSS in use over a high-speed. 2. Test bed system The computer resources for the test bed were installed to IPP and KEK site. One Grid server in each site and HPSS servers in KEK were connected to the Super Sinet. The Internet backbone, Super Sinet connects research institutes at 1 Gbps with operation of Optical Cross Connect for fiber/ wavelength switching and the two are directly connected at 1 Gbps. All resources including network were isolated from other users and dedicated for the test. Figure 1 and Table I shows our hardware setup. Three storage system components were employed. One disk storage server each at KEK and IPP shared its host with the Grid server. The remaining HPSS software components used some part of the KEK central computer system. The HPSS data flow depicted in Fig. 2. The HPSS Servers includes core servers, disk movers, and tape movers tightly coupled by an IBM SP2 cluster network switch. In the case of original (kerberos) pftp server measurement, pftpd was run in the core HPSS server. In the case of GSI-enabled HPSS server which will be mentioned in 4, pftpd was run in the same processors as the disk mover. The disk movers were directly connected to the test bed LAN through their network interface cards. HPSS disk movers were dedicated only to the test. NorduGrid middleware ran on the Grid servers. Other computing elements () acted as a Portable Batch System (PBS) [1] that was not required to install the NorduGrid middleware. NorduGrid middleware ran on the Grid servers. Other computing elements () acted as a Portable Batch System that was not required to install the NorduGrid middleware. The NorduGrid[5] is a pioneer Grid project in Scandinavia that added upper layer functionality, which is necessary to HEP computing, on the Globus tool kit. The middleware was simple to understand and offered functionality sufficient for our test bed study. Table II shows the versions of middleware used in the test bed. THCT2 1 eprint cs.dc/3651

Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California Table I Test bed Hardware IPP Grid and PBS server 1 Athlon 1.7GHz 2CPU 4 pentium III 1.4GHz 2CPU KEK Grid and PBS server 1 Pentium III 1GHz 2CPU 5 pentium III 1GHz 2CPU HPSS disk mover 2 Power3 375MHz HPSS tape mover and Library 19 Power3 375MHz, IBM 359 Table II Test bed Software software in IPP/KEK version Globus 2.2.2 NorduGrid.3.12 PBS 2.3.16 HPSS 4.3 Disk Mover (Disk Cache) HPSS server Shared by many users GRID testbed environment with HPSS through GbE- NorduGrid - grid-manager -gridftp-server Globus-mds Globus-replica PBS server.2tb 1Gbps 1Mbps SE 6CPUs PBS clients IPP ~ 6km User PCs KEK HPSS servers Figure 1: Layout of the test bed hardware SE HPSS 12TB NorduGrid - grid-manager - gridftp-server Globus-mds PBS server 1 CPUs PBS clients 3. HPSS over high-speed 3.1. basic network Before end to end measurement, basic Gigabit Ethernet between IBM HPSS servers at KEK and a host at IPP through the and a host on the KEK LAN was measured using netperf [2] and is shown in figure 3. Round Trip Time (RTT) averaged 3 to 4 ms. The network quality of service was quite good and free from packet loss (.1HPSS server was 256kB (the size was fixed to optimize IBM SP2 switching network ) and was 64MB in clients at both KEK (over LAN) and IPP (over ). The processors running the HPSS servers limited the maximum raw TCP transfer, as seen in the graph the network varied with socket buffer size. Beyond.5MB, network access through both LAN and became almost equivalent and saturated. Figure 4 shows the network with a (Gridftp client) 2CPU PenIII 1GHz RedHat 7.2 Globus 2.2 Figure 2: HPSS players. Disk mover GSIpftp Server 2CPU Power3 375MHz AIX 4.3 HPSS 4.3 Globus 2. Disk mover Tape: 359 (14MB/s 4GB) Tape movers 2CPU Power3 375MHz AIX 4.3 HPSS 4.3 number of simultaneous transfer sessions through the and the LAN. In the situation where socket buffer size was 1KB, up to 4 parallel simultaneous stream sessions improved network throughput. Using greater buffer size than 1MB, multiple stream sessions did not improve the aggregate network transfer speed. And network utilization was limited by the of the processors running the HPSS servers. Network by netperf Transfer speed (MBit/s) 6 4 LAN 2 HPSS mover KEK client HPSS mover IPP client HPSS mover Bufsize=256kB.2.4.6.8 1 TCP Buffer size of Client (Byte) [ 1 6 ] Figure 3: Basic GbE network transfer speed. 3.2. HPSS client API x3 x3 7Mb/s 744Mb/s 2 Figure 5 shows data transfer speed by using the HPSS client API and comparison between access from LAN and over. The transfer was from/to HPSS THCT2 2 eprint cs.dc/3651

Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California Aggregate Tx speed (MBit/s) 6 4 2 Network transfer with # of TCP session Client Buffer size = 1MB Client Buffer size = 1KB IPP client KEK HPSS mover Buffer size HPSS mover = 256kB 2 4 6 8 1 # of TCP session Figure 4: Network with no. of TCP stream sessions. disk mover disk to/from client host memory. The transferred file size was 2GB in all case. Disk access speed in the disk mover was 8MB/s. It shows that even with a larger API buffer, access speed was about a half of LAN access both for reading and writing from/to HPSS server. To increase HPSS in future tests, the newer pdata protocol provided in HPSS 4.3 can be employed. This will improve pget. To get the same effect on pputs, the pdata-push protocol provided in HPSS 5.1 is required. The existing mover and pdata protocols are driven by the HPSS mover with the mover requesting each data packet by sending a pdata header to the client mover. The client mover then sends the data. This exchange creates latency on a. The pdata-push protocol allows the client mover to determine the HPSS movers that will be the target of all data packets when the data transfer is set up. This protocol eliminates the pdata header interchange and allows the client to just flush data buffers to the appropriate mover. The result is that the data is streamed to the HPSS mover by TCP at whatever rates it can be delivered by the client side mover and written to the HPSS mover devices. Transfer speed (MB/s) HPSS Client API 8 6 4 2 IPP client read KEK client read IPP client write KEK client write LAN HPSS disk <-> memory file size = 2GB 1 2 Buffer size (MB) Figure 5: HPSS client API 6MB/s 33ms 57MB/s 128m 28MB/s 45ms 24MB/s 25m 3.3. pftp-pftpd transfer speed Figure 6 shows data transfer speed by using HPSS pftp from HPSS disk mover to client /dev/null dummy device. Again as in the previous HPSS client API transfer, even with a pftp buffer size of 64MB, access speed from was about a half of LAN access. In addition, enabling single file transfer with multiple TCP stream by using the pftp pwidth option was not effective in our situation. In our server layout, two disk mover hosts each had two RAID disks. Therefore, up to 4 concurrent file transfers could effect higher network utilization and overall throughput, and was so seen in and LAN access case. In the same figure (Fig. 6) data transfer speed was shown from HPSS disk mover to client disks which had writing of 35-45MB/s. Though disks both in server and client hosts had exceeding 3MB/s access speed and also network transfer speed exceeded 8MB/s, overall transfer speed dropped into 2MB/s. It is because these three resources access was not executed in parallel but done in series. Figure 7 shows elapsed time for access of data in tape library. Thanks to HPSS functionality and an adequate number of tape movers and tape drives, the system data throughput scaled with the number of concurrent file transfers. If all the tapes are off drives, since the library had only two accessors, scaled up to two concurrent transfers. Comparison (Fig. 8) of writing to HPSS disk mover from client over and LAN is rather complicated. In our setup, HPSS server had 4 independent disks but client had only one. Reading multiple files in parallel (N files N files; reading N files simultaneously at client and writing to N files to the server) from a single disk slows down the aggregate access by contention of disk heads. pftpd pftp HPSS mover disk Client 8 6 4 2 to /dev/null to client disk KEK client (LAN) IPP client() Ftp buffer=64mb client disk speed 35-45MB/s 2 4 6 8 1 Client disk speed @ KEK = 48MB/s Client disk speed @ IPP=33MB/s Figure 6: pftpd-pftp read to client /dev/null and disk THCT2 3 eprint cs.dc/3651

Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California pftpd pftp read Elapsed Time (sec) 3 2 1 tape off drive Data in Tape tape in drive Data in disk cache data was on HPSS mover disk data was in HPSS mover mounted tape data was in HPSS mover unmounted tape 2 4 6 8 1 Figure 7: pftpd-pftp read to client disk from tape archive 8 6 4 2 GSIpftp Client disk -> Mover disk IPP client; from single file IPP client; from multiple files KEK client; from single file KEK client; from multiple files IPP client; pwidth 1->1 N files 1 file ->N files N files 1 file 1->1 file (pwidth) 2 4 6 8 1 # of file transfer in paralle Figure 8: pftpd-pftp write to server cachedisk 4. GSI-enabled pftp GridFTP[3] is a standard protocol for building data GRID and supports the featues of Grid Security Infrastructure (GSI), Multiple data channels for parallel transfers, partial file transfers, third-party transfer and reusable and authenticated data channels. The pftp and ftp provided with HPSS software was not required or designed to support data Grid infrastructure. For future releases, HPSS Collaboration Members have introduced data Grid pftp requirements and the HPSS Technical Committee (TC) has convened a Grid Working Group to propose a development plan. As an interim and partial HPSS data Grid interface solution, the HPSS Collaboration is distributing the GSI-enabled pftp solution developed by Lawrence Berkeley National Laboratory (LBL). The HPSS TC is also working with the GridFTP development project underway at Argonne National Laboratory. To acquire an HPSS data Grid interface necessary for our test bed, we requested and received a copy of Lawrence Berkeley National Laboratory s recently developed GSI-enabled pftp. The protocol itself is pftp but it supports GSI-enabled AUTH and ADAT ftpcommand. Table III commands in FTP protocol GridFTP GSI-enbled pftp SPAS,SPOR,ETET PBSZ,PCLO,PORPN, ESTO,SBUF,DCAU PPOR,PROT,PRTR,PSTO AUTH,ADAT RFC959 commands As shown in table III which lists commands in each FTP protocol. while GSI-enabled pftp and GridFTP have different command set for parallel transfer, buffer management and Data Channel Authentication (DCA), the base command set is common. Fortunately unique functions to each protocol are optional and the two protocols are able to communicate. Installing and testing the GSI-enabled pftp proved that the GSI-enabled pftp daemon form LBL could be successfully accessed from gsinftp and urlcopy (standard globus clients). &(executable=gsim1) (arguments= -d ) (inputfiles= ("Bdata.in" "gsiftp://dt5s.cc:2811/hpss/manabe/data2")) (stdout=datafiles.out) (join=true) (maxcputime="36") (middleware="nordugrid") (jobname="hpss access test") (stdlog="grid_debug")% (ftpthreads=1) sample XRSL As for measurement of 2GB file being accessed from HPSS, GSI-enabled pftp and normal kerberos pftp had equivalent elapsed time. Figure 9 shows aggregate transfer speed over the number of independent simultaneous file transfer. However, in the case where GSI enabled-pftpd server does not run on HPSS disk mover where accessed data resides, transfer speed halved. In original pftp where pftpd running in HPSS core server, data path is directly established between pftp client and disk mover. On the other hand, GSI-enabled pftp, data flow was from disk mover, via pftpd to client host. When the disk mover and pftpd server do not reside on the same host, two successive network transfer are incurred. THCT2 4 eprint cs.dc/3651

Computing in High Energy and Nuclear Physics, 24-28 March 23, La Jolla, California Gridftp client and GSI-pftp server 8 6 4 2 disk dsk mver(!=pftpd) (!=pftpd) to clnt; client pftppftp-pftpd - pftpd disk dsk mver(=pftpd) (=pftpd) to clnt; client GSIftp-GSIpftpd gridftp-pftpd disk mver (!=pftpd) client gridftp-pftpd dsk mver(!=pftpd) to clnt; GSIftp -GSIpftpd 2 4 6 8 1 Figure 9: from GSI-enabled pftpd to Gridftp read 5. summary IPP and KEK configured NorduGrid test bed with HPSS storage server over High speed GbE domestic. Performance was measured several times for comparison between LAN and access. From that, we found that network latency affected HPSS pftp and client API data transfer speed. The GSIenabled pftpd developed by LBL was successfully adapted to the interface between Grid infrastructure and HPSS. Our paper is a report on work-in-progress. Final results require that the questions relative to raw TCP, server/client protocol traffic, and pftp a protocol be further evaluated; that any necessary modifications or parametric changes be acquired form our HPSS team members; and that measurements be taken again. Further understanding of the scalability and the limits of multi-disk mover configurations would be gained from measuring HPSS network utilization and using higher network interfaces adapters, system software and infrastructure, and processor configurations. References [1] http://www.openpbs.org [2] http://www.netperf.org [3] http://www.globus.org/datagrid/gridftp.html [4] http://www.sdsc.edu/hpss/ [5] http://www.nordugrid.org, You can find NorduGrid papers in this proceedings too. [6] S.Yashiro et. al., Data transfer using buffered I/O API with HPSS, CHEP 1, Beijing, Jul.21 THCT2 5 eprint cs.dc/3651