New results from CASPUR Storage Lab

Similar documents
IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

Module 2 Storage Network Architecture

Storage and Storage Access

Comparison of Storage Protocol Performance ESX Server 3.5

Parallel File Systems Compared

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

SNIA Discussion on iscsi, FCIP, and IFCP Page 1 of 7. IP storage: A review of iscsi, FCIP, ifcp

Performance Report: Multiprotocol Performance Test of VMware ESX 3.5 on NetApp Storage Systems

USING ISCSI AND VERITAS BACKUP EXEC 9.0 FOR WINDOWS SERVERS BENEFITS AND TEST CONFIGURATION

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Server Networking e Virtual Data Center

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Software-defined Shared Application Acceleration

COSC6376 Cloud Computing Lecture 17: Storage Systems

The IBM Systems Storage SAN768B announces native Fibre Channel routing

IBM Europe Announcement ZG , dated February 13, 2007

QLogic TrueScale InfiniBand and Teraflop Simulations

Storage Area Network (SAN)

Lot # 10 - Servers. 1. Rack Server. Rack Server Server

NAS for Server Virtualization Dennis Chapman Senior Technical Director NetApp

Active System Manager Release 8.2 Compatibility Matrix

Experiences with HP SFS / Lustre in HPC Production

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

Sun N1: Storage Virtualization and Oracle

iscsi Testing At NASA GSFC

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

The INFN Tier1. 1. INFN-CNAF, Italy

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

SMART SERVER AND STORAGE SOLUTIONS FOR GROWING BUSINESSES

HP0-S15. Planning and Designing ProLiant Solutions for the Enterprise. Download Full Version :

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Internet data transfer record between CERN and California. Sylvain Ravot (Caltech) Paolo Moroni (CERN)

Interoperability of Bloombase Spitfire StoreSafe Security. for Transparent IP-Storage Area Network (SAN) Encryption

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

EMC Business Continuity for Microsoft Applications

Implementing iscsi in Storage Area Networks

Implementing iscsi in Storage Area Networks

An Introduction to GPFS

IBM eserver xseries. BladeCenter. Arie Berkovitch eserver Territory Manager IBM Corporation

Network Storage Solutions for Computer Clusters Florin Bogdan MANOLACHE Carnegie Mellon University

EMC Backup and Recovery for Microsoft SQL Server

Cisco Wide Area Application Services (WAAS) Mobile

Configuring and Managing Virtual Storage

Data Storage Institute. SANSIM: A PLATFORM FOR SIMULATION AND DESIGN OF A STORAGE AREA NETWORK Zhu Yaolong

Embedded Filesystems (Direct Client Access to Vice Partitions)

10 Gbit/s Challenge inside the Openlab framework

VMware VMmark V1.1 Results

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Low Latency Evaluation of Fibre Channel, iscsi and SAS Host Interfaces

Send documentation comments to You must enable FCIP before attempting to configure it on the switch.

BlackBerry Enterprise Server for Microsoft Exchange Version: 5.0. Performance Benchmarking Guide

FPGAs and Networking

Unified Storage Networking. Dennis Martin President, Demartek

Microsoft Office SharePoint Server 2007

VMware Infrastructure 3.5 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Lustre A Platform for Intelligent Scale-Out Storage

Snia S Storage Networking Management/Administration.

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Hands-On Wide Area Storage & Network Design WAN: Design - Deployment - Performance - Troubleshooting

VMware VMmark V1.1 Results

Data Deduplication Makes It Practical to Replicate Your Tape Data for Disaster Recovery

Lessons learned from Lustre file system operation

Horizontal Scaling Solution using Linux Environment

DATA PROTECTION IN A ROBO ENVIRONMENT

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Cisco MDS 9000 Enhancements Fabric Manager Server Package Bundle, Mainframe Package Bundle, and 4 Port IP Storage Services Module

Storage Systems Market Analysis Dec 04

Hálózatok üzleti tervezése

Mellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007

iscsi Technology Brief Storage Area Network using Gbit Ethernet The iscsi Standard

Apace Systems. Avid Unity Media Offload Solution KIT

Experience with Fabric Storage Area Network and HSM Software at the Tier1 INFN CNAF

Oracle Database 11g Direct NFS Client Oracle Open World - November 2007

VMware VMmark V1.1 Results

Dell PowerEdge with Oracle Database 10g x86-64 and RAC on Enterprise Linux AS 5 U2 version 4.1 Updated 01/30/2009

Demartek September Intel 10GbE Adapter Performance Evaluation for FCoE and iscsi. Introduction. Evaluation Environment. Evaluation Summary

SAN, HPSS, Sam-QFS, and GPFS technology in use at SDSC

Vblock Architecture. Andrew Smallridge DC Technology Solutions Architect

Performance Evaluation Using Network File System (NFS) v3 Protocol. Hitachi Data Systems

TS7720 Implementation in a 4-way Grid

Jake Howering. Director, Product Management

The GUPFS Project at NERSC Greg Butler, Rei Lee, Michael Welcome. NERSC Lawrence Berkeley National Laboratory One Cyclotron Road Berkeley CA USA

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

IBM Scale Out Network Attached Storage (SONAS) using the Acuo Universal Clinical Platform

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

Cisco MDS 9000 Series Switches

EMC CLARiiON CX3-80. Enterprise Solutions for Microsoft SQL Server 2005

iscsi Technology: A Convergence of Networking and Storage

Real Time COPY. Provide Disaster Recovery NOT to maintain two identical copies Provide I/O consistent copy of data

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

Milestone Solution Partner IT Infrastructure Components Certification Report

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Technical Requirements

Oracle RAC 10g Celerra NS Series NFS

Transcription:

CASPUR / GARR / CERN / CNAF / CSP New results from CASPUR Storage Lab Andrei Maslennikov CASPUR Consortium June 2003

Participated: CASPUR ACAL FCS (UK) GARR CERN CISCO CNAF CSP Turin Nishan (UK) : M.Goretti, A.Maslennikov(*), M.Mililotti, G.Palumbo : N. Houghton : M.Carboni : M.Gug, G.Lee, R.Többicke, A.Van Praag : L.Pomelli : P.P.Ricci, S.Zani : R.Boraso : S.Macfall (*) Project Coordinator A.Maslennikov - June 2003 - SLAB update 2

Sponsors: E4 Computer : Loaned 6 SuperMicro servers (MBs and assembly) - excellent hardware quality and support Italy Intel : Donated 12 x 2.8 GHz Xeon CPUs San Valley Systems : Loaned two SL1000 units - good remote CE support during tests ACAL FCS / Nishan : Loaned two 4300 units - active participation in tests, excellent support CISCO Systems : Loaned two MDS 9216 switches with FCIP modules - active participation in tests, excellent support A.Maslennikov - June 2003 - SLAB update 3

Contents Goals Components and test setup Measurements: - SAN over WAN - NAS protocols - IBM GPFS - Sistina GFS Final remarks Vendors contact info A.Maslennikov - June 2003 - SLAB update 4

Goals for these test series 1. Feasibility study for a SAN-based Distributed Staging System 2. Comparison of the well-known NAS protocols on latest commodity hardware 3. Evaluation of the new versions of IBM GPFS and Sistina GFS as a possible underlying technology for a scalable NFS server. A.Maslennikov - June 2003 - SLAB update 5

Remote Staging Feasibility study for a SAN-based Distributed Staging System - Most of the large centers keep the bulk of the data on tapes and use some kind of disk caching (staging, HSM, etc) to access these data. - Sharing datastores between several centers is frequently requested, and this means that some kind of mechanism that would allow to access the data stored on the remote tapes should be proposed. - Suppose now that your centre has implemented a tape<->disk migration system. And you have to extend your system to allow it to access the data dislocated on remote tape drives. Let us see how this can be achieved. A.Maslennikov - June 2003 - SLAB update 6

Remote Staging Solution 1: To access a remote tape file, stage it on a remote disk, then copy it via network to the local disk. Local Site Remote Site Tape Disk Disk Disadvantages: - 2-step operation: more time is needed, harder to orchestrate - Wasted remote disk space A.Maslennikov - June 2003 - SLAB update 7

Remote Staging Solution 2: Use a tape server : a process residing on a remote host that has access to the tape drive. The data are read remotely and then piped via network directly to the local disk. Local Site Remote Site Disk Tape Server Tape Disadvantages: - remote machine is needed - architecture is quite complex A.Maslennikov - June 2003 - SLAB update 8

Remote Staging Solution 3: Access the remote tape drive as a native device on SAN. Use it then as if it is a local unit attached to one of your local data movers. Local Site Tape Disk SAN Remote Site Tape Benefits: - Makes the staging software a lot simpler. Local field-tested solution applies. - Best performance guaranteed (provided the remote drive may be used locally at native speed) A.Maslennikov - June 2003 - SLAB update 9

Remote Staging Verify whether FC tape drives may be used at native speeds over the WAN, using the SAN-over-WAN interconnection middleware - In 2002, we had already tried to reach this goal. In particular, we used the CISCO 5420 iscsi appliance to access an FC tape over the 400 km distance. We were able to write at the native speed of the drive, but the read performance was very poor. - This year, we were able to assemble a setup which implements a symmetric SAN interconnection and hence used it to repeat these tests. A.Maslennikov - June 2003 - SLAB update 10

NAS protocols Benchmark the well-known NAS protocols on a modern commodity hardware. - These tests we do on a regular basis, as we wish to know what performance we may currently count on, and how the different protocols compare on the same hardware base. - Our test setup was visibly more powerful than that of the last year, so we were expecting to obtain better numbers. - We were comparing two remote copy protocols: RFIO and Atrans (cacheless AFS), and two protocols that provide the transparent file access: NFS and AFS. A.Maslennikov - June 2003 - SLAB update 11

Scalable NFS Server Evaluate the new versions of IBM GPFS and Sistina GFS as a possible underlying technology for a scalable NFS server. - In 2002, we have already tried both GPFS and GFS. - GFS 5.0 has shown interesting performance figures, but we have observed several issues with it: unbalanced perfomance in case of multiple clients, exponential increase of load on the lock server with increasing number of clients. - GPFS 1.2 was showing a poor performance in case of concurrent writing on several storage nodes. - We used GFS 5.1.1 and GPFS 1.3.0-2 during this test session. A.Maslennikov - June 2003 - SLAB update 12

Components - High-end Linux units for both servers and clients 6x SuperMicro Superserver 7042M-6 and 2x HP Proliant DL380 with: 2 CPUs Pentium IV Xeon 2.8GHz SysKonnect 9843 Gigabit Ethernet NIC (fibre) Qlogic QLA2300 2Gbit Fibre Channel HBA Myrinet HBA - Disk systems 4x Infortrend IFT-6300 IDE-to-FC arrays: 12 x Maxtor DiamondMax Plus 9 200 GB IDE disks (7200 rpm) Dual Fibre Channel outlet at 2 Gbit Cache: 256 MB A.Maslennikov - June 2003 - SLAB update 13

- Tape drives 4x LTO/FC (IBM Ultrium 3580) - Network Components -2 12-port NPI Keystone GE switch (fibre) 28-port Dell 5224 GE switches (fibre / copper) Myricom Myrinet 8-port switch Fast geographical link (Rome-Bologna, 400km), with guaranteed throughput of 1 Gbit. - SAN Brocade 2400, 2800 (1Gbit) and 3800 (2Gbit) switches SAN Valley Systems SL1000 IP-SAN Gateways Nishan IPS 4300 multiprotocol IP Storage Switches CISCO MDS 9216 switches with DS-X9308-SMIP module A.Maslennikov - June 2003 - SLAB update 14

Components -3 New devices We were loaned three new objects: from San Valley Systems, from Nishan Systems, and from CISCO Systems. All these units provide the SAN-over-IP interconnect function, and are suitable for wide-area SAN connectivity. Let me give some more detail of these units. A.Maslennikov - June 2003 - SLAB update 15

San Valley Systems IP-SAN Gateway SL-700 / SL-1000-1 or 4 wirespeed Fibre Channel -to- Gigabit Ethernet channels - Uses UDP and hence delegates to the application the handling of a network outage - Easy in configuration - Allows for the fine-grained traffic shaping (step size 200 Kbit, 1Gb/s to 1Mb/s) and QoS - Connecting two SANs over IP with a pair of SL1000 units is in all aspects equivalent - to the case when these two SANs are connected with a simple fibre cable - Approximate cost: 20 KUSD/unit (SL-700, 1 channel) - 30 KUSD/unit (SL-1000, 4 channels) - Recommended number of units per site: 1 - A.Maslennikov - June 2003 - SLAB update 16

Nishan IPS 3300/4300 multiprotocol IP Storage Switch - 2 or 4 wirespeed ifcp ports for SAN interconnection over IP - Uses TCP and is capable to seamlessly handle the network outages - Allows for traffic shaping at predefined bandwidth (8 steps,1gbit- 10Mbit) and QoS - Impements an intelligent router function: allows to interconnect multiple fabrics from different vendors and makes them look as a single SAN - When interconnecting two or more separately managed SANs, maintains their independent administration - Approximate cost: 33 KUSD/unit (6 universal FC/GE ports + 2 ifcp ports - IPS 3300) 48 KUSD/unit (12 universal FC/GE ports + 4 ifcp ports - IPS 4300) - Recommended number of units per site: 2 (to provide redundant routing) A.Maslennikov - June 2003 - SLAB update 17

CISCO IP Storage Services Module (DS-X9308-SMIP) MDS9216 16 port modular FC switch IP Storage Service module - Provides integration of IP Storage Services into the Cisco MDS9000 FC switches (One MDS9216 switch incorporates 16 FC ports) - 8 Gigabit Ethernet IP Storage Interfaces: o Wire-rate FCIP on all ports simultaneously o Up to 24 simultaneous FCIP links per module - Industry standard FCIP protocol uses TCP/IP to provide reliable transport - VSAN and VLAN services increase stability and security of WAN-connected SAN elements - Pricing provided through reseller partners (IBM, HP and others) - Recommended number of units per site: 1 A.Maslennikov - June 2003 - SLAB update 18

FC SAN (Rome) Disks Tapes CASPUR Storage Lab HP DL380 Rome SM 7042M-6 SM 7042M-6 SM 7042M-6 SM 7042M-6 SM 7042M-6 SM 7042M-6 Gigabit IP (Rome) 1 Gbit WAN, 400km SL1000 IPS 4300 MDS 9216 Myrinet FC SAN (Bologna) HP DL380 Bologna SL1000 IPS 4300 MDS 9216 Gigabit IP (Bologna) A.Maslennikov - June 2003 - SLAB update 19

Series 1: accessing remote SAN devices HP DL380 Rome SL1000 FC SAN (Rome) IPS 4300 MDS 9216 Disks Tapes 1 Gbit WAN, 400km HP DL380 Bologna SL1000 FC SAN (Bologna) IPS 4300 MDS 9216 A.Maslennikov - June 2003 - SLAB update 20

Series 1 - current results - We were able to operate with tape drives at the drive native speed (R and W): 15 MB/sec in case of LTO and 25 MB/sec in case of another, faster, drive - In case of disk devices we have observed a small (5%) loss of performance on writes and a more visible (up to 12%) loss on reads, on all 3 units. The performance drop on reads increases with distance between the units: 6% at 1 m through 10-12% at 400 km. - Several powerful devices used simultaneously over the geographical link easily grab the whole available bandwidth of the GigE link between the two appliances (seen so far for Nishan and San Valley units only; CISCO will be tested in the close future). - in case of Nishan (TCP-based SAN interconnection) we have witnessed a successful job completion after an emulated 1-minute network outage. A similar test will also be performed for the CISCO unit. Distributed Staging based on a direct tape drive access is POSSIBLE! A.Maslennikov - June 2003 - SLAB update 21

Series 2 Comparison of NAS protocols Infortrend IFT6300 Infortrend IFT6300 FC 2 Gbit Server SM 7042 Gigabit Ethernet Client SM 7042 RAID 50 R/W: circa 150 MB/sec A.Maslennikov - June 2003 - SLAB update 22

Series 2 - details Some settings: - Kernels on server: 2.4.18-27 (RedHat 7.3, 8.0) - Kernel on client: 2.4.20-9 (RedHat 9) - AFS : cache was set up on ramdisk (400MB) - used ext2 filesystem on server Problems encountered: - Poor array performance on reads with kernel 2.4.20-9 A.Maslennikov - June 2003 - SLAB update 23

Write tests: Series 2 more detail - Measured average time needed to transfer 20 x 1.9 GB from memory on the client to the disk of the file server and vice versa including the time needed to run sync command on both client and the server at the end of operation: 20 x {dd if=/dev/zero of=<filename on server> bs=1000k count=1900} T=Tdd + max(tsyncclient, Tsyncserver) Read tests: - Measured average time needed to transfer 20 x 1.9 GB files from a disk on the server to the memory on the client (output directly to /dev/null ). Because of the large number of files in use and the file size comparable with available RAM on both client and server machines, caching effects were negligible. A.Maslennikov - June 2003 - SLAB update 24

Series 2- current results (MB/sec) [SM 7042-2GB RAM on server and client] Write Read Pure disk 146 151 RFIO NFS 111 87 111 82 AFS cacheless(atrans) AFS 67 47 56 30 A.Maslennikov - June 2003 - SLAB update 25

Series 3a IBM GPFS FC SAN 4 x IFT 6300 disk arrays SM 7042M-6 SM 7042M-6 SM 7042M-6 SM7042M-6 Myrinet NFS A.Maslennikov - June 2003 - SLAB update 26

Series 3a - details GPFS installation: - GPFS version 1.3.0-2 - Kernel 2.4.18-27.8.0smp - Myrinet as server interconnection network - All nodes see all disks (NSDs) What was measured: 1) Read and Write transfer rates (memory<->gpfs file system) for large files 2) Read and Write rates (memory on NFS client<->gpfs exported via NFS) A.Maslennikov - June 2003 - SLAB update 27

Series 3a GPFS native (MB/sec) Read Write 1 node 2 nodes 3 nodes 96 135 157 117 127 122 R / W speed for a single disk array: 123 / 78 A.Maslennikov - June 2003 - SLAB update 28

Series 3a GPFS exported via NFS (MB/sec) 1 node exporting 1 client 2 clients 3 clients 9 clients Read 35 44 44 44 Write 55 73 83 88 2 nodes exporting Read 2 clients 60 4 clients 72 6 clients 85 Write 90 106 106 3 nodes exporting 3 clients 6 clients 9 clients Read 84 113 120 Write 107 106 106 A.Maslennikov - June 2003 - SLAB update 29

Series 3b Sistina GFS FC SAN 4 x IFT 6300 disk arrays SM 7042M-6 SM 7042M-6 SM 7042M-6 SM7042M-6 SM7042M-6 Lock Server NFS A.Maslennikov - June 2003 - SLAB update 30

GFS installation: - GFS version 5.1.1 - Kernel: SMP 2.4.18-27.8.0.gfs (may be downloaded from Sistina together with the trial distribution), includes all the required drivers. Problems encountered: Series 3b - details - Exporting GFS via the kernel-based NFS daemon does not work well (I/Os on NFS clients were ending in error). Sistina was able to reproduce the bug on our setup, and already has partially fixed it the error rate went down in a visible way. So we are reporting here the results obtained with a generally less-performant but stable user space nfsd. What was measured: 1) Read and Write transfer rates (memory<->gfs file system) for large files 2) Same for the case (memory on NFS client<->gfs exported via NFS) A.Maslennikov - June 2003 - SLAB update 31

Series 3b GFS native (MB/sec) NB: - Out of 5 nodes: 1 node was running the lock server process 4 nodes were doing only I/O 1 client 2 clients 3 clients 4 clients Read 122 230 291 330 Write 156 245 297 300 R / W speed for a single disk array: 123 / 78 A.Maslennikov - June 2003 - SLAB update 32

Series 3b GFS exported via NFS (MB/sec) 1 node exporting 1 client 2 clients 4 clients 8 clients Read 54 67 78 93 Write 56 64 64 61 3 clients 6 clients 9 clients 3 nodes exporting Read 145 194 207 Write 164 190 185 8 clients 4 nodes exporting Read 250 Write 236 NB: - User space NFSD was used A.Maslennikov - June 2003 - SLAB update 33

Final remarks - We are proceeding with the test program. Currently under test: new middleware from CISCO, new tape drive from Sony. We are expecting also a new iscsi appliance from HP, and an LTO2 drive. - We are open for any collaboration. A.Maslennikov - June 2003 - SLAB update 34

Vendors contact info - Supermicro servers for Italy E4 Computer Vincenzo Nuti - vincenzo.nuti@e4company.com - FC over IP San Valley Systems Nishan Systems ACAL FCS CISCO Systems John McCormack - john.mccormack@sanvalley.com Stephen Macfall - smacfall@nishansystems.com Nigel Houghton - nigelhoughton@acalfcs.com Luciano Pomelli lpomelli@cisco.com A.Maslennikov - June 2003 - SLAB update 35