Storage Evaluations at BNL

Similar documents
Current Status of the Ceph Based Storage Systems at the RACF

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Cluster Setup and Distributed File System

RAIDIX Data Storage Solution. Data Storage for a VMware Virtualization Cluster

InfiniBand Networked Flash Storage

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

SurFS Product Description

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

COSC6376 Cloud Computing Lecture 17: Storage Systems

NST6000 UNIFIED HYBRID STORAGE. Performance, Availability and Scale for Any SAN and NAS Workload in Your Environment

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

MELLANOX MTD2000 NFS-RDMA SDK PERFORMANCE TEST REPORT

HYBRID STORAGE TM. WITH FASTier ACCELERATION TECHNOLOGY

OpenAFS A HPC filesystem? Rich Sudlow Center for Research Computing University of Notre Dame

VSTOR Vault Mass Storage at its Best Reliable Mass Storage Solutions Easy to Use, Modular, Scalable, and Affordable

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

SMART SERVER AND STORAGE SOLUTIONS FOR GROWING BUSINESSES

Costefficient Storage with Dataprotection

General Purpose Storage Servers

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

What is QES 2.1? Agenda. Supported Model. Live demo

Comparison of Storage Protocol Performance ESX Server 3.5

LCG data management at IN2P3 CC FTS SRM dcache HPSS

IME Infinite Memory Engine Technical Overview

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Table 9. ASCI Data Storage Requirements

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

dcache tape pool performance Niklas Edmundsson HPC2N, Umeå University

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

RAIDIX 4.5. Product Features. Document revision 1.0

Storage Area Network (SAN)

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Analysis of high capacity storage systems for e-vlbi

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

RAIDIX Data Storage System. Entry-Level Data Storage System for Video Surveillance (up to 200 cameras)

Performance Evaluation Using Network File System (NFS) v3 Protocol. Hitachi Data Systems

Architecting For Availability, Performance & Networking With ScaleIO

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

CA485 Ray Walshe Google File System

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

White Paper. File System Throughput Performance on RedHawk Linux

System Administration. Storage Systems

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

IBM Emulex 16Gb Fibre Channel HBA Evaluation

LEVERAGING A PERSISTENT HARDWARE ARCHITECTURE

Chapter 10: Mass-Storage Systems

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

NetVault Backup Client and Server Sizing Guide 2.1

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell Storage PS Series Arrays

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

<Insert Picture Here> Oracle Storage

Data oriented job submission scheme for the PHENIX user analysis in CCJ

The Genesis HyperMDC is a scalable metadata cluster designed for ease-of-use and quick deployment.

System input-output, performance aspects March 2009 Guy Chesnot

John Fragalla TACC 'RANGER' INFINIBAND ARCHITECTURE WITH SUN TECHNOLOGY. Presenter s Name Title and Division Sun Microsystems

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

Diamond Networks/Computing. Nick Rees January 2011

Storage Optimization with Oracle Database 11g

EMC Backup and Recovery for Microsoft SQL Server

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Extremely Fast Distributed Storage for Cloud Service Providers

Design Concepts & Capacity Expansions of QNAP RAID 50/60

System Requirements. PREEvision. System requirements and deployment scenarios Version 7.0 English

SFA12KX and Lustre Update

SATA RAID For The Enterprise? Presented at the THIC Meeting at the Sony Auditorium, 3300 Zanker Rd, San Jose CA April 19-20,2005

SUN ZFS STORAGE APPLIANCE

Parallels Virtuozzo Containers

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Quad Quad Core Computer

Xyratex ClusterStor6000 & OneStor

vstart 50 VMware vsphere Solution Specification

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

Got Isilon? Need IOPS? Get Avere.

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

Sun N1: Storage Virtualization and Oracle

Apace Systems. Avid Unity Media Offload Solution KIT

1Highwinds. Software. Highwinds Software LLC. Document Version 1.1 April 2003

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

NetVault Backup Client and Server Sizing Guide 3.0

A GPFS Primer October 2005

CSCS HPC storage. Hussein N. Harake

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Transcription:

Storage Evaluations at BNL HEPiX at DESY Spring 2007 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory

State of Affairs Explosive disk storage growth trajectory over the next 5 years (>30 PB). Projected storage requirements for distributed fileservers (dcache, xrootd) may require more farm nodes than necessary for computation alone. Management and scalability concerns Model using distributed dcache managed disk space on compute nodes may not prove viable or cost effective Separate and consolidate the distributed storage component (dcache, xrootd) of the farm onto dedicated storage servers? Differentiate between two tiers of distributed storage read-only vs. write (HA). Increasing demand for data center real estate, power, and cooling. Aging centralized storage infrastructure (Panasas, NFS).

Expected Computing Capacity Evolution 70000 60000 KSI2K 50000 40000 RHIC USATLAS 30000 20000 10000 FY' 12 FY' 11 FY' 10 FY' 09 FY' 08 FY' 07 A. Chan FY' 06 0

Expected Storage Capacity Evolution 40000 TB of disk storage 35000 30000 25000 RHIC 20000 USATLAS 15000 10000 5000 FY' 12 FY' 11 FY' 10 FY' 09 FY' 08 FY' 07 A. Chan FY' 06 0

Test Methodology Create a standard test/production configuration Local I/O profiles using IOzone and FileOP dcache and xrootd read bandwidth dcache write pool stress testing (GridFTP in / PFTP out) Mock production using pre-packaged applications in alliance with various experimental groups Obtain maximum I/O capacity for each system Identify the highest performing and most versatile storage systems Focus is on high-density disk arrays and NAS systems (e.g., SunFire x4500, Scalable Informatics, Nexsan SataBeast, Terrazilla, Xtore, 3Par) Compare and contrast SATA vs. SAS, Solaris/ZFS vs. Linux/ext3/XFS, HW vs. SW RAID, separate disk array vs. local disk

SunFire x4500 Thumper The SunFire x4500 is a promising potential dcache write pool node, NFS server, and/or iscsi target. NAS solution consisting of (2) dual-core AMD Opteron 285 processors (2.6Ghz) and (48) 500 GB SATA II drives yielding 24/20.5 TB RAW/usable capacity (6) paired SATA controllers connected to HyperTransport PCI-X 2.0 tunnels (2) dual-gb ethernet controllers 16 GB RAM

2 Thumper Test Configurations Solaris 10 update 2 (6/06) kernel 118855-19, ZFS filesystem A single ZFS storage pool and file system was created from (8) raidz (RAID 5) sets in (5 + 1) and (6 + 1) configurations Each RAID member disk resides on a different controller 4 channel bonded interfaces Fedora Core 6, x86_64 2.6.19 kernel (8) RAID-5 sets created using mdadm. Again, each member RAID disk on a different controller. 64k chunk. Single RAID-0 (stride) volume using LVM2 EXT3 (not 64-bit mode ext4) and XFS (2.8.11)file systems created on the volume 4 channel bonded interfaces

Thumper IOzone Results: Aggregate ZFS

Thumper IOzone Results: Aggregate XFS

Thumper IOzone Results: Aggregate EXT3

Thumper IOzone Comparative Results

Thumper IOzone Comparative Results

Thumper IOzone Comparative Results

Thumper IOzone Comparative Results

Thumper ZFS Fileop Tests Thumper ZFS Metadata Performance 1000000 mkdir rmdir create read write close stat access chmod readdir link unlink delete 900000 800000 Ops/sec 700000 600000 500000 400000 300000 200000 100000 0 0 5 10 15 20 25 30 Number of Files 35 40 45 50

Thumper XFS Fileop Tests Thumper XFS Metadata Performance 350000 325000 mkdir rmdir 300000 275000 create read write close Fileops/sec 250000 225000 200000 stat access chmod 175000 150000 125000 readdir link unlink 100000 75000 delete Total_files 50000 25000 0 0 5 10 15 20 25 30 Number of Files 35

Thumper EXT3 Fileop Tests Fileops/sec Thumper EXT3 Metadata Performance 375000 350000 325000 300000 275000 250000 225000 200000 175000 150000 125000 100000 75000 50000 25000 0 mkdir rmdir create read write close stat access chmod readdir link unlink delete Total_files 0 5 10 15 20 25 Number of Files 30 35

Thumper dcache Tests (ZFS - dccp) O. Rind Test 1: 3:45PM: 30 nodes read data (same file) from dcache at 400MB/sec. Test 2: 4:15PM: 30 nodes write data to dcache at ~250MB/sec.

Thumper dcache Tests (ZFS - dccp) O. Rind Test 3: 4:40PM: 30 nodes read mixed data from dcache @ ~350MB/sec. Test 4: 4:50PM: 30 nodes simultaneous read and write @ 200MB/sec. Expect that this would be average mixed performance.

Thumper dcache Tests (ZFS - dccp) O. Rind Test 5: 75 clients sequentially writing 3x1.5G files (green line) + 75 clients sequentially reading 4x1.5G randomly selected files (blue line) @ 250-300MB/sec.

Thumper dcache Tests (ZFS - GridFTP) O. Rind Test 6: 75 clients sequentially writing 3x1.5GB files (green line) + 75 clients sequentially reading 4x1.5G randomly selected files (blue line)

Thumper Test Observations CPU and buffer cache effects clearly seen in IOZONE graphs. Thumper delivers stunning I/O throughput in the 1 10GB file size range (dcache, Xrootd). Read performance parity between Linux XFS/EXT3 and Solaris10/ZFS SolX/ZFS dominant in sequential write performance, XFS on par for files > 4GB, abysmal EXT3 performance EXT3 not 64-bit (4KB block size limited), need to retest with the 64-bit EXT3 extents patchset (EXT4) EXT3 would be preferable if performance was equalized (inclusion in kernel, active development) ZFS performance drop in random writes for files > 4GB Problem with write sequentialization for very large files? ZFS consumes all available physical memory which isn't released even after I/O activity has ceased No convenient knob to throttle memory consumption (need to use mdb) However, memory is freed when needed

Thumper Test Observations File operation tests (up to 50 files): ZFS wins for close, access, stat, and chmod XFS for read and delete XFS == ZFS for writes ZFS: need to test stat for thousands of files recent dialogue on dcache list regarding slow pool start-up using ZFS A problem with Java - ZFS? ZFS recommended if compatible with the software stack (yes for dcache, no for Lustre) ZFS easy to set-up and administer, integrated volume management Fault tolerance: end to end checksums, no RAID-5 hole Architectural point of failure: storage will not be externally accessible if the CPU/memory module fails. Need external hyper-transport SATA controller? Does this really matter for dcache, xrootd read nodes?

Thumper Test Observations Scalability concerns: will managing many racks be a hassle? At the PB scale, does a SAN backend make more sense? Looking forward to a quad-core Thumper and hopefully SAS Thumpers. Need to test: 10GE Reap the benefits of TOE and eventually RDMA/iWARP on supported cards Performance cost of RAID-6? XFS dcache performance EXT4 GridFTP in PFTP out Xrootd

Other Test Systems We extensively tested the Scalable Informatics Jackrabbit. This is a similarly dense 5U, 48 disk (46 750 GB data disks), with (2) dual-core Opteron processors, 12GB RAM (our test unit) and (2) Areca SATA II RAID controllers, each with 1GB cache Our evaluation system was not a polished product but performed well There were several evaluation flaws that the vendor claims disadvantaged their system in our tests including: We installed SL4.4 (2.4x) over a 2.6x kernel The system was configured RAID-6 vs. RAID-5 on Thumper More systems in-house for testing: Xstor 16 bay SAS JBOD, 4U 48 bay SAS array on the way Aberdeen Terrazilla 6U 32 bay SATA II array with integrated server Nexsan SATAbeast 4U 48 bay SATA II, dual controller DDN SAF4248 4U 48 bay SATA II plus S2A controllers on the way

Questions, Comments?