X1 StorNext SAN. Jim Glidewell Information Technology Services Boeing Shared Services Group

Similar documents
StorNext SAN on an X1

Early X1 Experiences at Boeing. Jim Glidewell Information Technology Services Boeing Shared Services Group

Shared Object-Based Storage and the HPC Data Center

Integrating Fibre Channel Storage Devices into the NCAR MSS

Backing Up and Restoring Multi-Terabyte Data Sets

Data Movement & Tiering with DMF 7

Distributed File Systems Part IV. Hierarchical Mass Storage Systems

A GPFS Primer October 2005

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties

EMC CLARiiON Backup Storage Solutions

StorNext 3.0 Product Update: Server and Storage Virtualization with StorNext and VMware

FOUR WAYS TO LOWER THE COST OF REPLICATION

designed. engineered. results. Parallel DMF

Simplified and Consolidated Parallel Media File System Solution

Backup and archiving need not to create headaches new pain relievers are around

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

S SNIA Storage Networking Management & Administration

Veritas NetBackup Appliance Family OVERVIEW BROCHURE

IBM Storwize V7000 Unified

Optimizing Quality of Service with SAP HANA on Power Rapid Cold Start

Parallel File Systems Compared

Challenges in making Lustre systems reliable

What's in this guide... 4 Documents related to NetBackup in highly available environments... 5

How to recover a failed Storage Spaces

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Storage + VDI: the results speak for themselves at Nuance Communications

DASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

Dell EMC CIFS-ECS Tool

Storage Technology Requirements of the NCAR Mass Storage System

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

BaBar Computing: Technologies and Costs

Performance Benchmark of the SGI TP9400 RAID Storage Array

4. Are you only looking for a D/R as a service target for the VMware and IBM systems? Yes

Installing Acronis Backup Advanced Edition

Rapid database cloning using SMU and ZFS Storage Appliance How Exalogic tooling can help

An ESS implementation in a Tier 1 HPC Centre

IBM TotalStorage Enterprise Storage Server (ESS) Model 750

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

CRAY X1 System Administration UNICOS/mp 2.4 Update

Mass Storage at the PSC

Exam Objectives for MCSA Installation, Storage, and Compute with Windows Server 2016

High-Energy Physics Data-Storage Challenges

Driving Data Warehousing with iomemory

CA485 Ray Walshe Google File System

GFS: The Google File System. Dr. Yingwu Zhu

COPYRIGHTED MATERIAL. Windows Server 2008 Storage Services. Chapter. in this chapter:

Storage Systems. Storage Systems

Performance of relational database management

The Microsoft Large Mailbox Vision

HP D2D & STOREONCE OVERVIEW

An Introduction to GPFS

Free up rack space by replacing old servers and storage

Overcoming Obstacles to Petabyte Archives

KillTest. 半年免费更新服务

Exam : Title : Storage Sales V2. Version : Demo

1Z0-433

Parallel File Systems for HPC

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Coordinating Parallel HSM in Object-based Cluster Filesystems

Local and Remote Data Protection for Microsoft Exchange Server 2007

System input-output, performance aspects March 2009 Guy Chesnot

Data Archiving Using Enhanced MAID

Chapter 10: Mass-Storage Systems

User Services. Chuck Niggley Computer Sciences Corp. NASA Ames Research Center CUG Summit 2002 May 21, CUG Summit 2002, Manchester, England

XenData Product Brief: SX-550 Series Servers for LTO Archives

Real-time Protection for Microsoft Hyper-V

IBM TotalStorage Enterprise Storage Server Model 800

Automated Storage Tiering on Infortrend s ESVA Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

AUTOMATED RESTORE TESTING FOR TIVOLI STORAGE MANAGER

Storage Area Networks: Performance and Security

IBM Tivoli Storage Manager V5.1 Implementation Exam # Sample Test

VERITAS Volume Manager for Windows 2000

Parallel File Systems. John White Lawrence Berkeley National Lab

Continuous data protection. PowerVault DL Backup to Disk Appliance

Rio-2 Hybrid Backup Server

EMC Disk Library Automated Tape Caching Feature

IBM System Storage DS5020 Express

Chapter 11: Implementing File

The Modern Virtualized Data Center

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Architecting For Availability, Performance & Networking With ScaleIO

The advantages of architecting an open iscsi SAN

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Introduction to High Performance Parallel I/O

VERITAS Volume Replicator. Successful Replication and Disaster Recovery

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

HCI: Hyper-Converged Infrastructure

Replicating Mainframe Tape Data for DR Best Practices

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

SGI InfiniteStorage Data Migration Facility (DMF) Virtualizing Storage to Enable Active Archiving

Storage Devices for Database Systems

Oracle EXAM - 1Z Oracle Exadata Database Machine Administration, Software Release 11.x Exam. Buy Full Product

Using QNAP Local and Remote Snapshot To Fully Protect Your Data

Install ISE on a VMware Virtual Machine

Transcription:

X1 StorNext SAN Jim Glidewell Information Technology Services Boeing Shared Services Group

Current HPC Systems Page 2 Two Cray T-90's A 384 CPU Origin 3800 Three 256-CPU Linux clusters Cray X1 X1 SAN - Cray User Group - May 2004

Longtime DMF Site Started using DMF in 1990 on Cray systems 16 EScon STK 9840A drives shared between two T-90s Past drives include 4490, 4490E, and Timberline 37 terabytes of migrated T-90 data Using DMF since 1998 on SGI Origins Six fiber-channel STK 9840B Previously used Redwood drives 38 terabytes of migrated data Page 3 X1 SAN - Cray User Group - May 2004

Hardware Fully configured single chassis X1 (64 MSPs, 512GB) 26 TB of LSI RAID disk 7TB currently configured into SAN Roughly 1600MB/sec total bandwidth across all controllers Three Storagetek 9310 Powderhorn silos Using ACSLS 7.0 Serving two T-90s, Origin 3800, Linux cluster, and X1 ADIC SNMS SAN Hosted on Dell 2650 servers running Linux Six fiber-channel STK 9840C drives STK drives selected for compatibility with other services in the datacenter Chose 9840C over 9940C for faster load time ADIC tape libraries used at other Boeing locations Page 4 X1 SAN - Cray User Group - May 2004

SAN Hardware Page 5 X1 SAN - Cray User Group - May 2004

Basic StorNext SAN Architecture Page 6 X1 SAN - Cray User Group - May 2004

X1 Node OS X1 Node X1 Node... IOD 0 IOD 1 IOD 2 CNS-2 #1 CNS-2 #2 Boeing SAN Cabling A0 B0 A0 B0 A0 B0 RS200s A2 B2 A2 B2 A2 B2 A2 B2 CPES 9310 to Cluster SAN 12 SFM0 000d6 40 ports SFM1 000d7 40 ports O3000 SN Ethernet ACSLS FSS0 FSS1 Customer Net 3d06 RS200 A2 B2 3d01 RS200 A2 B2 Customer Net SN Ethernet X1 Node Test IOD 3 CNS-2 #3 A0 B0 X1-JS RS200 A2 B2 12 to Cluster SAN ACSLS Customer Network StorNext network SCSI/FC IP/FC Page 7 X1 SAN - Cray User Group - May 2004

SAN Cabling During Installation Page 8 X1 SAN - Cray User Group - May 2004

Terminology SNMS - StorNext Management Suite Manages migration, backup, monitors space, etc. SNFS -StorNext File System Journaled file system - server and client Appears as local file system to host Division between SNMS and SNFS seems fuzzy SNMS is essential for a functioning file system Terms such as migrated, online, offline, etc. are same as DMF When files go offline, they are truncated by SNMS Page 9 X1 SAN - Cray User Group - May 2004

Setup and Configuration Goal was maximal performance with heterogeneous mix of file sizes, data access, and access patterns We did not have adequate guidance during our initial attempts at configuration Our original plan had over 20 filesystems defined We were told after the machine arrived that a reasonable limit was 8, four if failover was desired Setup took months - even with a Cray SAN analyst on site most of the time A permanent Cray test bed would have been useful Tuning guides, both general and client specific, would be helpful Training was adequate, but not in the depth we felt we needed Initial performance was less than expected Page 10 X1 SAN - Cray User Group - May 2004

Performance Performance (especially writes) less than expected Much lower than same disks directly attached Appears to be an I/O bottleneck Mount options have hidden some of the performance problems Same mount options on Linux made performance worse... X1 client very slow to remove large number of files Other clients do not have this problem Deviated from Cray standard configuration to improve performance Enabled CTQ (Command Tag Queueing) Turned on write caching on LSI RAID controllers Page 11 X1 SAN - Cray User Group - May 2004

SAN vs. Direct Attached Disk Reads Writes Direct SAN Direct SAN cp 130 MB/s 250 MB/s 120 MB/s 30 MB/s cvcp 367 MB/s 175 MB/s 300 MB/s 97 MB/s Page 12 X1 SAN - Cray User Group - May 2004

Operational Issues Not as resilient as we had hoped SNMS needs to be running for mv's to work Recycling the MSM component may abort outstanding requests Huge log files can fill /usr, requiring an eventual reboot The SNMS <->ACSLS (STK library) interface needs work in order to meet our needs SNMS wants to control entire library (which is shared...) Disaster recovery should not require whole library to be audited Tape mounts pause as we enter tapes into the library Still trying to understand the process of adding/removing tapes Page 13 X1 SAN - Cray User Group - May 2004

Operational Issues continued fsmedcopy is slow and awkward More testing of this feature is needed If file system server rebooted, clients sometimes fail to recover Remount the SAN filesystem Reboot the host system (!) Tested many failure modes before production Long list of SAN-related SPRs Page 14 X1 SAN - Cray User Group - May 2004

User Interface Issues No dmget equivalent on host This could be a significant problem in the future Request for this functionality has been made to ADIC GUI access is preferred method of user control Slow response to metadata actions Page 15 X1 SAN - Cray User Group - May 2004

Production Experience So Far SAN production started April 25th No service interruptions in the first two weeks of production Most of our users never noticed A good sign in a transition of this type No user complaints Despite some operational issues, StorNext provides valuable features for our site Ability to manage data in excess of our physical disk capacity Trickle backup Option of sharing file systems between heterogeneous hosts Familiar DMF-like characteristics Support for duplicate copies of backup media Cray and ADIC support was a key factor in our decision to move forward with production Page 16 X1 SAN - Cray User Group - May 2004

Linux server platform Linux, as a StorNext server platform, appears immature for Boeing needs StorNext only recently released for Linux Linux I/O is somewhat primitive for the needs of a StorNext file server Limited number of failover paths (max_scsi_lun) Good long-term strategy Page 17 X1 SAN - Cray User Group - May 2004

Conclusions Setup was significantly more difficult than expected For both Cray and us Took months instead of weeks The lack of a StorNext test bed had serious impacts Earlier direct involvement by ADIC would have been beneficial Lessons learned at our site are likely to benefit others Boeing would have benefited from a more "turnkey" solution to managed storage Performance issues (particularly writes) need to be addressed for our environment Linux as a server platform is in line with long-term HPC strategy, but is still immature StorNext appears to be a solid product, but needs more polish for our needs Ease of setup HPC features (dmget, ACLs, etc.) Handling of exceptions We have been in production since April 25th Cray has made significant efforts to make this installation succeed Page 18 X1 SAN - Cray User Group - May 2004

Coming soon Page 19 X1 SAN - Cray User Group - May 2004