GPFS Best Practices. "If you can do it, it ain't braggin." Raymond L. Paden, Ph.D. HPC Technical Architect Deep Computing. 26 Oct 09 version 1.2.

Similar documents
DELL PowerVault MD3200/MD3220 Series of Storage Arrays. A Dell Transition Guide Version 1.0

Introduction to Mindjet on-premise

TPP: Date: October, 2012 Product: ShoreTel PathSolutions System version: ShoreTel 13.x

HP Server Virtualization Solution Planning & Design

Service Level Agreement

Product Release Notes

Overview of Data Furnisher Batch Processing

Hitachi Server Adapter for the SAP HANA Cockpit

Milestone Solution Partner IT Infrastructure Components Certification Summary

DELL EMC VxRAIL vcenter SERVER PLANNING GUIDE

Service Level Agreement

Application Notes for Stratus ftserver 6310 with VMWare and Avaya Aura Contact Center Release 6.2 Issue 1.0

This document lists hardware and software requirements for Connected Backup

Dynamic Storage (ECS)

Admin Report Kit for Exchange Server

Performance of VSA in VMware vsphere 5

Log shipping is a HA option. Log shipping ensures that log backups from Primary are

Quick Guide on implementing SQL Manage for SAP Business One

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

Managed Infrastructure SLA

Performance of usage of MindSphere depends on the bandwidth of your internet connection.

UPGRADING TO DISCOVERY 2005

CONTROL-COMMAND. Software Technical Specifications for ThomX Suppliers 1.INTRODUCTION TECHNICAL REQUIREMENTS... 2

FLEXPOD A Scale-Out Converged System for the Next-Generation Data Center

Universal CMDB. Software Version: Backup and Recovery Guide

CMC Blade BIOS Profile Cloning

Date: October User guide. Integration through ONVIF driver. Partner Self-test. Prepared By: Devices & Integrations Team, Milestone Systems

Apache Solr for FSI SERVER. User Manual. Version 4.5

SAP Business One Hardware Requirements Guide

CodeSlice. o Software Requirements. o Features. View CodeSlice Live Documentation

Please contact technical support if you have questions about the directory that your organization uses for user management.

Managed Infrastructure SLA

USO RESTRITO. SNMP Agent. Functional Description and Specifications Version: 1.1 March 20, 2015

Contents: Module. Objectives. Lesson 1: Lesson 2: appropriately. As benefit of good. with almost any planning. it places on the.

RELEASE NOTES. HYCU Data Protection for Nutanix

Oracle Database 11g Replay: The In-built Recorder for Real Application Testing

September 24, Release Notes

App Center User Experience Guidelines for Apps for Me

Integrating QuickBooks with TimePro

Xilinx Answer Xilinx PCI Express DMA Drivers and Software Guide

Definiens XD Release Notes

Level 2 Development Training

CaseWare Working Papers. Data Store user guide

CSC IT practix Recommendations

System Requirements for SurveyTracker Plus 6.0

App Orchestration 2.6

Product Release Notes

Dell Chassis Management Controller (CMC) Version 1.35 for Dell PowerEdge VRTX. Release Notes

Dell EqualLogic PS Series Arrays: Expanding Windows Basic Disk Partitions

Xerox Security Bulletin XRX12-007

SAS Viya 3.2 Administration: Mobile Devices

Demand Forecasting. For. Microsoft Dynamics 365 for Operations. Technical Guide. Release 7.1. December 2017

Element Creator for Enterprise Architect

HPE BladeSystem c3000 Tower Enclosure Quick Setup Instructions

Hitachi Storage Adapter for the SAP HANA Cockpit

TMS myclouddata SDK DEVELOPERS GUIDE

Infrastructure Series

SOLA and Lifecycle Manager Integration Guide

Product Release Notes

Dell Wyse Device Manager (WDM)

HP Universal CMDB. Software Version: Backup and Recovery Guide

Technical Paper. Installing and Configuring SAS Environment Manager in a SAS Grid Environment with a Shared Configuration Directory

Performance and Scalability Benchmark: Siebel CRM Release 7.7 Industry Applications on IBM eserver BladeCenter and IBM DB2 UDB on eserver p5 550

An Introduction to Crescendo s Maestro Application Delivery Platform

DIVAR IP 3000 Field Installation Guide

CCNA 3 Chapter 2 v5.0 Exam Answers 2015 (100%)

Chapter 14. Basic Planning Methodology

2. What is the most cost-effective method of solving interface congestion that is caused by a high level of traffic between two switches?

HP ExpertOne. HP2-T21: Administering HP Server Solutions. Table of Contents

Max 8/16 and T1/E1 Gateway, Version FAQs

Announcing Veco AuditMate from Eurolink Technology Ltd

Cisco Tetration Analytics, Release , Release Notes

Lytec MD 2011 Technical Requirements

Additional License Authorizations

High Security SaaS Concept Software as a Service (SaaS) for Life Science

Software Usage Policy Template

SPC BENCHMARK 2 EXECUTIVE SUMMARY VEXATA INC. VX100-F SCALABLE NVME FLASH ARRAY SPC-2 V1.7.0 SUBMITTED FOR REVIEW: AUGUST 29, 2018

Campuses that access the SFS nvision Windows-based client need to allow outbound traffic to:

EView/400i Management Pack for Systems Center Operations Manager (SCOM)

Enterprise Chat and Developer s Guide to Web Service APIs for Chat, Release 11.6(1)

CommandCenter Secure Gateway Release Virtual CC

Technical Paper. Installing and Configuring SAS Environment Manager in a SAS Grid Environment

Dear Milestone Customer,

Performance and Scalability Benchmark: Siebel CRM Release 7.7 Industry Applications on IBM eserver p690 and IBM DB2 UDB on eserver p5 570

CROWNPEAK DESKTOP CONNECTION (CDC) INSTALLATION GUIDE VERSION 2.0

ABELDent Platform Setup Conventions

Dashboard Extension for Enterprise Architect

25years. MQ High Availability. Celebrating. David Ware IBM MQ Chief Architect. MQ Technical Conference v

Avigilon Control Center Server User Guide. Version 6.8

Element Creator for Enterprise Architect

NiceLabel LMS. Installation Guide for Single Server Deployment. Rev-1702 NiceLabel

AFX CABINET SYSTEMS. AFX Series. 18 kva/kw to 150 kva/kw. DC, Hz Vac L-N / Vac L-L / 0 - ± 425 Vdc Bipolar

Intel Volume Management Device LED Management Tool Release Notes and Guide

NCTA-Certified Cloud Technologist (NCT) Exam NCT-110

FACE Trademark Usage Guidelines & Copyright Permissions

BIOS Update Release Notes

Overview. Features. Intel Media SDK 2014 Audio Library has API version 1.8. The following audio formats are supported: Decoding

Performance and Scalability Benchmark: Siebel CRM Release 7.7 Industry Applications on HP ProLiant Server and Microsoft SQL Server 2005

Data Center Advantage Incentive Program Distribution

HPE LoadRunner Best Practices Series. LoadRunner Upgrade Best Practices

Transcription:

Best Practices "If yu can d it, it ain't braggin." Raymnd L. Paden, Ph.D. HPC Technical Architect Deep Cmputing 26 Oct 09 versin 1.2.0 Dizzy Dean raypaden@us.ibm.cm 877-669-1853 512-858-4261

Special Ntices frm IBM Legal This presentatin was prduced in the United States. IBM may nt ffer the prducts, prgrams, services r features discussed herein in ther cuntries, and the infrmatin may be subject t change withut ntice. Cnsult yur lcal IBM business cntact fr infrmatin n the prducts, prgrams, services, and features available in yur area. Any reference t an IBM prduct, prgram, service r feature is nt intended t state r imply that nly IBM's prduct, prgram, service r feature may be used. Any functinally equivalent prduct, prgram, service r feature that des nt infringe n any f IBM's intellectual prperty rights may be used instead f the IBM prduct, prgram, service r feature. Infrmatin in this presentatin cncerning nn-ibm prducts was btained frm the suppliers f these prducts, published annuncement material r ther publicly available surces. Surces fr nn-ibm list prices and perfrmance numbers are taken frm publicly available infrmatin including D.H. Brwn, vendr annuncements, vendr WWW Hme Pages, SPEC Hme Page, GPC (Graphics Prcessing Cuncil) Hme Page and TPC (Transactin Prcessing Perfrmance Cuncil) Hme Page. IBM has nt tested these prducts and cannt cnfirm the accuracy f perfrmance, cmpatibility r any ther claims related t nn-ibm prducts. Questins n the capabilities f nn-ibm prducts shuld be addressed t the suppliers f thse prducts. IBM may have patents r pending patent applicatins cvering subject matter in this presentatin. The furnishing f this presentatin des nt give yu any license t these patents. Send license inquires, in writing, t IBM Directr f Licensing, IBM Crpratin, New Castle Drive, Armnk, NY 10504-1785 USA. All statements regarding IBM's future directin and intent are subject t change r withdrawal withut ntice, and represent gals and bjectives nly. Cntact yur lcal IBM ffice r IBM authrized reseller fr the full text f a specific Statement f General Directin. The infrmatin cntained in this presentatin has nt been submitted t any frmal IBM test and is distributed "AS IS". While each item may have been reviewed by IBM fr accuracy in a specific situatin, there is n guarantee that the same r similar results will be btained elsewhere. The use f this infrmatin r the implementatin f any techniques described herein is a custmer respnsibility and depends n the custmer's ability t evaluate and integrate them int the custmer's peratinal envirnment. Custmers attempting t adapt these techniques t their wn envirnments d s at their wn risk. IBM is nt respnsible fr printing errrs in this presentatin that result in pricing r infrmatin inaccuracies. The infrmatin cntained in this presentatin represents the current views f IBM n the issues discussed as f the date f publicatin. IBM cannt guarantee the accuracy f any infrmatin presented after the date f publicatin. IBM prducts are manufactured frm new parts, r new and serviceable used parts. Regardless, ur warranty terms apply. Any perfrmance data cntained in this presentatin was determined in a cntrlled envirnment. Therefre, the results btained in ther perating envirnments may vary significantly. Sme measurements quted in this presentatin may have been made n develpment-level systems. There is n guarantee these measurements will be the same n generally-available systems. Sme measurements quted in this presentatin may have been estimated thrugh extraplatin. Actual results may vary. Users f this presentatin shuld verify the applicable data fr their specific envirnment. Micrsft, Windws, Windws NT and the Windws lg are registered trademarks f Micrsft Crpratin in the United States and/r ther cuntries. UNIX is a registered trademark in the United States and ther cuntries licensed exclusively thrugh The Open Grup. LINUX is a registered trademark f Linus Trvalds. Intel and Pentium are registered trademarks and MMX,Itanium, Pentium II Xen and Pentium III Xen are trademarks f Intel Crpratin in the United States and/r ther cuntries. Other cmpany, prduct and service names may be trademarks r service marks f thers.

Sme thing Old Sme thing New What is? General Parallel File System All f 's rivals d sme f these things, nne f them d all f them! General: supprts wide range f applicatins and cnfiguratins Cluster: frm large (4000+ in a multi-cluster) t small (nly 1 nde) clusters Parallel: user data and metadata flws between all ndes and all disks in parallel HPC: supprts high perfrmance applicatins Flexible: tuning parameters allw t be adapted t many envirnments Capacity: frm high (4+ PB) t lw capacity (nly 1 disk) Glbal: Wrks acrss multiple ndes, clusters and labs (i.e., LAN, SAN, WAN) Hetergeneus: Native n AIX, Linux, Windws as well as NFS and CIFS Wrks with almst any blck strage device Shared disk: all user and meta data are accessible frm any disk t any nde RAS: reliability, accessibility, serviceability Ease f use: is nt a black bx, yet it is relatively easy t use and manage Basic file system features: POSIX API, jurnaling, bth parallel and nn-parallel access Advanced features: ILM, integrated with tape, disaster recvery, SNMP, snapshts, rbust NFS supprt, hints

What is? Typical Example Aggregate Perfrmance and Capacity Data rate: streaming rate < 5 GB/s, 4 KB transactin rate < 40,000 IOP/s Usable capacity < 240 TB IB LAN* x3550-01 x3550-02 x3550-03 x3550-04 x3550-05 x3550-06 x3550-07 x3550-08 x3550-09 x3550-10 x3550-11 x3550-12 x3550-13 x3550-14 x3550-15 x3550-16 x3550-17 x3550-18 x3550-19 x3550-33 x3550-34 x3550-35 x3550-36 x3550-37 x3550-38 x3550-39 x3550-40 x3550-41 x3550-42 x3550-43 x3550-44 x3550-45 x3550-46 x3550-47 x3550-48 x3550-49 x3550-50 x3550-51 Server-01 1 2 Server-02 Server-03 Server-04 1 2 IB 4xDDR IB 4xDDR IB 4xDDR IB 4xDDR 60-disk Drawer 2xFC8 2xFC8 2xFC8 2xFC8 3 4 3 4 LAN Cnfiguratin Perfrmance scales linearly in the number f strage servers Add capacity withut increasing the number f servers Add perfrmance by adding mre servers and/r strage Inexpensively scale ut the number f clients x3550-20 x3550-21 x3550-52 x3550-53 60-disk Drawer x3550-22 x3550-23 x3550-24 x3550-54 x3550-55 x3550-56 60-disk Drawer x3550-25 x3550-57 x3550-26 x3550-27 x3550-58 x3550-59 60-disk Drawer x3550-28 x3550-60 x3550-29 x3550-61 x3550-30 x3550-31 x3550-62 x3550-63 60-disk Drawer x3550-32 x3550-64 Thugh nt shwn, a cluster like this will generally include an administrative netwrk.

What is Nt is nt a client/server file system like NFS, CIFS (Samba) r AFS/DFS with a single file server. ndes can be an NFS r CIFS server, but treats them like any ther applicatin. client client client client LAN file server (e.g., NFS r Samba) client LAN client client client SAN metadata server is nt a SAN file system with dedicated metadata server. can run in a SAN file system like mde, but it des nt have a dedicated metadata server. avids the bttlenecks intrduced by centralized file and/r metadata servers.

What is Nt is nt a niche file system fr IBM system P prducts Yesterday was a parallel file system fr IBM SP systems Tday is a general purpse clustered parallel file system tunable fr many wrklads n many cnfiguratins. Winterhawk BlueGene/P P6 p595 idataplex BladeCenter/H

Aerspace and Autmtive Banking and Finance Where Is Used Tday is a mature prduct with established market presence. It has been generally available since 1998 with research develpment starting in 1991. Applicatins include... Bi-infrmatics and Life Sciences Defense Digital Media EDA (Electrnic Design Autmatin) General Business Natinal Labs Petrleum SMB (Small and Medium sized Business) Universities Weather Mdeling

SW layer in prviding a "virtual" view f a disk virtual disks which crrespnd t LUNs in the servers with a bijective mapping Lcal Area Netwrk (LAN) Architecture Clients Access Disks Thrugh the Servers via the LAN Client #1 2 LUN Lgical Unit Abstractin f a disk AIX - hdisk Linux - SCSI device LUNs map t RAID arrays in a disk cntrller r "physical disks" in a server Redundancy Each server has 2 cnnectins t the disk cntrller prviding redundancy Server #1 2 N single pints f failure primary/backup servers fr each LUN cntrller/hst cnnectin fail ver Dual RAID cntrllers L1, L2, L3 L4, L5, L6 Client #2 2 LAN Fabric (e.g., Ethernet, IB) user data, metadata, tkens, heartbeat, etc. Server #2 2 L4, L5, L6 L1, L2, L3 Client #3 2 Server #3 2 L7, L8, L9 L10, L11, L12 Strage Cntrller RAID Cntrller Client #4 RAID Cntrller A1 A4 A7 A10 A2 A5 A8 A11 A3 A6 A9 A12 2 Server #4 2 L10, L11, L12 L7, L8, L9 Client #5 2 Client #6 2 Redundancy Each LUN can have upt 8 servers. If a server fails, the next ne in the list takes ver. There are 2 servers per, a primary and backup server. SAN switch can be added if desired. Zning Zning is the prcess by which RAID sets are assigned t cntrller prts and HBAs achieves its best perfrmance by mapping each RAID array t a single LUN in the hst. Twin Tailing Fr redundancy, each RAID array is zned t appear as a LUN n 2 r mre hsts.

Strage Area Netwrk (SAN) Architecture Client/Servers Access Disk via the SAN 1 Gb/s cnnectins are sufficient. LAN Fabric (e.g., Ethernet, IB) tkens, heartbeat, etc. All ndes act bth as client and server. All LUNs are munted n all ndes. Client/Server #1 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 Client/Server #2 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 Client/Server #3 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 Client/Server #4 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 Client/Server #5 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 Client/Server #6 2 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 is nt a SAN file system; it merely can run in a SAN centric mde. Multiple HBAs increase redundancy and cst. N single pints f failure All LUNs munted n all ndes SAN cnnectin (FC r IB) fail ver Dual RAID cntrllers SAN Fabric (FC r IB) user data, metadata Strage Cntrller RAID Cntrller RAID Cntrller A1 A4 A7 A10 A2 A5 A8 A11 A3 A6 A9 A12 Zning maps all RAID sets (LUNs) t all ndes. CAUTION: Nt recmmended fr SANs with > 32 hst prts. Scaling beynd this requires special tuning (e.g., set queue depth very small).

Best Practices The fllwing pages prvide examples f what are and what are nt best practices. The examples are based n idataplex slutins using currently available strage slutins. The principles mtivating these examples can be extended t ther server technlgies (e.g., rack ptimized, blade, System X, System P, and s frth).

Recmmended Strage Slutins Fr reasns f best practices, the DS5300 and DCS9900 are generally the preferred HPC strage slutins. Fr custmers with smaller capacity and/r lwer perfrmance requirements that unlikely t grw ver time t larger/faster slutins, the DS3200, DS3400, DS5020 and DS5100 are acceptable slutins. The DS8300, XIV r the 3U idx strage servers shuld nt in general be used fr HPC slutins. COMMENT: The distinctin between large and small capacity r high and lw perfrmance is nt exact. Hwever, as a guideline, if greater than 240 SATA disks r 128 FC r SAS disks are needed, r if greater than 3 GB/s is needed, then a DS5300 r DCS9900 is recmmended.

Building Blcks A cnvenient design strategy fr slutins is t define a "strage building blck", which is the "smallest" increment f strage and servers by which a strage system can grw. Therefre, a strage slutin cnsists f 1 r mre strage building blcks. This allws custmers t cnveniently expand their strage slutin in increments f strage building blcks (i.e., "build as yu grw" strategy) This slutin is made feasible since scales linearly in the number f disks, strage cntrllers, servers, clients, and s frth.

Using in idataplex as an IB/LAN File System External Servers Blcking factr ~= 3:1 Cmpute Ndes dx340 r dx360 IB LAN 1 2 DCS9900 Advantage: Since the servers are external t the idx frames, they are nt limited by the blcking factr. 1 2 : IB LAN via RDMA DCS9900: FC8 Hst Interface 3 4 3 4 5 x 60-disk Drawers SATA Disk 300 x disks (1 TB) 30 x 8+2P RAID 6 The FC8 hst cnnectins culd be replaced with IB hst cnnectins. In that case, the DCS9900 culd even be attached t the IB LAN, but that nly increases the IB switch prt cunt with little added benefit. FC8 FC8 Analysis 84 ndes (1 frame) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate per nde 500 t 1500 MB/s depending n blcking average rate per nde write < 64 MB/s read < 40 MB/s 336 ndes (4 frames) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate fr per nde 500 t 1500 MB/s depending n blcking average rate per nde write < 16 MB/s read < 10 MB/s Capacity raw < 300 TB usable < 240 TB

Using in idataplex as an IB/LAN File System Internal Servers Lk! Yu may need t reduce the blcking factr r eliminate it. Blcking factr ~= 2:1 Cmpute Ndes dx340 r dx360 nsd server nsd server IB LAN I/O Ndes dx340 r dx360 M2 nsd server nsd server Limitatin: The blcking factr may limit strage server perfrmance. DCS9900: IB Hst Interface via SRP 1 2 1 2 DCS9900 3 4 3 4 5 x 60-disk Drawers SATA Disk 300 x disks 30 x 8+2P RAID 6 Caveats and Warnings: idx I/O ndes have a limited number f prts. D nt verlad the IB switch ASICs by attaching the server IB cnnectins t the same line card. If the message passing traffic is heavy, the blcking factr may limit server perfrmance; in the wrst case, it may be necessary t reduce r eliminate the blcking factr. Alternative Slutin: Use the FC8 DCS9900 hst interfaces with 2xFC8 adapters in the servers. This will bypass the ASIC and blcking factr issues. Analysis 84 ndes (1 frame) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate per nde 750 t 1500 MB/s depending n blcking average rate per nde write < 64 MB/s read < 40 MB/s 336 ndes (4 frames) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate per nde 750 t 1500 MB/s depending n blcking average rate per nde write < 16 MB/s read < 10 MB/s Capacity raw < 300 TB usable < 240 TB

Using in idataplex as a Ethernet/LAN File System External Servers Blcking factr ~= 2:1 Cmpute Ndes dx320 Ethernet LAN Advantage: Since the servers are external t the idx frames, they are nt limited by the blcking factr. : LAN DS5300: FC8 Hst Interface DS5300 8 x EXP5000 15Krpm x FC Disk 128 x disks (450 TB) 24 x 4+P RAID 5 FC8 FC8 Analysis 84 ndes (1 frame) aggregate rates write < 4.5 GB/s read < 5.5 GB/s peak rate per nde < 80 MB/s average rate per nde < 33 MB/s can nt fully utilize cntrller BW* 336 ndes (4 frames) aggregate rates write < 4.5 GB/s read < 5.5 GB/s peak rate per nde < 80 MB/s average rate per nde write < 13 MB/s read < 16 MB/s Capacity raw < 56 TB usable < 42 TB The DS5300 can deliver up t 5.5 GB/s, but with nly 4 uplinks, nly 1400 MB/s can be used assuming 50% f the uplinks are devted t. At least 4 frames are needed fr this slutin t be meaningful. Alternative Slutin: Use 4 servers each with a 2x; in thery this shuld wrk, but in practice lad balancing ver bnded prts is difficult t manage.

Using in idataplex as a Ethernet/LAN File System Internal Servers Blcking factr ~= 2:1 Cmpute Ndes dx320 Ethernet LAN I/O Ndes dx340 r dx360 M2 Limitatin: Must use I/O ndes in a nnstandard way t avid blcking factr limitatins. Lk! Bypass the BNT switches. FC8 DS5300: FC8 Hst Interface DS5300 8 x EXP5000 15Krpm x FC Disk 128 x disks (450 TB) 24 x 4+P RAID 5 Caveat: Bypassing the BNT switch avids the bandwidth restrictins caused by the blcking factr. Alternative Slutin: Use 4 servers each with a 2x; in thery this shuld wrk, but in practice lad balancing ver bnded prts is difficult t manage. Analysis 84 ndes (1 frame) aggregate rates write < 4.5 GB/s read < 5.5 GB/s peak rate per nde < 80 MB/s average rate per nde < 33 MB/s can nt fully utilize cntrller BW 336 ndes (4 frames) aggregate rates write < 4.5 GB/s read < 5.5 GB/s peak rate per nde < 80 MB/s average rate per nde write < 13 MB/s read < 16 MB/s Capacity raw < 56 TB usable < 42 TB The DS5300 can deliver up t 5.5 GB/s, but with nly 4 uplinks, nly 1400 MB/s can be used assuming 50% f the uplinks are devted t. At least 4 frames are needed fr this slutin t be meaningful.

Using in idataplex as a SAN File System Blcking factr ~= 3:1 Cmpute Ndes dx340 r dx360 IB LAN Limitatin: Must reduce the queue depth when scaling ut the size f the SAN. May require ther special tuning t guarantee stability. DCS9900: IB Hst Interface via SRP 1 2 1 2 DCS9900 3 4 3 4 5 x 60-disk Drawers SATA Disk 300 x disks 30 x 8+2P RAID 6 Caveats and Warnings: D nt scale this slutin beynd 3 frames (252 ndes) withut assistance frm the develpment team. (Currently the largest SAN in prductin is 256 ndes). Reduce the queue depth n clusters with mre than 32 SAN attachments (e.g., set the queue depth t 1 r 2 n a cluster with 256 ndes). Analysis 84 ndes (1 frame) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate per nde 500 t 1500 MB/s depends n impact f blcking average rate per nde write < 64 MB/s read < 40 MB/s 252 ndes (3 frames) aggregate rates write < 5.4 GB/s read < 3.6 GB/s peak rate per nde 500 t 1500 MB/s depends n impact f blcking average rate per nde write < 21 MB/s read < 14 MB/s Capacity raw < 300 TB usable < 240 TB

Using idataplex Strage Ndes as Servers Blcking factr ~= 3:1 Cmpute Ndes dx340 r dx360 IB LAN Strage Ndes dx340 r dx360 M2 Limitatins: Single pint f failure design Blcking factr may limit strage server perfrmance. Analysis 84 ndes (1 frame) aggregate rates* write < 800 MB/s read < 1200 MB/s peak rate per nde 750 t 1500 MB/s depends n impact f blcking average rate per nde write < 10 MB/s read < 15 MB/s Capacity SAS: 450 GB/disk raw < 11 TB usable (n mirrring) < 7 TB usable (mirrring) < 3.5 TB Analysis 336 ndes (4 frames) Assume 2 strage ndes per frame aggregate rates* write < 3.0 GB/s read < 4.5 GB/s peak rate per nde 750 t 1500 MB/s depends n impact f blcking average rate per nde write < 10 MB/s read < 15 MB/s Capacity SAS: 450 GB/disk raw < 44 TB usable (n mirrring) < 28 TB usable (mirrring) < 14 TB Based n the fllwing data rates fr a 4+P RAID 5 SAS LUN write < 200 MB/s read < 300 MB/s Warnings: idx 3U strage ndes are a single pint f failure risk expsure spanning a file system acrss multiple strage ndes increases the risk this is an issue fr all file systems and is nt cnsidered a best practice If this must be dne, use mirrring and strage pls t manage the risk. D nt use 5+P RAID 5 r 10+2P RAID 6 cnfiguratins has nt been tested with ServeRAID cntrllers

DS3000 Series DS3200 DS3400 3-Gbps SAS cnnect t hst Direct-attach Fr System x 2U, 12 disks Dual Pwer Supplies Supprt fr SAS disks r SATA disks Expansin via Starting under $4,500 US 4-Gbps Fibre cnnect t hst Direct-attach r SAN Fr System x & BladeCenters 2U, 12 disks Dual Pwer Supplies Supprt fr SAS disks r SATA disks Expansin via Starting under $6,500 US WARNINGS: DS3000 cntrllers are ideal fr smaller strage cnfiguratins (n.b., d nt exceed 4 x DS3400s within a single cluster). Als, they d nt make gd metadata stres fr.

DS3400 Small Perfrmance Optimized Slutin with Maximum Scaling Server-01 8 cres, 6 DIMMs 2 x FC8 Cntrller-A Cntrller-B DS3400-01 12 x 15 Krpm SAS disks (450 GB/disk) Server-02 8 cres, 6 DIMMs Server-03 8 cres, 6 DIMMs Server-04 8 cres, 6 DIMMs 2 x FC8 2 x FC8 2 x FC8 COMMENT: Using 2xFC8 per server instead 4xFC4 per server with a SAN switch simplifies cabling. 10 x 15 Krpm SAS disks (450 GB/disk) Cntrller-A Cntrller-B DS3400-02 12 x 15 Krpm SAS disks (450 GB/disk) 10 x 15 Krpm SAS disks (450 GB/disk) Ethernet Switch FC8 SAN Switch (24 prts) FC4 Cntrller-A Cntrller-B DS3400-03 12 x 15 Krpm SAS disks (450 GB/disk) Aggregate Capacity and Perfrmance Capacity 88 disks @ 450 GB/disk raw < 38 TB, usuable < 28 TB includes 4 ht spares per DS3400 This is excessive, but there are nly 4 ht spares per DS3400 Perfrmance streaming: write < 2500 MB/s, read < 3000 MB/s IOP: write < 35,000 IOP/s, read < 40,000 IOP/s 10 x 15 Krpm SAS disks (450 GB/disk) Cntrller-A Cntrller-B DS3400-04 12 x 15 Krpm SAS disks (450 GB/disk) 10 x 15 Krpm SAS disks (450 GB/disk) WARNING: Scaling beynd 4 x DS3400s is nt recmmended because RAID rebuilds ver multiple DS3400s significantly impede perfrmance. If scaling beynd this is required, then deply multiple file systems r strage pls t limit the impact f RAID rebuilds.

DS3400 Small Capacity Optimized Slutin with Maximum Scaling WARNING: Scaling beynd 4 x DS3400s is nt recmmended because RAID rebuilds ver multiple DS3400s significantly impede perfrmance. If scaling beynd this is required, then deply multiple file systems r strage pls t limit the impact f RAID rebuilds. E t h e r n e t L A N Server-01 8 cres, 6 DIMMs Server-02 8 cres, 6 DIMMs Server-03 8 cres, 6 DIMMs Server-04 8 cres, 6 DIMMs 2 x FC4 2 x FC4 2 x FC4 2 x FC4 2 x FC4 2 x FC4 2 x FC4 2 x FC4 FC4 t DS3400 nt shwn. Cntrller-A DS3400-04 12 x SATA disks (1 TB/disk) 12 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) Cntrller-B FC4 FC4 FC4 Cntrller-A DS3400-04 12 x SATA disks (1 TB/disk) Cntrller-B Cntrller-A DS3400-04 12 x SATA disks (1 TB/disk) Cntrller-B Cntrller-A DS3400-04 12 x SATA disks (1 TB/disk) Cntrller-B 12 x SATA disks (1 TB/disk) 12 x SATA disks (1 TB/disk) 12 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) 10 x SATA disks (1 TB/disk) Server: 8 Cres, 6 GB RAM 2 dual-prt 8 Gb/s FC HBAs (2xFC8) at mst 1500 MB/s per adapter IB HCA (4xDDR2) at mst 1500 MB/s per HCA 10 x SATA disks (1 TB/disk) Capacity Optimized Use 4 drawers f 1 TB SATA disk per DS3400 Capacity per DS3400 42 disks @ 1 TB/disk in 8+2P RAID 6 cnfiguratin raw = 42 TB, usuable = 32 TB includes 2 ht spares 10 x SATA disks (1 TB/disk) Aggregate Capacity raw = 168 TB, usuable = 128 TB includes 8 ht spares Aggregate Perfrmance streaming rate < 3 GB/s

DS5000 Series Cntrller Supprt Mdules (Fans, pwer supplies) DS5300 mdel 1818-53A Cntrllers Intercnnect Mdule (batteries, midplane) 4u EXP5000 mdel 1818-D1A Pwer/cling Cntrllers (ESMs) 3u Drives FC/SATA 16 Disks per Disk Enclsure

DS5300 Example DS5300 Building Blck Ethernet Switch: :, : Administratin Server-01 8 cres, 6 DIMMs Server-02 8 cres, 6 DIMMs FC8 FC8 Perfrmance Analysis DS5300 streaming data rate 256 x SATA r 128 x 15Krpm disks: write < 4.5 GB/s, read < 5.5 GB/s DS5300 IOP rate 256 x SATA disks: write < 3600 IOP/s, read < 12,000 IOP/s 128 x 15Krpm disks: write < 9,000 IOP/s, read < 36,000 IOP/s ptential aggregate rate: 8 x < 5.6 GB/s 725 MB/s per is pssible, but 700 MB/s is required ptential aggregate FC8 rate: 8 x FC8 < 6.0 GB/s 780 MB/s per FC8 is pssible, but 700 MB/s is required Server-03 8 cres, 6 DIMMs Server-04 8 cres, 6 DIMMs FC8 FC8 8 x FC8 SAN switch nt required cntrller A DS5300 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 cntrller B The administrative netwrk is nt illustrated in this diagram. Server-05 8 cres, 6 DIMMs FC8 Disk Drawers ptin #1: 128 x 15Krpm FC disk ptin #2: 256 x SATA disks Server-06 8 cres, 6 DIMMs Server-07 8 cres, 6 DIMMs Server-08 8 cres, 6 DIMMs FC8 FC8 FC8 COMMENT: This is a "safe" cnfiguratin in the sense that meeting prjected perfrmance rates can reasnably be expected (n.b., there are mre than enugh servers, FC8 and prts t d the jb). If HBA failver is required, then 8 dual prt HBAs may be adpted (thereby requiring a SAN switch). If 2xFC8 adapters are adpted, then peak perfrmance can be maintained during failure cnditins.

DS5300 Benchmark Results T Be Cmpleted Detailed benchmarks will be available sn and shared with yu when they are cmpleted. Preliminary analysis based n ther benchmarks suggests the fllwing can be expected. 128 x 15Krpm FC disks @ 4+P RAID 5 (8 x EXP5000) streaming: write < 4.5 GB/s, read < 5.5 GB/s IOP rates: write < 9,000 IOP/s, read < 36,000 IOP/s 240 x SATA disks @ 8+2P RAID 6 (4 x EXP5060) streaming: write < 4.5 GB/s, read < 5.5 GB/s IOP rates: write < 3,600 IOP/s, read < 12,000 IOP/s The EXP5060 will GA by EOY 2009. It supprts up t 60 disks per 4U enclsure.

Limitatins f the DS5100 The DS5100 is intended as replacement fr the DS4800 with similar perfrmance (i.e., at mst ~= 1600 MB/s) t prvide an intermediate slutin between the DS3400/DS4700 and the DS5300. The DS5100 has an "Enhanced Perfrmance" feature that can duble its perfrmance (i.e., at mst ~= 3200 MB/s) A DS5100 with the "Enhanced Perfrmance" feature csts mre than an equivalently cnfigured DS5300.

DCS9900 4u Cuplet Frnt View 45u Cuplet "dual RAID cntrller" Disk Enclsure 2u Rear View f "Half f a Cuplet" 10 Disk Enclsures per Frame 60 Disks per Enclsure 4u per Enclsure

Ethernet Switch (Administratin) DCS9900 Example DCS9900 Building Blck Server-01 8 cres, 6 DIMMs IB 4xDDR 2xFC8 DCS9900 (2U) RAID Cntrller C1 8 x FC8 hst cnnectins 1 2 hst prts drive prts 3 4 hst prts Server-02 8 cres, 6 DIMMs Server-03 8 cres, 6 DIMMs IB 4xDDR IB 4xDDR 2xFC8 2xFC8 SAN switch nt required DCS9900 (2U) RAID Cntrller C2 1 2 hst prts drive prts 60-Bay Disk Tray (4U) 3 4 hst prts Server-04 8 cres, 6 DIMMs IB Switch () IB 4xDDR 2xFC8 The 2xFC8 HBAs can be replaced by dual prt 4xDDR IB HCAs using SRP. The IB hst prts can either be directly attached t the servers r cnnected t a dedicated IB SAN switch. It is als pssible t use an IB switch fr a cmbined LAN and SAN, but this has been discuraged in the past. As a best practice, it is nt recmmend t use an IB SAN fr mre than 32 prts. Disk trays Minimum required t saturate DCS9900 BW 60-Bay Disk Tray (4U) 160 x SAS disks: 72 TB r 300 x SATA disks: 300 TB Peak sustained DCS9900 perfrmance streaming data rate < 5.6 GB/s nncached IOP rate < 40,000 IOP/s 4xDDR IB HCA (Hst Channel Adapter) Ptential peak data rate per HCA < 1500 MB/s Required peak data rate per HCA < 1400 MB/s 2xFC8 (dual prt 8 Gbit/s Fibre Channel) Ptential peak data rate per 2xFC8 < 1560 MB/s Required peak data rate per 2xFC8 < 1400 MB/s COMMENT: Mre disks (fr a ttal f 1200) can be added t this slutin but it will nt increase perfrmance.

DCS9900 Benchmark Results Parameters blcksize(streaming) = 4096K blcksize(iop) = 256K pagepl = 1G maxmbps = 4000 DCS9900 Parameters 8+2P RAID 6 SATA cache size = 1024K cache prefetch = 0 cache writeback = ON Streaming Jb recrd size = 4M file size = 32G number f tasks = 1 t 16 access pattern = seq COMMENT The disparity between read and write perfrmance bserved belw is much less prnunced when using 15Krpm SAS drives. Fr example, using 160 SAS tiers... write ~= 5700 MB/s, read ~= 4400 MB/s This disparity can be remved using cluster blck allcatin fr SATA disk, but this nt recmmended. 4 Servers, n clients P6-p520, 4 cres, 4.2 GHz, 8 GB RAM 2xFC8 IOP Jb recrd size = 4K ttal data accessed = 10G number f tasks = 32 access pattern = small file (4K t 16K) COMMENT: Benchmarks n a system cnfigured using 160 x 15Krpm SAS drives, delivered the fllwing streaming data rates: write < 5.6 GB/s read < 4.4 GB/s Access Patt Tier 1 4 8 16 32 64 Streaming write (MB/s) 270 790 1400 2700 4800 5400 Streaming read (MB/s) 220 710 1200 1600 2900 3600 IOP write (IOP/s) 7,500 13,500 30,000 30,400 41,000 IOP read (IOP/s) 3,800 5,900 27,300 27,300 33,500

Cnclusin is a best f class prduct with gd features and brad market acceptance. Prperly used with careful design, it can prvide a best f class strage slutin. Please cntact me with questins if yu need technical assistance with.