A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Similar documents
Xyratex ClusterStor6000 & OneStor

Seagate ExaScale HPC Storage

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Refining and redefining HPC storage

Enterprise HPC storage systems

Toward Understanding Life-Long Performance of a Sonexion File System

The ClusterStor Engineered Solution for HPC. Dr. Torben Kling Petersen, PhD Principal Solutions Architect - HPC

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

High-Performance Lustre with Maximum Data Assurance

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Sonexion /900 Release Notes

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Sonexion GridRAID Characteristics

Blue Waters I/O Performance

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

SFA12KX and Lustre Update

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

LBRN - HPC systems : CCT, LSU

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

IME Infinite Memory Engine Technical Overview

Welcome! Virtual tutorial starts at 15:00 BST

Accelerating Spectrum Scale with a Intelligent IO Manager

Xyratex ClusterStor TM The Future of HPC Storage. Ken Claffey, Alan Poston & Torben Kling-Petersen. Networked Storage Solutions

Architecting a High Performance Storage System

CSCS HPC storage. Hussein N. Harake

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

2012 HPC Advisory Council

An Exploration of New Hardware Features for Lustre. Nathan Rutman

Design Concepts & Capacity Expansions of QNAP RAID 50/60

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

InfiniBand Networked Flash Storage

Isilon Performance. Name

CS143: Disks and Files

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Network Storage Appliance

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

NetApp High-Performance Storage Solution for Lustre

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

Brent Welch. Director, Architecture. Panasas Technology. HPC Advisory Council Lugano, March 2011

The Spider Center-Wide File System

An Overview of Fujitsu s Lustre Based File System

Storage for HPC, HPDA and Machine Learning (ML)

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Breakthrough Cloud Performance NeoSapphire All-Flash Arrays. AccelStor, Inc.

Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems

INFINIDAT Storage Architecture. White Paper

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

An ESS implementation in a Tier 1 HPC Centre

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Experiences with HP SFS / Lustre in HPC Production

NEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Improved Solutions for I/O Provisioning and Application Acceleration

Lustre TM. Scalability

Virtual SQL Servers. Actual Performance. 2016

Architecting Storage for Semiconductor Design: Manufacturing Preparation

ASKAP Central Processor: Design and Implementa8on

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

High Performance Storage Solutions

IBM System Storage DS8870 Release R7.3 Performance Update

1 Title. 2 Team Information. 3 Abstract. 4 Problem Statement and Hypothesis. Zest: The Maximum Reliable TBytes/sec/$ for Petascale Systems

Lustre / ZFS at Indiana University

The Oracle Database Appliance I/O and Performance Architecture

Using DDN IME for Harmonie

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Hitachi Virtual Storage Platform Family

FhGFS - Performance at the maximum

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

John Kaitschuck, Senior Staff Engineer/Technologist CSSG, June, 2017 Seagate LUG 2017 Presentation

I/O at the German Climate Computing Center (DKRZ)

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Repair Policy: Specifies how the operating system proceeds with repairing virtual disks in the specified storage

Lock Ahead: Shared File Performance Improvements

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Design and Architecture of Dell Acceleration Appliances for Database (DAAD): A Practical Approach with High Availability Guaranteed

Efficient Object Storage Journaling in a Distributed Parallel File System

JMR ELECTRONICS INC. WHITE PAPER

Demonstration Milestone for Parallel Directory Operations

Introducing Panasas ActiveStor 14

RAID Levels Table RAID LEVEL DESCRIPTION EXAMPLE

Robin Hood 2.5 on Lustre 2.5 with DNE

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Deep Learning Performance and Cost Evaluation

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Under the Hood of Oracle Database Appliance. Alex Gorbachev

Transcription:

A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC

Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2

Software Releases - Overview New in 1.4 GridRAID (formerly called PDRAID) exclusively for Cray Sonexion 1600 (until May2014) New and Improved Monitoring Dashboard High level view into the entire storage Node Status File System Throughput Inventory Top System Statistics SSU+n Systems Containing one SSU Enclosure and Up to Three ESU Enclosures The SSU+n feature, where the maximum value for n=3, whereby up to 3 Expansion Storage Units (ESUs) can be added to each SSU GUI Guest Account A built-in "guest" account for read-only access to ClusterStor Manager NIS GUI Support Added GUI support for configuring NIS as an option for Lustre users. 3

Declustered Parity RAID Geometry Data 0 Data 0 Parity 0 Data 1 Data 30 Spare 0 Data 4 Data 2 Parity 1 Data 3 Data 10 2 Spare 1 Data 1 Parity 0 Parity 81 Data 3 Data 6 Parity 0 Data 0 Data 6 Parity Data 47 Parity 5 Data 7 Data 3 Data 13 2 Parity 0 Data 1 Parity 1 Data 5 Spare 2 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Dis k6 Disk 7 Disk 8 Disk 9 PD-RAID array PD-RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives 4

ClusterStor Grid RAID Declustered Parity - Geometry PD RAID geometry for an array is defined as: - P drive (N+K+A) - example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives 5

ClusterStor OS Mandatory to effectively implement high capacity drives and solutions Feature ClusterStor Grid RAID De-clustered RAID 6: Up to 400% faster time to repair Rebuild of 6TB drive MD RAID ~ 33.3 hours Grid RAID ~ 9.5 hours Repeal Amdahl s Law: speed of a parallel system is gated by the performance of the slowest component Minimize file system fragmentation Traditional RAID Benefit Recover from a disk failure and return to full data protection faster Minimizes application impact to widely striped file performance Improved allocation and layout maximizes sequential data placement 4 to1 Reduction in Object Storage Targets Simplifies scalability challenges ClusterStor Integrated Management End to end CLI and GUI config, monitoring and management reduces Opex Grid RAID Parity Rebuild Disk Pool #1 Parity Rebuild Disk Pool #2 Parity Rebuild Disk Pool #3 Object Storage Server Parity Rebuild Disk Pool #1 Object Storage Server Parity Rebuild Disk Pool #4 6

Performance of MD-RAID (IOR) 8 000 CS6000 + MD-RAID IOR with 32x 1.8.8 client 7 000 6 000 5 000 MB/s 4 000 3 000 Read (MB/s) Write(MB/s) * 2 000 1 000 0 4 8 16 32 64 128 256 512 1024 1536 No of Client threads (32 clients) * read Direct I/O -t=32 write Buffered I/O -t=8 7

Effects of Grid-RAID (IOR Direct I/O) 8 000 CS6000 + Grid-RAID - IOR with 8x 1.8.9 clients MB/s 7 000 6 000 5 000 4 000 3 000 2 000 1 000 Read - 32/128 Write - 32/128 * - 1 4 8 16 32 64 96 No of Client threads (8 clients) * max_rpcs_in_flight = 32 max_dirty_mb = 128 8

CIFS/NFS Gateway 10/40 GbE CIFS/NFS Gateway Cluster Infiniband/40 GbE Lustre LNet Lustre Clients ClusterStor 1500/6000 9

Announcing ClusterStor 9000 - Engineered Solution Platform ClusterStor Management Unit (CMU) HA System Management Servers HA Lustre Management Servers HA Metadata Management Servers 2 x 24 port Management Network switch 2 x 36 port FDR IB or 40GbE Data Network CS9000 - Up to 50% faster MB/s Read: 12,344 Write: 12,075 MB/sec Read: 8,553 Write: 8,514 MB/sec Scalable Storage Units (SSU) 1 Storage Unit (SSU) 1 Storage Unit (SSU) + 1 Expansion Unit (ESU) 2 x HA Storage Servers 82 x SAS drives (2x OSTs) 2 x High Capacity SSDs Grid-RAID 10

Initial CS9000 Single SSU Performance Results 9000 8000 7000 6000 MB/s 5000 4000 3000 2000 1000 0 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128 Write 2439 2316 2370 3353 3310 3253 4527 4465 4451 7611 7271 7243 8494 8522 8553 7934 8015 8024 Read 3073 3117 3115 4538 4528 4533 6418 6390 6425 8243 8250 8235 8528 8534 8515 8431 8482 8452 16 client nodes -FDR IB 4 or 8 threads per node IOR parameters: Direct I/O mode, File per process, Transfer size 64MB 11

ClusterStor 6000 vs 9000 Specification ClusterStor 6000 ClusterStor 9000 Object Storage Servers Sandy Bridge 8C, 1.8GHz w/ 32GB Memory @ 1600MHz Ivy Bridge 10C, 1.9GHz w/ 32GB Memory @ 1866MHz Enclosure 5U 84 Titan RAS Enhanced 5U 84 Titan (Side Card FRU) RAID MDRAID Only Grid RAID Only Disk 4/3/2TB 4/3/2TB (6/5TB TBD Pending Avail.) SAS Lane Config. x8 SAS per 42 HDDs x12 SAS per 42 HDDs Flash Accelerator & Journals 2x 100GB SSDs (1+1) 2 x 800GB SSDs (1+1) ESU / EBOD Exp. Yes up to 3 in the field Yes up to 1 in the field IOR Performance 6GB/sec per SSU - 4TB drives 8.5GB/sec per SSU - 4TB drives. > 9GB/sec per SSU with 5+TB drives ClusterStor Mgmt Unit (CMU) MDS/MGS Nodes 2x Sandy Bridge 8C, 2.7GHz w/ 64GB Memory @ 1600MHz MGMT Nodes Single Sandy Bridge 8C, 2.7GHz w/ 32GB Memory @ 1600MHz MDS/MGS Nodes 2x Ivy Bridge 10C, 3.3GHz w/ 64GB Memory @ 1866MHz MGMT Nodes Single Ivy Bridge 8C, 2.6GHz w/ 32GB Memory @ 1600MHz 12

Management tools etc.

ClusterStor dashboard 14

The company that put ENTERPRISE into Lustre Often imitated, never beaten!!