A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

A ClusterStor update Torben Kling Petersen, PhD Principal Architect, HPC

Sonexion (ClusterStor) STILL the fastest file system on the planet!!!! Total system throughput in excess on 1.1 TB/s!! 2

Software Releases - Overview New in 1.4 GridRAID (formerly called PDRAID) exclusively for Cray Sonexion 1600 (until May2014) New and Improved Monitoring Dashboard High level view into the entire storage Node Status File System Throughput Inventory Top System Statistics SSU+n Systems Containing one SSU Enclosure and Up to Three ESU Enclosures The SSU+n feature, where the maximum value for n=3, whereby up to 3 Expansion Storage Units (ESUs) can be added to each SSU GUI Guest Account A built-in "guest" account for read-only access to ClusterStor Manager NIS GUI Support Added GUI support for configuring NIS as an option for Lustre users. 3

Declustered Parity RAID Geometry Data 0 Data 0 Parity 0 Data 1 Data 30 Spare 0 Data 4 Data 2 Parity 1 Data 3 Data 10 2 Spare 1 Data 1 Parity 0 Parity 81 Data 3 Data 6 Parity 0 Data 0 Data 6 Parity Data 47 Parity 5 Data 7 Data 3 Data 13 2 Parity 0 Data 1 Parity 1 Data 5 Spare 2 Disk 0 Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Dis k6 Disk 7 Disk 8 Disk 9 PD-RAID array PD-RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives 4

ClusterStor Grid RAID Declustered Parity - Geometry PD RAID geometry for an array is defined as: - P drive (N+K+A) - example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives 5

ClusterStor OS Mandatory to effectively implement high capacity drives and solutions Feature ClusterStor Grid RAID De-clustered RAID 6: Up to 400% faster time to repair Rebuild of 6TB drive MD RAID ~ 33.3 hours Grid RAID ~ 9.5 hours Repeal Amdahl s Law: speed of a parallel system is gated by the performance of the slowest component Minimize file system fragmentation Traditional RAID Benefit Recover from a disk failure and return to full data protection faster Minimizes application impact to widely striped file performance Improved allocation and layout maximizes sequential data placement 4 to1 Reduction in Object Storage Targets Simplifies scalability challenges ClusterStor Integrated Management End to end CLI and GUI config, monitoring and management reduces Opex Grid RAID Parity Rebuild Disk Pool #1 Parity Rebuild Disk Pool #2 Parity Rebuild Disk Pool #3 Object Storage Server Parity Rebuild Disk Pool #1 Object Storage Server Parity Rebuild Disk Pool #4 6

Performance of MD-RAID (IOR) 8 000 CS6000 + MD-RAID IOR with 32x 1.8.8 client 7 000 6 000 5 000 MB/s 4 000 3 000 Read (MB/s) Write(MB/s) * 2 000 1 000 0 4 8 16 32 64 128 256 512 1024 1536 No of Client threads (32 clients) * read Direct I/O -t=32 write Buffered I/O -t=8 7

Effects of Grid-RAID (IOR Direct I/O) 8 000 CS6000 + Grid-RAID - IOR with 8x 1.8.9 clients MB/s 7 000 6 000 5 000 4 000 3 000 2 000 1 000 Read - 32/128 Write - 32/128 * - 1 4 8 16 32 64 96 No of Client threads (8 clients) * max_rpcs_in_flight = 32 max_dirty_mb = 128 8

CIFS/NFS Gateway 10/40 GbE CIFS/NFS Gateway Cluster Infiniband/40 GbE Lustre LNet Lustre Clients ClusterStor 1500/6000 9

Announcing ClusterStor 9000 - Engineered Solution Platform ClusterStor Management Unit (CMU) HA System Management Servers HA Lustre Management Servers HA Metadata Management Servers 2 x 24 port Management Network switch 2 x 36 port FDR IB or 40GbE Data Network CS9000 - Up to 50% faster MB/s Read: 12,344 Write: 12,075 MB/sec Read: 8,553 Write: 8,514 MB/sec Scalable Storage Units (SSU) 1 Storage Unit (SSU) 1 Storage Unit (SSU) + 1 Expansion Unit (ESU) 2 x HA Storage Servers 82 x SAS drives (2x OSTs) 2 x High Capacity SSDs Grid-RAID 10

Initial CS9000 Single SSU Performance Results 9000 8000 7000 6000 MB/s 5000 4000 3000 2000 1000 0 4 4 4 8 8 8 16 16 16 32 32 32 64 64 64 128 128 128 Write 2439 2316 2370 3353 3310 3253 4527 4465 4451 7611 7271 7243 8494 8522 8553 7934 8015 8024 Read 3073 3117 3115 4538 4528 4533 6418 6390 6425 8243 8250 8235 8528 8534 8515 8431 8482 8452 16 client nodes -FDR IB 4 or 8 threads per node IOR parameters: Direct I/O mode, File per process, Transfer size 64MB 11

ClusterStor 6000 vs 9000 Specification ClusterStor 6000 ClusterStor 9000 Object Storage Servers Sandy Bridge 8C, 1.8GHz w/ 32GB Memory @ 1600MHz Ivy Bridge 10C, 1.9GHz w/ 32GB Memory @ 1866MHz Enclosure 5U 84 Titan RAS Enhanced 5U 84 Titan (Side Card FRU) RAID MDRAID Only Grid RAID Only Disk 4/3/2TB 4/3/2TB (6/5TB TBD Pending Avail.) SAS Lane Config. x8 SAS per 42 HDDs x12 SAS per 42 HDDs Flash Accelerator & Journals 2x 100GB SSDs (1+1) 2 x 800GB SSDs (1+1) ESU / EBOD Exp. Yes up to 3 in the field Yes up to 1 in the field IOR Performance 6GB/sec per SSU - 4TB drives 8.5GB/sec per SSU - 4TB drives. > 9GB/sec per SSU with 5+TB drives ClusterStor Mgmt Unit (CMU) MDS/MGS Nodes 2x Sandy Bridge 8C, 2.7GHz w/ 64GB Memory @ 1600MHz MGMT Nodes Single Sandy Bridge 8C, 2.7GHz w/ 32GB Memory @ 1600MHz MDS/MGS Nodes 2x Ivy Bridge 10C, 3.3GHz w/ 64GB Memory @ 1866MHz MGMT Nodes Single Ivy Bridge 8C, 2.6GHz w/ 32GB Memory @ 1600MHz 12

Management tools etc.

ClusterStor dashboard 14

The company that put ENTERPRISE into Lustre Often imitated, never beaten!!