CSCS HPC storage. Hussein N. Harake

Similar documents
Efficient Object Storage Journaling in a Distributed Parallel File System

ZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

BUILDING HIGH AVAILABILITY SSD

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Parallel File Systems for HPC

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

High Performance Storage Solutions

Parallel File Systems. John White Lawrence Berkeley National Lab

Application Acceleration Beyond Flash Storage

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Parallel File Systems Compared

NVMe SSD Performance Evaluation Guide for Windows Server 2016 and Red Hat Enterprise Linux 7.4

File Systems for HPC Machines. Parallel I/O

Experiences with HP SFS / Lustre in HPC Production

The Spider Center-Wide File System

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Feedback on BeeGFS. A Parallel File System for High Performance Computing

2012 HPC Advisory Council

SFA12KX and Lustre Update

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

High-Performance Lustre with Maximum Data Assurance

Xyratex ClusterStor6000 & OneStor

High Capacity network storage solutions

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

DELL Terascala HPC Storage Solution (DT-HSS2)

Lessons learned from Lustre file system operation

Exploiting the full power of modern industry standard Linux-Systems with TSM Stephan Peinkofer

FhGFS - Performance at the maximum

Oak Ridge National Laboratory

Blue Waters I/O Performance

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

HPC Storage Use Cases & Future Trends

High Performance SSD & Benefit for Server Application

Technology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ

Architecting a High Performance Storage System

InfiniBand Networked Flash Storage

The ClusterStor Engineered Solution for HPC. Dr. Torben Kling Petersen, PhD Principal Solutions Architect - HPC

Demonstration Milestone for Parallel Directory Operations

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

The Oracle Database Appliance I/O and Performance Architecture

Dell TM Terascala HPC Storage Solution

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Accelerating Spectrum Scale with a Intelligent IO Manager

HPC Technology Update Challenges or Chances?

Optimizing SDS for the Age of Flash. Krutika Dhananjay, Raghavendra Gowdappa, Manoj Hat

Active-Active LNET Bonding Using Multiple LNETs and Infiniband partitions

Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ

Datura The new HPC-Plant at Albert Einstein Institute

MELLANOX MTD2000 NFS-RDMA SDK PERFORMANCE TEST REPORT

Building Self-Healing Mass Storage Arrays. for Large Cluster Systems

Azor: Using Two-level Block Selection to Improve SSD-based I/O caches

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Userspace NVMe Driver in QEMU

Solaris Engineered Systems

Embedded Filesystems (Direct Client Access to Vice Partitions)

ETERNUS DX60 and DX80

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Emerging Technologies for HPC Storage

Extraordinary HPC file system solutions at KIT

2008 International ANSYS Conference

SurFS Product Description

GW2000h w/gw175h/q F1 specifications

High speed transfer on PCs

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

The Genesis HyperMDC is a scalable metadata cluster designed for ease-of-use and quick deployment.

Lenovo RD240 storage system

1 Title. 2 Team Information. 3 Abstract. 4 Problem Statement and Hypothesis. Zest: The Maximum Reliable TBytes/sec/$ for Petascale Systems

An Overview of Fujitsu s Lustre Based File System

Improved Solutions for I/O Provisioning and Application Acceleration

Implementing Storage in Intel Omni-Path Architecture Fabrics

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

Lustre overview and roadmap to Exascale computing

Deep Learning Performance and Cost Evaluation

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

Configuration Section Configuration

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

NST6000 UNIFIED HYBRID STORAGE. Performance, Availability and Scale for Any SAN and NAS Workload in Your Environment

ABySS Performance Benchmark and Profiling. May 2010

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

Hyper-converged infrastructure with Proxmox VE virtualization platform and integrated Ceph Storage.

Block Storage Service: Status and Performance

General Purpose Storage Servers

Voltaire Making Applications Run Faster

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Getting Started Guide. StorPool version xx Document version

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

Software-defined Shared Application Acceleration

Lock Ahead: Shared File Performance Improvements

Transcription:

CSCS HPC storage Hussein N. Harake

Points to Cover - XE6 External Storage (DDN SFA10K, SRP, QDR) - PCI-E SSD Technology - RamSan 620 Technology

XE6 External Storage - Installed Q4 2010 - In Production Q1 2011-5 Enclosures - 300 X 2TB Hard Drives - Two UPS for each Singlet - Lustre 1.8.4 filesystem - IB QDR Network - 4 IO servers

Voltaire 4036 CRAY XE6 IO SERVER 1 IO Server 2 IO Server 3 IO Server 4 (5) Quad SAS Disk Controllers 4 X IO servers: 4X 4X 4X 4X 4X - 4 OSSs & 1 MDS Server - 2 TB SATA drives - SRP protocol - IB QDR Network - 4 LNET Routers - Heartbeat & Multipath implementation - Lustre 1.8.4-8.4 GB RAW using ddn-ost-survey CSCS, Hussein N. Harake 4

XE6 External Storage # of LUNS Block Size Stripe Size # of Clients W. Cache=ON R. Cache=ON 1M Write Read 1 7110 MB 6154 MB 4 7013 MB 6274 MB 8 28 7753 MB 6053 MB 14 7508 MB 5388 MB 28 6465 MB 5270 MB 28 4M 1 6472 MB 6444 MB 4 6968 MB 6722 MB 8 28 7597 MB 6020 MB 14 7533 MB 6080 MB 28 8162 MB 5969 MB IOR was used with MPIIO (HDF5 didn t show any improvement) Workshop 5

PCI-E SSD Technology PCI E Virident based on SLC SSDs 1 X N400 400 GB PCI E Fusion IO based on SLC SSDs 1 X Iodrive Duo 320GB PCI E WarpDrive based on SLC SSDs 1 X SLP-300 packs 300GB

PCI-E SSD Technology Two Benchmark tools were used: FIO and IOR MPIIO, HDF5 and POSIX interfaces XFS and GPFS filesystems Two Supermicro servers with two sockets each PCI-E Gen 2 16X AMD Opteron Magny-Cours 8 cores 16 GB DDR3 Memory Workshop 7

PCI-E SSD Technology FIO Parameters: iodepth=256 iodepth_batch_complete=8 iodepth_batch_submit=8 ioengine=libaio direct=1 rw=randwrite or rw=randread numjobs=4 - iodepth # of requests for async IO - iodepth_batch_complete and iodepth_batch_submit are iodepth batching control - Direct=1 avoid any memory allocation direct io - Libaio Linux native asynchronous io - Randwrite = Random Write - Randread = Random Write Workshop 8

PCI-E SSD Technology IOR Parameters: -a POSIX -B b 1G t 4K e -s 1 -i 1 -F C -a POSIX: IO engine -B: bypassing I/O buffers -b: file size to be written per task -t: transfer size -e: fsync -s: number of segments -i: number of repetitions of test -F: file-per-process -C: changes task ordering to n+1 9

Workshop

PCI-E SSD Technology Comparing File-Systems Using One Card: XFS performed ~100% more then GPFS on IOPs GPFS and XFS showed same results on throughput Using two Cards Same server: IOPs showed ~90% increase on XFS and 30% on GPFS Throughput showed ~90% increase on XFS and on GPFS Using two Cards on two different servers over IB: GPFS showed 40% less in performance then local server Although GPFS didn't sustain the HW performance however a single card is capable to deliver the same IOPs of a DS5300 on GPFS. Workshop 13

PCI-E SSD Technology Things we learned: Marketing numbers are based on raw benchmarks, 30 to 50% of performance penalty when using raid and filesystem Bypass any kind of IO buffers if you want the real performance capability of your HW Don t change any configuration parameters on the HW controller in case you have data on it Each card required min. one core or more Two to 4GB of memory is required for each card Check the kind of Raid supported by your HW Does the HW include any battery for flushing the Data in case of loosing power Tools and utilities are required to report errors, health check, logs etc. Replacing a defected SSD makes sense instead of replacing the entire card Workshop 14

Benchmarking RamSan-620 on GPFS Hussein N. Harake CSCS - TI

Infrastructure Seven Dual Sockets clients / servers Two X GPFS IO server Five X GPFS clients PCI-E 16x 32 GB of memory IB QDR network Suse SLES 10 / Redhat OS GPFS 3.4.x 4 X FC 4 Gb/s dual ports

RamSan 620 Fast Random Access file system TMS - RamSan CPU1 CPU2 Store 1 CPU1 CPU2 Store 2 FC P1 4G/Bits FC P2 4G/Bits I B P1 40GIG I B P1 40GIG FC P1 4G/Bits FC P2 4G/Bits I B P1 40GIG I B P1 40GIG IB Network FC Switch1 FC Switch2 GPFS Clients FC 8 Gb/s Ethernet 1 Gb/s IB QDR 40 Gb/s Store1: GPFS IO server Store2: GPFS IO server SLC SSDs CSCS, Hussein N. Harake

Benchmark FIO from the IO servers: - Fio from GPFS IO servers - IB network was not used - GPFS Clients were not involved - Randread IOPs 4K block size - Randwrite IOPs 4K block size - Throughput randwrite 1M bs - Throughput randread 1M bs - 8 devices each was exported through FC port - GPFS nsds were used to handle Data And Metadata.

Results from IO servers Random Read using 4K block size each interface is delivering ~95K Iops Total 380K IOPs

Results from IO servers Random Write using 4K block size each interface is delivering ~83K Iops Total of 332K IOPs

Benchmark FIO from GPFS clients : - IB network was used - Using RDAM on GPFS showed some improvements - No FIO was running on GPFS servers, only on Clients - Randread IOPs 4K bs - Randwrite IOPs 4K bs - Throughput randwrite 1M bs - Throughput randread 1M bs bs = Block Size

Results from GPFS clients Random read using 4K block size each interface is delivering ~43K IOPs Total 190K IOPs

Results from GPFS clients Random write using 4K block size each interface is delivering ~18K IOPs Total 72K IOPs

Results Throughput - 2.1 GB/s Write Throughput using 1MB block size - 1.5 GB/s Read Throughput using 1MB block size - Client Results could be improved if we add more clients - GPFS showed a scalable solution on the RamSan - Not All components are Hot swappable