BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

Similar documents
BeeGFS Solid, fast and made in Europe

The Leading Parallel Cluster File System

An Introduction to BeeGFS

THOUGHTS ABOUT THE FUTURE OF I/O

An introduction to BeeGFS. Frank Herold, Sven Breuner June 2018 v2.0

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

FhGFS - Performance at the maximum

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Emerging Technologies for HPC Storage

An ESS implementation in a Tier 1 HPC Centre

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

An Introduction to GPFS

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Data Movement & Tiering with DMF 7

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Lustre overview and roadmap to Exascale computing

IBM Spectrum Scale IO performance

Veritas NetBackup on Cisco UCS S3260 Storage Server

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Structuring PLFS for Extensibility

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Architecting For Availability, Performance & Networking With ScaleIO

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

High-Performance Lustre with Maximum Data Assurance

Technologies for High Performance Data Analytics

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Experiences with HP SFS / Lustre in HPC Production

Implementing Storage in Intel Omni-Path Architecture Fabrics

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Introducing Panasas ActiveStor 14

Refining and redefining HPC storage

Using file systems at HC3

DDN and Flash GRIDScaler, Flashscale Infinite Memory Engine

InfiniBand Networked Flash Storage

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Preparing GPU-Accelerated Applications for the Summit Supercomputer

LBRN - HPC systems : CCT, LSU

Parallel File Systems. John White Lawrence Berkeley National Lab

Introduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

AFM Use Cases Spectrum Scale User Meeting

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Accelerating Spectrum Scale with a Intelligent IO Manager

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

IBM CORAL HPC System Solution

An Overview of Fujitsu s Lustre Based File System

Application Acceleration Beyond Flash Storage

Habanero Operating Committee. January

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Deep Learning mit PowerAI - Ein Überblick

Deep Learning Performance and Cost Evaluation

SolidFire and Ceph Architectural Comparison

IBM ProtecTIER and Netbackup OpenStorage (OST)

CSCS HPC storage. Hussein N. Harake

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

libhio: Optimizing IO on Cray XC Systems With DataWarp

HPC Storage Use Cases & Future Trends

Deep Learning Performance and Cost Evaluation

What's New in SUSE LINUX Enterprise Server 9

Optimizing Cluster Utilisation with Bright Cluster Manager

Storage for HPC, HPDA and Machine Learning (ML)

Got Isilon? Need IOPS? Get Avere.

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Technology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

The Fastest And Most Efficient Block Storage Software (SDS)

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems

DDN About Us Solving Large Enterprise and Web Scale Challenges

Open Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012

Contents. HyperCore OS Software Versions 13 Supported 13 Unsupported 13. Definitions 2 Supported 2 Unsupported 2

Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ

Definitions 2 Supported 2 Unsupported 2. Tested System Limits 3. HC3 System Configuration General Guidelines 5. HC3 System Configurations 7

StorPool Distributed Storage Software Technical Overview

Extremely Fast Distributed Storage for Cloud Service Providers

The RAMDISK Storage Accelerator

Lustre and PLFS Parallel I/O Performance on a Cray XE6

19. prosince 2018 CIIRC Praha. Milan Král, IBM Radek Špimr

EXPERIENCES WITH NVME OVER FABRICS

White Paper. File System Throughput Performance on RedHawk Linux

Filesystems on SSCK's HP XC6000

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Extraordinary HPC file system solutions at KIT

Next Generation Storage for The Software-Defned World

Lustre usages and experiences

Isilon Performance. Name

Transcription:

BeeGFS The Parallel Cluster File System Container Workshop ISC 28.7.18 www.beegfs.io July 2018 Marco Merkel VP ww Sales, Consulting

HPC & Cognitive Workloads Demand Today Flash Storage HDD Storage Shingled Storage & Cloud GPU Network

Traditional System Architecture(s)

BeeGFS Inventor The Fraunhofer Gesellschaft (FhG) Largest organization for applied research in Europe 66 institutes in Germany, research units and offices around the globe Staff: ~24000 employees Annual Budget: ~2 Billion Inventor of mp3 Audio Codec The Fraunhofer Center for High-Performance Computing Part of Fraunhofer Institute for Industrial Mathematics (ITWM) Located in Kaiserslautern, Germany Staff: ~260 employees

ThinkParQ A Fraunhofer spin-off Founded in 2014 specifically for BeeGFS Based in Kaiserslautern (right next to Fraunhofer HPC Center) Consulting, professional services & support for BeeGFS Cooperative development together with Fraunhofer (Fraunhofer continues to maintain a core BeeGFS HPC team) First point of contact for BeeGFS

What is BeeGFS? BeeGFS is A hardware-independent parallel file system (aka Software-defined Parallel Storage) Designed for performance-critical environments File #1 File #2 File #3 /mnt/beegfs/dir1 1 1 1 2 2 3 2 3 3 M M M Storage Server #1 Storage Server #2 Storage Server #3 Storage Server #4 Storage Server #5 Metadata Server #1 Simply grow capacity and performance to the level that you need

BeeGFS Architecture Client Service Native Linux module to mount the file system Storage Service Store the (distributed) file contents Metadata Service Maintain striping information for files Not involved in data access between file open/close Management Service Service registry and watch dog Graphical Administration and Monitoring Service GUI to perform administrative tasks and monitor system information Can be used for Windows-style installation

Enterprise Features BeeGFS Enterprise Features do require a support contract: High Availability Quota Enforcement Access Control Lists (ACLs) Storage Pools Burst buffer function with Beeond Support Benefits: Competent level 3 support (next business day) Access to customer login area Special repositories with early updates and hotfixes Additional documentation and how to guides Direct contact to the file system development team Guaranteed next business day response Optional remote login for faster problem solving Fair & simple pricing model

BeeGFS v6: Buddy Mirroring New in BeeGFS v6: Built-in Replication for High Availability Flexible setting per directory Individual for metadata and/or storage Buddies can be in different racks or different fire zones.

BeeGFS v7: Storage Pools New in BeeGFS v7: Support for different types of storage Modification Event Logging Statistics in time series database

Storage + Compute: Converged Setup Compute nodes as storage servers for small systems

BeeOND BeeGFS On Demand Create a parallel file system instance on-the-fly Start/stop with one simple command Use cases: cloud computing, test systems, cluster compute nodes,.. Can be integrated in cluster batch system (e.g. Univa Grid Engine) Common use case: per-job parallel file system Aggregate the performance and capacity of local SSDs/disks in compute nodes of a job Take load from global storage Speed up "nasty" I/O patterns

The easiest way to setup a parallel filesystem # GENERAL USAGE $ beeond start n <nodefile> -d <storagedir> -c <clientmount> ------------------------------------------------- # EXAMPLE $ beeond start n $NODEFILE d /local_disk/beeond c /my_scratch Starting BeeOND Services Mounting BeeOND at /my_scratch Done.

Live per-client and per-user Statistics

Typical Hardware for BeeGFS www.beegfs.io

GB / s Different building blocks for BeeGFS Entry-Level Building Block 8 CPU Cores @ 2GHz 2 x 400 GB SSD for OS and BeeGFS Metadata 24x 4 TB HDDs 128 GB RAM High-End Building Block 36 CPU Cores 24x NVMe drives with zfs 1 TB RAM 12 Read Write Max Throughput 11,6 11,8 8 4 2,5 2,4 0 Entry-Level High-End 85% of BeeGFS customers use InifiniBand

Key Aspects Maximum Performance & Scalability High Flexibility Robust & Easy to use

MB/s MB/s Linear throughput scalability for any I/O pattern 30000 Sequential read/write up to 20 servers, 160 application processes 14000 Strided unaligned shared file writes, 20 servers, up to 768 application processes 25000 13000 20000 15000 12000 11000 10000 10000 9000 5000 0 0 5 10 15 20 25 NUMBER OF STORAGE SERVERS Write Read 8000 7000 6000 24 48 96 192 384 768 Shared File Write Throughput decreases with more processes for Lustre & GPFS NUMBER OF PROCESSES Source: John Bent, Los Alamos National Lab, PLFS Whitepaper

MB/s MB/s Linear throughput scalability for any I/O pattern Sequential read/write up to 20 servers, 160 application processes Strided unaligned shared file writes, 20 servers, up to 768 application processes 30000 14000 25000 13000 20000 12000 11000 15000 10000 10000 9000 5000 0 0 5 10 15 20 25 8000 7000 6000 24 48 96 192 384 768 NUMBER OF STORAGE SERVERS NUMBER OF PROCESSES Write Read

Key Aspects Performance & Scalability Flexibility Robust & Easy to use Very intensive suite of release stress tests, in-house production use before public release The move from a 256 nodes system to a 1000 nodes system did not result in a single hitch, similar for the move to a 2000 nodes system. Applications access BeeGFS as a normal (very fast) file system mountpoint Applications do not need to implement a special API Servers daemons run in user space On top of standard local filesystems (ext4, xfs, zfs, ) No kernel patches Kernel, system and BeeGFS updates are trivially simple Packages for Redhat, SuSE, Debian and derivatives Runs on shared-nothing HW Graphical monitoring tool Admon: Administration & Monitoring Interface

Key Aspects Performance & Scalability Flexibility Multiple BeeGFS services (any combination) can run together on the same machine Flexible striping per-file / per-directory Add servers at runtime On-the-fly creation of file system instances (BeeOND) Runs on ancient and modern Linux distros/kernels NFS & Samba re-export possible Runs on different Architectures, e.g.

BeeGFS Flexibility Spectrum BeeGFS Is Designed For All Kinds And Sizes: Scale seamlessly from few to thousands of nodes Idea: Make use of what you already have for burst-buffering - SSDs in the compute nodes

Flexibility: CPU Architectures

Installations, certified vendors, plugins & integrations Singularity

Use Cases & Building blocks www.beegfs.io

Test Result @ Juron OpenPower NVMe Cluster The cluster consists of 18 nodes. Each node had the following configuration: 2 IBM POWER8 processors (up to 4.023 GHz, 2x 10 cores, 8 threads/core) 256 GB DDR4 memory Non-Volatile Memory: HGST Ultrastar SN100 Series NVMe SSD, partitioned: partition of 745 GB for storage partition of 260 GB for metadata 1 Mellanox Technologies MT27700 Family [ConnectX-4], EDR InfiniBand. All nodes were connected to a single EDR InfiniBand switch CentOS 7.3, Linux kernel 3.10.0-514.21.1.el7 Results: 2 weeks test time Measured values are equivalent to 104% write & 97% (read) of NVMe manufacturer specs. For file stat operations the throughput is increasing up to 100,000 ops/s per metadata server 16 metadata servers, the system delivers 300,000 file creates & over 1,000,000 stat ops per sec. IO 500 Annual Winner Chart place # 4 with only 8 nodes - have check on other setups being above #4 and let me know the price...

Test Result @ Juron OpenPower NVMe Cluster The cluster consists of 18 nodes. Each node had the following configuration: 2 IBM POWER8 processors (up to 4.023 GHz, 2x 10 cores, 8 threads/core) 256 GB DDR4 memory Non-Volatile Memory: HGST Ultrastar SN100 Series NVMe SSD, partitioned: partition of 745 GB for storage partition of 260 GB for metadata 1 Mellanox Technologies MT27700 Family [ConnectX-4], EDR InfiniBand. All nodes were connected to a single EDR InfiniBand switch CentOS 7.3, Linux kernel 3.10.0-514.21.1.el7 Results: 2 weeks test time Measured values are equivalent to 104% write & 97% (read) of NVMe manufacturer specs. For file stat operations the throughput is increasing up to 100,000 ops/s per metadata server 16 metadata servers, the system delivers 300,000 file creates & over 1,000,000 stat ops per sec. IO 500 Annual Winner Chart place # 4 with only 8 nodes - have check on other setups being above #4 and let me know the price...

BeeOND at Alfred-Wegener-Institute Example: Omni-Path Cluster at AWI Global BeeGFS storage on spinning disks (4 servers @ 20GB/s) 300 compute nodes with a 500MB/s SSD each 150GB/s aggregate BeeOND speed for free Create BeeOND on SSDs on job startup Integrated into Slurm prolog/epilog script Stage-in input data, work on BeeOND, stage-out results

Cray CS400 at Alfred Wegner Institut Broadwell CPU Omnipath Interconnect 0,5 TB SSD in each node BeeOND IOR 50TB Stripe size 1, local Stripe 4 Stripe size 1, any 308 Nodes write 160 GB/sec 161 GB/sec 160 GB/sec 308 Nodes read 167 GB/sec 164 GB/sec 167 GB/sec Omnipath Streaming Performance into BeeGFS : 1 Thread : 8,4 GB/sec 2 Threads : 10,8 GB/sec 4 Threads : 12,2 GB/sec beegfs.io 29 UNIVERSITY OF BATH 2018

TSUBAME 3.0 in Tokyo Tsubame 3.0 for hybrid AI & HPC with BeeOND on 1PB NVMe (raw device bandwidth >1TB/s)

BeeGFS Key Advantages Maximum Performance & Scalability High Flexibility Robust & Easy to use Excellent performance based on highly scalable architecture Flexible and robust high availability options Easy to use and easy to manage solutions Suitable for very different application domains & IO pattern Fair L3 pricing model (per storage node per year)

Turnkey Solutions

Questions? // Keep in touch Web Mail Twitter LinkedIn Newsletter marco.merkel@thinkparq.com