Technology Testing at CSCS including BeeGFS Preliminary Results. Hussein N. Harake CSCS-ETHZ

Similar documents
Technology evaluation at CSCS including BeeGFS parallel filesystem. Hussein N. Harake CSCS-ETHZ

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

An ESS implementation in a Tier 1 HPC Centre

CSCS HPC storage. Hussein N. Harake

BeeGFS. Parallel Cluster File System. Container Workshop ISC July Marco Merkel VP ww Sales, Consulting

The Leading Parallel Cluster File System

Extraordinary HPC file system solutions at KIT

An Introduction to BeeGFS

SFA12KX and Lustre Update

HPC Technology Update Challenges or Chances?

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Robin Hood 2.5 on Lustre 2.5 with DNE

CSCS Site Update. HPC Advisory Council Workshop Colin McMurtrie, Associate Director and Head of HPC Operations.

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

FhGFS - Performance at the maximum

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Leonhard: a new cluster for Big Data at ETH

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.3

A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers

MAHA. - Supercomputing System for Bioinformatics

New Storage Technologies First Impressions: SanDisk IF150 & Intel Omni-Path. Brian Marshall GPFS UG - SC16 November 13, 2016

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Parallel File Systems Compared

NetApp High-Performance Storage Solution for Lustre

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Advanced Data Placement via Ad-hoc File Systems at Extreme Scales (ADA-FS)

The RAMDISK Storage Accelerator

BeeGFS Solid, fast and made in Europe

Experiences with HP SFS / Lustre in HPC Production

Crossing the Chasm: Sneaking a parallel file system into Hadoop

AFM Use Cases Spectrum Scale User Meeting

The Spider Center-Wide File System

Lustre usages and experiences

File Systems for HPC Machines. Parallel I/O

Parallel File Systems for HPC

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Crossing the Chasm: Sneaking a parallel file system into Hadoop

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Parallel File Systems. John White Lawrence Berkeley National Lab

GFS Best Practices and Performance Tuning. Curtis Zinzilieta, Red Hat Global Services

FUJITSU PHI Turnkey Solution

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Cray XC Scalability and the Aries Network Tony Ford

The Hyperion Project: Collaboration for an Advanced Technology Cluster Testbed. November 2008

Improving overall Robinhood performance for use on large-scale deployments Colin Faber

LHConCRAY. Acceptance Tests 2017 Run4 System Report Miguel Gila, CSCS August 03, 2017

Toward portable I/O performance by leveraging system abstractions of deep memory and interconnect hierarchies

Habanero Operating Committee. January

The Last Bottleneck: How Parallel I/O can improve application performance

Progress Report on Transparent Checkpointing for Supercomputing

Challenges in making Lustre systems reliable

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

THOUGHTS ABOUT THE FUTURE OF I/O

Parallel Virtual File Systems on Microsoft Azure

Comet Virtualization Code & Design Sprint

HPC Storage Use Cases & Future Trends

BeeGFS Benchmarks on IBM OpenPOWER Servers. Ely de Oliveira October 2016 v 1.0

WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

An introduction to BeeGFS. Frank Herold, Sven Breuner June 2018 v2.0

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

Red Hat Enterprise 7 Beta File Systems

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Application Performance on IME

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

IBM Spectrum Scale IO performance

Network Request Scheduler Scale Testing Results. Nikitas Angelinas

IME Infinite Memory Engine Technical Overview

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Shifter: Fast and consistent HPC workflows using containers

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Dell TM Terascala HPC Storage Solution

HPC projects. Grischa Bolls

SUSE. High Performance Computing. Eduardo Diaz. Alberto Esteban. PreSales SUSE Linux Enterprise

Short Talk: System abstractions to facilitate data movement in supercomputers with deep memory and interconnect hierarchy

Implementing Storage in Intel Omni-Path Architecture Fabrics

Operational Robustness of Accelerator Aware MPI

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

Opportunities for container environments on Cray XC30 with GPU devices

Lessons learned from Lustre file system operation

S K T e l e c o m : A S h a r e a b l e D A S P o o l u s i n g a L o w L a t e n c y N V M e A r r a y. Eric Chang / Program Manager / SK Telecom

DDN About Us Solving Large Enterprise and Web Scale Challenges

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

The Last Bottleneck: How Parallel I/O can attenuate Amdahl's Law

DELL Terascala HPC Storage Solution (DT-HSS2)

Automated Configuration and Administration of a Storage-class Memory System to Support Supercomputer-based Scientific Workflows

Transcription:

Technology Testing at CSCS including BeeGFS Preliminary Results Hussein N. Harake CSCS-ETHZ

Agenda About CSCS About the Systems Integration (SI) Unit Technology Overview DDN IME DDN WOS OpenStack BeeGFS Case Study What is BeeGFS? Test System Layout Tuning Monitoring Benchmark tools Results Next Steps Q&A CSCS 2016 2

CSCS (Swiss National Supercomputing Centre) Founded in 1991 Enables world-class research with a scientific user lab Available to domestic and international researchers through a transparent, peer-reviewed allocation process. Open to academia and are available as well to users from industry and the business sector. Operated by ETH Zurich and is located in Lugano. CSCS 2016 3

24 years of supercomputers at CSCS 1991 NEC SX3 5.5 GF Adula 1996 NEC SX4 10 GF Gottardo 1999 NEC SX5 64 GF Prometeo 2002 IBM SP4 1.3 TF Venus 2005 Cray XT3 5.8 TF Palu 2006 IBM P5 4.5 TF Blanc 2009-12 Cray XE6 402 TF Monte Rosa 2012-13 Cray XC30 7.7 PF Piz Daint 2014 XC30 1.25 PF Piz Daint extension 4

Data Centre - 2000 sq.m Machine Room - 20 MW of power and Cooling capacity - Lake Water cooling - 700 Liters/s CSCS 2016 5

Overview of Systems Integration (SI) Unit Unit missions: - Managing projects - Relations with Vendors - Evaluating Technologies - Software deployments

Technology Overview DDN IME Image courtesy of DDN CSCS 2016 7

Tchnology Overview DDN WOS (1) CSCS 2016 8 Image courtesy of DDN

Technology Overview DDN WOS (2) CSCS 2016 9

Technology Overview DDN WOS (3) CSCS 2016 10

Technology Overview - OpenStack Image source: https://www.openstack.org/software/ CSCS 2016 11

BeeGFS Case Study

What is BeeGFS? Parallel filesystem HPC oriented Used to be called FhGFS Alternative to Lustre and GPFS Developed by Fraunhofer Open-source Support delivered by ThinkParq Image courtesy of BeeGFS 13

Basic Features of BeeGFS Supports failover for data and Metadata using applications like Peacemaker, heartbeat Replication failover mechanism Supports Multiple data and metadata on both servers and targets Supports quota Uses Robin-hood to scan the entire filesystem Beegfs on demand filesystem (BeeOND) Easy to deploy and manage CSCS 2016 14

BeeOND - Create a filesystem on Demand - Uses the hard drive / SSDs on every compute node - Filesystem get created by submitting a job to the schedule We are working on confirming SLURM support - Memory could used instead of SSDs - We used 20 SSDs on 20 nodes for our tests CSCS 2016 15

Benefits of BeeOND Benefits from unused space No impact on the parallel filesystem Real utilization of the high speed network Filesystem scales with the compute nodes Open point: What is the overhead on the compute nodes? CSCS 2016 16

Test System Layout One couplet (two controllers) 4 * FDR Links Two X86 servers One enclosure 60 drives DDN 7700 6 SSDs one raid volume 6 * 9 Raid 5 volumes 2 * FDR Links Dual sockets SB 128GB memory Fabric 1 * FDR Links CSCS 2016 17

Tuning the servers echo 5 > /proc/sys/vm/dirty_background_ratio echo 20 > /proc/sys/vm/dirty_ratio echo 50 > /proc/sys/vm/vfs_cache_pressure echo 262144 > /proc/sys/vm/min_free_kbytes echo always > /sys/kernel/mm/transparent_hugepage/enabled echo always > /sys/kernel/mm/transparent_hugepage/defrag for dev in dm-0 dm-1 dm-2 dm-3 dm-4 dm-5 dm-6 do echo deadline > /sys/block/$dev/queue/scheduler echo 4096 > /sys/block/$dev/queue/nr_requests echo 32768 > /sys/block/$dev/queue/read_ahead_kb echo 32767 > /sys/block/$dev/queue/max_sectors_kb done echo performance tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor echo 1 > /proc/sys/vm/zone_reclaim_mode Documentation for the tuned parameters: https://www.kernel.org/doc/documentation/sysctl/vm.txt https://access.redhat.com/solutions/46111 http://www.slideshare.net/rampalliraj/linux-kernel-io-schedulers?from_action=save CSCS 2016 18

Monitoring clients activities (1) CSCS 2016 19

Monitoring servers activities (2) CSCS 2016 20

Benchmark tools Mdtest measuring metadata https://sourceforge.net/projects/mdtest/ IOzone throughput read and write http://www.iozone.org CSCS 2016 21

Iozone results on /beegfs Test running: Children see throughput for 64 initial writers = 5032700.90 kb/sec Min throughput per process = 63754.09 kb/sec Max throughput per process = 103798.58 kb/sec Avg throughput per process = 78635.95 kb/sec Min xfer = 12880896.00 kb Test running: Children see throughput for 64 rewriters = 4996297.63 kb/sec Min throughput per process = 68781.82 kb/sec Max throughput per process = 90666.23 kb/sec Avg throughput per process = 78067.15 kb/sec Min xfer = 16473088.00 kb Test running: Children see throughput for 64 readers = 4225632.91 kb/sec Min throughput per process = 40047.24 kb/sec Max throughput per process = 77678.61 kb/sec Avg throughput per process = 66025.51 kb/sec Min xfer = 10813440.00 kb Test running: Children see throughput for 64 re-readers = 4253662.00 kb/sec Min throughput per process = 56998.73 kb/sec Max throughput per process = 76042.87 kb/sec Avg throughput per process = 66463.47 kb/sec Min xfer = 15729664.00 kb CSCS 2016 22

Mdtest results on BeeOND Directory creation Directory Stat Directories per second 120000 100000 80000 60000 40000 20000 0 1 2 4 8 16 20 Numer of MDSs Directories per second 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 1 2 4 8 16 20 Numer of MDSs Directories per second Directory Stat 160000 140000 120000 100000 80000 60000 40000 20000 0 1 2 4 8 16 20 Numer of MDSs CSCS 2016 23

Mdtest results on BeeOND File Creation File Stat 300000 900000 Files per second 250000 200000 150000 100000 50000 Files per second 800000 700000 600000 500000 400000 300000 200000 100000 0 1 2 4 8 16 20 Numer of MDSs 0 1 2 3 4 5 6 Numer of MDSs File removal 250000 Files per second 200000 150000 100000 50000 0 1 2 3 4 5 6 Numer of MDSs CSCS 2016 24

Next steps Scaling on bigger cluster Verifying the fail over procedures Verify the BeeOND overhead on compute nodes Using Nvme instead of SSDs Using tmpfs Create BeeOND through SLURM jobs Use Robinhood to scan millions of files CSCS 2016 25

Q&A hussein@cscs.ch 26