Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

Similar documents
Demonstration Milestone for Parallel Directory Operations

ThinkServer RD330 Photography - 3.5" Disk Bays

5.4 - DAOS Demonstration and Benchmark Report

Fan Yong; Zhang Jinghai. High Performance Data Division

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

DL-SNAP and Fujitsu's Lustre Contributions

Fujitsu's Lustre Contributions - Policy and Roadmap-

High Level Architecture For UID/GID Mapping. Revision History Date Revision Author 12/18/ jjw

Testing Lustre with Xperior

ThinkServer RD530 Photography - 3.5" Disk Bay*

High-Performance Lustre with Maximum Data Assurance

Fujitsu s Contribution to the Lustre Community

Upstreaming of Features from FEFS

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Multi-tenancy: a real-life implementation

DNE2 High Level Design

ThinkServer RD630 Photography - 3.5" Disk Bay*

Lustre Metadata Fundamental Benchmark and Performance

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

Community Release Update

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Project Quota for Lustre

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Lustre / ZFS at Indiana University

Johann Lombardi High Performance Data Division

6.5 Collective Open/Close & Epoch Distribution Demonstration

Andreas Dilger, Intel High Performance Data Division LAD 2017

Parallel File Systems for HPC

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Feedback on BeeGFS. A Parallel File System for High Performance Computing

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

ROBINHOOD POLICY ENGINE

Extraordinary HPC file system solutions at KIT

Acceptance Small (acc-sm) Testing on Lustre

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Multi-Rail LNet for Lustre

Scalability Test Plan on Hyperion For the Imperative Recovery Project On the ORNL Scalable File System Development Contract

TELECOMMUNICATIONS TECHNOLOGY ASSOCIATION JET-SPEED HHS3124F / HHS2112F (10 NODES)

Remote Directories High Level Design

An Overview of Fujitsu s Lustre Based File System

Datura The new HPC-Plant at Albert Einstein Institute

Olaf Weber Senior Software Engineer SGI Storage Software. Amir Shehata Lustre Network Engineer Intel High Performance Data Division

Experiences with HP SFS / Lustre in HPC Production

Andreas Dilger, Intel High Performance Data Division SC 2017

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

SDSC s Data Oasis Gen II: ZFS, 40GbE, and Replication

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

CPMD Performance Benchmark and Profiling. February 2014

ANSYS Fluent 14 Performance Benchmark and Profiling. October 2012

Server & Storage Offering Guide for SME

New Storage Architectures

Analytics of Wide-Area Lustre Throughput Using LNet Routers

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

NEC Express5800/B120f-h System Configuration Guide

2-Port 40 Gb InfiniBand Expansion Card (CFFh) for IBM BladeCenter IBM BladeCenter at-a-glance guide

The Genesis HyperMDC is a scalable metadata cluster designed for ease-of-use and quick deployment.

SERVER TECHNOLOGY H. Server & Workstation Motherboards Server Barebones & Accessories

LS-DYNA Performance Benchmark and Profiling. October 2017

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

QuickSpecs. HPE Cloudline CL7100 Server. Overview

FhGFS - Performance at the maximum

Open SFS Roadmap. Presented by David Dillow TWG Co-Chair

NetApp High-Performance Storage Solution for Lustre

Dell Compellent Storage Center and Windows Server 2012/R2 ODX

GW2000h w/gw175h/q F1 specifications

High Performance Computing

Assistance in Lustre administration

Lustre at Scale The LLNL Way

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

SFA12KX and Lustre Update

An Exploration of New Hardware Features for Lustre. Nathan Rutman

Lessons learned from Lustre file system operation

Elastifile 2.5.x Dedicated Storage Mode (DSM) Hardware Requirements Guide. November 2017 Document Revision: 0.1

Filesystems on SSCK's HP XC6000

DELL Terascala HPC Storage Solution (DT-HSS2)

LIME: A Framework for Lustre Global QoS Management. Li Xi DDN/Whamcloud Zeng Lingfang - JGU

RS-200-RPS-E 2U Rackmount System with Dual Intel

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Toward a Windows Native Client (WNC) Meghan McClelland LAD2013

Administering Lustre 2.0 at CEA

MILC Performance Benchmark and Profiling. April 2013

Olaf Weber Senior Software Engineer SGI Storage Software

A ClusterStor update. Torben Kling Petersen, PhD. Principal Architect, HPC

Genesis HyperMDC 200D

Andreas Dilger High Performance Data Division RUG 2016, Paris

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

HPC Hardware Overview

LFSCK 1.5 High Level Design

NEMO Performance Benchmark and Profiling. May 2011

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Tutorial: Lustre 2.x Architecture

HP SAS benchmark performance tests

Transcription:

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-1 contract. Revision History Date Revision Author 26/2/14 Original R. Henwood 13/3/14 Clarifications R. Henwood

Introduction The following milestone completion document applies to Subproject 3.2 LFSCK 2: MDT-OST Consistency. This project is recorded in the OpenSFS Lustre software Development contract SFS- DEV-1 agreed August 3, 211. The LFSCK 2: MDT-OST Consistency code is functionally complete. The purpose of this Milestone is to verify the code performs acceptably in a production-like environment. In addition to completing all the Test Scenarios (demonstrated for the Implementation Milestone,) LFSCK 2: MDT-OST Consistency Performance will be measured as agreed on the ticket LU-3423. All tests were executed on the OpenSFS Functional Test Cluster. Details of the hardware are available in Appendix A. For all the tests, Lustre software Master with LFSCK 2 patches was used. Correctness 1. sanity-lfsck.sh Test will be executed within Autotest and results automatically recorded in Maloo. Test will be automatically completed, triggered by a Gerrit check-in with commit message "Test- Parameters: envdefinitions=enable_quota=yes mdtcount=2 testlist=sanity-lfsck". All test cases must pass. 2. sanity-scrub.sh Test will be executed within Autotest and results automatically recorded in Maloo. Test will be automatically completed, triggered by a Gerrit check-in with commit message "Test- Parameters: envdefinitions=enable_quota=yes testlist=sanity-scrub". All test cases must pass. 3. Standard review tests The standard collection of review tests (currently including sanity, sanityn, replay-single, conf-sanity, recovery-small, replay-ost-single, insanity, sanity-quota, sanity-sec, lustre-rsync-test, lnet-selftest, and mmp) will be executed within Autotest and results automatically recorded in Maloo. Tests will be automatically completed, triggered by a Gerrit check-in. All test cases should pass except for some known test failures unrelated to the LFSCK functionality. Result This test has been completed successfully and the results are recorded in the Implementation milestone: http://wiki.opensfs.org/images/e/ee/lfsck_mdt-ostconsistency_implementation.pdf Demonstration context All tests require a populated directory on the file system. The directory will be created and populated with the following properties: 1. Create 'L' test root directories. 'L' is equal to the MDT count. The directory in the root 'dir- X' is located MDT-X. 2. Set default stripe size as 64KB. Set the default stripe count as 'M'. 3. Create 'N' sub-directories under each of the test root directories.

4. Under each sub-directory create 8K files. Each file is 64 * 'M' KB in size. 5. Once created, the populated directory structure with N sub-directories (described above) is known in the procedure below as PopD_n. Measure performance of LFSCK 2 against a single MDT device without inconsistencies. 3. L={1} 4. M={1, 2, 4, 8} 5. N={4, 8, 16, 32}. 6. LFSCK must run at full speed, without any Lustre file system system load. lctl lfsck_start -M ${fsname}-mdt -t namespace 7. Record the following a. Start time, end time b. lfsck_namespace statistics before and after the run (lctl get_param -n mdd.${fsname}- MDT.lfsck_namespace) Result OST-objects processed per second 9 8 7 6 5 4 3 2 1 1 2 4 8 stripe count 32K files 64K files 128K files 256K files Figure 1: LFSCK 2: MDT-OST consistency running against a single consistent MDT Checking a completely consistent file system performs well and compares favorably with previous measurements of LFSCK. For each of the varying stripe count measurements consistent performance is observed. Varying the total number of files in the tested directory shows a measurable decrease in performance.

Measure performance of LFSCK 2 against a single MDT device with inconsistencies. 3. L={1} 4. M={1, 2, 4, 8} 5. N={4, 8, 16, 32}. 6. On the OSS, set fail_loc to skip the XATTR_NAME_FID set to simulate the case of MDT-OST inconsistency. 7. LFSCK must run at full speed, without any Lustre file system system load. lctl lfsck_start -M ${fsname}-mdt -t namespace 8. Record the following a. Start time, end time 9. lfsck_namespace statistics before and after the run (lctl get_param -n mdd.${fsname}- MDT.lfsck_namespace) Result

OST-objects processed per second 35 3 25 2 15 1 5 1 2 4 8 32K files 64K files 128K files 256K files stripe count Figure 2: LFSCK 2: MDT-OST consistency repairing dangling references on a single MDT with varying stripe count Checking and fixing an inconsistent file system performs well. For each of the varying stripe count measurements consistent performance is observed. Varying the total number of files in the tested directory shows a small but measurable decrease in performance. Comparing scanning a consistent file system (previous result) with fixing inconsistencies shows a measurable reduction in performance, as one would expect. Impact of LFSCK on small file create performance on a single MDT without inconsistencies 3. L={1}. 4. M={8} 5. N={2} 6. Populate each sub-directory with 1M files.. 7. Run LFSCK with speed limited to {2%, 4%, 6%, 8%, 1%} of full speed. 8. Simultaneously, use md_test to create an additional 1M files and measure the create performance.

Result create speed (MDT-objects/sec) 45 4 35 3 25 2 15 1 5 2 4 6 8 1 LFSCK speed limited to a % of maximum LFSCK speed Figure 3: LFSCK 2: MDT-OST consistency impact on create performance. Creating files on an consistent file system that has LFSCK currently executing performs well. As a general result, as LFSCK scanning speed is increased file creates become slightly slower. This is consistent with expected behavior. It is worth noting that even at 1% LFSCK speed the performance is still a large fraction of the performance without LFSCK running. Multiple MDT Demonstration LFSCK 2 is tested against multiple MDTs. Configurations with one, two, three, and four MDTs are created. This is effectively the Demonstration Context above, with values of L={1,2,3,4}. Impact of LFSCK 2 running against multiple MDT devices without inconsistencies. 3. L={1,2,3,4} 4. M={1, 2, 4, 8} 5. N={4, 8}. 6. LFSCK must run at full speed, without any Lustre file system system load. lctl lfsck_start -M ${fsname}-mdt -t namespace 7. Record the following a. Start time, end time 8. lfsck_namespace statistics before and after the run (lctl get_param -n mdd.${fsname}- MDT.lfsck_namespace)Install Lustre Master with LFSCK patches.

Result 35 OST objects checked per second 3 25 2 15 1 5 32K files/mdt 64K files/mdt 1 2 3 4 MDT count Figure 4: LFSCK 2 MDT-OST consistency performance on multiple consistent MDTs LFSCK 2 works across multiple consistent MDTs. The scanning performance against a consistent file system across multiple MDTs appears to scale as would be expected. It can also be observed that the result for a single MDT is similar to the results previously recorded in this document. Impact of LFSCK 2 running against multiple MDT devices during inconsistency repair. 3. L={1,2,3,4}. 4. M={8} 5. N={2,4} 6. Populate each sub-directory with 1M files. 7. Run LFSCK with speed limited to {2%, 4%, 6%, 8%, 1%} of full speed. 8. Simultaneously, use md_test to create an additional 1M files and measure the create performance.

Result 9 OST-objects scanned per second 8 7 6 5 4 3 2 1 32K files/mdt 64K files/mdt 1 2 3 4 MDT count Figure 5: LFSCK 2 MDT-OST repair of dangling references on multiple MDTs with varying stripe count LFSCK 2 works across multiple inconsistent MDTs. The scanning performance against an inconsistent file system with multiple MDTs does not apparently show linear scaling for 64K files/mdt and does show linear scaling for 32K files/mdt from one to three MDTs. It can also be observed that the result for a single MDT is similar to the results previously recorded in this document. Conclusion LFSCK 2: MDT-OST consistency has successfully completed both functional Acceptance and Performance tests. The performance results recorded herein illustrate performance expectations are met or exceeded during online operation and under load. In addition, LFSCK 2 has been shown to meet or exceed expectations running in a single and multiple MDT environment. * Other names and brands maybe the property of others.

Appendix A: OpenSFS Functional Test Cluster specification client (2) Intel E562 2.4GHz Westmere (Total 8 Cores) (1) 64GB DDRIII 1333MHz ECC/REG - (8x8GB Modules Installed) * (1) On Board Dual 1/1/1T Ports (8) Hot Swap Drive Bays for SATA/SAS (6) PCi-e Slots 8X (3) QDR 4GB QSFP to QSFP ib Cables (3) Mellanox QDR 4GB QSFP Single Port OSS server (1) Intel E562 2.4GHz Westmere (Total 8 Cores) (1) 32GB DDRIII 1333MHz ECC/REG - (8x8GB Modules Installed) * (1) On Board Dual 1/1/1T Ports (1) On Board VGA (1) On Board IPMI 2. Via 3rd. Lan (1) 5GB SATA Enterprises 24x7 (1) 4GB SSD OCZ SATA (8) Hot Swap Drive Bays for SATA/SAS (6) PCi-e Slots 8X (3) QDR 4GB QSFP to QSFP ib Cables (3) Mellanox QDR 4GB QSFP Single Port MDS server (1) Intel E562 2.4GHz Westmere (Total 8 Cores) (1) 32GB DDRIII 1333MHz ECC/REG - (8x8GB Modules Installed) * (1) On Board Dual 1/1/1T Ports (1) On Board VGA (1) On Board IPMI 2. Via 3rd. Lan (1) 5GB SATA Enterprises 24x7 (1) 4GB SSD OCZ SATA (8) Hot Swap Drive Bays for SATA/SAS (6) PCi-e Slots 8X (3) QDR 4GB QSFP to QSFP ib Cables (3) Mellanox QDR 4GB QSFP Single Port