Acceptance Small (acc-sm) Testing on Lustre

Similar documents
Lustre OSS Read Cache Feature SCALE Test Plan

Testing Lustre with Xperior

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

Assistance in Lustre administration

Remote Directories High Level Design

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

LFSCK High Performance Data Division

5.4 - DAOS Demonstration and Benchmark Report

Lustre Metadata Fundamental Benchmark and Performance

Lustre Parallel Filesystem Best Practices

Performance Monitoring in an HP SFS Environment

Parallel File Systems for HPC

Fan Yong; Zhang Jinghai. High Performance Data Division

<Insert Picture Here> Lustre Development

Toward a Windows Native Client (WNC) Meghan McClelland LAD2013

Network Request Scheduler Scale Testing Results. Nikitas Angelinas

Disruptive Storage Workshop Hands-On Lustre

Extraordinary HPC file system solutions at KIT

Scalability Test Plan on Hyperion For the Imperative Recovery Project On the ORNL Scalable File System Development Contract

Lustre Clustered Meta-Data (CMD) Huang Hua Andreas Dilger Lustre Group, Sun Microsystems

Triton file systems - an introduction. slide 1 of 28

Demonstration Milestone for Parallel Directory Operations

6.5 Collective Open/Close & Epoch Distribution Demonstration

DNE2 High Level Design

Working with CPS Utilities

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Project Quota for Lustre

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

Challenges in making Lustre systems reliable

Case study: ext2 FS 1

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

INTEGRATING HPFS IN A CLOUD COMPUTING ENVIRONMENT

High Level Architecture For UID/GID Mapping. Revision History Date Revision Author 12/18/ jjw

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Filesystems on SSCK's HP XC6000


Experiences with HP SFS / Lustre in HPC Production

Andreas Dilger, Intel High Performance Data Division SC 2017

Data storage on Triton: an introduction

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

Introduction to the shell Part II

Lessons learned from Lustre file system operation

Imperative Recovery. Jinshan Xiong /jinsan shiung/ 2011 Whamcloud, Inc.

Case study: ext2 FS 1

An Exploration of New Hardware Features for Lustre. Nathan Rutman

Sending Invitations to Interview or Other Assessments Taleo Version 15

DAOS Epoch Recovery Design FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

Walk-thru- Build Lustre MASTER with ZFS on RHEL 7.3 /CentOS 7.3 from Intel Git

Dell TM Terascala HPC Storage Solution

Linux on zseries Journaling File Systems

DELL Terascala HPC Storage Solution (DT-HSS2)

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

Everyday* Lustre. A short survey on Lustre tools. Sven Trautmann Engineer, Lustre Giraffe Team, Sun Microsystems

File and filesystem fragmentation in Lustre

ROBINHOOD POLICY ENGINE

Comparison of different distributed file systems for use with Samba/CTDB

4/19/2016. The ext2 file system. Case study: ext2 FS. Recap: i-nodes. Recap: i-nodes. Inode Contents. Ext2 i-nodes

Automatic MySQL Schema Management with Skeema. Evan Elias Percona Live, April 2017

The JANUS Computing Environment

Lustre overview and roadmap to Exascale computing

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Scalability Testing of DNE2 in Lustre 2.7 and Metadata Performance using Virtual Machines Tom Crowe, Nathan Lavender, Stephen Simms

LLNL Lustre Centre of Excellence

MELLANOX MTD2000 NFS-RDMA SDK PERFORMANCE TEST REPORT

Community Release Update

Lustre at Scale The LLNL Way

Andreas Dilger, Intel High Performance Data Division LAD 2017

CLIO. Nikita Danilov Senior Staff Engineer Lustre Group

Johann Lombardi High Performance Data Division

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

A More Realistic Way of Stressing the End-to-end I/O System

SharePoint 2010 and 2013 Auditing and Site Content Administration using PowerShell

Ubuntu Manual Fsck Must Performed Debian

Guidelines for Efficient Parallel I/O on the Cray XT3/XT4

Veritas Access NetBackup Solutions Guide

Using file systems at HC3

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Kerberized Lustre 2.0 over the WAN

Lustre File System. Proseminar 2013 Ein-/Ausgabe - Stand der Wissenschaft Universität Hamburg. Paul Bienkowski Author. Michael Kuhn Supervisor

Analytics in the cloud

Supercomputing in Plain English Exercise #6: MPI Point to Point

Encode Rule Explorer App v1.0.2 for IBM QRadar Documentation

Computer Systems and Architecture

EECS 470 Lab 5. Linux Shell Scripting. Friday, 1 st February, 2018

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

The modules covered in this course are:

sottotitolo A.A. 2016/17 Federico Reghenzani, Alessandro Barenghi

Lab 3. On-Premises Deployments (Optional)

Computer Systems and Architecture

Unless otherwise noted, all references to STRM refer to STRM, STRM Log Manager, and STRM Network Anomaly Detection.

Preparation of alignments for variant calling with GATK: exercise instructions for BioHPC Lab computers

If you had a freshly generated image from an LCI instructor, make sure to set the hostnames again:

Advanced Operating Systems

tailon Documentation Release Georgi Valkov

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Feedback on BeeGFS. A Parallel File System for High Performance Computing

MANAGING LUSTRE & ITS CEA

Transcription:

Acceptance Small (acc-sm) Testing on Lustre Author Date Description of Document Change Approval By Approval Date Sheila Barthel 10/21/08 First draft Sheila Barthel 10/28/08 Second draft Sheila Barthel 11/11/08 Third (final) draft November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 1 of 8

Overview The Lustre QE group and developers use acceptance-small (acc-sm) tests to catch bugs early in the development cycle. Within the Lustre group, acc-sm tests are run on YALA, an automated test system. This document is being published to document acc-sm testing and encourage wider acc-sm testing in the Lustre community. What is acc-sm testing and why do we use it for Lustre? Acceptance small (acc-sm) testing is a suite of test cases used to verify different aspects of Lustre functionality. What tests comprise the acc-sm test suite? Each CVS branch contains a lustre/tests sub-directory; all acc-sm tests are stored here. The acceptance-small.sh file contains a list of all tests in the acc-sm suite. To get the list, run: $ grep TESTSUITE_LIST acceptance-small.sh The acc-sm tests are listed below, by CVS branch. b1_6 branch: $ grep TESTSUITE_LIST acceptance-small.sh export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL INSANITY SANITY_QUOTA PERFORMANCE_SANITY" -- 17 test suites b1_8_gate branch: $ grep TESTSUITE_LIST acceptance-small.sh export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL REPLAY_VBR INSANITY SANITY_QUOTA PERFORMANCE_SANITY" -- 18 test suites HEAD branch: $ grep TESTSUITE_LIST acceptance-small.sh export TESTSUITE_LIST="RUNTESTS SANITY DBENCH BONNIE IOZONE FSX SANITYN LFSCK LIBLUSTRE REPLAY_SINGLE CONF_SANITY RECOVERY_SMALL REPLAY_OST_SINGLE REPLAY_DUAL INSANITY SANITY_QUOTA SANITY_SEC SANITY_GSS PERFORMANCE_SANITY" -- 19 test suites November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 2 of 8

To see the test cases in a particular acc-sm test, run: $ grep run_ <testunit_script> For example, to see the test cases that comprise the SANITY test: $ grep run_ sanity.sh tail -3 run_test 130c "FIEMAP (2-stripe file with hole)" run_test 130d "FIEMAP (N-stripe file)" run_test 130e "FIEMAP (test continuation FIEMAP calls)" For each acc-sm test, what does it measure or show? Here is a brief description of each acc-sm test. RUNTESTS A basic regression test with unmount / remount. SANITY Verifies Lustre operation under normal operating conditions. DBENCH Dbench benchmark for simulating N clients to produce the filesystem load. BONNIE Bonnie++ benchmark for creation, reading and deleting many small files. IOZONE IOzone benchmark for generating and measuring a variety of file operations. FSX File system exerciser. SANITYN Verifies operations from two clients under normal operating conditions. November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 3 of 8

LFSCK Tests e2fsck and lfsck to detect and fix filesystem corruption. LIBLUSTRE Runs a test linked to a LibLustre client library. REPLAY_SINGLE Verifies recovery after an MDS failure. CONF_SANITY Verifies various Lustre configurations (including wrong ones), where the system must behave correctly. RECOVERY_SMALL Verifies RPC replay after communications failure. REPLAY_OST_SINGLE Verifies recovery after an OST failure. REPLAY_DUAL Verifies recovery from two clients after a server failure. INSANITY Tests multiple concurrent failure conditions. SANITY_QUOTA Verifies filesystem quotas. November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 4 of 8

How do you get the acc-sm tests? The acc-sm test suite is stored in the lustre/tests sub-directory on each CVS branch (b1_6, b1_8, and HEAD). Do you have to run every acc-sm test? No. You can choose to run only specified acc-sm tests. Tests can be run either with or without the acceptance-sm.sh (acc-sm.sh) wrapper script. Here are several examples: To only run the RUNTESTS and SANITY.sh tests: ACC_SM_ONLY= RUNTESTS sh acceptance-small.sh - OR - sh runtests To only run test_1 and test_2 of the SANITY.sh tests: ACC_SM_ONLY= SANITYN ONLY= 1 2 sh acceptance-small.sh To only run the replay-single.sh test and except the test_3* and test_4* tests: ACC_SM_ONLY= REPLAY_SINGLE REPLAY_SINGLE_EXCEPT= 3 4 sh acceptance-small.sh To only run the conf-sanity.sh script (without the acceptance-small.sh wrapper script): CONF_SANITY_EXCEPT= $ (seq 15) sh conf-sanity.sh Do the acc-sm tests have to be run in a specific order? The test order is defined in the acceptance-small.sh script and in each test script. Users do not have to (and should not) do anything to change the order of tests. Who runs the acc-sm tests? Currently, the QE group and Lustre developers run acc-sm as the main test suite for Lustre testing. Acc-sm tests are run on YALA, the automated test system, with test reports submitted to Buffalo (a web interface that allows for browsing various Lustre test results). We welcome external contributions to the Lustre acc-sm test efforts either of the Lustre code base or new testing platforms. November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 5 of 8

What type of Lustre environment is needed to run the acc-sm tests? Is anything special needed? The default Lustre configuration for acc-sm testing is 1 combined MGS/MDT, 1 OST and 1 client. Several clusters with this default configuration are available in YALA, the automated Lustre test system. To run the acc-sm test suite on a non-default Lustre configuration, you have to modify the default settings in the acc-sm configuration file, lustre/tests/cfg/local.sh. The configuration variables include mds_host, ost_host, OSTCOUNT, MDS_MOUNT_OPTS and OST_MOUNT_OPTS, among others. To create your own configuration file, copy cfg/local.sh to cfg/my_config.sh: cp cfg/local.sh cfg/my_config.sh Edit the necessary variables in the configuration file (my_config.sh) and run acc-sm. What are the steps to run acc-sm? There are two methods to run the acc-sm tests. 1. Check out a Lustre branch (b1_6, b1_8 or HEAD). 2. Change directory to lustre/tests: cd lustre/tests 3. Run acc-sm on a local, default Lustre configuration (1 MGS/MDT, 1 OST and 1 client): sh acceptance-small.sh 2>&1 tee /tmp/output - OR - 1. Install the lustre-tests RPM (available at lts-head:/var/cache/cfs/package/rpm/lustre). 2. Change directory to lustre/tests: cd /usr/lib/lustre/tests 3. Create your own configuration file and edit it for your configuration. cp cfg/local.sh cfg/my_config.sh 4. Run acc-sm on a local Lustre configuration. Here is an example of running acc-sm on a non-default Lustre configuration (MDS is sfire7, OST is sfire8, OSCOUNT=1, etc). In this example, only the SANITY test cases are being run. ACC_SM_ONLY=SANITY mds_host=sfire7 ost8_host=sfire8 MDSDEV1=/dev/sda1 OSTCOUNT=1 OSTDEV1=/dev/sda1 MDSSIZE=5000000 OSTSIZE=5000000 MDS_MOUNT_OPTS="-o user_xattr" OST_MOUNT_OPTS=" -o user_xattr" REFORMAT="--reformat" PDSH="pdsh -S -w" sh acceptance-small.sh November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 6 of 8

How do you run acc-sm on a mounted Lustre system? To run acc-sm on a Lustre system that is already mounted, you need to use the correct configuration file (according to the mounted Lustre system) and run acc-sm as: SETUP=: CLEANUP=: FORMAT=: NAME=<config> sh acceptance-small.sh How do you run acc-sm with and without reformat? By default, the acc-sm test suite does not reformat Lustre. If you want to reformat Lustre, run acc-sm with REFORMAT="--reformat": REFORMAT="--reformat" sh acceptance-small.sh If needed, you can specify WRITECONF="writeconf", and then run acc-sm with WRITECONF="writeconf": WRITECONF="writeconf" sh acceptance-small.sh How do you run acc-sm in a Lustre configuration with several clients? The default configuration file for acc-sm is cfg/local.sh, which uses only one client (local). To use additional remote clients, specify the RCLIENTS list and use the cfg/ncli.sh configuration file (or your own copy of ncli configuration). NAME=ncli RCLIENT=<space-separated list of remote clients> sh acceptance-small.sh For example: NAME=ncli RCLIENT="client2 client3 client11" sh acceptance-small.sh What is the SLOW variable and how is it used with acc-sm? The SLOW variable is used to run a subset of acc-sm tests. By default, the variable is set to SLOW=no, which causes some of the longer acc-sm tests to be skipped and acc-sm test run to complete in less than 2 hours. To run all of the acc-sm tests, set the variable to SLOW=yes: SLOW=yes sh acceptance-small.sh What is the FAIL_ON_ERROR variable and how is it used with acc-sm? The FAIL_ON_ERROR variable is used to "stop" or "continue" running acc-sm tests after a test failure occurs. If the variable is set to "true" (FAIL_ON_ERROR=true), then acc-sm stops after test_n fails and test_n+1 does not run. If the variable is set to "false" (FAIL_ON_ERROR=false), then acc-sm continues after test_n fails and test_n+1 does run. FAIL_ON_ERROR=false sh acceptance-small.sh November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 7 of 8

What is the PDSH variable and how it is used with acc-sm? The PDSH variable is used to provide remote shell access. If acc-sm is run on a Lustre configuration with remote servers, specify PDSH like this: PDSH="pdsh -S w" sh acceptance-small.sh If the client has no access to the servers, you can run acc-sm without PDSH, but the tests which need PDSH access are skipped. A summary report is generated which lists the skipped tests. What is the CMD configuration for HEAD? For the HEAD branch, specify the MDSCOUNT variable (number of MDTs). By default, the variable is set to 1. If you have a Lustre configuration with several MDT nodes, they need to be specified in the configuration file as mds1_host, mds2_host,... By default, all of these variables are set to the mds_host value. What do we do with the acc-sm test results? Acc-sm test results are sent to Buffalo, a web interface for Lustre test results. The default Buffalo display shows a summary of tests run on different hardware configurations for various CVS branches for the past 24 hours, with links to the various reports. For more information on reporting test results to Buffalo, see Buffalizing Tests. If an acc-sm test fails, then the failure is investigated. If the investigation reveals there is a Lustre defect, then a bug is opened in Bugzilla to fix the problem. November 2008 <INSERT THE GROUP(S) THAT NEED TO KNOW THE INFORMATION> Page 8 of 8