Fan Yong; Zhang Jinghai. High Performance Data Division

Similar documents
Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Olaf Weber Senior Software Engineer SGI Storage Software. Amir Shehata Lustre Network Engineer Intel High Performance Data Division

Zhang, Hongchao

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Andreas Dilger High Performance Data Division RUG 2016, Paris

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Ben Walker Data Center Group Intel Corporation

Clear CMOS after Hardware Configuration Changes

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Johann Lombardi High Performance Data Division

Lustre Metadata Fundamental Benchmark and Performance

Optimization of Lustre* performance using a mix of fabric cards

Demonstration Milestone Completion for the LFSCK 2 Subproject 3.2 on the Lustre* File System FSCK Project of the SFS-DEV-001 contract.

Lustre on ZFS. Andreas Dilger Software Architect High Performance Data Division September, Lustre Admin & Developer Workshop, Paris, 2012

Remote Directories High Level Design

Intel & Lustre: LUG Micah Bhakti

Improving Driver Performance A Worked Example

Lustre * 2.12 and Beyond Andreas Dilger, Intel High Performance Data Division LUG 2018

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Andreas Dilger, Intel High Performance Data Division LAD 2017

Andreas Dilger, Intel High Performance Data Division SC 2017

Intel Clear Containers. Amy Leeland Program Manager Clear Linux, Clear Containers And Ciao

WITH INTEL TECHNOLOGIES

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

SFA12KX and Lustre Update

Lustre overview and roadmap to Exascale computing

Intel Unite Solution. Linux* Release Notes Software version 3.2

DNE2 High Level Design

Assistance in Lustre administration

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Daniel Verkamp, Software Engineer

5.4 - DAOS Demonstration and Benchmark Report

Intel Cluster Ready Allowed Hardware Variances

Intel Unite Plugin Guide for VDO360 Clearwater

6.5 Collective Open/Close & Epoch Distribution Demonstration

DL-SNAP and Fujitsu's Lustre Contributions

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Lustre on ZFS. At The University of Wisconsin Space Science and Engineering Center. Scott Nolin September 17, 2013

ENVISION TECHNOLOGY CONFERENCE. Functional intel (ia) BLA PARTHAS, INTEL PLATFORM ARCHITECT

Intel Unite. Intel Unite Firewall Help Guide

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Extremely Fast Distributed Storage for Cloud Service Providers

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

Parallel File Systems for HPC

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

Lustre 2.8 feature : Multiple metadata modify RPCs in parallel

LFSCK High Performance Data Division

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Demonstration Milestone for Parallel Directory Operations

High Level Architecture For UID/GID Mapping. Revision History Date Revision Author 12/18/ jjw

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

Intel Omni-Path Fabric Manager GUI Software

Nathan Rutman SC09 Portland, OR. Lustre HSM

Orchestrating an OpenStack* based IoT Smart Home

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Fast Forward I/O & Storage

DAOS Epoch Recovery Design FOR EXTREME-SCALE COMPUTING RESEARCH AND DEVELOPMENT (FAST FORWARD) STORAGE AND I/O

Installation Guide and Release Notes

Administering Lustre 2.0 at CEA

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Tutorial: Lustre 2.x Architecture

Localized Adaptive Contrast Enhancement (LACE)

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Becca Paren Cluster Systems Engineer Software and Services Group. May 2017

Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications

Multi-Rail LNet for Lustre

Lustre/HSM Binding. Aurélien Degrémont Aurélien Degrémont LUG April

Brent Gorda. General Manager, High Performance Data Division

Intel Software Guard Extensions Platform Software for Windows* OS Release Notes

Intel Server Board S2400SC

Achieve Low Latency NFV with Openstack*

Intel Unite Solution. Plugin Guide for Protected Guest Access

LFSCK 1.5 High Level Design

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET

Storage Performance Development Kit (SPDK) Daniel Verkamp, Software Engineer

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

SPDK Blobstore: A Look Inside the NVM Optimized Allocator

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

DIY Security Camera using. Intel Movidius Neural Compute Stick

Lustre OSS Read Cache Feature SCALE Test Plan

Running Docker* Containers on Intel Xeon Phi Processors

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Installation Guide and Release Notes

Accelerate Finger Printing in Data Deduplication Xiaodong Liu & Qihua Dai Intel Corporation

Debugging and Analyzing Programs using the Intercept Layer for OpenCL Applications

Ceph BlueStore Performance on Latest Intel Server Platforms. Orlando Moreno Performance Engineer, Intel Corporation May 10, 2018

Disruptive Storage Workshop Hands-On Lustre

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

AKIPS Network Monitor User Manual Version 18.x. AKIPS Pty Ltd

Bei Wang, Dmitry Prohorov and Carlos Rosales

Transcription:

Fan Yong; Zhang Jinghai High Performance Data Division

How Can Lustre * Snapshots Be Used? Undo/undelete/recover file(s) from the snapshot Removed file by mistake, application failure causes data invalid Quickly backup the filesystem before system upgrade Upgrade Lustre/kernel may hit some trouble and need to roll back Prepare a consistent frozen data view for backup tools Ensure system is consistent for the whole backup 2

Phase I: ZFS-based Lustre * Snapshot Targeted for Community Lustre 210 release 3

ZFS-based Lustre * Snapshot Overview ZFS snapshot created on each target with a new fsname Mount as separate read-only Lustre filesystem on client(s) Architecture details: http://wikilustreorg/lustre_snapshots lctl snapshot commands Lustre control Lustre kernel lctl API ZFS control Userspace MGS MDSs Lustre kernel ZFS tools set Lustre kernel ZFS tools set OSSs 4

Global Write Barrier Freeze the system during creating snapshot pieces on every target Write barrier on MDTs only No orphans, no dangling references New lctl commands for the global write barrier lctl barrier_freeze <fsname> [timeout (seconds)] lctl barrier_thaw <fsname> lctl barrier_stat <fsname> 5

Two Phase Global Write Barrier Setup User action 1 lctl barrier_freeze Barrier done! MGS action MDT action 2 start FREEZE1 3 Get barrier lock from MGS, block client modifications, flush RPCs MGS 5 Wait all FREEZE1 done; start FREEZE2 8 Wait all FREEZE2 done MDT_0 4 Notify MGS FREEZE1 done MDT_n 6 Get barrier lock from MGS, sync/commit local transactions 7 Notify MGS FREEZE2 done OST_0 OST_m OST_x 6

Fork/Erase Configuration Logs Snapshot is independent from the original filesystem New filesystem name (fsname) is assigned to the snapshot Fsname is part of the configuration logs names Fsname exists in the configuration logs entries New lctl commands for fork/erase configuration logs lctl fork_lcfg <fsname> <new_fsname> lctl erase_lcfg <fsname> 7

Mount Snapshot Read-only Not only -o ro Any modification of ZFS snapshot can trigger backend failure/assertion Open ZFS dataset as readonly mode NOT start cross-servers sync thread, pre-create thread, quota thread Skip sequence file initialization, orphan cleanup, recovery Ignore last_rcvd modification Deny to create transaction Forbid LFSCK 8

Userspace Interfaces lctl snapshot_xxx Functionality Create snapshot Destroy snapshot Modify snapshot attributes List the snapshots Mount snapshot Umount snapshot Usage lctl snapshot_create [-b --barrier] [-c --comment comment] <-F --fsname fsname] [-h --help] <-n --name ssname> [-r --rsh remote_shell][-t --timeout timeout] lctl snapshot_destroy [-f --force] <-F --fsname fsname> [-h --help] <-n --name ssname> [-r --rsh remote_shell] lctl snapshot_modify [-c --comment comment] <-F --fsname fsname> [-h --help] <-n --name ssname> [-N --new new_ssname] [-r --rsh remote_shell] lctl snapshot_list [-d --detail] <-F --fsname fsname> [-h --help] [-n --name ssname] [-r --rsh remote_shell] lctl snapshot_mount <-F --fsname fsname> [-h --help] <-n --name ssname> [-r --rsh remote_shell] lctl snapshot_umount <-F --fsname fsname> [-h --help] <-n --name ssname> [-r --rsh remote_shell] 9

Snapshot tracking lsnapshotlog Snapshot logs: /var/log/lsnapshotlog # cat /var/log/lsnapshotlog Sun Mar 13 14:46:05 2016 (32688:jt_snapshot_create:1138:lustre:ssh): Create snapshot mysnapshot successfully with comment <This is a test>, barrier <enable>, timeout <60> Sun Mar 13 14:48:27 2016 (515:jt_snapshot_modify:1521:lustre:ssh): Modify snapshot mysnapshot successfully with name <newsnapshot>, comment <The old name is mysnapshot> Sun Mar 13 14:49:13 2016 (632:jt_snapshot_mount:2013:lustre:ssh): The snapshot newsnapshot is mounted Sun Mar 13 14:53:03 2016 (894:jt_snapshot_modify:1521:lustre:ssh): Modify snapshot newsnapshot successfully with name <(null)>, comment <Change comment online> Sun Mar 13 14:53:20 2016 (973:jt_snapshot_umount:2167:lustre:ssh): the snapshot newsnapshot have been umounted 10

Write Barrier Scalability CPU: Intel Xeon E5620 @240GHz 4 cores * 2, HT RAM: 64GB DDR3 Network: InfiniBand QDR Storage: SATA disk arrays 2 MDTs per MDS 4 OSTs per OSS barrier_freeze time (seconds) 25 20 15 10 5 0 Write Barrier Scalability 202 182 182 168 172 184 172 144 idle busy 1 2 4 8 MDTs count 11

Snapshot I/O Scalability snapshot_create time (seconds) 35 30 25 20 15 10 5 0 Snapshot Scalability with MDTs 272 276 228 214 308 266 286 26 18 14 16 16 1 14 12 12 1 2 4 8 MDTs count snapshot_create time (seconds) 35 30 25 20 15 10 5 0 Snapshot Scalability with OSTs 254 208 292 21 idle+barrier idle-barrier 30 253 326 212 busy+barrier busy-barrier 14 16 2 2 18 14 16 16 2 4 8 16 OSTs count 12

I/O Performance With Snapshots Limited impact on metadata performance Measured via mds-survey on single MDT Slight benefit as changed blocks not freed No significant impact on I/O performance Measure via obdfilter-survey on one OST Not Lustre * specific, ZFS is COW based Objects/second Metadata Performance Impact 20000 16000 12000 8000 4000 15926 14817 14903 13500 with snasphot without snapshot 0 destroy create 13

Next Steps for Snapshot Feature Phase I: targeted for Community Lustre 210 release landing Phase II: Lustre * integrated snapshot Depends on users requirements vs other Lustre features, performance, etc More controllable and relatively independent solution Reuse Phase I global write barrier Integrate snapshot creation/mount/unmount into OSD Identify files/objects in each snapshot as part of File Identifier (FID) 14

Legal Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation Learn more at intelcom, or from the OEM or retailer No computer system can be absolutely secure Tests document performance of components on a particular test, in specific systems Differences in hardware, software, or configuration will affect actual performance Consult other sources of information to evaluate performance as you consider your purchase For more complete information about performance and benchmark results, visit http://wwwintelcom/performance Intel disclaims all express and implied warranties, including, without limitation, the implied warranties and merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing or usage in trade Intel, the Intel logo and others are trademarks of Intel Corporation in the US and/or other countries *Other names and brands may be claimed as the property of others 2016 Intel Corporation