Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

Similar documents
Optimization of Lustre* performance using a mix of fabric cards

Small File I/O Performance in Lustre. Mikhail Pershin, Joe Gmitter Intel HPDD April 2018

Zhang, Hongchao

Intel. Rack Scale Design: A Deeper Perspective on Software Manageability for the Open Compute Project Community. Mohan J. Kumar Intel Fellow

Andreas Dilger High Performance Data Division RUG 2016, Paris

Intel Unite Plugin Guide for VDO360 Clearwater

Intel Unite Solution. Linux* Release Notes Software version 3.2

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Evolving Small Cells. Udayan Mukherjee Senior Principal Engineer and Director (Wireless Infrastructure)

12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

An introduction to today s Modular Operating System

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

Intel & Lustre: LUG Micah Bhakti

ENVISION TECHNOLOGY CONFERENCE. Functional intel (ia) BLA PARTHAS, INTEL PLATFORM ARCHITECT

Clear CMOS after Hardware Configuration Changes

Intel Unite. Intel Unite Firewall Help Guide

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

Live Migration of vgpu

Intel Unite Solution Version 4.0

Intel Unite Solution. Plugin Guide for Protected Guest Access

M.2 Evolves to Storage Benefit

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Re- I m a g i n i n g D a t a C e n t e r S t o r a g e a n d M e m o r y

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

Building an Open Memory-Centric Computing Architecture using Intel Optane Frank Ober Efstathios Efstathiou Oracle Open World 2017 October 3, 2017

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

The Transition to PCI Express* for Client SSDs

Intel Xeon W-3175X Processor Thermal Design Power (TDP) and Power Rail DC Specifications

Intel Unite Solution Intel Unite Plugin for WebEx*

Fan Yong; Zhang Jinghai. High Performance Data Division

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

LNet Roadmap & Development. Amir Shehata Lustre * Network Engineer Intel High Performance Data Division

Andreas Dilger. Principal Lustre Engineer. High Performance Data Division

Jim Harris Principal Software Engineer Intel Data Center Group

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

Intel Software Guard Extensions Platform Software for Windows* OS Release Notes

Intel QuickAssist for Windows*

Intel Xeon Processor E v3 Family

DIY Security Camera using. Intel Movidius Neural Compute Stick

Intel Unite Solution Version 4.0

SELINUX SUPPORT IN HFI1 AND PSM2

Ben Walker Data Center Group Intel Corporation

Intel Unite Solution. Plugin Guide for Protected Guest Access

Drive Recovery Panel

Intel Open Network Platform Release 2.0 Hardware and Software Specifications Application Note. SDN/NFV Solutions with Intel Open Network Platform

Jomar Silva Technical Evangelist

Intel SSD DC P3700 & P3600 Series

Sample for OpenCL* and DirectX* Video Acceleration Surface Sharing

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.9 (GA)

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

WITH INTEL TECHNOLOGIES

High Performance Computing The Essential Tool for a Knowledge Economy

Extremely Fast Distributed Storage for Cloud Service Providers

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Andreas Dilger, Intel High Performance Data Division LAD 2017

Software Evaluation Guide for ImTOO* YouTube* to ipod* Converter Downloading YouTube videos to your ipod

Running Docker* Containers on Intel Xeon Phi Processors

Intel Omni-Path Fabric Manager GUI Software

Intel Xeon Phi Coprocessor. Technical Resources. Intel Xeon Phi Coprocessor Workshop Pawsey Centre & CSIRO, Aug Intel Xeon Phi Coprocessor

Intel Omni-Path Fabric Manager GUI Software

Making Nested Virtualization Real by Using Hardware Virtualization Features

Ravindra Babu Ganapathi

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

Extending Energy Efficiency. From Silicon To The Platform. And Beyond Raj Hazra. Director, Systems Technology Lab

Jason Waxman General Manager High Density Compute Division Data Center Group

Intel Analysis of Speculative Execution Side Channels

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Intel Software Guard Extensions SDK for Linux* OS. Installation Guide

Intel SSD Data center evolution

Analyze and Optimize Windows* Game Applications Using Intel INDE Graphics Performance Analyzers (GPA)

Intel vpro Technology Virtual Seminar 2010

Välkommen. Intel Anders Huge

Modernizing Meetings: Delivering Intel Unite App Authentication with RFID

Intel Unite Solution Intel Unite Plugin for Ultrasonic Join

OpenCL* and Microsoft DirectX* Video Acceleration Surface Sharing

MICHAL MROZEK ZBIGNIEW ZDANOWICZ

Intel Media Server Studio 2018 R1 Essentials Edition for Linux* Release Notes

Intel Cluster Ready Allowed Hardware Variances

Lustre Beyond HPC. Presented to the Lustre* User Group Beijing October 2013

Intel Atom Processor D2000 Series and N2000 Series Embedded Application Power Guideline Addendum January 2012

CREATING A COMMON SOFTWARE VERBS IMPLEMENTATION

Intel Unite Solution Version 4.0

Dynamic Power Optimization for Higher Server Density Racks A Baidu Case Study with Intel Dynamic Power Technology

Intel VTune Amplifier XE

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Theory and Practice of the Low-Power SATA Spec DevSleep

SSDs Going Mainstream Do you believe? Tom Rampone Intel Vice President/General Manager Intel NAND Solutions Group

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

LED Manager for Intel NUC

Intel Compute Card Slot Design Overview

Intel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*

IXPUG 16. Dmitry Durnov, Intel MPI team

Intel Speed Select Technology Base Frequency - Enhancing Performance

Intel Unite. Enterprise Test Environment Setup Guide

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Tutorial: Finding Hotspots with Intel VTune Amplifier - Linux* Intel VTune Amplifier Legal Information

Intel Core TM Processor i C Embedded Application Power Guideline Addendum

Improving Driver Performance A Worked Example

Intel Cache Acceleration Software (Intel CAS) for Linux* v2.8 (GA)

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Transcription:

Data life cycle monitoring using RoBinHood at scale Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September 2015 - LAD

Agenda Motivations Hardware and software setup The first scan problem Changelog injection DU, FIND Conclusion 9/16/2015 2

Motivations During the last few years several customers started to adopt RoBinHood (RBH) to monitor and manage (HSM) the life of the data into Lustre*. Intel currently supports RBH, included in the Intel Enterprise Edition for Lustre* software. The objective of this presentation is to guide the audience on how to size, test and troubleshoot RBH to handle a very large number of files. 9/16/2015 3

Hardware Setup 4x Object Storage Server: 2x Intel Xeon E5-2643v2 CPU 64GB of RAM. 4x Intel SSD DC P3700 Series 400GB Mellanox* FDR card and Intel True Scale Fabric QDR card. 1x Metadata Server: 2x Intel Xeon E5-2643v2 CPU 64GB of RAM. 2x Intel SSD DC P3700 Series 400GB Mellanox* FDR card and Intel True Scale Fabric QDR card. 1x Policy Engine 2x Intel Xeon E5-2643v2 CPU 64GB of RAM. 2x Intel SSD DC P3700 Series 400GB Mellanox* FDR card and Intel True Scale Fabric QDR card. 4x Object Storage Server Policy Engine 1x Metadata Server 9/16/2015 4

Software stack used Intel Enterprise Edition for Lustre* 2.3 Lustre* version 2.5.37.7 Kernel version 2.6.32-504.30.3.el6_lustre.x86_64 Patched version of RBH 2.5.5 RBH accounting has been disabled for performance and to benefit recent commit bd6aa4f (multi-threading DB batch operations) MySQL Version 5.1.73 included in the CentOS* distro Community version 5.6.26 from Oracle Processor hw crc32 checksuming to reduce the buf_calc_page_new_checksum() 9/16/2015 5

innodb buffer entries/sec How to design the RBH server 3000 First scan operation High frequency CPU to increase metadata queries As much memory as possible 80% of available RAM for innodb_buffer_log 2500 2000 1500 1000 500 0 70 60 50 40 30 20 10 0 default innodb_buffer_log MySQL memory allocation 10M 48M 81M number of files 9/16/2015 6

SSD devices are essential MySQL on Intel NVM MySQL on local HDD First scan operation with RBH on the same server. In the chart metadata operations per second collected by Intel Manager for Lustre* 9/16/2015 7

MySQL software tuning Maintain the configuration simple Use mysqltuner for advice slow_query_log=1 Disable sequential optimization for MySQL and save CPU query_cache_size=8m thread_cache_size=4 innodb_buffer_pool_size= < 80% of the RAM > tmp_table_size=64m Disable transaction protection in MySQL if your server is safe enough max_heap_table_size=64m innodb_flush_neighbors=0 innodb_flush_log_at_trx_commit=2 9/16/2015 8

entries/sec First scan operation RBH calls path2fid, lstat, getstripe, hsm_state to collect information about entries The metadata server must be fast enough Many operation modes are possible: Single scan server Single scan server multiple threads Multiple scan servers 3000 2500 2000 1500 1000 500 0 First scan speed 1M 10M 50M number of files scanned We run out of innodb_buffer_log 9/16/2015 9

Changelog operations RBH uses the Lustre* changelog to identify changes in the file system after the first scan. We need (again) a fast metadata server Changelog OFF Changelog ON 35282 file creation/sec 33730 file creation/sec 9/16/2015 10

Changelog injection test bed Dataset 10M/50M/80M, 1M files per directory Real small files (64/32K) created by mdtest $MDTEST_BIN -I 1000000 -w 64000 -F C -u -d $DIR Used 4 to 16 clients Tested only CREATE operations on the file system Tools used to troubleshoot: perf mysqltuner Intel Manager for Lustre* iostat, vmstat and other Linux tools 9/16/2015 11

ops per second Speed measured during the experiments Changelog operations Increasing the number of clients file creation speed is increasing The RBH speed remain stable but below the speed of the file system 25000 20000 15000 10000 5000 0 13 49 81 millions of files creation speed read speed 81Millions files experiment 9/16/2015 12

How to troubleshoot RBH Activate the stats on RBH EntryProcessor Pipeline Stats GET_INFO_DB GET_INFO_FS DB_APPLY Idle threads We want to maintain the pipe full without overloading Increasing the n. threads and the objects to process is not always a good idea 9/16/2015 13

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 102 108 114 120 126 132 Troubleshooting The latency measured in the RBH s report file give us great insights In our environment the latency of RBH to access to the file system is bigger than the DB operations A Lustre* tuning is necessary to decrease the latency 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 ms/op 0 Lustre* metadata operations from this client are slower than DB RBH operations during 49M experiment execution time 12000 10000 8000 6000 4000 2000 0 GET_INFO_DB GET_INFO_FS DB_APPLY read speed 9/16/2015 14

Lustre* metrics and tunables Increasing the capacity of RBH to perform metadata operations: lctl set_param llite.*.statahead_max lctl set_param ldlm.namespaces.*.lru_size lctl set_param ldlm.namespaces.*.lru_max_age sysctl -w lnet.debug=0 lctl set_param mdc.*.max_rpcs_in_flight (remember to increse peer_credits if necessary) 9/16/2015 15

RobinHood helps SysAdmin 49 millions 81 millions du -h 5h 40m 12sec Still working!!! rbsh-lhsm-du -H -d 1m 39sec 1m 46sec lfs find --ost 0 6h 49m 16sec Still working!!! rbh-lhsm-find --ost 0 33m 54sec 36m 52sec 9/16/2015 16

Conclusion We successfully tested the capabilities of RBH to report the utilization of file system busy with millions of files MySQL must be very well designed and tuned Manage a RBH instance for more than 100M of files could be a problem Next steps: Verify the impact of deep tree layouts Try to mimic a more realistic workload including different metadata operation Verify the scalability of RBH in a HSM environment 9/16/2015 17

Notices and Disclaimers Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Statements in this document that refer to Intel s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel s results and plans is included in Intel s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and noninfringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, and Intel Xeon trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. 2015 Intel Corporation. 9/16/2015 18

Intel Confidential 9/16/2015 19