Improving overall Robinhood performance for use on large-scale deployments Colin Faber

Similar documents
RobinHood Project Status

TokuDB vs RocksDB. What to choose between two write-optimized DB engines supported by Percona. George O. Lorch III Vlad Lesin

Percona Server for MySQL 8.0 Walkthrough

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Robin Hood 2.5 on Lustre 2.5 with DNE

RocksDB Key-Value Store Optimized For Flash

Lustre overview and roadmap to Exascale computing

A Brief Introduction of TiDB. Dongxu (Edward) Huang CTO, PingCAP

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Database Architectures

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

The Google File System

Parallel File Systems for HPC

ScaleArc Performance Benchmarking with sysbench

ITS. MySQL for Database Administrators (40 Hours) (Exam code 1z0-883) (OCP My SQL DBA)

MySQL for Developers Ed 3

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

Switching to Innodb from MyISAM. Matt Yonkovit Percona

EMC Documentum 6.5 with SQL Server 2008 Reaches New Heights in Scalability and Performance, Driving Lower Customer TCO

LUG 2012 From Lustre 2.1 to Lustre HSM IFERC (Rokkasho, Japan)

MySQL & NoSQL: The Best of Both Worlds

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Load Testing Tools. for Troubleshooting MySQL Concurrency Issues. May, 23, 2018 Sveta Smirnova

MongoDB Schema Design

Data storage on Triton: an introduction

What's new in MySQL 5.5? Performance/Scale Unleashed

Scaling DreamFactory

Delegates must have a working knowledge of MariaDB or MySQL Database Administration.

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

Kaminario Powering Cloud-Scale Applications with All-Flash. Sundip Arora Director, Product Marketing

Google File System. Arun Sundaram Operating Systems

Data Movement & Tiering with DMF 7

Next-Generation Cloud Platform

MySQL for Developers Ed 3

The ATLAS EventIndex: Full chain deployment and first operation

MongoDB Schema Design for. David Murphy MongoDB Practice Manager - Percona

Massive Scalability With InterSystems IRIS Data Platform

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Which technology to choose in AWS?

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Automating Information Lifecycle Management with

MySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

MySQL for Database Administrators Ed 3.1

BIS Database Management Systems.

MIS Database Systems.

MySQL Database Administrator Training NIIT, Gurgaon India 31 August-10 September 2015

/ Cloud Computing. Recitation 6 October 2 nd, 2018

Performance improvements in MySQL 5.5

<Insert Picture Here> MySQL Cluster What are we working on

State of the Dolphin Developing new Apps in MySQL 8

Manual Trigger Sql Server 2008 Insert Multiple Rows

IBM Tivoli Storage Manager for HP-UX Version Installation Guide IBM

CSE 333 Lecture 9 - storage

To Shard or Not to Shard That is the question! Peter Zaitsev April 21, 2016

Andreas Dilger, Intel High Performance Data Division Lustre User Group 2017

Four Steps to Unleashing The Full Potential of Your Database

RobinHood Policy Engine

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Jyotheswar Kuricheti

DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS

MySQL for Developers. Duration: 5 Days

Extreme Storage Performance with exflash DIMM and AMPS

MySQL Replication Options. Peter Zaitsev, CEO, Percona Moscow MySQL User Meetup Moscow,Russia

MySQL Database Scalability

Migrating Oracle Databases To Cassandra

MySQL 5.7 For Operational DBAs an Introduction. Peter Zaitsev, CEO, Percona February 16, 2016 Percona Technical Webinars

MySQL for Developers. Duration: 5 Days

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Why Choose Percona Server For MySQL? Tyler Duzan

System Characteristics

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Evaluation of Chelsio Terminator 6 (T6) Unified Wire Adapter iscsi Offload

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Oracle Database 12c: JMS Sharded Queues

InnoDB Scalability Limits. Peter Zaitsev, Vadim Tkachenko Percona Inc MySQL Users Conference 2008 April 14-17, 2008

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Exadata Implementation Strategy

The TokuFS Streaming File System

Introduction to K2View Fabric

Optimizing Performance for Partitioned Mappings

Manual Trigger Sql Server 2008 Insert Multiple Rows At Once

Lustre * Features In Development Fan Yong High Performance Data Division, Intel CLUG

The Design and Optimization of Database

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

NewSQL Databases. The reference Big Data stack

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

Update The Statistics On A Single Table+sql Server 2005

Performance of relational database management

Product Overview. Technical Summary, Samples, and Specifications

Fast Forward I/O & Storage

Survey of Oracle Database

Performance comparisons and trade-offs for various MySQL replication schemes

Triton file systems - an introduction. slide 1 of 28

Transcription:

Improving overall Robinhood performance for use on large-scale deployments Colin Faber <colin.faber@seagate.com> 2017 Seagate Technology LLC 1

WHAT IS ROBINHOOD? Robinhood is a versatile policy engine and tool kit used manage large-scale file systems Maintains on going replication of target FS metadata Utilizes external relational database for metadata management Utilizes Lustre Changelog feature Utilizes Lustre HSM management features Versatile and can be used in non-hsm, and non-lustre environments Provides tool clones which utilize relational database rather than FS directly Robinhood was originally developed by CEA the French Alternative Energies and Atomic Energy Commission, and is actively utilized by many HPC sites, as well as by many HPC vendors for use in their products. Seagate, Cray, DDN, DKRZ, Kaust, NCSA, Oracle, as well as many others. 2

HOW SEAGATE USES ROBINHOOD Utilized directly in various Lustre offerings ClusterStor A200 tiered active archive object store ClusterStor L300N / 9000 / 1500 Currently supports large scale file systems which utilize Robinhood for various tasks 100 s of millions of files and directories Huge amounts of file system activity Stale file detection and removal HSM workloads Seagate is committed to supporting and improving Robinhood as the current, best Lustre changelog consumer and policy engine available. 3

CHALLENGES WITH ROBINHOOD Relies on Lustre changelogs which can be difficult to manage, requiring occasional rescans of the target file system Seagate has invested considerable development resources to harden the Changelog feature Current relational database model leaves room for optimization Backing database choice for Robinhood (MySQL with InnoDB engine) has scaling issues Relational database transactions are inherently inefficient Environments with high file counts and high change rates result in processing backlogs Alternative database engines (Percona TokuDB / PostgreSQL, and various others) may be a better fit for large-scale deployments 4

MITIGATING CHALLENGES WITH ROBINHOOD To mitigate challenges encountered in our Robinhood deployments, our development work has focused on hardening and improving Robinhood Code improvements for manageability Break up large routines and logic blocks Introduce a development package to build new features against Database tuning and experiments Lustre client tuning and experiments Developed several value-add components through a plug-in architecture we re developing Determined an appropriate sizing guide and techniques for policy engine systems 5

SIZING A POLICY ENGINE SYSTEM APPROPRIATELY A Robinhood-based policy engine system is mostly a large relational database Each inode on your file system equals 4 or more records in the database Running reports, triggering policies, and utilizing external tools which ship with Robinhood all generate complex queries to which the database has to respond A low-powered, low-core count, low CPU frequency machine will result in a low performance Robinhood policy engine Large-scale file systems need large-scale database systems for Robinhood to operate against 6

DATABASE TUNING Database tuning is critical. InnoDB tuning using Percona s automatic tuning tool works great! https://tools.percona.com/wizard After many experiments we found that our custom tuning was only moderately better than the Percona tuning wizard. Sysbench version 0.5 with complex OLTP workload is a great way to exercise your database before getting Robinhood involved Repeatable / industry standard database benchmark suite Various tuning strategies can be employed and tested Tuning itself can be scripted, allowing for hands off performance discovery 7

DATABASE SYSTEM SIZING (CPU) Surprise surprise! A faster CPU makes for a faster database server A faster CPU unsurprisingly means better performance 8

DATABASE SYSTEM SIZING (continued) Storage matters, but not as much as CPU and memory CPU core count and clock speed are the biggest factors in performance More memory means more relational data in cache Storage should be appropriate for the file system size 1+ KB per inode + InnoDB log file size == database size on disk The higher the IOP rate, the better Many small random block reads and writes Solid State / NVMe block device preferred For us, EXT4 is the file system of choice Various things can be tuned, with slight performance gains 9

DATABASE SYSTEM SIZING (continued) A software RAID solution can work well for database use Needs extra CPU cycles for block processing Larger chunk size help GPT aligned partitions help Fast Seagate drives help :) 10

ROBINHOOD TUNING Simple tuning makes for vast performance improvements Disabling accounting helps significantly Robinhood database threads should be based on sysbench benchmarking findings Lustre client should be tuned for Robinhood Disable various levels of caching Increase RPCs in flight Tune statahead Flush inode cache regularly 11

SEAGATE ROBINHOOD CONTRIBUTIONS Code refactor Seagate has completed large amounts of code refactor work changing the internal framework to allow for easier changes in the core application Plug-in architecture New framework provides development packages and allows for plugin style development within Robinhood record flow systems Performance optimizations New plug-ins developed by Seagate contain enhancements to record flow management, improving overall operational performance Plug-in framework allows for further incremental changes improving performance 12

CHANGELOG UPLOAD ENHANCEMENT (CUE) CUE sorts, categorizes and reduces total changelog record processing requirements without losing information Reducing changelog record flow reduces DB and FS load Sorting and categorizing allows for additional enhancements Various opportunistic bulk queries ( WHERE IN() ) Allows for alternative database strategies ( NoSQL sharding, etc ) Improves overall Robinhood performance and reduces hardware requirements Provides a path forward to scale further 13

CUE RESULTS SO FAR Average performance improvements of a 400+ % reduction in record processing time and a 40 % reduction in database query load for generalized workloads (with feature, without feature lower is better) 14

SQL OPTIMIZATION AND SIMPLIFICATION Current Robinhood SQL implementation Optimized for single transaction operations Uses expensive / complex operations LEFT JOIN type operations (requires rescan of table space) Disallows SHARDing type horizontal scaling strategies Currently only works MySQL / SQLite RDBMS Simplification targets: Optimize for single transaction bulk operations Reduce SQL processing time in common workloads (DELETE / UPDATE) Provide pathway toward alternative relational database engines Allow for per-operation / per-function based testing and profiling 15

NEXT STEPS Test utilizing multiple concurrent client mounts to the same file system, on the same client to attempt to avoid Lustre bottlenecks (Single MDC Semaphore when performing FID lookups fixed in recent lustre versions) Consolidate SQL query sets into batch transactions where possible Further refinements to the CUE plugin Investigation and porting work into alternative database systems which are designed for the types of workloads commonly seen within Robinhood deployments Percona s TokuDB engine PostgreSQL Various NoSQL implementations 16

Questions? Thank you! 2017 Seagate Technology LLC 17

LUSTRE CLIENT TUNING CHEAT SHEET Sample tuning utilized for scale testing Overall system setup: vm.vfs_cache_pressure to 150 Lustre setup: llite.*.xattr_cache to 0 ldlm.namespaces.*osc*.lru_max_age to 1200 ldlm.namespaces.*osc*.lru_size to 100 osc/*/max_rpcs_in_flight and mdc/*/max_rpcs_in_flight to 256 llite.*.statahead_max to 4 Crontab: @daily echo 2 > /proc/sys/vm/drop_caches 18