Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

Similar documents
From an open storage solution to a clustered NAS appliance

Centralized configuration management using registry tdb in a CTDB cluster

IBM Storwize V7000 Unified

IBM Active Cloud Engine centralized data protection

Clustered NAS For Everyone Clustering Samba With CTDB. NLUUG Spring Conference 2009 File Systems and Storage

Running And Troubleshooting A Samba/CTDB Cluster. A Tutorial At sambaxp 2011

EMC VPLEX Geo with Quantum StorNext

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

Samba in a cross protocol environment

Clustering Samba With CTDB A Tutorial At sambaxp 2010

Clustering Samba With CTDB A Tutorial At sambaxp 2010

Hitachi HH Hitachi Data Systems Storage Architect-Hitachi NAS Platform.

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

AutoRID module extensions to control ID generation

Nový IBM Storwize V7000 Unified block-file storage system Simon Podepřel Storage Sales 2011 IBM Corporation

Clustered Samba Not just a hack any more

IBM řešení pro větší efektivitu ve správě dat - Store more with less

EMC Celerra CNS with CLARiiON Storage

PLAYING NICE WITH OTHERS: Samba HA with Pacemaker

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

SONAS Best Practices and options for CIFS Scalability

Clustered NAS For Everyone Clustering Samba With CTDB A Tutorial At sambaxp 2009

PRESENTATION TITLE GOES HERE

GlusterFS and RHS for SysAdmins

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Improve Web Application Performance with Zend Platform

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Exam Name: Midrange Storage Technical Support V2

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

An Introduction to GPFS

Synology High Availability (SHA)

Hitachi HQT-4210 Exam

Dell Fluid File System Version 6.0 Support Matrix

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Optimizing and Managing File Storage in Windows Environments

IBM EXAM QUESTIONS & ANSWERS

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

Backup and archiving need not to create headaches new pain relievers are around

NETVAULT BACKUP SYSTEM ADMINISTRATION COMPLETE - INSTRUCTOR LED TRAINING

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Oracle DBA workshop I

Polarion 18 Enterprise Setup

EMC Business Continuity for Microsoft Applications

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

HOW DATA DEDUPLICATION WORKS A WHITE PAPER

Technology Insight Series

Get More Out of Storage with Data Domain Deduplication Storage Systems

NetVault Backup Web-based Training Bundle - 2 Student Pack

Titan SiliconServer for Oracle 9i

IBM TS7700 grid solutions for business continuity

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Configuring file system archival solution with Symantec Enterprise Vault

Exam : Title : High-End Disk for Open Systems V2. Version : DEMO

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Infinite Volumes Management Guide

Microsoft Office SharePoint Server 2007

Why Datrium DVX is Best for VDI

Vendor: Hitachi. Exam Code: HH Exam Name: Hitachi Data Systems Storage Fondations. Version: Demo

Availability in the Modern Datacenter

Data Movement & Tiering with DMF 7

Dell Fluid File System. Version 6.0 Support Matrix

SwiftStack and python-swiftclient

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Symbols INDEX > 12-14

WEBSPHERE APPLICATION SERVER

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

IBM Storage Software Strategy

The Fastest Scale-Out NAS

Hedvig as backup target for Veeam

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Glamour: An NFSv4-based File System Federation

CS510 Operating System Foundations. Jonathan Walpole

Polarion Enterprise Setup 17.2

Clustered Samba Challenges and Directions. SDC 2016 Santa Clara

Vendor: IBM. Exam Code: Exam Name: IBM Midrange Storage Technical Support V3. Version: Demo

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

SAP with Oracle on UNIX and NFS with NetApp Clustered Data ONTAP and SnapManager for SAP 3.4

Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper By Anton Els

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Synology High Availability (SHA)

Cloud Meets Big Data For VMware Environments

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

High Availability for Highly Reliable Systems

Introducing Panasas ActiveStor 14

Ende-zu-Ende Datensicherungs-Architektur bei einem Pharmaunternehmen

Datacenter replication solution with quasardb

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

CONFIGURING IBM STORWIZE. for Metadata Framework 6.3

Got Isilon? Need IOPS? Get Avere.

Current Topics in OS Research. So, what s hot?

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

Configuring IBM Spectrum Protect for IBM Spectrum Scale Active File Management

EMC Symmetrix DMX Series The High End Platform. Tom Gorodecki EMC

Nové možnosti storage virtualizace s řešením IBM Storwize V7000

1) ADDITIONAL INFORMATION

Protect enterprise data, achieve long-term data retention

Enterprise Backup and Restore technology and solutions

Transcription:

Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS) Dr. Jens-Peter Akelbein Mathias Dietz, Christian Ambach IBM Germany R&D 2011 Storage Developer Conference. Insert Your Company Name. All Rights Reserved.

IBM SONAS Overview Enterprise class network attached storage Scale out design to offer linear scaling of performance and capacity growth up to 14,4 petabytes Provides single global repository to manage multiple petabytes of storage Up to a billion files in a single file system Policy-driven file lifecycle management (disk and tape) Supports CIFS, NFS, HTTP, FTP Provides centralized management by CLI or GUI Enable desaster recovery and business continuity with asynchronous replication NDMP designed to provide full and incremental backup of files Integrated Tivoli Storage Manager for backup and HSM Securing data with Antivirus integration from most commonly deployed ISV applications 2

Active/Active nodes passive passive passive passive passive passive Reduce redundancy 6 nodes, 8 servers After the first node failure you run with a chance of 3 out of 7 the next failure will cause an outage You still have 33% more power consumption 100% redundancy 6 nodes, 12 servers After the first node failure you run with a chance of 1 out of 11 the next failure will cause an outage You have twice as many power consumption passive passive CTDB 0% redundant servers 6 nodes consolidated as 1 server First node failure just means there are 5 nodes left for NAS protocol I/O (16,6% bandwidth reduction) Second node failure never cause an outage (33,3% bandwidth reduction) Third node failure, again, no outage 6 All-Active nodes provides vastly higher redundancy even than the most expensive 12 node Active/Passive configuration 3

SONAS hardware structure SONAS System Windows clients Customer IP Network UNIX clients CIFS, NFS, HTTP, FTP clients Management Node Interface Node Interface Node Interface Node Up to 30 Interface nodes Tier 1 I/O Private Ethernet Management Network InfiniBand Cluster Network 3 types of nodes Storage Node Storage Node Up to 30 storage pods Tier 2 capacity Storage Pod One or two high density 4U x 60 HDD RAID Controllers One high density 4U x 60 HDD storage expansion unit per RAID controller HD RAID Controller HD Disk Enclosure HD RAID Controller HD Disk Enclosure Optional: 2 nd RAID Controller 4

Some challenges in clustering CIFS Monitoring of several nodes (SONAS has max. 92) within a cluster instead of only one or two nodes Keep the configuration synchronous across all nodes in a cluster Concurrent code upgrade of the software stack Performance depends on the right implementation of in-band functions requiring interaction between different nodes in the cluster Workload specific scenarios 5

Monitoring clustered CIFS Administrator Windows clients UNIX clients SONAS System Customer IP Network Management Node Events, Errors Samba Interface Node Interface Node Interface Node Tracing, Logging Samba Tracing, Logging Samba Tracing, Logging Events, Errors CTDB GPFS Private Ethernet Management Network HDD CTDB HDD CTDB HDD GPFS GPFS 6

Consistent CIFS configuration Larger clusters may contain nodes in various states nodes may be taken offline, nodes can be added or removed Samba stores its configuration in smb.conf by default distributing smb.conf across the cluster can result in various error situations impacting consistency CTDB already distributes persistent TDBs (locking.tdb, secrets.tdb etc.) for Samba across the cluster => Registry was developed for Samba 3.2 being stored as a clustered registry in CTDB CTDB pushes configuration consistently across the cluster 7

How about other protocols? Other protocols (NFS, HTTP, FTP) use configuration files, too Why not using the same mechanism sharing them via CTDB? Store config files in custom registry keys in the CTDB CTDB keeps registry keys synchronized across all nodes Push configuration from the registry key into the local configuration file on all nodes CTDB provides the rc.local hook calling a script each time when an event script is being called; rc.local script pushes configuration locally Problems starts with timing as the runtime of rc.local is hardly predictable Other problems lead to deadlocks, hard to debug, unclear separation of duty between clustering protocols and service configuration management => Configuration Management Daemon was introduced running independently of CTDB 8

Concurrent Code Upgrades Requirements: ideally zero downtime of a NAS services provided by the cluster Challenges for CIFS using Samba: keep changing data structures consistent across different versions Samba / CTDB do not have versioned data structures for communication so far; no guarantee on having different versions being compatible Node2 Solution for code upgrades (2 node example) provides disconnect node 1 service upgrade node 1 disallow any Samba/ctdb configuration changes by disabling services on node2 dump TDBs on node2 and send backups to node 1 Downtime shutdown ctdb on node2 restart ctdb on node 1 as disabled restore all relevant persistent TDBs on node 1 Node1 allow configuration changes by fully restarting ctdb provides upgrade node2 service node 2 rejoins the cluster Extension for n nodes: update each half of the cluster in turn 9

Central project file access patterns Movie rendering companies using Maya rendering software Workload: Customer uses a central Maya project file which contains links to the textures and RIB files At the same time, multiple users try to open the Maya project file in their 3D modeling application for reading The Maya project file is also loaded at the beginning of the rendering process by their rendering farm Problem: Slow start-up of rendering application Some clients saw time-outs when accessing the project file 10

DMASTER Ping-Pong When the same file is accessed through multiple cluster nodes, Samba will need to check the record for the file in brlock.tdb on each read request. This leads to many DMASTER requests/responses that put load on CTDB and increase the latency. ping Worka round: Make sure all clients access the file throug h the same pong ping pong dmaster new dmaster 2. REQ_CALL 3. REQ_DMASTER 5. REQ_CALL 8. REPLY_DMASTER read the same file lmaster lmaster 1. REQ_CALL Clients 4. REPLY_DMASTER 7. REQ_DMASTER 6. REQ_CALL new dmaster CTDB Cluster dmaster CTDB Cluster 11

CTDB read-only locks Will allow multiple nodes to have a read-only copy of a record in their local TDB. In principle this is a caching protocol. Only the DMASTER can perform updates on the record In case record gets updated, other nodes need to invalidate their local copy and fetch a fresh copy During a recovery, local copies will be thrown away 12

Record distribution in (C)TDB key for locking records in a cluster is based on inode numbers inode numbers are very similar tdb_hash() does a bad job distributing the entries among the hash buckets Half of the hash buckets were unused in the local TDB LMASTER assignment was also very uneven, only half of the nodes was used for being LMASTER Solution: replace tdb_hash with jenkins hash balanced distribution of records among hash buckets in local TDB balanced LMASTER assignments for records in CTDB 13

ID mapping performance (1) Customer from the industrial sector located in Germany Workload: Research departments around the world storing data large Active Directory with many users which are member of many groups (>300) Problem: Customer ran into time-outs and connection losses during session setup The timeouts only happen during the very first login of a user to the SONAS cluster Initial Analysis: Creation of ID mappings for the user and groups took longer than the CIFS client connect timeout If many new users access SONAS at the same time, some of them failed (timeout) The same happens if a user is in many groups - connection failed (time-out) 14

ID Mapping performance (2) Samba stores the ID mappings in a persistent TDB (idmap_tdb2) that is clustered by CTDB and uses transactions to ensure the integrity. Allocating an ID from the high water mark and actually writing the mapping was not a single but two transactions. A single transaction is started for each group where the user is member of. Solutions: Provide a tool to pre-fill the ID map database Wrap high water mark and actually writing the mapping into a single transaction (Samba 3.4.7) For user in many groups problem - map many SIDs to gids at once in one single transaction (Samba 3.6) Implement new ID mapping module based on idmap_rid but with less configuration overhead (idmap_autorid) 15

ID Mapping performance (3) After Samba improvements (3.2.1 vs 3.4.7 vs 3.6) 06:00 04:48 03:36 02:24 1000 IDs 1000 Ids Samba 3.4.7 1000 Ids Samba 3.6 01:12 00:00 4 Nodes 3 Nodes 1 Node 1 Node Cache Other improvements: Create ID mapping for all member groups in a single transaction (Samba 3.6) 16

Summary Clustering CIFS involves more aspects to consider compared to other protocols Several improvements in Samba and CTDB over the past years done in regards of Reliability, Availability, Servicability, and Performance Upcoming enhancements in open source addresses further improvements for special workloads All improvements done in the SONAS software stack without dependency to special hardware 17

Questions? 18

True single namespace free capacity used capacity Move data Add capacity Filer #1 Filer #2 Filer Filer #3 #n Add next filer Scale Out NAS with a single namespace How does your NAS infrastructure look like with traditional NAS filers? Data growth leads to adding filers again and again If you have a few Petabyte you end up with many many small islands Someone here having >100 NAS filers? If you have petabytes you end up with many filers How much of your capacity is really used? How much maintenance needs to be spend for large amount of filers? True single namespace Everything is in one large file system with one namespace, no data moves required Capacity of one large file system grows by adding storage All free capacity is shared; no data aggregation and no data replication needed Active/Active, all nodes have access to all data 19