From an open storage solution to a clustered NAS appliance

Similar documents
Experiences in Clustering CIFS for IBM Scale Out Network Attached Storage (SONAS)

IBM Storwize V7000 Unified

Centralized configuration management using registry tdb in a CTDB cluster

IBM Active Cloud Engine centralized data protection

Running And Troubleshooting A Samba/CTDB Cluster. A Tutorial At sambaxp 2011

Clustered NAS For Everyone Clustering Samba With CTDB. NLUUG Spring Conference 2009 File Systems and Storage

Hitachi HH Hitachi Data Systems Storage Architect-Hitachi NAS Platform.

PLAYING NICE WITH OTHERS: Samba HA with Pacemaker

EMC Celerra CNS with CLARiiON Storage

Samba in a cross protocol environment

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Clustering Samba With CTDB A Tutorial At sambaxp 2010

Clustering Samba With CTDB A Tutorial At sambaxp 2010

PRESENTATION TITLE GOES HERE

SONAS Best Practices and options for CIFS Scalability

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

White Paper. EonStor GS Family Best Practices Guide. Version: 1.1 Updated: Apr., 2018

Clustered NAS For Everyone Clustering Samba With CTDB A Tutorial At sambaxp 2009

Improve Web Application Performance with Zend Platform

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Clustered Samba Not just a hack any more

AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT

CyberStore DSS. Multi Award Winning. Broadberry. CyberStore DSS. Open-E DSS v7 based Storage Appliances. Powering these organisations

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Nový IBM Storwize V7000 Unified block-file storage system Simon Podepřel Storage Sales 2011 IBM Corporation

Symbols INDEX > 12-14

EMC VPLEX Geo with Quantum StorNext

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Infinite Volumes Management Guide

Synology High Availability (SHA)

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

GlusterFS and RHS for SysAdmins

Vendor: Hitachi. Exam Code: HH Exam Name: Hitachi Data Systems Storage Fondations. Version: Demo

An Introduction to GPFS

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

NETVAULT BACKUP SYSTEM ADMINISTRATION COMPLETE - INSTRUCTOR LED TRAINING

Dell Fluid File System Version 6.0 Support Matrix

Exam Name: Midrange Storage Technical Support V2

AutoRID module extensions to control ID generation

Backup and archiving need not to create headaches new pain relievers are around

IBM Active Cloud Engine/Active File Management. Kalyan Gunda

IBM EXAM QUESTIONS & ANSWERS

WEBSPHERE APPLICATION SERVER

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

NetVault Backup Web-based Training Bundle - 2 Student Pack

SwiftStack and python-swiftclient

Polarion 18 Enterprise Setup

Datacenter replication solution with quasardb

See what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.

Parallels Virtuozzo Containers

C exam.31q C IBM Storwize Family Technical Solutions V4

Optimizing and Managing File Storage in Windows Environments

Get More Out of Storage with Data Domain Deduplication Storage Systems

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Dell Fluid File System. Version 6.0 Support Matrix

Glamour: An NFSv4-based File System Federation

HCI: Hyper-Converged Infrastructure

CXS Citrix XenServer 6.0 Administration

Titan SiliconServer for Oracle 9i

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

Microsoft Office SharePoint Server 2007

Configuring file system archival solution with Symantec Enterprise Vault

CONFIGURING IBM STORWIZE. for Metadata Framework 6.3

"Software-defined storage Crossing the right bridge"

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

BIG DATA READY WITH ISILON JEUDI 19 NOVEMBRE Bertrand OUNANIAN: Advisory System Engineer

The UnAppliance provides Higher Performance, Lower Cost File Serving

Introducing Panasas ActiveStor 14

The Path to Lower-Cost, Scalable, Highly Available Windows File Serving

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Cloud Meets Big Data For VMware Environments

Protect enterprise data, achieve long-term data retention

IBM Storage Software Strategy

Surveillance Dell EMC Isilon Storage with Video Management Systems

Hitachi HQT-4210 Exam

The Fastest Scale-Out NAS

Course CXS-203 Citrix XenServer 6.0 Administration

Polarion Enterprise Setup 17.2

Synology High Availability (SHA)

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

Network Storage Appliance

Системы хранения IBM. Новые возможности

GIS - Clustering Architectures. Raj Kumar Integration Management 9/25/2008

Exam : Title : High-End Disk for Open Systems V2. Version : DEMO

PeerStorage Arrays Unequalled Storage Solutions

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Quobyte The Data Center File System QUOBYTE INC.

High Availability for Highly Reliable Systems

Why Datrium DVX is Best for VDI

Availability in the Modern Datacenter

Hybrid Cloud NAS for On-Premise and In-Cloud File Services with Panzura and Google Cloud Storage

Vendor: IBM. Exam Code: Exam Name: Storage Sales V2. Version: DEMO

OPERATING SYSTEM. Chapter 12: File System Implementation

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

Ende-zu-Ende Datensicherungs-Architektur bei einem Pharmaunternehmen

Storwize/IBM Technical Validation Report Performance Verification

Transcription:

From an open storage solution to a clustered NAS appliance Dr.-Ing. Jens-Peter Akelbein Manager Storage Systems Architecture IBM Deutschland R&D GmbH 1

IBM SONAS Overview Enterprise class network attached storage Scale out design to offer linear scaling of performance and capacity growth up to 14,4 petabytes Provides single global repository to manage multiple petabytes of storage Up to a billion files in a single file system Policy-driven file lifecycle management (disk and tape) Supports CIFS, NFS, HTTP, FTP Provides centralized management by CLI or GUI Enable desaster recovery and business continuity with asynchronous replication NDMP designed to provide full and incremental backup of files Integrated Tivoli Storage Manager for backup and HSM Securing data with Antivirus integration from most commonly deployed ISV applications 2 2

Active/Active nodes passive passive passive passive passive passive Reduce redundancy 6 nodes, 8 servers After the first node failure you run with a chance of 3 out of 7 the next failure will cause an outage You still have 33% more power consumption 100% redundancy 6 nodes, 12 servers After the first node failure you run with a chance of 1 out of 11 the next failure will cause an outage You have twice as many power consumption passive passive CTDB 0% redundant servers 6 nodes consolidated as 1 server First node failure just means there are 5 nodes left for NAS protocol I/O (16,6% bandwidth reduction) Second node failure never cause an outage (33,3% bandwidth reduction) Third node failure, again, no outage 6 All-Active nodes provides vastly higher redundancy even than the most expensive 12 node Active/Passive configuration 3 3

True single namespace free capacity used capacity Move data Add capacity Filer #1 Filer #2 Filer Filer #3 #n Add next filer Scale Out NAS with a single namespace How does your NAS infrastructure look like with traditional NAS filers? Data growth leads to adding filers again and again If you have a few Petabyte you end up with many many small islands Someone here having >100 NAS filers? If you have petabytes you end up with many filers How much of your capacity is really used? How much maintenance needs to be spend for large amount of filers? True single namespace Everything is in one large file system with one namespace, no data moves required Capacity of one large file system grows by adding storage All free capacity is shared; no data aggregation and no data replication needed Active/Active, all nodes have access to all data 4 4

SONAS hardware structure Windows clients UNIX clients SONAS System Customer IP Network CIFS, NFS, HTTP, FTP clients Management Node Interface Node Interface Node Interface Node Up to 30 Interface nodes Tier 1 I/O Private Ethernet Management Network InfiniBand Cluster Network 3 types of nodes Storage Node Storage Node Up to 30 storage pods Tier 2 capacity Storage Pod One or two high density 4U x 60 HDD RAID Controllers One high density 4U x 60 HDD storage expansion unit per RAID controller HD RAID Controller HD Disk Enclosure HD RAID Controller HD Disk Enclosure Optional: 2 nd RAID Controller 5 5

Software stack of the SONAS appliance Administrator Windows clients UNIX clients SONAS System Customer IP Network Management Node Events, Errors Samba Interface Node Interface Node Interface Node Tracing, Logging Samba Tracing, Logging Samba Tracing, Logging Events, Errors CTDB GPFS Private Ethernet Management Network HDD CTDB HDD CTDB HDD GPFS GPFS 6 6

From an open solution to an Enterprise storage appliance Integrated system design as one box Zero downtime Hardware replacement procedures No root access Automated operations of hardware resource utilization (CPU, Memory, Local disk utilization, I/O performance) Balanced I/O workload on nodes Centralized Command Line Interface Centralized Graphical User Interface including health center Centralized hardware and software monitoring (RAS Reliability, Availability, Servicability), First Time Data Capture Centralized Messaging (SNMP, CIM), message filtering Centralized logging for all software components Centralized install and upgrade including appropriate package formats Centralized performance monitoring 7 7

Some challenges in clustering CIFS Areas of improvements being mastered in clustering CIFS in multiple nodes Monitoring of several nodes (SONAS has max. 92) within a cluster instead of only one or two nodes Keep the configuration synchronous across all nodes in a cluster Concurrent code upgrade of the software stack Performance depends on the right implementation of in-band functions requiring interaction between different nodes in the cluster Workload specific scenarios 8 8

Consistent CIFS configuration Larger clusters may contain nodes in various states nodes may be taken offline, nodes can be added or removed Samba stores its configuration in smb.conf by default distributing smb.conf across the cluster can result in various error situations impacting consistency CTDB already distributes persistent TDBs (locking.tdb, secrets.tdb etc.) for Samba across the cluster => Registry was developed for Samba 3.2 being stored as a clustered registry in CTDB CTDB pushes configuration consistently across the cluster 9 9

How about other protocols? Other protocols (NFS, HTTP, FTP) use configuration files, too Why not using the same mechanism sharing them via CTDB? Store config files in custom registry keys in the CTDB CTDB keeps registry keys synchronized across all nodes Push configuration from the registry key into the local configuration file on all nodes CTDB provides the rc.local hook calling a script each time when an event script is being called; rc.local script pushes configuration locally Problems starts with timing as the runtime of rc.local is hardly predictable Other problems lead to deadlocks, hard to debug, unclear separation of duty between clustering protocols and service configuration management => Configuration Management Daemon was introduced running independently of CTDB 10 10

Concurrent Code Upgrades Requirements: ideally zero downtime of a NAS services provided by the cluster Challenges for CIFS using Samba: keep changing data structures consistent across different versions Samba / CTDB do not have versioned data structures for communication so far; no guarantee on having different versions being compatible Solution for code upgrades (2 node example) disconnect node 1 upgrade node 1 disallow any Samba/ctdb configuration changes by disabling services on node2 dump TDBs on node2 and send backups to node 1 shutdown ctdb on node2 restart ctdb on node 1 as disabled restore all relevant persistent TDBs on node 1 allow configuration changes by fully restarting ctdb upgrade node2 node 2 rejoins the cluster Extension for n nodes: update each half of the cluster in turn SMB V2.0 introduces Durable File Handles SMB V2.2 introduces SMB Multichannel 11 Node2 provides service Downtime Node1 provides service Sessions can moved transparently between nodes 11

Central project file access patterns Movie rendering companies using Maya rendering software Workload: Customer uses a central Maya project file which contains links to the textures and RIB files At the same time, multiple users try to open the Maya project file in their 3D modeling application for reading The Maya project file is also loaded at the beginning of the rendering process by their rendering farm Problem: Slow start-up of rendering application Some clients saw time-outs when accessing the project file 12 12

DMASTER Ping-Pong When the same file is accessed through multiple cluster nodes, Samba will need to check the record for the file in brlock.tdb on each read request. This leads to many DMASTER requests/responses that put load on CTDB and increase the latency. 13 Workaround: Make sure all clients access the file through the same cluster node Final solution: Implement CTDB read-only locks ping pong ping pong Clients read the same file 1. REQ_CALL lmaster new dmaster dmaster 4. REPLY_DMASTER 2. REQ_CALL 3. REQ_DMASTER 5. REQ_CALL new dmaster lmaster 7. REQ_DMASTER 8. REPLY_DMASTER 6. REQ_CALL CTDB Cluster dmaster CTDB Cluster 13

CTDB read-only locks Will allow multiple nodes to have a read-only copy of a record in their local TDB. In principle this is a caching protocol. Only the DMASTER can perform updates on the record In case record gets updated, other nodes need to invalidate their local copy and fetch a fresh copy During a recovery, local copies will be thrown away 14 14

ID mapping performance (1) Customer from the industrial sector located in Germany Workload: Research departments around the world storing data large Active Directory with many users which are member of many groups (>300) Problem: Customer ran into time-outs and connection losses during session setup The timeouts only happen during the very first login of a user to the SONAS cluster Initial Analysis: Creation of ID mappings for the user and groups took longer than the CIFS client connect timeout If many new users access SONAS at the same time, some of them failed (timeout) The same happens if a user is in many groups - connection failed (time-out) 15 15

ID Mapping performance (2) Samba stores the ID mappings in a persistent TDB (idmap_tdb2) that is clustered by CTDB and uses transactions to ensure the integrity. Allocating an ID from the high water mark and actually writing the mapping was not a single but two transactions. A single transaction is started for each group where the user is member of. Solutions: Provide a tool to pre-fill the ID map database Wrap high water mark and actually writing the mapping into a single transaction (Samba 3.4.7) For user in many groups problem - map many SIDs to gids at once in one single transaction (Samba 3.6) Implement new ID mapping module based on idmap_rid but with less configuration overhead (idmap_autorid) 16 16

ID Mapping performance (3) After Samba improvements (3.2.1 vs 3.4.7 vs 3.6) 06:00 04:48 03:36 02:24 1000 IDs 1000 Ids Samba 3.4.7 1000 Ids Samba 3.6 01:12 00:00 4 Nodes 3 Nodes 1 Node 1 Node Cache 17 17

Questions? 18 18