A fields' Introduction to SUSE Enterprise Storage TUT91098

Similar documents
What's new in Jewel for RADOS? SAMUEL JUST 2015 VAULT

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Introducing SUSE Enterprise Storage 5

Ceph Block Devices: A Deep Dive. Josh Durgin RBD Lead June 24, 2015

Ceph Rados Gateway. Orit Wasserman Fosdem 2016

INTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017

클라우드스토리지구축을 위한 ceph 설치및설정

Datacenter Storage with Ceph

Distributed File Storage in Multi-Tenant Clouds using CephFS

THE CEPH POWER SHOW. Episode 2 : The Jewel Story. Daniel Messer Technical Marketing Red Hat Storage. Karan Singh Sr. Storage Architect Red Hat Storage

CEPHALOPODS AND SAMBA IRA COOPER SNIA SDC

Cloud object storage in Ceph. Orit Wasserman Fosdem 2017

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

virtual machine block storage with the ceph distributed storage system sage weil xensummit august 28, 2012

MySQL and Ceph. A tale of two friends

ROCK INK PAPER COMPUTER

Expert Days SUSE Enterprise Storage

CephFS A Filesystem for the Future

SUSE Enterprise Storage Technical Overview

Choosing an Interface

Introduction to Ceph Speaker : Thor

A Gentle Introduction to Ceph

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Distributed File Storage in Multi-Tenant Clouds using CephFS

Ceph Software Defined Storage Appliance

Jason Dillaman RBD Project Technical Lead Vault Disaster Recovery and Ceph Block Storage Introducing Multi-Site Mirroring

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

Samba and Ceph. Release the Kraken! David Disseldorp

Ceph Intro & Architectural Overview. Federico Lucifredi Product Management Director, Ceph Storage Vancouver & Guadalajara, May 18th, 2015

SEP sesam Backup & Recovery to SUSE Enterprise Storage. Hybrid Backup & Disaster Recovery

CephFS: Today and. Tomorrow. Greg Farnum SC '15

Building reliable Ceph clusters with SUSE Enterprise Storage

DISTRIBUTED STORAGE AND COMPUTE WITH LIBRADOS SAGE WEIL VAULT

SUSE Enterprise Storage 3

Guide. v5.5 Implementation. Guide. HPE Apollo 4510 Gen10 Series Servers. Implementation Guide. Written by: David Byte, SUSE.

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

Enterprise Ceph: Everyway, your way! Amit Dell Kyle Red Hat Red Hat Summit June 2016

WHAT S NEW IN LUMINOUS AND BEYOND. Douglas Fuller Red Hat

Ceph: scaling storage for the cloud and beyond

Deploying Ceph clusters with Salt

SolidFire and Ceph Architectural Comparison

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

SUSE OpenStack Cloud Production Deployment Architecture. Guide. Solution Guide Cloud Computing.

SUSE Enterprise Storage Case Study Town of Orchard Park New York

The Comparison of Ceph and Commercial Server SAN. Yuting Wu AWcloud

CEPH APPLIANCE Take a leap into the next generation of enterprise storage

All-NVMe Performance Deep Dive Into Ceph + Sneak Preview of QLC + NVMe Ceph

Webtalk Storage Trends

A Robust, Flexible Platform for Expanding Your Storage without Limits

architecting block and object geo-replication solutions with ceph sage weil sdc

Block Storage Service: Status and Performance

A product by CloudFounders. Wim Provoost Open vstorage

Ceph. The link between file systems and octopuses. Udo Seidel. Linuxtag 2012

Deployment Guide. SUSE Enterprise Storage 5

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

SUSE Enterprise Storage v4

Optimizing Ceph Object Storage For Production in Multisite Clouds

Build Cloud like Rackspace with OpenStack Ansible

Next Generation Storage for The Software-Defned World

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Type English ETERNUS CD User Guide V2.0 SP1

Ceph vs Swift Performance Evaluation on a Small Cluster. edupert monthly call Jul 24, 2014

Table of Contents GEEK GUIDE CEPH: OPEN-SOURCE SDS

SUSE. We A d a pt. Yo u S u c c e ed. Tom D Hont Sales Engineer

Archive Solutions at the Center for High Performance Computing by Sam Liston (University of Utah)

SDS Heterogeneous OS Access. Technical Strategist

OPEN STORAGE IN THE ENTERPRISE with GlusterFS and Ceph

OPEN HYBRID CLOUD. ALEXANDRE BLIN CLOUD BUSINESS DEVELOPMENT Red Hat France

SurFS Product Description

Ceph at DTU Risø Frank Schilder

GlusterFS and RHS for SysAdmins

Ceph in HPC Environments, BoF, SC15, Austin, Texas November 18, MIMOS. Presented by Hong Ong. 18th h November 2015

FROM HPC TO CLOUD AND BACK AGAIN? SAGE WEIL PDSW

GOODBYE, XFS: BUILDING A NEW, FASTER STORAGE BACKEND FOR CEPH SAGE WEIL RED HAT

Ceph at the DRI. Peter Tiernan Systems and Storage Engineer Digital Repository of Ireland TCHPC

Extremely Fast Distributed Storage for Cloud Service Providers

RED HAT CEPH STORAGE ON THE INFINIFLASH ALL-FLASH STORAGE SYSTEM FROM SANDISK

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

UH-Sky informasjonsmøte

Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA

Performance Testing Ceph with CBT

GlusterFS Architecture & Roadmap

Administration Guide. SUSE Enterprise Storage 5

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Ceph as (primary) storage for Apache CloudStack. Wido den Hollander

Human Centric. Innovation. OpenStack = Linux of the Cloud? Ingo Gering, Fujitsu Dirk Müller, SUSE

Benchmark of a Cubieboard cluster

Introducing HPE SimpliVity 380

Open vstorage RedHat Ceph Architectural Comparison

A FLEXIBLE ARM-BASED CEPH SOLUTION

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Is Open Source good enough? A deep study of Swift and Ceph performance. 11/2013

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

HPE Synergy HPE SimpliVity 380

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Cloudian Sizing and Architecture Guidelines

Ceph-based storage services for Run2 and beyond

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

Transcription:

A fields' Introduction to SUSE Enterprise Storage TUT91098 Robert Grosschopff Senior Systems Engineer robert.grosschopff@suse.com Martin Weiss Senior Consultant martin.weiss@suse.com Joao Luis Senior Software Engineer joao@suse.com

SUSE Enterprise Storage Introduction and Overview (The Pre Sales Phase)

What is SUSE Enterprise Storage? Based on Ceph 2003: Sage Weil`s PhD thesis UCSC 2006: Open Sourced 2007-2011: Incubated by DreamHost 2012: Inktank was founded 2013: SUSE announces plans for Ceph 2014: RedHat acquires Inktank 2015: SUSE Enterprise Storage 1.0 November 2015: SUSE Enterprise Storage 2.0 January 2016: SUSE Enterprise Storage 2.1 July 2016: SUSE Enterprise Storage 3.0 End of 2016: SUSE Enterprise Storage 4.0

Design Criteria for Ceph Developement / Architecture Fault Tolerance (No Single Point Of Failure) Scalability Performance Automated Management Self Managing Self Healing Flexibility (100% Software Based) Multiple Access Protocols Runs on Commodity Hardware

Commodity Hardware Suitable Server? - Well, it depends

Expanding Storage Suitable JBODs? - Again, it depends

Acronyms Acronyms RADOS CRUSH Reliable Autonomic Distributed Object Store Controlled Replication Under Scalable Hashing RBD Rados Block Device RGW RADOS Gateway CephFS Ceph Filesystem OSD Object Storage Daemon MON Ceph Monitor MDS Metadata Server

Design Principles for Implementation Fault Tolerance Infrastructure MONs, OSDs, MDS Gateways Redundancy vs Space Efficiency Replication (Size) Erasure Coding (K+M) Configurable Redundancy Location Awareness Data Distribution Performance Bandwidth, Latency, IOPS

SUSE Enterprise Storage Components MON, OSD, MDS, Gateways

Ceph - Components HOST/VM APP RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fullydistributed block device with cloud platform integration CLIENT CEPHFS A distributed file system with POSIX semantics and scaleout metadata management LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors

Use Cases Cloud (OpenStack) Storage Backend Cinder / Glance Object Store Rados, S3 and Swift compatible Block Device Linux clients (RBD) Hypervisors (i.e. KVM / QEMU) iscsi VMware ESXi, Linux, Windows and other iscsi based clients CephFS (distributed scale out file system) Linux clients NFS Ganesha (future) Samba (future)

Challenges and Expectation Setting Customers need to understand the latency challenge with scale out storage. Small systems do NOT perform! Start not too small and increase size and load over time Implement SLOW and increase the load SLOW!! Give the system time to burn in and test.

SES Field Experiences (The Post Sales Phase)

The 7 Ps and what to do Proper Prior Planning Prevents P*** Poor Performance ;-)

Consulting Approach Analysis of the current environment Time synchronization and name resolution Network Infrastructure Storage Infrastructure and Clients Analysis of the requirements and the use case RTO / RPO / SLA, Fault Tolerance Performance and scalability Design the solution and select the hardware Number of servers, number of disks, memory, CPU, networking Implement the solution in an automated and repeatable fashion AutoYaST, SMT / SUSE Manager, Configuration Management Know How transfer Customer needs to understand and operate Test, test, test... and test! Performance / Latency / Bandwidth Bottom up testing disk, file system, journal, OSD bench, RADOS bench Fault Tolerance No cluster that was not tested will work as expected or required!

Design Approaches (1) Server Hardware SLES certified! RAID 1 for the OS JBOD for OSD disks Sufficient Memory + CPU

Design Approaches (2) Performance and Size Disk / Raid Controller / HBA Raid Controllers for OSD disks do not make much sense. Variations in performance with cache on HBA Journal on SSD or on same disk Number of nodes and disks, size of disks, type of disks (density) IOPS per disk / 4 (Journal in & Journal out, XFS Journal, XFS Data) Number of disks / number of OSD hosts? Impact of failures

Design Approaches (3) Network No SPOF Ceph is network sensitive! Optimize for bandwidth and low latency Separation of networks - two or more networks! iscsi on separate network, multipathing instead of bonding Connect all primary NICs to the same switch When possible use 802.3ad Configure Jumbo frames

Design Approaches (4) Creating pools Size, Backfill, Scrubbing Using Cache Tiers Persistent read cache, Fast writes but more complexity Erasure coding and RBD Placing and using Gateways Same node or different node (separate node is better but more expensive) ISCSI, RGW Rados, RGW, iscsi, CephFS Mon,OSD,rack hardware (and disk) placement. Fault Tolerance Single Datacenter Multiple Datacenter (2+1) Size Scrubbing Backfill

Real Customer Setups (1) Many customers start with 4 to 9 or more servers (1 / 2 / 3 data centers) 3 MONs 4 OSDs Additional gateways The more disks and the more servers, the better! Min. 8 Disks per server Min. 32 Disks in the cluster (4 nodes with 8 disks each) Min. 1-2 GB RAM per TB of disk space Memory can only be replaced with more memory! More for for Erasure Coding and Cache Tiering! More memory for recovery if low number of OSDs / disks in a server 1.5 GHz CPU per OSD disk More CPU for Erasure Coding More CPU for Recovery

Real Customer Setups (2) Pure SATA 4 Nodes with 16 SATA disks = 64 SATA disks Backup Solution One Pool with Size = 2 and one other pool with Size = 3 1 GBit Network only Mixed SSD and SATA 4 Nodes with 4 * SSD + 4 * SATA disks each = 16 * SSD + 16 * SATA Scenario 1: Cache Tier with 16 * SSD + EC Pool 2:1 with 16 * SATA Scenario 2: SSD Pool with 16 * SSD + SATA Pool with 16 * SATA Scenario 3: SSD Pool with 8 * SSD + SATA Pool with 16 * SATA with Journal on SSD with Ratio 2:1 10 Gbit Network and iscsi

Real Customer Setups (3) HBA / RAID Controller Ensure compatibility between disks and controller Use JBOD and do not use the RAID controller cache Different enterprise SSDs performance loss after some time, disconnect Mainboard Setup Upgrade all firmware to the latest Disable all not required components Turn off all power save features

Implementation Best Practices Concept before implementation Ensure repeatable installation with staging AutoYaST, SMT / SUSE Manager (staging!) Salt / Chef / Crowbar / DeapSee Configuration Management Latest supported drivers should be installed BTRFS for OS, XFS for data / OSDs / MONs Fault Tolerance Testing Performance Testing Tuning Documentation

Operating and Troubleshooting Common Issues (The Usage and Post Implementation Phase)

Nothing is sexier than an Healthy Cluster <example of healthy cluster> <ceph -s ceph health > * this is what you always want your cluster to look like * * sometimes this will not be so and you should know when what you are looking at is a problem or normal operation *

Common Warnings Monitors clocks are skewed monitor is down OSDs OSD is down Flag x is set (e.g., noup, noout, nobackfill, noscrub) PGs are degraded Pgs are undersized PGs are stuck (unclean, inactive, degraded, undersized, stale) Operation is blocked for x seconds

Less Common Warnings Monitors Data directory is getting full OSDs OSD is near full Pool is full Pool object quota over threshold Pool byte quota over threshold

The scary HEALTH_ERR Monitors Monitor data directory has no available space OSDs No OSDs OSD is full Pool is critically over the quota for max objects Pool is critically over the quota for max bytes PGs stuck for more than x seconds

What may have cause this? Were there power issues? Are you suffering from network issues? Did the hardware change? Did you add new nodes? Did you remove nodes? Were there any configuration changes? Crushmap?

Usual suspects - OSDs Down because they cannot be started Down because they die Too many OSDs per server for the available RAM? Catastrophic network issues that leads to OOM? Down/Out because of osdmap flags? noup, noin?

Usual suspects - PGs Not being remapped because crush rules OSDs down osdmap flags Operations are stuck because OSDs cannot talk to each other OSDs overloaded osdmap flags crush rules Down OSDs are down Degraded OSDs are down crush rules Incomplete OSDs are down

When all else fails contacting Support What we will need for a speedy resolution: What changed in the system since the last healthy state Supportconfig and logs from all the affected nodes Logs: From monitors and affected osds Appropriate debug levels, otherwise logs are close to useless! Either via injectargs, admin socket or ceph.conf ceph tell osd.1 injectargs --debug-osd 10 ceph daemon osd.1 config set debug_osd=10

Debug levels Monitor OSDs debug ms = 1 debug mon = 10 debug paxos = 10 debug ms = 1 debug osd = 10 debug filestore = 10 debug journal = 10 debug monc = 10 e.g., ceph tell mon.* injectargs --debug-ms 1 debug-mon 10 ceph tell osd.* injectargs --debug-ms 1 debug-osd 10