The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device

Similar documents
QoS support for Intelligent Storage Devices

The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device

SolidFire and Ceph Architectural Comparison

Hierarchical Disk Sharing for Multimedia Systems

Ceph: A Scalable, High-Performance Distributed File System

LIME: A Framework for Lustre Global QoS Management. Li Xi DDN/Whamcloud Zeng Lingfang - JGU

Nested QoS: Providing Flexible Performance in Shared IO Environment

Efficient and Adaptive Proportional Share I/O Scheduling

Ceph: A Scalable, High-Performance Distributed File System PRESENTED BY, NITHIN NAGARAJ KASHYAP

Nested QoS: Providing Flexible Performance in Shared IO Environment

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

Resource allocation in networks. Resource Allocation in Networks. Resource allocation

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service

3. Quality of Service

Overview. Lecture 22 Queue Management and Quality of Service (QoS) Queuing Disciplines. Typical Internet Queuing. FIFO + Drop tail Problems

PROBABILISTIC SCHEDULING MICHAEL ROITZSCH

Dynamic Metadata Management for Petabyte-scale File Systems

Providing Quality of Service Support in Object-Based File System

IntServ and RSVP. Overview. IntServ Fundamentals. Tarik Cicic University of Oslo December 2001

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Episode 5. Scheduling and Traffic Management

QoS Support for Intelligent Storage Devices

PCAP: Performance-Aware Power Capping for the Disk Drive in the Cloud

No Tradeoff Low Latency + High Efficiency

PAC485 Managing Datacenter Resources Using the VirtualCenter Distributed Resource Scheduler

CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS

Shen, Tang, Yang, and Chu

ResQ: Enabling SLOs in Network Function Virtualization

Ceph. The link between file systems and octopuses. Udo Seidel. Linuxtag 2012

Hierarchical Disk Driven Scheduling for Mixed Workloads

Computer Networking. Queue Management and Quality of Service (QOS)

Lecture 17: Distributed Multimedia

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

QUALITY of SERVICE. Introduction

Enhancing Lustre Performance and Usability

OASIS: Self-tuning Storage for Applications

An Integrated Model for Performance Management in a Distributed System

The Google File System

Adaptive Resync in vsan 6.7 First Published On: Last Updated On:

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

A Gentle Introduction to Ceph

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles

Object Storage: Redefining Bandwidth for Linux Clusters

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

Project Computer Networking. Resource Management Approaches. Start EARLY Tomorrow s recitation. Lecture 20 Queue Management and QoS

ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices

Regulating I/O Performance of Shared Storage with a Control Theoretical Approach

CS-580K/480K Advanced Topics in Cloud Computing. Object Storage

OASIS: Self-tuning Storage for Applications

Outline. Challenges of DFS CEPH A SCALABLE HIGH PERFORMANCE DFS DATA DISTRIBUTION AND MANAGEMENT IN DISTRIBUTED FILE SYSTEM 11/16/2010

Improving Disk I/O Performance on Linux

Network Support for Multimedia

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

High Performance File Serving with SMB3 and RDMA via SMB Direct

RSVP 1. Resource Control and Reservation

Resource Control and Reservation

Modification and Evaluation of Linux I/O Schedulers

Architecture and Performance Implications

Red Hat Enterprise 7 Beta File Systems

Kernel Korner. Analysis of the HTB Queuing Discipline. Yaron Benita. Abstract

Virtualizing Disk Performance

A GPFS Primer October 2005

Improving Disk I/O Performance on Linux. Carl Henrik Lunde, Håvard Espeland, Håkon Kvale Stensland, Andreas Petlund, Pål Halvorsen

A Framework for Power of 2 Based Scalable Data Storage in Object-Based File System

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Network Design Considerations for Grid Computing

Advanced Computer Networks

ibench: Quantifying Interference in Datacenter Applications

Shaping Process Semantics

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Unit 2 Packet Switching Networks - II

Real-Time Internet of Things

different problems from other networks ITU-T specified restricted initial set Limited number of overhead bits ATM forum Traffic Management

Using Transparent Compression to Improve SSD-based I/O Caches

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Distributed caching for cloud computing

BEST PRACTICES FOR OPTIMIZING YOUR LINUX VPS AND CLOUD SERVER INFRASTRUCTURE

Multimedia Systems 2011/2012

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Mohammad Hossein Manshaei 1393

CSE 153 Design of Operating Systems

Discriminating Hierarchical Storage (DHIS)

The Pennsylvania State University. The Graduate School. Department of Computer Science and Engineering

The Google File System

Datacenter Storage with Ceph

Block Storage Service: Status and Performance

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

TELE Switching Systems and Architecture. Assignment Week 10 Lecture Summary - Traffic Management (including scheduling)

Dalí: A Periodically Persistent Hash Map

Module objectives. Integrated services. Support for real-time applications. Real-time flows and the current Internet protocols

Enabling NVMe I/O Scale

Chapter 3. Design of Grid Scheduler. 3.1 Introduction

Network Control and Signalling

Current Topics in OS Research. So, what s hot?

What s Wrong with the Operating System Interface? Collin Lee and John Ousterhout

Cluster-Based Scalable Network Services

Priority Traffic CSCD 433/533. Advanced Networks Spring Lecture 21 Congestion Control and Queuing Strategies

Designing a True Direct-Access File System with DevFS

Mission-Critical Databases in the Cloud. Oracle RAC in Microsoft Azure Enabled by FlashGrid Software.

Transcription:

The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device Joel Wu and Scott Brandt Department of Computer Science University of California Santa Cruz MSST2006 May 17, 2006

What is Storage QoS? Storage system quality of service can have multiple dimensions: availability, performance, reliability, and even security We define QoS in term of performance (data rate) QoS support: Ability to assure a certain level of performance (data rate) that users can obtain from the storage system.

Why storage QoS? Storage-bound applications with timing constraints Period Deadline Deadline Deadline Read Data Decode Data Disp. Data Read Data Decode Data Disp. Data Read Data Decode Data Disp. Data Time Video player example Consolidation of storage results in larger and more complex shared storage systems Unrelated workload and multiple organizations/users sharing the same storage resource During overload, applications must compete for access and interfere with each other

How to achieve QoS? Provisioning: Ensure enough resource to go around Automated design tools Limitations of provisioning: Require detailed knowledge on expected workloads Slow to react Storage workloads are transient and bursty, provisioning for worst-case scenario can be prohibitively expensive Adequate provisioning is necessary but not enough Need additional solution to provide QoS assurance

Object-Based Storage System (Ceph) Object-Based Model Offloads handling of low-level storage details to the storage devices Storage device accessed through object interface Metadata management decoupled from data management Three major components in Ceph: Clients Metadata server cluster Object-Based Storage Devices (OSD) Ceph OSD: Intelligent and autonomous storage device with P2P capability Consists of CPU, memory, NIC, and block-based (s) Client Client Client High Speed Networks MDS MDS Metadata Server Cluster OSD OSD OSD OSD

QoS support in Ceph QoS-capable: Ability to provide storage bandwidth assurance QoS framework for Ceph Provide underlying framework that can be utilized to achieve higher level QoS goals Build from QoS-aware OSDs Provide assurance for different classes Class: Generic term referring to an aggregate of storage traffic sharing the same QoS goal Semantic and granularity defined by administrator Limit interference between different classes

Streams in non-striped distributed storage Stream: An end-to-end data path Flows in non-striped traditional distributed storage system Interconnect Network

QoS mechanism for individual device QoS mechanism per device: Façade, Zygaria, Sundaram03, XFS GRIO2 Interconnect Network QoS QoS QoS QoS QoS QoS

QoS mechanism in SLEDS SLEDS [Chambliss 03] QoS QoS Interconnect Network

Throttling model Generalization of QoS: A resource has a capacity and is redistributed among a number of consumers to satisfy QoS goals Specification Allows the desired quality level to be specified (IOPS, bytes/sec, latency, request rate). Can be associated with different entities (clients/hosts, groups of clients, application class) Monitor Monitoring the rate different entities are receiving data. Can monitor different parameters of the system (queue length, average completion time, response time, throughput, bandwidth) Enforcer Mechanism that shapes bandwidth by throttling (I/O scheduler, leaky bucket) Controlling technique Decides when and how much to throttle bandwidth (heuristics, controltheoretic)

Throttling model Specification Controlling Technique workloads Monitor Enforcer Storage Devices 1. Specify the desired rate. May involve admission control. 2. Monitor the actual rate received 3. If actual rate received is less than desired rate, throttle competing traffic 4. Ease throttling if actual rate received is satisfactory.

Striping of Objects in Ceph Files are broken up into objects and striped across OSDs Mapping by pseudo-random RUSH algorithm Goal is load balancing File Object Placement Group OSD RUSH RUSH

Streams in Ceph Striping - accessing a single file may involve multiple s Peer-to-peer replication/recovery traffic between OSDs Data flows in Ceph Interconnect Network MDS OSD OSD OSD OSD OSD OSD

Ceph QoS Architecture clients QoS administrator spec Interconnect Network MDS AQuA OSD ENF ENF ENF ENF ENF ENF

Basic QoS support in OSD Fundamental capability that all QoS-aware systems possess is the ability to shape bandwidth to redistribute resource Higher level goals can be decomposed into why, when, how, which, and how much to shape traffic (enforcer component) Giving OSD this capability - Push and encapsulate complexity into OSD

AQuA OSD: Adaptive Qualify of service Aware Object-based Storage Device Enforce bandwidth allocation among classes Based on Object-based file system (OBFS) Small and efficient file system for managing block-based hard s [Wang 2004] Object Disk I/O Scheduler (ODIS) QoS-aware scheduler incorporated into object-based file system AQuA = OBFS + ODIS + Bandwidth Maximizer Object interface API OBFS ODIS Linux kernel User space OSD Block device driver Block-based

OBFS Internal Structure [Wang 2004] Completion Queue Completion Daemon Read Write Object Cache Flush Daemon Disk I/O Daemon Disk Request Queue elevator queue User Space Read/Write Kernel Space

Object Disk I/O Scheduler (ODIS) Limit interference between classes Allows reservations and reclaims unused bandwidth Replaces standard elevator scheduler in OBFS n QoS queues, 1 best-effort queue, and 1 dispatch queue Number of QoS queues is dynamic Each queue has an associated worker thread Specification of a class determines the rate requests are moved to the dispatch queue Best-effort queue QoS queue 1 QoS queue 2 QoS queue n Worker threads Dispatch queue (elevator order)

HTB Implementation G root b b b b b 1 2 3 4 r 1 r 2 r 3 r 4 Implemented with hierarchical token buckets (HTB) [NOSSDAV 05] Each node has an associated bucket and token rate Root node represents the aggregate bandwidth (token rate) of the (G) Each token represents 1 KB of bandwidth Leaf node tokens replenished at a rate corresponding to reservation Root node tokens replenished with a fixed rate Root node facilitates sharing and reclamation of unused bandwidth HTB can be of more than two level

Stateful consumer Disk drive is stateful Total resource (available bandwidth) is not fixed Dependent on the workload demand supply resource How to assure resource allocation when the amount of available resource varies? 1. Disk model (RT schedulers) 2. Proportional allocation (Sundaram 03, Cello, YFQ) 3. Assume fixed resource (DFS, XFS GRIO2, Zygaria) 4. Adaptation - Throttling model (Façade, SLEDS)

QoS assurance vs. total throughput Estimation of total bandwidth: tradeoff between tightness of QoS assurance and total throughput Aggressive Looser QoS assurance Over-commitment Higher utilization Conservative Tighter QoS assurance Underutilization Reduced total throughput AQuA: 1. Conservative estimate of total bandwidth to ensure stringent QoS assurance Bandwidth allocated by ODIS 2. Minimize underutilization of with dynamic adaptation Bandwidth maximizer attempts to maximize total throughput

Bandwidth Maximizer Total Throughput G A Potential for bandwidth maximization 1. When the demand is not capped by either A or G. O < A and O < G 2. When the demand is capped by A, and A < G 3. When the demand is capped by G, G < A. A: Achievable Throughput Time G: Global Token Rate (Estimation of total bandwidth) O: Observed (actual) rate Proof-of-concept implementation: The heuristic adjusts G by monitoring the status: If throughput is capped by G and no QoS commitments are violated, it increase G. If throughput is capped by G, G has been increased to greater than its original value, and some QoS commitments are violated, decrease G. G will not drop below its original value.

AQuA: Results 45 40 Workload 1 ( 0-120) Workload 2 (30-120) Workload 3 (60-120) 35 Throughput (MB/s) 30 25 20 15 10 5 0 20 40 60 80 100 120 Time (s) Without reservation and assurance

AQuA: Results With ODIS With ODSI and Bandwidth Maximizer Throughput (MB/s) 45 40 35 30 25 20 15 10 5 Workload 1 ( 0-120) Workload 2 (30-120) Workload 3 (60-120) Throughput (MB/s) 60 55 50 45 40 35 30 25 20 15 10 Workload 1 ( 0-120) Workload 2 (30-120) Workload 3 (60-120) 0 0 20 40 60 80 100 120 5 0 20 40 60 80 100 120 140 Time (s) Time (s) Reservation at 5, 15, and 20 MB/s

Conclusion and Future Work QoS-aware OSD Encapsulating bandwidth shaping mechanism within OSD by combining OBFS with QoS-aware scheduler Adaptive heuristic minimizes underutilization Basic building block of the overall QoS framework Future Works will shift from using OBFS to EBOFS (Extend-Based Object File System) More intelligent adaptation method Global QoS framework