A QoS-aware Scheduling Scheme for Software-Defined Storage Oriented iscsi Target

Similar documents
AADL : about scheduling analysis

Efficient Distributed File System (EDFS)

Video Proxy System for a Large-scale VOD System (DINA)

Virtual Machine Migration based on Trust Measurement of Computer Node

Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

The Codesign Challenge

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Simulation Based Analysis of FAST TCP using OMNET++

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Maintaining temporal validity of real-time data on non-continuously executing resources

Wishing you all a Total Quality New Year!

Lecture 7 Real Time Task Scheduling. Forrest Brewer

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Analysis of Collaborative Distributed Admission Control in x Networks

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Improved Resource Allocation Algorithms for Practical Image Encoding in a Ubiquitous Computing Environment

Game Based Virtual Bandwidth Allocation for Virtual Networks in Data Centers

An Optimal Algorithm for Prufer Codes *

Fibre-Optic AWG-based Real-Time Networks

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

Virtual Machine Placement Based on the VM Performance Models in Cloud

WITH rapid improvements of wireless technologies,

A Binarization Algorithm specialized on Document Images and Photos

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Performance Comparison of a QoS Aware Routing Protocol for Wireless Sensor Networks

Parallel matrix-vector multiplication

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Model Based on Multi-agent for Dynamic Bandwidth Allocation in Networks Guang LU, Jian-Wen QI

ELEC 377 Operating Systems. Week 6 Class 3

Mathematics 256 a course in differential equations for engineering students

An Online Delay Efficient Multi-Class Packet Scheduler for Heterogeneous M2M Uplink Traffic

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Priority-Based Scheduling Algorithm for Downlink Traffics in IEEE Networks

Resource and Virtual Function Status Monitoring in Network Function Virtualization Environment

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Adaptive Energy and Location Aware Routing in Wireless Sensor Network

Journal of Chemical and Pharmaceutical Research, 2014, 6(10): Research Article. Study on the original page oriented load balancing strategy

Petri Net Based Software Dependability Engineering

Delay Variation Optimized Traffic Allocation Based on Network Calculus for Multi-path Routing in Wireless Mesh Networks

Load Balancing for Hex-Cell Interconnection Network

Reducing Frame Rate for Object Tracking

Application of VCG in Replica Placement Strategy of Cloud Storage

RAP. Speed/RAP/CODA. Real-time Systems. Modeling the sensor networks. Real-time Systems. Modeling the sensor networks. Real-time systems:

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Network Bandwidth Computation Technique for IP Storage with QoS Guarantees

Efficient Content Distribution in Wireless P2P Networks

SRB: Shared Running Buffers in Proxy to Exploit Memory Locality of Multiple Streaming Media Sessions

Support Vector Machines

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

If you miss a key. Chapter 6: Demand Paging Source:

Bandwidth Allocation for Service Level Agreement Aware Ethernet Passive Optical Networks

Improved Energy-Efficiency in Cloud Datacenters with Interference-Aware Virtual Machine Placement

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Hybrid Job Scheduling Mechanism Using a Backfill-based Multi-queue Strategy in Distributed Grid Computing

Scheduling and queue management. DigiComm II

Technical Report. i-game: An Implicit GTS Allocation Mechanism in IEEE for Time- Sensitive Wireless Sensor Networks

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Real-Time Guarantees. Traffic Characteristics. Flow Control

Dynamic Bandwidth Provisioning with Fairness and Revenue Considerations for Broadband Wireless Communication

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark

Agile Data Streaming for Grid Applications

Burst Round Robin as a Proportional-Share Scheduling Algorithm

Cluster Analysis of Electrical Behavior

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

A Frame Packing Mechanism Using PDO Communication Service within CANopen

An Investigation into Server Parameter Selection for Hierarchical Fixed Priority Pre-emptive Systems

Performance Evaluation of Information Retrieval Systems

Concurrent Apriori Data Mining Algorithms

A New Transaction Processing Model Based on Optimistic Concurrency Control

Load-Balanced Anycast Routing

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Adaptive Resource Allocation Control with On-Line Search for Fair QoS Level

THere are increasing interests and use of mobile ad hoc

ARTICLE IN PRESS. Signal Processing: Image Communication

Meta-heuristics for Multidimensional Knapsack Problems

Clock Skew Compensator for Wireless Wearable. Computer Systems

Optimized Resource Scheduling Using Classification and Regression Tree and Modified Bacterial Foraging Optimization Algorithm

Configuration Management in Multi-Context Reconfigurable Systems for Simultaneous Performance and Power Optimizations*

Reliability and Energy-aware Cache Reconfiguration for Embedded Systems

Application of Clustering Algorithm in Big Data Sample Set Optimization

Goals and Approach Type of Resources Allocation Models Shared Non-shared Not in this Lecture In this Lecture

Assembler. Building a Modern Computer From First Principles.

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

QBox: Guaranteeing I/O Performance on Black Box Storage Systems

Performance Study of Parallel Programming on Cloud Computing Environments Using MapReduce

On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Queueing Network-based Optimisation Techniques for Workload Allocation in Clusters of Computers *

Fast Retransmission of Real-Time Traffic in HIPERLAN/2 Systems

Solutions for Real-Time Communication over Best-Effort Networks

Verification by testing

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

Adaptive Power-Aware Prefetch in Wireless Networks

Transcription:

A QoS-aware Schedulng Scheme for Software-Defned Storage Orented SCSI Target Xanghu Meng 1,2, Xuewen Zeng 1, Xao Chen 1, Xaozhou Ye 1,* 1 Natonal Network New Meda Engneerng Research Center, Insttute of Acoustcs Chnese Academy of Scences, Bejng 100190, P.R. Chna 2 Unversty of Chnese Academy of Scences, Bejng 100190, P.R. Chna *Correspondng author(e-mal: yexz@dsp.ac.cn) Abstract Software-defned Storage (SDS) whch uses vrtualzaton and poolng technologes provdes a new and flexble way to deploy a shared Storage Area Network (SAN). However, sharng of dsk resources can cause performance degradatons n an SDS vrtualzaton envronment. Concurrent I/O applcatons whch have dverse latency requrements and send request to dfferent logc unts (LUNs) co-located on the same shared storage target wll lead to unpredctable performance for the SDS. Ths paper focuses on provdng dfferentated servce to dfferent applcatons and ensurng Qualty of Servce (QoS) of tme-crtcal applcatons n the SCSI SAN of SDS. An analyss model s presented to descrbe ths problem. Based on ths analyss model, a QoS-aware schedulng framework for the SDS orented SCSI target (SDS-QoS) ncorporated wth SCSI storage controller s proposed. The SDS-QoS employs a prorty-based strategy to ensure a predctable latency performance of crtcal applcatons. SDS-QoS conssts of two components latency estmaton and prorty adjustment. It schedules I/O requests by dynamcally adjustng ther prortes based on estmated latences, thus makng latency senstve applcatons acheve ther requred deadlnes and guarantee ther QoS. Experments on real-world I/O workloads demonstrate that, t can mprove the achevement rate of requred deadlnes from 36.7% to 83.3%. Key words: Software-defned storage, qualty of servce, network storage, schedulng algorthm, I/O performance 1. INTRODUCTION In the era of cloud computng and bg data, there has been an ncrease n the amount of network data creatng new challenges for storage systems [1]. Software-defned storage (SDS) [2] typcally ncludes a form of storage vrtualzaton that decouples the physcal storage hardware from the software that manages t. SDS has the potental to provde shared storage to clents through storage area network (SAN) ths can solve the capacty, avalablty and manageablty problems related to storage. However, nherent sharng of storage resources can lead to performance degradatons. Multple applcatons requestng dfferent logc unt numbers (LUNs) located on the same SCSI [3] server wll contend for the same dsk resource. Ths contenton for dsk resource depends on a number of factors - the type of storage, confguraton of the LUNs and the number of LUNs co-located on the same SCSI target - whch may result n unpredctable delays and performance loss. I/O requests whch have been optmzed on dfferent SCSI ntators can be out of order as they reach the SCSI target server concurrently, leadng to ncreased latency [4]. The holstc performance of all I/O operatons wll decrease when the number of LUNs workng concurrently on the target ncreases. Ths s because concurrent I/O of dfferent LUNs contend wth each other for dsk access leadng to nterference at the dsk level. Resolvng such ssues s pertnent for applcatons to acheve ther servce level objectves (SLOs). A major concern of shared storage systems s the reduced system effcency due to the nterference of dfferent workloads wth dfferent performance requrements n terms of latency and throughput. A prerequste for effcent sharng s that concurrent applcatons are solated from each other, so that nteractve or tmecrtcal applcatons such as onlne vdeo streamng applcatons and transacton-based applcatons are not affected by other I/O ntensve workloads [5]. For latency senstve I/O applcatons, undesrable delays may occur f all concurrent I/O applcatons are allocated equal resources regardless of ther I/O requrements. Provdng dfferentated servces to dfferent I/O workloads s very necessary. Many prevous works [14-17] have studed the problem of QoS n SDS envronment, but lttle work has been done on SCSI SAN, whose performance mprovements can lead to further promoton of the whole SDS. It s sgnfcantly more challengng to provde performance solaton and QoS guarantees among varous applcatons [14], and we are nterested n enablng quanttatve QoS guarantees for varous types of tme-crtcal applcatons. In ths paper, we present an I/O schedulng scheme SDS-QoS (SCSI-based SDS orented QoS schedulng). Ths scheme provdes dfferentated servces to dfferent I/O applcatons requestng multple LUNs 544

co-located on the same shared storage based on ther latency requrement. For the SCSI storage stack, blocklevel schedulers are severely lmted by ther nablty to gather nformaton from, and exert control over other levels [6]. Therefore, we add a QoS-aware schedulng scheme nto the SCSI target controller above the blocklevel scheduler n cooperaton wth t. The man contrbutons of ths work are summarzed as follows: We propose an analyss model to demonstrate that concurrent I/O applcatons requestng multple LUNs co-located on the same shared storage target n an SDS envronment wll lead to performance degradaton. We propose a dynamc schedulng scheme SDS-QoS whch s QoS-aware and provdes dfferentated servces to dfferent applcatons to ensure QoS. We prove through benchmarkng tools and real-world I/O workloads that the proposed method can make latency senstve applcatons reach the requred deadlnes and guarantee ther QoS. 2. RELATED WORKS There s a large body of research [9-20] on provdng dfferentated servces among dfferent I/O applcatons on shared storage systems. These studes, however have been mostly based on classcal operatng systems and have rarely consdered SCSI SAN n SDS envronments. SCAN [7], an elevator algorthm, wdely used n Lnux, does not consder tme constrants n the schedulng process. Earlest Deadlne Frst (EDF) [8] s usually used n hard real-tme envronments where I/O requests have completon deadlnes. However, EDF has a heavy overhead n terms of magnetc head seekng tme and dsk rotatonal latency, leadng to poor dsk utlzaton. The QoS concept was formally proposed by Chrstopher R. Lumb [9] for the frst tme n the storage feld n Façade. Façade s a vrtual storage controller whch sts between hosts and storage devces n the network and throttles ndvdual I/O requests from multple clents[9]. Prevous studes [10-13] have used dfferent schedulng methods to allevate I/O contenton and guarantee specfc QoS n shared storage envronment. However, only a lmted number of the aforementoned studes have focused on polces n the data plane. A data plane s a very large bucket that provdes dfferent types of storages such as NAS, object, and block storage. The mpact of a data plane on the overall SDS s crtcal. The present study dffers sgnfcantly from these studes snce ths work s focused on the data plane, specfcally the SCSI SAN. Prevous studes [14-16] provded QoS n mult-tered storage systems wth sold state drve (SSD) and tradtonal hard dsk drve (HDD) to buld herarchcal storage. Typcally, IOFlow [17] desgned a SDS archtecture and borrowed several SDN deas and appled them to the shared storage concept. However, t focused only on I/O requests from Vrtual Machnes (VMs) to the storage. Studes [15, 18, 19] provded dfferentated I/O servces to multple applcatons n the VM hypervsor. Qwen Zha desgned an I/O schedulng algorthm for a soft real-tme servce orented SCSI storage system n the SCSI ntator [20]. Whle the above works have acheved specfc QoS based on a VM hypervsor or clent sde, lmted attenton has been ascrbed to the SCSI target controller. There are two major dfferences between the present study and prevous studes: a) the present study acheves QoS of I/O applcatons n SCSI storage controller as opposed to prevous studes whch use SCSI ntators or VM hypervsor b) schedulng on LUNs for the current study s fne-gran as opposed to the course graned schedulng on VMs. 3. ANALYSIS MODEL AND SOFTWARE ARCHITECTURE 3.1Analyss Model An I/O request R s represented by the attrbute set R = (D, S, P, L, W). D represents the relatve deadlne; S s the total requested data sze; P stands for the prorty of R; L ndcates the estmated or actual latency of R; and W s the dsk bandwdth allocated to R. Some applcatons such as soft real-tme servces do not have determned deadlnes. In such cases, D can be represented by a fuzzy nterval [d, D]. The QoS of the completon tme of a request can be represented by a membershp functon of fuzzy deadlnes, as shown n Fg.1. If an I/O applcaton fnshes wthn the lower bound of the nterval (d), t wll acheve an optmal QoS ( the optmal value s set to 1). If the actual latency exceeds d, ths request wll not be cancelled mmedately as long as the maxmum latency stays wthn D - however the QoS wll degrade wth ths ncrease n latency. If the latency exceeds D, the QoS wll be set to 0. In ths paper, the QoS s defned as Eq. 1 whch wll be used as a metrc later. 1, ( t d) 1 D QoS t,( d t D) d D D d 0, t D (1) 545

QoS 1 d D latency Fgure 1 Membershp functon wth a fuzzy deadlne We dscuss the parallel schedulng model for N concurrent I/O requests, wth the assumpton that each LUN s targeted by a sngle request. The dsk servce tme of a request wth data sze s s expressed as T Ttran, where Ttran s the transfer tme proportonal to s, and s the overhead (ncludes the seekng tme and rotaton tme), whch s a fxed value for each transfer process. Assumng n concurrent requests have szes s1, s2,, sn, respectvely, the overall storage devce bandwdth s wrtten as W. Each request s accessng a dedcated LUN, thus there are also n LUNs. In parallel schedulng, f all concurrent requests have the same prorty, then each one wll occupy W/n bandwdth on an average. When we take dfferent prortes (p1, p2,, pn) nto consderaton, a request wth a prorty p wll occupy p W / n. The response tme of frst request can be wrtten as: n s1 T1 p W. (2) 1 When -1 requests have been completed, the average bandwdth for rest of the requests wll be W/(n-+1), >1. Therefore, the response tme of the th request wll be: ( s s 1) ( n 1 ) T T 1 (3) p W where 1 n. We can further expand ths to wrte: ( sk sk 1) ( n 1 k) n s1 T p W p W k 2 k 1 1 1 sk 1 ( n 1 ) s W p p W k 1 k (4) The mean response tme can be descrbed as: n 1 (2n 2 1) s Ep, n 2. (5) n W p 1 If all concurrent applcatons have the same prorty, then we can wrte p 1(1 n). Eq. 4 ndcates that a request wth a hgher prorty p (a larger value of p represents a hgher prorty) wll have a lower response tme T. Therefore, n order to satsfy the requrements to meet the deadlne, a tme-crtcal applcaton should be assgned a hgh prorty. From Eq. 5, we can observe that the average response tme of all applcatons s proportonal to the number of LUNs we can valdate ths from Fg. 2(c). 546

Average Latency (ms) Throughput (MB/S) IOPS Revsta de la Facultad de Ingenería U.C.V., Vol. 32, N 1, pp.544-554, 2017 65000 60000 55000 50000 45000 40000 35000 1 2 3 4 5 6 7 8 Number of LUNs (a) IOPS 260 240 220 200 180 160 140 1 2 3 4 5 6 7 8 Number of LUNs (b) Throughput 0.25 0.20 0.15 0.10 0.05 0.00 1 2 3 4 5 6 7 8 Number of LUNs (c) Average latency Fgure 2 Performance degradaton wth multple LUNs To analyze the effect of concurrent I/O under an SDS envronment, ths research examnes the varatons n the accumulated dsk performance wth a dverse number of LUNs co-located on the same target. As shown n Fg.2, the values of the holstc IOPS (Fg. 2a) and throughput (Fg. 2b) for all I/O operatons decrease, whle the average response tme (Fg. 2c) ncreases wth an ncrease n the number of concurrently workng LUNs. 3.2 Software Archtecture In ths secton we present the desgn archtecture of the SDS-QoS n the SDS framework. We descrbe the features of the SDS framework wth QoS enabled. We focus on an applcaton scenaro where concurrent I/O applcatons wth vared latency requrements run on multple LUNs co-located on a sngle shared storage. The relatve layout of the SDS-QoS wth respect to the overall SDS framework s shown n Fg. 3. The SDS-QoS functons as a module of the SCSI controller whch manages the storage resources on the SCSI server. The SDS-QoS s composed of a Latency Estmaton module and a Prorty Adjustment module whose functons are shown n detal n Fg.4. 547

Control Plane Data Plane LUN1 LUN2 LUN3 SCSI Controller SDS-QoS Latency Estmaton Prorty Adjustment Block Scheduler Physcal Storage Fgure 3 SDS-QoS n SDS framework The SCSI Enterprse Target (IET) [23] s an open source SCSI target controller software, whch ncludes the user space and the kernel space. Most data processng works are performed n the kernel space. The kernel space contans two man components: the NTHREAD and the WTHREAD. The NTHREAD module s responsble for recevng I/O requests from clents and passng the SCSI PDU to the WTHREAD. The WTHREAD module s responsble for the read/wrte processes of the SCSI data and for buldng up the block requests. Hence, the most approprate place to ntegrate the SDS-QoS s nto the WTHREAD. The man goals of SDS-QoS are a) to guarantee prortzed servces to tme crtcal applcatons b) to ensure admssble servces for applcatons wthout strct latency requrements. Ths s acheved by assgnng approprate prortes for dsk access of I/O applcatons accordng to ther latency requrements. The attrbutes of an applcaton, such as the data sze and the deadlne, can be specfed by the user. It s assumed that these attrbutes reman unchanged throughout the perod of executon. Deadlnes and szes are extracted from the SCSI PDU and then passed to SDS-QoS as nput. The Latency Estmaton module frst predcts the latences of requests based on the data szes and system state and then passes the latency values to the Prorty Adjustment module. The Prorty Adjustment module s responsble for decdng the prorty values based on the requrements of the deadlne and the predcted latency values. Fnally, the prorty values are submtted to the underlyng block scheduler for schedulng the dsk I/O. The aforementoned process s performed at every schedulng cycle whch can be modfed as needed based on the access pattern of the applcaton. In our work, we use a schedulng nterval of 3 seconds whch s the best emprcal value on our platform. We use a feedbackbased strategy to adaptvely adjust the prorty of the applcaton accordng to allocated dsk bandwdth on the system and re-allocate the bandwdth based on adjusted prorty at every nterval. An advantage of our approach s that the prortes are not statc and can be dynamcally updated accordng to the state of the system. Moreover, our approach does not need to modfy the operatng system and can be easly ntegrated nto the SCSI controller as an extenson to support the QoS. IO Requests S C S I C o n t r o l l e r W T H R E A D SDS QoS Deadlnes NTHREAD SCSI PDU Szes Latency Estmaton Latences Prorty Adjustment Bandwdths Prortes Block Scheduler Fgure 4 SDS-QoS desgn archtecture 548

4. DESIGN AND IMPLEMENTATION The SDS-QoS manly conssts of two modules the Latency Estmaton module and the Prorty Adjustment module. The desgn archtecture s llustrated n Fg. 4. The attrbutes of the I/O applcatons (deadlnes and data szes) are provded as nputs to the SDS-QoS and then t generates the prorty values of dfferent applcatons as the output. The output wll be submtted to the block scheduler to be taken nto consderaton for dsk schedulng. The detaled desgn s descrbed below. 4.1 Latency Estmaton To provde dfferentated servces for all applcatons, we need to compare the requrements specfed by the applcatons wth ther actual performance. In order to ensure that a runnng applcaton meets ts requred deadlne, t s necessary to obtan a clear estmate of the applcaton s completon tme to decde whether the applcaton needs hgher prorty servce. The latency estmaton task performs ths based on the current dsk bandwdth allocated for an applcaton. It takes the total sze of each I/O applcaton as nput, also t obtans the bandwdth and the sze of the remanng data to be used n the next schedulng cycle. The present work uses the Lnux otop utlty to collect nformaton regardng the dsk I/O (such as dsk read/wrte bandwdth and prorty) for each runnng applcaton. For each schedulng perod, otop s nvoked to record these dsk utlzaton statstcs. If one applcaton fnshes, the rest of the applcatons wll obtan longer dsk access tmes and hence hgher bandwdth, leadng to a decrease n the predcted values of latency. At the start of each schedulng cycle, the predcted latency values are computed and provded to the Prorty Adjustment module. 4.2 Prorty Adjustment Based on the requrements of the I/O applcaton, the Prorty Adjustment module can enable servce dfferentaton for each of the applcaton. For each schedulng perod, after obtanng the estmated values of latency from the Latency Estmaton module, the Prorty Adjustment module compares the estmated latency and I/O requrement of each applcaton to decde whether ts requred deadlne can be met for the current state. If the Prorty Adjustment module decdes that a crtcal applcaton's latency requrement cannot be met and the expected deadlne may be mssed, then ths module wll dynamcally adjust the prortes of the actve applcatons so that the allocated bandwdth for the crtcal applcaton can satsfy ts requred performance. The Prorty Adjustment module then wll submt the assgned prorty values to the underlyng block scheduler whch wll execute the actual dsk schedulng and allocate the dsk access tme accordngly. Durng the servce tme of an applcaton, t has exclusve dsk access. The hgher the prorty an applcaton has, the longer s ts dsk access tme, leadng to a hgher dsk bandwdth. Therefore, the assgned prorty of an I/O applcaton n turn affects ts allocated bandwdth n next schedulng perod. After adjustng the prortes, new bandwdths for the next schedulng perod wll be allocated from among all the I/O applcatons and be passed back to the Latency Estmaton module as feedback for re-computaton of the latences. Durng every schedulng perod, the SDS-QoS wll repeat the above procedure once. Wthn the system bounds and hardware lmtatons, the Prorty Adjustment module provdes a locally optmal soluton for dsk resource allocaton along wth a QoS guarantee. It not only provdes a QoS to crtcal applcatons, but also guarantees acceptable servces to noncrtcal ones. Sometmes t may happen that the Prorty Adjustment detects a crtcal applcaton for whch t cannot satsfy the requred deadlne under the current system states even f the hghest prorty s assgned to t. In such cases the prorty of ths applcaton wll not be elevated ths prevents the degradaton of the performance of other applcatons. 4.3 Implementaton The block scheduler has no knowledge regardng the I/O characterstcs of an applcaton except the sze and locaton of ts requests. Under such crcumstances, the proposed SDS-QoS scheme fgures out the expected prorty values and submts ths nformaton to the underlyng block scheduler. In order to acheve prortzed servces, ths work mplements the Completely Far Queung (CFQ) [21] schedulng algorthm whch s commonly used n the Lnux kernel. Each applcaton s a process runnng on the host. The purpose of the CFQ algorthm s to provde farness to all processes n terms of dsk bandwdth. The CFQ algorthm uses a seres of per-process queues to group synchronous requests and then allocates tme slces for each of the queues to access the dsk.the length of the tme slce and the number of requests a queue can submt depends on the I/O prorty of the gven process. CFQ has three prorty levels: dle (low prorty), best-effort (medum prorty) and realtme (hgh prorty). CFQ assgns the longest tme slces to real-tme processes, whle dle processes receve the shortest. A process receves exclusve dsk access durng ts tme slce. Therefore, a process wth a hgher prorty naturally receves a hgher dsk I/O bandwdth. A new process wll be assgned a default prorty of besteffort, but we can modfy ths usng the once command n Lnux. 549

Table 1 Termnology for sds-qos algorthm Attrbute Notaton Process R=<R1, R2 RN> Deadlne D=<D1, D2 DN> Total Data Sze S=<S1, S2 SN> Dsk Bandwdth W=<W1, W2 WN> Latency L=<L1, L2 LN> Prorty P=<P1, P2 PN> Tme Elapsed T=<T1, T2 TN> The pseudo code of the SDS-QoS schedulng algorthm s shown n Algorthm 1. Notatons n the pseudo code are lsted n Table I. The prorty of a real-tme process s the hghest, and that of an dle process s the lowest, and a best-effort process has default prorty. The schedulng algorthm wll execute at every schedulng perod, and all the actve I/O processes are consdered for schedulng. The nputs to the algorthm are the deadlnes and total data szes of all applcatons; and the output s the computed prorty. Our algorthm uses a feedback mechansm, namely, on one hand, t computes the prortes accordng to the current bandwdth of the applcaton durng every schedulng perod; on the other hand, the prortes assgned n prevous perods also mpact the bandwdths. The Algorthm 1 s dvded nto two parts: a) The frst part s the Latency Estmaton whose responsblty s to calculate the predcted value of the latency of each actve process accordng to the current bandwdth W and ProcessedData (whch can be obtaned usng otop). For a new process, the ntal value of ProcessedData s set to 0 and ts default prorty s set to be best-effort. b) The second part s the Prorty Adjustment whch s responsble for assgnng sutable prorty values to the actve processes. Frst t compares the predcted latency wth the expected deadlne of each process and tres to dentfy a crtcal process whch s lkely to mss the deadlne. If such a process exsts, t wll receve a preferental treatment. If there s no such process, t ndcates that all the processes are lkely to acheve ther deadlnes, and hence there s no need to adjust the prortes. For each schedulng perod, the SDS-QoS selects a sngle crtcal process (f any). If more than one processes are lkely to mss ther deadlnes then the one (R) wth the earlest deadlne wll be selected (Lne 10). Next, t fnds potental vctms of R - a vctm process Rj s one whch has a later deadlne than that of R and s lkely to acheve ts requred deadlne (Lne 12). If there are more than one such processes Rj, the module wll choose the most sutable vctm whose prorty s hgher than the lowest and whose avalable executon tme before the deadlne s the longest (Lne 13-14). The prorty of such a vctm process Rj wll be decreased to make way for 550

Tme(s) Revsta de la Facultad de Ingenería U.C.V., Vol. 32, N 1, pp.544-554, 2017 the crtcal process R for bandwdth allocaton (Lne 15-16). If such a process Rj does not exst and the current prorty of R s not the hghest (Lne 18), then the prorty of R wll be ncremented (Lne 19) to enable the crtcal process receve more bandwdth. If there s no sutable vctm process and prorty of R s already at the hghest (Lne 20), then the P wll be adjusted to be the lowest. Ths s because t s mpossble for the crtcal process to meet ts deadlne gven the current state of the system (t already has the hghest prorty), we must guarantee that other processes won t suffer from performance loss. 5. EXPERIMENTS AND ANALYSIS 5.1 Algorthmc Valdty To test the valdty of the algorthm we test a scenaro where two concurrent runnng applcatons are targeted on dfferent LUNs on a sngle SCSI target. Both of these processes perform smple random wrtes. We use the IOMeter to create the two workloads. The only dfference between the applcatons s that the frst process A s tme senstve whle the other process B s not. Intally, both ther prortes are set to best-effort by default. We frst gve the test values of latency under default case where both applcatons use default prorty. Under the best case, the tme crtcal applcaton s assgned the hghest prorty whle the other applcaton s assgned the lowest prorty. From Table 2 we can observe that there s a sgnfcant decrease n the latency of applcaton A n the best case (whch s also the lower bound of tme crtcal applcaton A) compared to the default case, whle the latency of B has only slght ncrease n the best case of A. Table 2 Values of latency Latency(s econds) Applcat on A Applcaton B Default 74.12 74.38 case Best case 47.05 75.30 A. Based on the nformaton n Table II, we conduct 10 experments wth dfferent values of the deadlne for 100 80 60 40 20 deadlne of A latency of A latency of B 0 1 2 3 4 5 6 7 8 9 10 Experments Fgure 5 Actual latency wth dfferent requred deadlnes From Fg. 5 we can see that for experments 1-3, the deadlnes for the applcaton A are greater than the latency of the default case. Therefore both the applcatons can easly satsfy the deadlne requrements wthout any prorty adjustment. In experments 4-6, the deadlnes are earler than the default latency but stll greater than the best latency, so the SDS-QoS adjusts the prorty of B to the lowest n order to reduce the latency of A. Hence, the actual latency of A wll meet ts deadlne requrement. Besdes, for B there s only a slght ncrease n the latency. For experments 7-10, the deadlnes are less than the latency of the best case, whch means that ts deadlnes cannot be met n any case (due to current system lmtaton). In ths case the SDS-QoS adjusts the prorty of A to the lowest whch makes ts latency exceed the requred deadlne. From the above experments we can verfy that the SDS-QoS can make latency senstve applcatons acheve ther requred deadlnes and can guarantee ther qualty of servce (QoS) wthn the system capacty. 5.2 Qualty of Servce (QoS) 551

Overall throughput(mb/s) Overall QoS Revsta de la Facultad de Ingenería U.C.V., Vol. 32, N 1, pp.544-554, 2017 Ths paper focuses on the QoS of dfferent I/O requests. Hence we can use the overall QoS of all the requests as a key performance ndcator. The overall QoS can be deduced from Eq.1, and can be wrtten as: n QoS QoS, (6) overall where n s the number of concurrent requests. The overall QoS reflects the real-tme performance. For comparson, we also conduct experments usng classc algorthms such as a) EDF whch has the characterstcs of optmal deadlne satsfacton and b) SCAN whch s an elevator algorthm and has an optmal throughput (we wll dscuss throughput performance n next subsecton). The result of the overall QoS s shown n Fg. 6. EDF performs best n terms of overall QoS because t s a hard-real-tme scheduler and only consders the requred deadlne of requests. Therefore t can acheve more than 90% of all deadlnes. Overall QoS of SCAN s the worst due to ts lack of consderaton for latency t only consders the seek address order. The SDS-QoS performs much better than SCAN wth an ncrease of about 45% n performance, because SDS-QoS consders the latency requrement and adjusts the prorty of a latency-senstve applcaton to meet ts deadlne. We also observe that the dfference between SDS-QoS and EDF s small. 80 EDF SCAN SDS-QoS 60 40 20 0 20 40 60 80 100 Number of requests Fgure 6 Test on overall QoS for dfferent algorthms 5.3 Overall throughput The present work focuses on QoS performance, but also t s able to provde satsfyng throughput performance whch s another key performance ndcator. Expermental results on overall throughput wth dfferent number of LUNs are shown n Fg.7. 250 200 SCAN SDS-Qos EDF 150 100 50 0 1 2 3 4 5 6 7 8 Number of LUNs Fgure 7 Overall throughput As mentoned above, SCAN algorthm has the optmal performance throughput; as an elevator algorthm t only consders movement drecton of the magnetc head and the seek address to mnmze dsk seek tme, thus SCAN provdes a hgh I/O throughput. EDF has the worst throughput performance, because t has not consdered dsk seek address and spends much more tme on seekng. SDS-QoS s able to provde a much hgher overall throughput than EDF whle ensurng QoS of applcatons. Besdes, SDS-QoS s close to SCAN. 5.4 Test on real-world I/O workloads 552

Tme(s) Revsta de la Facultad de Ingenería U.C.V., Vol. 32, N 1, pp.544-554, 2017 In our present work,to test the proposed method, we use two actual I/O traces (block I/O traces) from enterprse servers at Mcrosoft Research Cambrdge [22]. We choose a web server trace whch s latency senstve and a research server trace whch has no strct deadlne requrements. The two I/O traces are run smultaneously wth dfferent LUNs co-located on the shared storage server. Snce they are run concurrently, both the I/O workloads have performance degradatons due to dsk contenton, and SDS-QoS tres to meet the deadlne requrements of the latency senstve applcaton. 110 100 90 80 70 60 50 40 30 20 10 Response tme wthout SDS-QoS Response tme wth SDS-QoS Requred deadlne 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 IO requests Fgure 8 Response tme of web server workload The deadlne and response tme results of web server requests are shown n Fg. 8. It s obvous that, n most cases the response tme of the web server workload wth SDS-QoS has lower values than that wthout the SDS-QoS - ths s because SDS-QoS dynamcally adjusts prortes of the I/O requests. The web server requests were assgned hgher prorty values to meet ther requred deadlnes. The SDS-QoS schedulng scheme provded an mprovement n the achevement rate of requred deadlnes ncreasng t from 36.7% to 83.3%. 5. CONCLUSIONS To acheve a QoS dfferentaton n I/O performances n SDS, the present work proposes a schedulng algorthm ncorporated wth SCSI storage controller that dynamcally predcts the latency for all concurrent I/O applcatons based on the system status. Based on these predctons the SDS-QoS assgns and adjusts the prorty of the applcatons. Next the block-level scheduler takes the prorty values nto consderaton whle schedulng I/O requests to the underlyng dsk devce. The scheme descrbes a latency-based QoS-aware I/O schedulng framework above the operatng system's dsk scheduler for shared storage n SDS envronments. It ensures that the latency-senstve applcatons won't suffer unpredctable delays n concurrent I/O stuatons. It also provdes a sgnfcant mprovement n the deadlne satsfacton and delvers a QoS guarantee. Experments on IOMeter and real-world I/O traces verfy that the SDS-QoS can make latency senstve applcatons satsfy ther requred deadlnes and guarantee ther qualty of servce. The present work concentrates on qualty of servce n IO schedulng, but ths s not suffcent for applcatons under SDS envronment. Therefore, n the future we wll study further about IO schedulng n SDS, such as how to mprove schedulng effcency wth flash memory. Acknowledgements Supported by the Strategc Prorty Research Program of the Chnese Academy of Scences (Grant No. XDA06010302), and Innovaton Insttute Foresght Program (Grant No. Y555021601). REFERENCES Q W N, Wang J L. (2016). A framework of secure cloudstorage n the age of bg data. Journal of Network New Meda. 5(2), 1-7. Carlson M, Yoder A, Schoeb L, et al. (2014). Software defned storage. Storage Networkng Industry Assoc. workng draft, 20-24. 553

Shang Q L, Zhang W, Guo X Y, et al. (2015). An energy-savng schedulng scheme for streamng meda storage systems. Hgh Technology Letters, 3(1), 347-357. Gulat A, Merchant A, Uysal M, et al. (2012). Workload dependent IO schedulng for farness and effcency n shared storage systems,19th Internatonal Conference on Hgh Performance Computng, Pune, Inda, 1-10. Yang S, Harter T, Agrawal N, et al(2015). Splt-level I/O schedulng. Proceedngs of the 25th ACM Symposum on Operatng Systems Prncples, Monterey, USA, 474-489. Dennng P J.(1967). Effects of schedulng on fle memory operatons. Proceedngs of the Aprl 18-20, 1967, sprng jont computer conference, Atlantc Cty, USA, 9-21. Lu C L, Layland J W. (2014). Schedulng algorthms for multprogrammng n a hard-real-tme envronment. Journal of the ACM. 20(1),46-61. Lumb C R, Merchant A, Alvarez G A. (2003). Façade: vrtual storage devces wth performance guarantees. Proceedngs of the 2nd USENIX Conference on Fle and Storage Technologes, San Francsco, USA, 131-144. Wang J, Cheng L.(2015). qsds: a qos-aware I/O schedulng framework towards software defned storage. Proceedngs of the 11th ACM/IEEE Symposum on Archtectures for networkng and communcatons systems, Oakland, USA, 195-196. Malensek M, Pallckara S L, Pallckara S(2015). Allevaton of dsk I/O contenton n vrtualzed settngs for datantensve computng. 2015 IEEE/ACM 2nd Internatonal Symposum on Bg Data Computng (BDC), Lmassol, Cyprus, 1-10. Lu H, Saltaformaggo B, Kompella R, et al(2015). vfar: latency-aware far storage schedulng va per-io cost-based dfferentaton. Proceedngs of the 6th ACM Symposum on Cloud Computng, Kohala Coast, USA, 125-138. Hsu C J, Panta R K, Ra M R, et al(2016). Insde-out: relable performance predcton for dstrbuted storage systems n the cloud. 2016 IEEE 35th Symposum on Relable Dstrbuted Systems (SRDS), Budapest, Hungary, 127-136. Elnably A, Wang H, Gulat A, et al(2012). Effcent QoS for mult-tered storage systems. Proceedngs of the 4th USENIX conference on Hot Topcs n Storage and Fle Systems, Boston, USA, 6-6. Bllaud J P, Gulat A.(2013). hclock: herarchcal QoS for packet schedulng n a hypervsor. Proceedngs of the 8th ACM European Conference on Computer Systems, Prague, Czech Republc, 309-322. Wang H, Varman P. (2015). A resource allocaton model for hybrd storage systems. 2015 15th IEEE/ACM Internatonal Symposum on Cluster, Cloud and Grd Computng, Shenzhen, Chna, 91-100. Thereska E, Ballan H, O'Shea G, et al.(2013). IOFlow: a software-defned storage archtecture. Proceedngs of the 24th ACM Symposum on Operatng Systems Prncples, Farmnton, USA, 82-196 Gulat A, Merchant A, Varman P J. (2010). mclock: handlng throughput varablty for hypervsor IO schedulng. Proceedngs of the 9th USENIX conference on Operatng systems desgn and mplementaton, Vancouver, Canada, 437-450. Gulat A, Shanmuganathan G, Zhang X(2012). Demand based herarchcal QoS usng storage resource pools. Proceedngs of the 2012 USENIX conference on Annual Techncal Conference, Boston, USA, 1-4. Zha Q W, Zhang W, Zeng X W(2013). An I/O schedulng algorthm for soft real-tme servces orented scs storage system. Journal Of Software. 8(7):,1785-1792. Axboe J. (2007). CFQ IO Scheduler. presentaton at lnux. conf. au, Narayanan D, Donnelly A, Rowstron A(2008). Wrte off-loadng: Practcal power management for enterprse storage. ACM Transactons on Storage, 4(3),1-23. SCSI Enterprse Target, http://scstarget.sourceforge.net. SCSI Protocol, https://www.etf.org/rfc/rfc3720.txt. 554