CLOUD based storage systems are emerging to gain significant

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 221 On the Latency and Enegy Efficiency of Distibuted Stoage Systems Akshay Kuma, Student Membe, IEEE, Ravi Tandon, Membe, IEEE, and T. Chales Clancy, Senio Membe, IEEE Abstact The incease in data stoage and powe consumption at data-centes has made it impeative to design enegy efficient distibuted stoage systems (DSS). The enegy efficiency of DSS is stongly influenced not only by the volume of data, fequency of data access and edundancy in data stoage, but also by the heteogeneity exhibited by the DSS in these dimensions. To this end, we popose and analyze the enegy efficiency of a heteogeneous distibuted stoage system in which n stoage seves (disks) stoe the data of R distinct classes. Data of class i is encoded using a ðn; k i Þ easue code and the (andom) data etieval equests can also vay acoss classes. We show that the enegy efficiency of such systems is closely elated to the aveage latency and hence motivates us to study the enegy efficiency via the lens of aveage latency. Though this connection, we show that easue coding seves the dual pupose of educing latency and inceasing enegy efficiency. We pesent a queuing theoetic analysis of the poposed model and establish uppe and lowe bounds on the aveage latency fo each data class unde vaious scheduling policies. Though extensive simulations, we pesent qualitative insights which eveal the impact of coding ate, numbe of seves, sevice distibution and numbe of edundant equests on the aveage latency and enegy efficiency of the DSS. Index Tems Easue codes, distibuted stoage, Fok-Join queues, latency, enegy efficiency, multi-class queuing system Ç 1 INTRODUCTION CLOUD based stoage systems ae emeging to gain significant pominence due to thei highly vitualized infastuctue that pesents cost-effective and simple to use elastic netwok esouces. The backbone infastuctue of the cloud is compised of distibuted stoage systems (DSS), in which the data is stoed and accessed fom commodity stoage disks. Coding of data acoss distibuted disks povides fault toleance by poviding eliability against unexpected disk failues. Thee has been a ecent paadigm shift fom classical eplication based codes to easue codes because they povide highe fault toleance at the same stoage cost [2]. As a esult, a numbe of commecial DSS such as Google Colossus, Windows Azue etc. ae tansitioning to the use of easue codes [3], [4], [5]. Besides poviding fault toleance and minimizing stoage cost, anothe impotant aspect which deseves equal, if not moe attention is the enegy efficiency of DSS. Ove the last decade, the damatic usage of data has lead to an enomous incease in the volume of stoed (achival) data and the fequency of data access to a DSS [6]. This tanslates to moe and moe seves being added to the data-cente opeating at highe seve utilization levels. As a esult, A. Kuma is with the Depatment of ECE, Viginia Tech, Blacksbug, VA. E-mail: akshay2@vt.edu. R. Tandon is with the Discovey Analytics Cente and the Depatment of CS, Viginia Tech, Blacksbug, VA. E-mail: tandon@vt.edu. T. Chales Clancy is with the Hume Cente and the Depatment of ECE, Viginia Tech, Blacksbug, VA. E-mail: tcc@vt.edu. Manuscipt eceived 30 Nov. 2014; evised 22 May 2015; accepted 25 June 2015. Date of publication 22 July 2015; date of cuent vesion 7 June 2017. Recommended fo acceptance by C. Mastoianni, S.U. Khan, and R. Bianchini. Fo infomation on obtaining epints of this aticle, please send e-mail to: epints@ieee.og, and efeence the Digital Object Identifie below. Digital Object Identifie no. 10.1109/TCC.2015.2459711 the enegy consumption of the data-centes is inceasing steeply and adds up to its opeational cost. Accoding to [7], enegy consumed by the data centes globally has inceased by 19 pecent in 2012 and stoage systems in a lage data-cente consume up to 40 pecent of the total enegy [8]. Hence, thee is a need to devise enegy efficient data stoage schemes. The existing techniques fo enegy-efficient data stoage ae based on vaiants of schemes that involve poweing off stoage devices [9], [10], [11]. Enegy efficiency is a system wide popety and while some metics focus on the enegy efficiency of hadwae o softwae components [12], othes ae based on the usage of physical esouces (such as CPU, memoy, stoage etc.) by the unning applications o seves. Fo the scope of this wok, we focus on the data tansfe thoughput metic [13], which measues enegy efficiency as the amount of data pocessed in the DSS pe unit amount of enegy expended acoss all distibuted seves. Theefoe, the enegy efficiency of DSS is stongly influenced by the volume of data tansfeed (pe equest), fequency of data stoage/access equests, sevice ate of each seve and the degee of edundancy in data stoage. The enegy efficiency of a DSS is also closely elated to its ead/wite latency. 1 The data stoed and accessed fom the cloud is boadly classified into two categoies [14], [15]: Hot-data: this could efe to data which is fequently accessed (i.e., a highe job equest ate). Futhemoe, it is desiable to povide highe edundancy/ fault toleance when stoing such data. 1. Hee, latency efes to the time taken to pocess a data equest, measued elative to the time at which it entes the DSS. Fo the scope of this wok, we conside latency to be the sum of queuing delay and sevice time, and assume the othe delays to be elatively negligible. 2168-7161 ß 2015 IEEE. Pesonal use is pemitted, but epublication/edistibution equies IEEE pemission. See http://www.ieee.og/publications_standads/publications/ights/index.html fo moe infomation.

222 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 Cold-data: this could efe to data which is infequently accessed o achival data. Such data does not necessaily mandate to be coded and stoed with highe fault toleance, as it is seldom accessed by the uses. When the data is infequently accessed as in case of Colddata, the aveage latency is educed and it also impoves the enegy efficiency of DSS [16]. Anothe case in point is that inceasing edundancy as in Hot-data impoves fault-toleance but geneally esults in inceased latency [17]. The enegy efficiency in this case deceases due to incease in powe consumption as moe seves ae involved in pocessing the same data equest. Thus the latency of DSS is closely tied with its enegy efficiency. Theefoe in this wok, we study the enegy efficiency of a DSS though the lens of aveage latency of DSS. As mentioned ealie, the easue coded DSS, due to thei seveal meits ove eplication based codes, have gained significant pominence in ecent times. Theefoe, in this wok we study the elationship between latency and enegy efficiency fo such systems. In a easue coded DSS, the data of each use is stoed acoss n disks (o seves) using a ðn; kþ optimal Maximum-Distance- Sepaable (MDS) code. By the popety of MDS codes, accessing the data stoed at any k out of n seves suffices to ecove the entie data of a use (also efeed to as successful completion of the job equest 2 of that use). The pocessing of job equests in DSS is typically analyzed using Fok-Join (F-J) queues [19], [20]. A ðn; kþ F-J queue consists of n independently opeating queues coespondingtoeachofthenseves. Evey job aiving in the system is split n ways and entes the queues of all n seves simultaneously. A queuing theoetic latency analysis of the ðn; kþ F-J system has been done in [17] (also see [21], [22], [23]). The key findings of these papes is that using easue coding and sending edundant equests (equests to moe than k seves fo a ðn; kþ F-J system) can significantly educe the latency of a DSS. Howeve, most of the afoementioned liteatue consides a homogenous stoage achitectue and thee is no distinction (fom system s pespective) between any two job equests enteing the system. Howeve, that is hadly the case with eal DSS [24], [25], wheein as mentioned ealie (see Hot-data vs. Cold-data), the job equests can be classified into one of the seveal classes based on the job aival ate o fault-toleance/stoage equiements. Fo instance, Amazon S3 [24] allows its customes to choose fom following stoage options: Standad Stoage, Reduced Redundancy Stoage, and Glacie Stoage. Standad Stoage is the most expensive but it povides maximum eliability and availability. At the othe exteme is the inexpensive Glacie Stoage which stoes data with low edundancy and is designed fo non-citical and infequently accessed data. Theefoe, motivated by this obsevation, we conside a ðn; k 1 ;k 2 ;...; k R Þ multi-tenant DSS fo R distinct data classes, a genealization of the homogenous ðn; kþ DSS in [17]. Data of class i 2. We estict ou attention to ead equests because in most of the pactical DSS, such as HDFS [18], Windows Azue [5] etc. the use s data is witten only once to the stoage nodes but it can be etieved multiple times by the use. ð8i 2f1; 2;...;RgÞ is stoed acoss n seves using a ðn; k i Þ easue (MDS) code. The aivals 3 of job equests of class i ae assumed to follow a Poisson distibution with ate i. The key contibutions of this pape ae: A multi-tenant DSS is poposed and analyzed though the Fok Join famewok to account fo the heteogeneity in job aival ates and fault-toleance equiements of diffeent data classes. A data thoughput based enegy efficiency metic is defined fo the heteogeneous DSS opeating unde any given scheduling policy. Fo the special case of single seve and data class, we showed that the aveage latency and enegy efficiency of DSS ae closely elated to each othe. Theefoe, using a queuing-theoetic appoach, we povided lowe and uppe bounds on the aveage latency fo jobs of class i (8i 2f1; 2;...;Rg) in the poposed F-J famewok unde vaious scheduling policies such as Fist- Come-Fist-Seve (FCFS), peemptive and nonpeemptive pioity scheduling policies. We studied the impact of vaying the code-ate on the latency, enegy efficiency and netwok bandwidth consumed by DSS. Inceasing code-ate educes latency and inceases enegy efficiency. Howeve, this comes at the cost of inceased stoage space and (wite) bandwidth. We also obtained inteesting insights fom investigating the impact of vaying the numbe of seves, heavy-tail aival/ sevice distibutions in the DSS. Lastly, we studied the impact of vaying the numbe of edundant equests (sending equests to moe than k seves fo ðn; kþ MDS code) to the DSS. We obseved that sending edundant equests educes latency and inceases enegy efficiency. Thus, full edundancy esults in minimum latency and maximum enegy efficiency fo each data-class. 2 RELATED WORK A numbe of good MDS codes such as LINUX RAID-6 and aay codes (EVENODD codes, X-code, RDP codes) have been developed to encode the data stoed on cloud (see [26] and efeences theein). These codes have vey low encoding/decoding complexity as they avoid Galois Field aithmetic (unlike the classical Reed-Solomon MDS codes) and involve only XOR opeations. Howeve, they ae usually applicable upto two o thee disk failues. Also, in the event of disk failue(s), Aay codes and ecently intoduced Regeneating codes educe disk and netwok I/O espectively. Recently, non-mds codes such as Tonado, Rapto and LRC codes [27], [28] have been developed fo easue coded stoage. Although the fault-toleance is not as good as MDS codes, they achieve highe pefomance due to lowe epai bandwidth and I/O costs. The latency analysis of (MDS) easue coded ðn; kþ homogenous DSS has been well investigated in [17], [21], [22] which povide queuing theoetic bounds on aveage 3. Job aivals efes to the time instants at which job equests entes the queues of the seves in the DSS.

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 223 Fig. 2. MDS codes fo data stoage in a two-class Fok-Join system. Fig. 1. System model. latency. A elated line of wok [23], [29] independently showed that sending equests to multiple seves always educes the (ead) latency. Then Liang and Kozat [30] extended the latency analysis to a ðn; k; LÞ DSS, in which n of a total L numbe of independent seves ae used to stoe the ðn; kþ MDS code. It assumed a constant +exponential model fo the sevice time of jobs. The authos in [31], [32] developed load-adaptive algoithms that dynamically vay job size, coding ate and numbe of paallel connections to impove the delay-thoughput tadeoff of key-value stoage systems. These solutions wee extended fo heteogeneous sevices with mixtue of job sizes and coding ate. Recently, Xiang et al. [33] povided a tight uppe bound on aveage latency, assuming abitay easue code, multiple file types and a geneal sevice time distibution. This was then used to solve a joint latency and stoage cost optimization poblem by optimizing ove the choice of easue code, placement of encoded chunks and the choice of scheduling policy. Data-centes while configued fo peak-sevice demand, end up being highly undeutilized. Futhemoe, the hadwae components in stoage systems ae not powe popotional, with idle mode consuming oughly 60 pecent of that of a busy powe [34]. This has esulted in significant eseach in designing/implementing powe efficient schemes. Most of the cuent liteatue focuses on powe management via pefomance scaling (such as DVFS [35], [36]) o low-powe states [37]. Recently, Liu et al. [38] investigated the effect of vaying vaious system opeations such as pocessing speed, system on/off decisions etc. on the powe-delay pefomance fom a queuing theoetic pespective. This wok was extended in [39], wheein a joint speed scaling and sleep state management appoach was poposed, that detemines the best low-powe state and fequency setting by examining the powe consumption and aveage esponse time fo each pai. Howeve, the wok in [38], [39] does not pesent the powe-delay pefomance analysis in a easue coded DSS. Also it focuses on powe consumption athe than the moe elevant, enegy efficiency of DSS. Theefoe, in this wok, we study the elationship between enegy efficiency and aveage latency in a (MDS) easue coded heteogeneous DSS fo diffeent scheduling policies. 3 SYSTEM MODEL A heteogeneous multi-tenant ðn; k 1 ;k 2 ;...;k R Þ DSS (shown in Fig. 1) consists of n seves that stoe the data of R distinct classes. The R classes diffe fom each othe in the faulttoleance, stoage equiements and fequency of access of the stoed data. The data of class i (which is assumed to be of size l i ) is patitioned into k i equal size fagments and then stoed acoss n seves using a ðn; k i Þ Maximum-Distance-Sepaable code. Thus each seve stoes, 1=k i faction of oiginal data. The aival pocess fo equest of class i is assumed to be Poisson with ate i. The sevice time at each seve is assumed to follow an exponential distibution with sevice ate m (pe unit file size) [32]. The effective sevice ate at any seve fo jobs of class i is m i ¼ k im l i since each seve stoes 1=k i faction of data. Example 1. We now pesent a epesentative example to illustate the system model. Conside a ðn; k 1 ;k 2 Þ¼ ð3; 2; 1Þ two-class DSS. Data fo the two classes A and B ae encoded acoss n ¼ 3 seves using ð3; 2Þ and ð3; 1Þ MDS codes espectively as shown in Fig. 2. Let A 1 and B 1 denote two files of class A and B espectively that need to be coded and stoed acoss the seves. Then fo the ð3; 2Þ MDS code, A 1 is split into two sub-files, A 11 and A 12, of equal size and ae stoed on any two seves (seves 1 and 2 in Fig. 2). Then the emaining seve (i.e. seve 3) stoes A 11 A 12. Thus each seve stoes half the size of oiginal file and the entie file can be ecoveed fom any two seves. The ð3; 1Þ MDS code fo file B 1,isa simple eplication code in which each seve stoes the copy of entie file of class B and thus can be ecoveed by accessing the data fom any one seve. The evolution of system state in this example, depends on the local scheduling policy at each seve. Although thee exists vaious scheduling policies, in this wok we conside Fist-Come-Fist-Seve, peemptive and non-peemptive pioity queuing policies at each seve. In FCFS scheduling, all data classes ae equal pioity. At each seve, the job that entes fist in the buffe is seved fist. In a pioity queuing policy, the data classes ae assigned diffeent pioity levels. A job of a paticula class will be seved only when thee ae no outstanding jobs of classes with highe pioity level. A pioity queuing policy is futhe classified as peemptive o non-peemptive based on whethe o not the job in seve can be peempted by a job of highe pioity level. Figs. 3a, 3b, and 3c illustates the evolution of system state unde the FCFS policy. Afte seve 2 finished job A 1 in Fig. 3a, B 1 entes seve 2 and is finished in the next state (Fig. 3b) while othe seves still pocess A 1. Since k B ¼ 1, the emaining two copies of B 1 immediately exit the system.

224 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 Fig. 3. System state evolution: two-class FJ system with FCFS. Finally in Fig. 3c seve 1 finishes A 1 and since k A ¼ 2, A 1 exits at seve 3. 3.1 Latency and Enegy Efficiency Definition 1. Fo a f i ;l i ; m; ðn; k 1 ;k 2 ;...;k R Þg DSS, the aveage latency of class i unde some scheduling policy P is defined as T i P ¼ T i s;p þ T i q;p ; (1) whee Ts;P i and T q;p i ae the aveage sevice time and waiting time (in queue) fo a job of class i espectively. Fo a f i ;l i ; m; ðn; k 1 ;k 2 ;...;k R Þg DSS opeating unde scheduling policy P, a elevant metic fo measuing the enegy efficiency, E P, of the DSS is the data tansfe thoughput metic [13]. It is defined as the limiting atio of the amount of data pocessed, D P ðtþ, by the DSS to the enegy consumed, E P ðtþ, by the DSS in a infinitely lage time duation t. It has units of bits/joule. Now D P ðtþ is simply, D P ðtþ ¼ XR i¼1 l i N i ðtþ; (2) whee N i ðtþ is the numbe of jobs of class i pocessed by DSS in a time inteval t. In ode to detemine E P ðtþ, we model the powe consumption of the DSS as follows: To educe powe consumption, the seves ae equipped with the dynamic voltage/fequency scaling (DVFS) mechanism and low-powe states [39]. The DVFS mechanism educes opeating voltage and CPU pocessing speed (o fequency) in step to educe utilization and hence incease powe savings. The powe consumed by a seve in any state is the sum of powe consumed by the CPU and the platfom which compises of chipset, RAM, HDD, Fan etc. The powe consumed by the CPU and platfom in a given state is assumed to be same acoss all the n seves. The powe consumed by a seve (CPU and platfom) while being in active and low-powe state is denoted by P on and P off espectively. A seve is in active mode duing the busy peiods (i.e., thee ae outstanding jobs waiting fo sevice). In geneal, at the end of a busy peiod, a seve emains active fo a while and then entes a sequence of low-powe states staying in each fo a pedetemined amount of time. Fo ease of analysis, we lump them into a single low-powe state with constant CPU powe, C l and constant platfom powe, P l. Afte the busy peiod is Fig. 4. Vaiation of total powe consumption of DSS acoss multiple busy peiods and idle peiods. The switch to idle state happens only fo idle peiods with duation geate than d l but it then esults in a wake-up latency of w l. ove, the seve emains in active mode fo d l and then entes the low-powe state. 4 When the busy peiod estats, the seve incus a wake-up latency w l in which it consumes active mode powe, but is not capable of pocessing any job equests. Fig. 4 explains this using an example. The CPU powe duing active mode, C a is popotional to V 2 f, whee V is the supply voltage and f is the CPU opeating fequency 5 (f 2½0; 1Š) and ae set by the DVFS mechanism. Futhe, we assume that V is popotional to f [39]. So C a ¼ C 0 f 3, fo some maximum powe C 0. The powe consumed by the platfom duing active mode, P a, is constant. t i;j;k busy denotes the duation of time fo which the kth seve is busy seving jth job of ith class. denotes the duation of idle peiod afte the kth seve finished the jth job of ith class. Using the above notations, the active mode powe pe seve is P on ¼ C a þ P a ¼ C 0 f 3 þ P a. Similaly, P off ¼ C l þ P l. Conside any time duation t of inteest duing the opeation of DSS. Duing this peiod, the total time fo which the DSS is in active mode, t a, is sum total (acoss all seves) of all busy peiods plus the active mode time befoe enteing low-powe state. Mathematically, we have, t i;j;k idle t a ¼ XR i¼1 XN i ðtþ X n j¼1 k¼1 t i;j;k busy þ maxð0;ti;j;k idle d lþ: (3) The total time fo which DSS is in low-powe state, t l is, t l ¼ nt t a : (4) We have now the following definition of enegy efficiency of a DSS. Definition 2. Fo a f i ;l i ; m; ðn; k 1 ;k 2 ;...;k R Þg DSS, the enegy efficiency of the DSS unde some scheduling policy P is defined as, 4. As a consequence, if the duation of idle peiod (time between end of a busy peiod and stat of the next one) is smalle than d l, then the seve always emains active). 5. Due to this, the effective sevice ate fo class i becomes m i ¼ fk i m=l i.

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 225 D P ðtþ E P ¼ lim t!1 E P ðtþ ; (5) i¼1 ¼ lim l in i ðtþ ; (6) t!1 P on t a þ P off t l whee (6) follows fom (5) using (2). The expessions fo t a and t l ae given in (3) and (4) espectively. Next in ode to highlight the elationship between the aveage latency and enegy efficiency of a DSS, we conside the special case of a M/M/1 system and a single data-class. Fo tactability of analysis, hee we assume that d l, w l and P off ae all 0. Then fom the Definition 1 fo aveage latency, 6 we have, T ¼ T s þ T q ; (7) ¼ 1 m 0 þ m 0 ðm 0 Þ ¼ 1 m 0 ; (8) whee (8) follows fom (7) by noting that fo a M/M/1 system, the mean sevice time is T s ¼ 1 m 0 and mean waiting time is T q ¼ m 0 ðm 0 Þ. Hee, m0 ¼ mf l is the effective sevice ate. Hee, The enegy efficiency is computed using (6) as E¼lim t!1 lnðtþ P on t a þ P off t l ; (9) lnðtþ ¼ lim t!1 P NðtÞ P on i¼1 T s;i ; (10) l ¼ P NðtÞ ; (11) i¼1 P on lim T s;i t!1 NðtÞ ¼ l ; (12) P on T s whee (10) follows fom (9) by noting that t on is sum of sevice time of each of NðtÞ jobs (denoted by T s;i fo the ith job) and by neglecting the powe consumed when seve is idle i.e., P off ¼ 0. Then (12) follows fom (11) fom the definition of aveage sevice time. Thus the enegy efficiency is invesely elated to the aveage sevice time of jobs. It is difficult to find a closed fom expession fo the enegy efficiency of a heteogeneous DSS but the geneal tend of invese popotionality between latency and enegy efficiency continues to hold tue as veified though extensive simulations in Section 6. The aveage latency is also diectly elated to the aveage sevice time. 7 Theefoe, we conclude that enegy efficiency and aveage latency of a DSS ae closely elated to each othe. Hencefoth, we focus on the latency analysis of a heteogeneous DSS. 6. In this special case, the scheduling policy P and class index i ae not elevant and hence dopped fom 1. 7. Queuing delay depends on job aival ate and sevice time. So the latency which is sum of queuing delay and sevice time diectly depends on sevice time. 4 PRELIMINARIES In this section, we fist pesent the analysis of aveage sevice latency in a multi-class single seve system with FCFS scheduling policy. Fo the coesponding esults in a pioity (peemptive/non-peemptive) queuing system, we efe the eade to [40]. To impove the tactability of the latency analysis, the analytical esults in this wok ignoe the impact of wakeup latency w l simila to othe woks in liteatue [17], [21], [22], [23], [29]. We then biefly eview the existing esults fo uppe and lowe bounds on the aveage latency fo a ðn; kþ homogenous Fok-Join system [17]. 4.1 Aveage Latency in Multi-Class Single Seve System with FCFS Scheduling Conside the system model descibed in Fig. 1 with n ¼ 1 seve and FCFS scheduling policy. The FCFS system can be modeled as a M/G/1 queuing system with net aival ate, ¼, and a geneal sevice distibution, S. The aveage latency of jobs of class i is the sum of thei aveage waiting time (in queue) and the aveage sevice time. Let S i be a andom vaiable epesenting the sevice time fo a job of class i in the FCFS system. Then the aveage sevice time of jobs of class i is simply the expectation, E½S i Š. In the FCFS system, the waiting time, W FCFS, fo jobs of all the classes is same and is given by the Pollaczek-Khinchine (P-K) fomula [41] (fo M/G/1 system) as W FCFS ¼ ðe½s2 ŠÞ 2ð1 E½SŠÞ : (13) Theefoe, the aveage latency fo jobs of class i is, TFCFS i ¼ E½S išþ E½S2 Š 2ð1 E½SŠÞ ¼ E½S išþ ðv½sšþe½sš2 Þ ; 2ð1 E½SŠÞ (14) whee V½:Š denotes the vaiance of the andom vaiable. Now the faction of jobs of class i, p i is p i ¼ i ¼ i : (15) So the pobability that S takes on the value of S i is p i 8i ¼ 1; 2;...;R. Theefoe the pobability distibution function (pdf) of S is given by f S ðsþ ¼ XR p f S ðsþ: (16) Then the mean and the second moment of S ae simply E½SŠ ¼ XR p E½S Š; E½S 2 Š¼ XR Using (15) and (17) in (14), we obtain, T i FCFS ¼ E½S išþ P h i R V½S ŠþE½S Š 2 p E½S 2 Š: (17) 2 1 P : (18) R E½S Š

226 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 Fig. 5. Makov chain fo a ð3; 2Þ Fok-Join system. 4.2 Latency Analysis of Homogenous DSS An exact latency analysis of the ðn; kþ DSS is pohibitively complex because the Makov chain has a state space with infinite states in at least k dimensions. This is exemplified in Fig. 5 which shows the Makov chain evolution fo a ð3; 2Þ DSS. Each state is chaacteized by the numbe of jobs in the system. The aival and sevice ates of jobs ae and m espectively. We note that as moe jobs ente the system, the Makov Chain stats gowing in two-dimensions and esults in multiple states with the same numbe of jobs in the system such as states 6 and 6 0. Thus, we note that an exact analysis of the F-J system is vey complex. Theefoe, we eview existing uppe- and lowe-bounds fo the aveage latency of homogenous DSS. 4.2.1 Lowe Bound on Aveage Latency In a ðn; kþ DSS, a job is consideed finished when k out of n seves finish that job. This is equivalent to each job going though k stages sequentially, whee the tansition fom one stage to the next occus when one of the emaining seves finishes a sub-task of the job [42]. We note that at any stage s, the maximum possible sevice ate fo a job that is not finished yet is ðn s þ 1Þm 0, whee m 0 ¼ fkm l. This happens when all the emaining sub-tasks of a job ae at the head of thei queues. Thus, we can enhance the latency pefomance in each stage s by appoximating it with a M/M/1 system with sevice ate ðn s þ 1Þm 0. Then, the aveage latency of the oiginal system (denoted by T), can be lowe bounded as T T LB ¼ Xk i¼1 1 ðn i þ 1Þm 0 ; (19) whee T LB denotes the lowe bound on the aveage latency of the F-J system. 4.2.2 Uppe Bound on Aveage Latency To uppe-bound the pefomance of the ðn; kþ F-J system, we degade its pefomance by appoximating it with a ðn; kþ Split-Mege (SM) system, poposed in [17]. In the ðn; kþ SM system, afte a seve finishes a copy of a job, it is blocked and not allowed to accept new jobs until all k copies of the cuent job ae finished. When k copies of a job ae finished, the copies of that job at emaining n k seves exit the system immediately. The SM system thus can be modeled as a M/G/1 system with aival ate and a sevice distibution that follows kth ode statistics [43] and is descibed hee fo efeence. Let X 1, X 2 ;...;X n be n i.i.d andom vaiables (v). Now if we ode the v s in ascending ode to get, X 1;n <X 2;n <X k;n <X n;n, then the distibution of the kth smallest value, X k;n, is called the kth ode statistics. The pdf of X k;n is given by 8 n f Xk;n ðxþ ¼ F X ðxþ k 1 ð1 F X ðxþþ n k f X ðxþ; k 1; 1;n k (20) whee F X ðxþ and f X ðxþ ae the cumulative density function and pdf of X i espectively fo all i. The aveage latency of the F-J system is thus uppe-bounded by the aveage latency fo the SM system, T SM as, h i V½X k;n ŠþE½X k;n Š 2 T T SM ¼ E½X k;n Š þ ; (21) fflfflffl{zfflfflffl} 2ð1 E½X k;n ŠÞÞ sevice time fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} waiting time whee the aveage sevice time is simply the expectation, E½X k;n Š and the aveage waiting time fo a M/G/1 system given by the P-K fomula in (13). Now if X i is exponential with mean 1=m 0 (whee m 0 ¼ fkm l ), then the mean and vaiance of X k;n ae given by, E½X k;n Š¼ H1 n k;n m 0 ; V½X k;n Š¼ H2 n k;n ; (22) m 0 2 whee Hx;y z is a genealized hamonic numbe of ode z defined by H z x;y ¼ Xy j¼xþ1 fo some positive integes x; y and z. 5 MAIN RESULTS 1 j z ; (23) Section 4.2 pesented bounds on the aveage latency fo the ðn; kþ F-J system. To extend the lowe-bound esult (19) to a heteogeneous FJ system, a naive appoach would be to appoximate it with a homogenous FJ system with jobs of class i only while evaluating the lowe bound on aveage latency of class i. Thus a naive lowe-bound on the aveage latency fo jobs of class i is, Tnaive i Xk i 1 1 : (24) ðn jþm i i j¼0 8. The esult in (20) can be undestood as follows. Fist select goups n of k 1, 1, and n k seves out of n seves in k 1;1;n k possible ways. Then the pdf of sevice time fo the singled-out seve is simply f X ðxþ. Now since X i ae i.i.d andom vaiables, the pobability that the selected k 1 seves finish thei jobs befoe the singled-out seve is F X ðxþ k 1. Similaly, the pobability that n k seves finish thei jobs afte the singled-out seve is ð1 F X ðxþþ n k.

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 227 This lowe bound holds tue iespective of the scheduling policy used in the heteogeneous system. Howeve, this is a loose bound as it ignoes the dependency of esponse time fo a job of class i on the jobs of othe classes in the system which compete fo the sevice at the same seve. Theefoe, though a igoous latency analysis of vaious scheduling policies, we next account fo this intedependency in aveage latency of diffeent classes and pesent lowe and uppe bounds fo the heteogeneous FJ system. To this end, we fist define a set of vaiables fo a compact pesentation of the esults. The opeational meaning of these vaiables will become clea when we pesent the poof of the esults. ðn; k i Þ is the MDS code used to stoe data of class i. l i is the file-size fo class i. i is the aival ate fo jobs of class i. m i ¼ k i fm=l i is the effective sevice ate fo jobs of class i, whee m is the sevice ate pe unit file size. i ¼ i m i is the seve utilization facto fo class i. S i ¼ P i Hn k 1. ;n 5.1 Main Results Lemma 1 gives the stability conditions of the heteogeneous DSS fo vaious scheduling policies. The uppe- and lowebounds on the aveage latency fo vaious scheduling policies ae pesented in Theoem 1 and 2 espectively. Lemma 1. Fo a ðn; k 1 ;k 2 ;...;k R Þ Fok-Join system to be stable, the following condition must be satisfied at each node. FCFS scheduling! X R k X R! l <nfm XR : (25) k Peemptive/Non-peemptive pioity scheduling X R l <nfm: (26) Next, to uppe-bound the aveage latency, we extend the Split-Mege system (defined in Section 4.2.2) to R data classes, keeping the scheduling policy same as that fo the oiginal system. Then fo a given scheduling policy, the uppebound on aveage latency is basically the aveage latency of the coesponding SM system. This in tun is sum of the aveage sevice time and waiting time which can be detemined by noting the equivalence between the SM system as a M/G/1 system as descibed in Section 4.2.2. We thus obtain the following uppe-bounds on the aveage latency fo diffeent scheduling policies. Theoem 1. The aveage latency fo job equests of class i in a ðn; k 1 ;k 2 ;...;k R Þ Fok-Join system is uppe-bounded as follows: FCFS scheduling TFCFS i H1 n k i ;n þ ½Hn k 2 ;n þðh1 n k ;n Þ2 2 Š=m : (27) m fflfflffl{zfflfflffl} i 21 S ð R Þ fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} Sevice time Waiting time The bound is valid only when S R < 1. Non-peemptive pioity scheduling 9 T i N PQ H1 n k i ;n m i þ ½Hn k 2 ;n þðh1 n k ;n Þ2 2 Š=m : 21 S ð i 1 Þð1 S i Þ (28) The bound is valid only when S i < 1. Peemptive pioity scheduling 9 P i TPQ i H1 n k i ;n m i ð1 S i 1 Þ þ ½Hn k 2 ;n þðh1 n k ;n Þ2 2 Š=m : 21 S ð i 1 Þð1 S i Þ (29) The bound is valid only when S i < 1. We now define an additional set of vaiables fo compact pesentation of the esults in Theoem 2. Without loss of geneality, assume the classes ae elabeled such that k 1 k 2 k R. Then fo class i, we define c s as, 8 0; 1 s k 1 >< 1; k 1 <sk 2 c s ¼. : (30). >: i 1; k i 1 <sk i At a stage s, let R i s denote the set of classes with pioity highe than class i and that have not been finished yet. t s;i ¼ i ðn sþ1þm i at stage s and class i. Z i s ¼ 1 P 2R i t s; at stage s and class i. s Fo obtaining a lowe-bound on the aveage latency, we enhance the pefomance of the oiginal system simila to the pocess descibed in Section 4.2.2. The pocessing of a job of class i is modeled as completing k i sequential stages (o sub-tasks). Then we enhance the latency pefomance fo job of class i in stage s by assuming the maximum possible sevice ate fo it, i.e, ðn s þ 1Þm i. Howeve, at stage s, thee may also be unfinished sub-tasks of jobs of othe classes which can be seved with maximum possible sevice ate of ðn s þ 1Þm j, whee j 6¼ i. Due to this, we model the pefomance of each enhanced stage as a M/G/1 system. We thus obtain the following lowe-bounds on the aveage latency fo diffeent scheduling policies. Theoem 2. The aveage latency fo job equests of class i in a ðn; k 1 ;k 2 ;...;k R Þ Fok-Join system is lowe-bounded as follows: FCFS scheduling 0 t 2 s; ¼c s þ1 TFCFS i Xk i t s;i B þ s¼1 {z} i 1 @ sevice time ¼c s þ1 t s; fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} waiting time 1 C A : (31) 9. Without loss of geneality, we set the classes in the ode of deceasing pioity as 1 > 2 > >R.

228 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 Fig. 6. Enhanced two-class FJ system with FCFS. Non-Peemptive pioity scheduling 9 0 P 1 TN PQ i Xk i R t 2 s; @ t s;i ¼c þ s þ1 i Z i s Zi s t A: (32) s;i s¼1 Peemptive pioity scheduling 9 0 1 TPQ i Xk t i t s;i P2R 2 i Z i þ is s; @ [i s Z i s Zi s t A: (33) s;i s¼1 5.2 Poofs fo FCFS Scheduling We now pesent the poofs fo the stability condition and the bounds on aveage latency fo the FCFS scheduling policy. The poofs fo the emaining esults ae given in [44]. 5.2.1 Poof of Lemma 1-FCFS Scheduling Conside any seve in the ðn; k 1 ;k 2 ; :::; k R Þ Fok-Join system. Jobs of class ente the queue with ate. Each new job of class exits the system when k sub-tasks of that job ae completed. The emaining n k sub-tasks ae then cleaed fom the system. Thus fo each job of class, ðn kþ n faction of the sub-tasks ae deleted and hence the effective aival ate of jobs of class at any seve is 1 n k n ¼ k n. Thus the oveall aival ate at any seve, eff,is eff ¼ XR k n : (34) Let S denote the sevice distibution fo a single-seve FCFS system seving R data classes. Then fom (17), the mean sevice time at a seve is E½SŠ ¼ XR p E½S Š¼ XR P m R ; (35) whee (35) follows fom (15) and the assumption that the sevice time fo a job of class i is exponential with ate m. To ensue stability, the net aival ate should be less than the aveage sevice ate at each seve. Thus fom (34) and (35) the stability condition of each queue is! X R 1 k n < XR P m R : Since m ¼ fk m l and the tem is a constant, with simple algebaic manipulations we aive at Fig. 7. Latency of a data-class inceases with incease in its code-ate and deceases with incease in sevice ate. X R k! X R! l <nfm XR : (36) k This completes the poof of stability condition fo FCFS scheduling. 5.2.2 Poof of Theoem 1-FCFS Scheduling The FCFS system can be modeled as a M/G/1 queuing system with aival ate ¼ and a geneal sevice time distibution S. Then the aveage latency fo a job of class i in a FCFS scheduling system is given by (18) as, P h i R V½S ŠþE½S Š 2 T i fcfs ¼ E½S išþ 2 1 P : R E½S Š To obtain an uppe bound on the aveage latency, we degade the FJ system in the following manne. Fo a job of class i, the seves that have finished pocessing a sub-task of that job ae blocked and do not accept new jobs until k i sub-tasks of that job have been completed. Then the subtasks at emaining n k i seves exit the system immediately. Fig. 6 illustates this pocess using Example 1. When A 1 is finished at seve 2, it is blocked (see Fig. 7b) until anothe k A ¼ 2 copies ae finished. Now this pefomancedegaded system can be modeled as a M/G/1 system whee the distibution of the sevice pocess, S i, follows k i th odeed statistics as descibed in Section 4.2.2. Now fo any class i, the sevice time at each of the n seves is exponential with mean 1=m i. Hence fom (22), the mean and vaiance of S i ae, E½S i Š¼ H1 n k i ;n ; V½S i Š¼ H2 n k i ;n : (37) m i Substituting (37) in (18), we get the following uppe bound on aveage latency: TFCFS i H1 n k i ;n þ ½Hn k 2 ;n þðh1 n k ;n Þ2 2 Š=m ; (38) m fflfflffl{zfflfflffl} i 21 S ð R Þ fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} sevice time waiting time m 2 i

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 229 whee S R ¼ H 1 n k ;n and ¼ =m. This concludes the poof of uppe bound on the aveage latency fo FCFS scheduling. 5.2.3 Poof of Theoem 2-FCFS Scheduling Fo the pupose of obtaining a lowe bound on the aveage latency of class i, using insights fom Section 4.2.1, we map the paallel pocessing in the poposed FJ system to a sequential pocess consisting of k i pocessing stages fo k i sub-tasks of a job of class i. The tansition fom one stage to the next occus when one of the emaining seves finishes a sub-task of the job. Let c s denotes the numbe of classes that ae finished befoe stat of stage s, defined in (30). The pocessing in each stage s coesponds to a single-seve FCFS system with jobs of all but classes 1; 2;...;c s. Then, using (14) fo the FCFS sub-system at stage s, the aveage latency fo a sub-task of a job of class i in stage s is given by, TFCFS;s i ¼ E½Ss i Šþ E½ðSs Þ 2 Š 2ð1 E½S s ŠÞÞ ; (39) whee S s is a.v. denoting the sevice time fo any sub-task in stage s and Si s denotes the sevice time fo a sub-task of class i in stage s. Now the moments of S s and Si s ae elated to each othe in the same way as the moments of S and S i in (17). So we have, E½S s Š¼ XR ¼c s þ1 p E½S s Š; E½ðSs Þ 2 Š¼ XR Substituting (40) in (39), we get ¼c s þ1 p E½ðS s Þ2 Š: (40) TFCFS;s;c i s ¼ E½Si s Šþ ¼c s þ1 E½ðSi sþ2 Š 2 1 P : (41) R ¼c sþ1 E½Si sš Now we note that at any stage s, the maximum possible sevice ate fo a job of class j that is not finished yet is ðn s þ 1Þm j. This happens when all the emaining subtasks of job of class j ae at the head of thei buffes. Thus, we can enhance the latency pefomance in each stage s by appoximating it with a M/G/1 system with sevice ate ðn s þ 1Þm j fo jobs of class j. Then, the aveage latency fo sub-task of job of class i in stage s is lowe bounded as, ¼c sþ1 TFCFS;s;c i 1 s þ ðn s þ 1Þm i 1 ¼c sþ1 ðn sþ1þm 2 ðn sþ1þm ; (42) Finally, the aveage latency fo class i in this enhanced system is simply P k i s¼1 T i FCFS;s;c s. This gives us T i FCFS Xk i t s;i i þ Xk i s¼1 s¼1 fflffl{zfflffl} sevice time ¼c sþ1 t s; ðn sþ1þm! 1 ¼c s þ1 t s; fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} waiting time whee t s;i ¼ i ðn sþ1þm i. This concludes the poof of lowe bound on the aveage latency fo FCFS scheduling. ; 6 QUANTITATIVE RESULTS AND DISCUSSION In this section, we use Monte-Calo simulations of a heteogeneous Fok-Join system to study the impact of vaying vaious system paametes on the aveage latency of diffeent classes and the enegy efficiency of DSS. Fo simplicity, the numbe of data classes is set to 2. Data of class 1 is stoed using ðn; k 1 Þ¼ð10; 5Þ MDS code. Data of class 2 is stoed using ð10;k 2 Þ MDS code whee k 2 is vaied fom 1 to 10. Aival ates fo the two classes ae set as: 1 ¼ 0:15 and 2 ¼ 0:5. The job size fo both the classes is set to 1 kilobits. Job equests fo both the classes ae seved using full edundancy (i.e., 1 ¼ 2 ¼ n). We set the powe consumption paametes by using the data fo Intel Xeon family of CPUs 10 and associated platfom components [39]. Theefoe, we set C 0 ¼ 203:13 W, P a ¼ 120 W, C l ¼ 15 W, w l ¼ 6 s, and P l ¼ 13:1 W. The CPU fequency, f is set to 1 unless mentioned othewise. 6.1 Impact of Fault-Toleance and Sevice Rate The behavio of aveage latency with espect to change in fault-toleance k, is govened by two opposing factos. Inceasing k educes the numbe of seves available fo seving the next job in queue, thus esulting in an incease in latency. Inceasing k inceases the effective sevice ate (km) of each seve as each seve stoes a smalle faction ( l k ) of the job. This deceases the aveage latency. Fig. 7 shows the aveage latency fo jobs of class 2 vesus k 2 fo the FCFS system with m ¼ 1=6 and 1. 11 The file size fo both classes ae equal to 1 kb. We note that the aveage latency inceases on inceasing k 2. This is because m is lage enough, so the incement in latency due to the fist facto dominates the decease in latency due to the second facto. We also note that the bounds ae somewhat loose at high values of k 2 and low values of m. In paticula, the lowe bound becomes loose because at each pocessing stage of the seial Fok-Join system, the diffeence between the actual sevice ate and its bound at sth stage of pocessing (i.e. ðn s þ 1Þm i ) fo jobs of class i inceases with incease in k and decease in m. Similaly the uppe bound becomes wose because the sevice time lost due to blocking inceases significantly at low m and high k values. This is because the emaining sub-tasks ae seved eally slow (low m) and the blocking continues until a lage numbe of subtasks (high k) ae finished. Finally, as expected, we note that the naive lowe bound on latency of class 2 is loose as compaed to the poposed lowe bound fo the FCFS system. 6.2 Impact of Coding on Enegy Efficiency and Stoage Space Fig. 8 illustates the impact of vaying the code ate (k=n fo ðn; kþ MDS code) on the latency, enegy efficiency and netwok bandwidth of system. At one exteme is the ðn; 1Þ eplication code with code ate 1=n that has minimum latency (see Fig. 7) and maximum enegy efficiency. This is 10. We use the powe consumption paametes of Deepe Sleep state fo ou low-powe state. 11. In ou wok, we set file size to be multiples of 1 kilobits. So m is defined pe kilobit.

230 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 Fig. 8. Tadeoff between enegy efficiency of DSS and stoage space pe file with vaiation in code-ate. because we just wait fo any one seve to finish the job. Fo a fixed n, low latency tanslates to highe data thoughput and hence highe enegy efficiency. Howeve, the total stoage space pe file fo ðn; 1Þ code is nl whee l is file size. Hence the (wite) netwok bandwidth is maximum at k ¼ 1. At the othe exteme is ðn; nþ code with no fault-toleance and no stoage ovehead (stoage size is l). But it suffes fom high latency and low enegy efficiency. This is because we need to wait fo all the seves to finish thei sub-tasks of a job befoe the job is completed. Hence latency and thoughput suffes which in tun deceases enegy efficiency. 6.3 Impact of Numbe of Seves in DSS Fig. 9 shows the impact of inceasing the numbe of seves (n) on the latency and enegy efficiency of DSS, while all othe system paametes ae kept constant. We obseved that fo low values of n, inceasingn inceases the enegy efficiency. This is because of moe seves available to seve the job which educes aveage latency and thus incease the thoughput. The incease in thoughput due to lowe latency outweighs the incease in enegy consumption due to highe n. Hencetheoveall effect is that enegy efficiency inceases. Howeve at high values of n, inceasingn esults in diminishing etuns in latency and thoughput. This is because latency impovement is limited by effective sevice ate (km=l) and not the numbe of seves. At vey lage n, the enegy consumption becomes quite significant. Theefoe, the enegy efficiency begins to decease at lage n. Wethusconclude that thee is an optimum value of n that maximizes enegy efficiency and has nea minimal latency. Fig. 10. A light tailed geneal sevice distibution (Paeto-distibution with a ¼ 6) esults in monotonically deceasing latency as a function of codeate. Enegy efficiency follows an invese behavio. 6.4 Impact of Geneal Sevice Time In most of the pactical DSS, the sevice times ae not exponentially distibuted but athe have heavy-tail which means that thee is a significant pobability of vey lage sevice times. Paeto distibution has been found to be a good fit fo sevice time distibution in pactical DSS [45], [46]. Its cumulative distibution function is given by F S ðsþ ¼ 0 fo s<s m; 1 sm a (43) s fo s s m : Hee a is shape paamete and x m is the scale paamete. As the value of a deceases the sevice becomes moe heavytailed and it becomes infinite fo a 1. Figs. 10 and 11 show the impact of Paeto sevice distibution on the latency and enegy efficiency of DSS fo a ¼ 1:1 and 6 espectively. At a ¼ 6, the sevice distibution is not vey heavy-tailed. So inceasing k 2 educes latency of jobs of class 2 due to incease in thei effective sevice ate (k 2 mf=l 2 ). Howeve, at a ¼ 1:1, the sevice time distibution becomes vey heavytailed, so as k 2 becomes lage, the incease in sevice time due to waiting fo moe seves (lage k) outweighs the decease due to highe effective sevice ate. In both cases, we note that latency behaves invesely to the change in latency. We note that as k 2 inceases fom 1 to 10, enegy efficiency fist stats inceasing, eaches a maximum and then stats deceasing fo lage k. We conclude that fo heavy-tailed sevice distibution, thee exists an optimal Fig. 9. Enegy efficiency inceases and attains a maxima as numbe of seves is inceased while latency behaves in an invese fashion. Fig. 11. A heavy tailed geneal sevice distibution (Paeto-distibution with a ¼ 1:1) esults in minimal latency and maximum enegy efficiency point as the code-ate is inceased.

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 231 Fig. 12. A heavy tailed inte-aival time distibution (Paeto-distibution with a ¼ 1:5) esults in monotonically inceasing latency (and monotonically deceasing enegy efficiency) as the code-ate is inceased. code-ate that yield maximum enegy efficiency and minimum latency fo heavy-tailed sevice times. 6.5 Impact of Heavy-Tailed Aival Distibution Fig. 12 illustates the impact of a geneal (Paeto) aival time distibution on the latency and enegy efficiency of DSS. We obseved that when distibution becomes heavy tailed, latency inceases (and enegy efficiency deceases) with incease in code ate. The heavy-tailed aival distibution esults in occasional vey lage inte-aival time, howeve the aival ate emains the same. Since it does not influence significantly the sevice dynamics, we obseve that the latency inceases with incease in code-ate simila to the M/M/1 case (in Fig. 7). Since latency inceases, enegy efficiency deceases with incease code-ate simila to pevious esults. 6.6 Impact of Numbe of Redundant Requests We now exploe the impact of vaying the numbe of edundant equests (i.e., sending job equests to moe than k seves) on the aveage latency and enegy efficiency of DSS. The behavio of latency is govened by two opposing factos. Inceasing the numbe of edundant equests educes the sevice time because thee ae moe seves available that simultaneously pocess the same job. This educes the sevice time of each job. It inceases the enegy efficiency because the seves can pocess moe equests pe unit time. On the othe hand, inceasing the numbe of edundant equests educes the numbe of seves available fo seving the next job in queue, thus esulting in incease of size of queue at the seves. This esults in loss of thoughput and hence a plausible decease in enegy efficiency. As it tuns out that the fist facto is moe dominant than the second one, theeby esulting in an oveall eduction in latency (incease in enegy efficiency) by inceasing the numbe of edundant equests. This behavio can be obseved in Fig. 13 which shows the aveage latency of class 1 and enegy efficiency of DSS fo FCFS scheduling. In this figue, the edundancy fo class 1, 1,isvaiedfom4 to 10 and the edundancy of class 2 is set to 2 ¼ 10. Fig. 13. Sending edundant equests educes aveage latency and impoves enegy efficiency. 7 CONCLUSIONS In this pape, we poposed a novel multi-tenant DSS model and analyzed the enegy efficiency of the system via lens of system latency. In the poposed heteogeneous DSS, each data class can possibly have diffeent job aival ate, job size and its data can be stoed with a diffeent fault-toleance equiement by coding it with appopiate ðn; kþ MDS code. In ode to evaluate the impact of vaious paametes of DSS on its enegy efficiency, we defined a data thoughput based enegy efficiency metic fo any given scheduling policy. We analytically established that the enegy efficiency of DSS is invesely elated to the system latency fo a special case. This motivated us to futhe investigate the impact of vaious paametes on the elationship between the latency and enegy efficiency of the DSS. Theefoe, using a queuing-theoetic appoach, we obtained bounds on the aveage latency fo FCFS, peemptive and non-peemptive pioity queuing policies. We veified the accuacy of the bounds fo diffeent settings of system paametes. The bounds, in geneal, ae tight at high values of sevice ate, m and low values of k. We also noted that the poposed lowe bounds ae tighte than a naive lowe bound that follows diectly fom the wok in [17]. Using simulations, we investigate the elationship between aveage latency of data classes and enegy efficiency of DSS unde vaious setting of system paametes. We found that inceasing the coding ate educes the netwok bandwidth but inceases latency and deceases enegy efficiency. We also found that thee exists an optimal numbe of seves which maximizes enegy efficiency and esults in nea minimal latency. We obseved that fo heavy-tailed sevice distibution (which is the case fo pactical DSS), thee exists an optimal code-ate that yield maximum enegy efficiency and minimum latency. Lastly, we studied the impact of sending edundant equests on the aveage latency of that data class and the enegy efficiency of DSS. We noted that inceasing edundancy fo a data class helps to educe its aveage latency and as a consequence, the oveall latency deceases and enegy efficiency of DSS inceases. ACKNOWLEDGMENTS Pats of this wok wee pesented at the Globecom 2014 confeence [1].

232 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2017 REFERENCES [1] A. Kuma, R. Tandon, and T. Clancy, On the latency of heteogeneous MDS queue, in Poc. IEEE Global Commun. Conf., Dec. 2014, pp. 2375 2380. [2] H. Weathespoon and J. Kubiatowicz, Easue coding vs. eplication: A quantitative compaison, in Poc. Int. Wokshop Pee-to- Pee Syst., 2002, pp. 328 338. [3] (2010, Jul.). Colossus, successo to google file system [Online]. Available: http://goo.gl/cuxcsm [4] Saving capacity with HDFS RAID. (2014, Jun.) [Online]. Available: http://goo.gl/p5usvs [5] C. Huang, H. Simitci, Y. Xu, et al., Easue coding in windows Azue stoage, in Poc. USENIX Conf. Annu. Techn. Conf., 2012, p. 2. [6] (2014, Feb.). Cisco visual netwoking index: Global mobile data taffic foecast update, 20132018 [Online]. Available: http://goo. gl/ulxroo [7] Y. Svedlik. (2011, Sep.) Global data cente enegy use to gow by 19% in 2012 [Online]. Available: http://goo.gl/ck1txb [8] D. Hanik, D. Nao, and I. Segall, Low powe mode in cloud stoage systems, in Poc. IEEE Int. Symp. Paallel Distib. Pocess., May 2009, pp. 1 8. [9] D. Colaelli and D. Gunwald, Massive aays of idle disks fo stoage achives, in Poc. ACM/IEEE Conf. Supecomput., 2002, pp. 1 11. [10] A. Vema, R. Kolle, L. Useche, and R. Rangaswami, SRCMap: Enegy popotional stoage using dynamic consolidation, in Poc. 8th USENIX Conf. File Stoage Technol., 2010, p. 20. [11] H. Jo, Y. Kwon, H. Kim, E. Seo, J. Lee, and S. Maeng, SSD-HDDhybid vitual disk in consolidated envionments, in Poc. Int. Conf. Paallel Pocess., 2010, pp. 375 384. [12] D. Chen, E. Henis, R. I. Kat, et al., Usage centic geen pefomance indicatos, SIGMETRICS Pefom. Eval. Rev., vol. 39, no. 3, pp. 92 96, Dec. 2011. [13] G. Schulz, Measuement, metics, and management of IT esouces, in The Geen and Vitual Data Cente. New Yok, NY, USA: CRC/Auebach, 2009. [14] J. Levandoski, P.-A. Lason, and R. Stoica, Identifying hot and cold data in main-memoy databases, in Poc. IEEE Int. Conf. Data Eng., Ap. 2013, pp. 26 37. [15] D. Gibson. (2012). Is you data hot, wam, o cold? [Online]. Available: http://ibmdatamag.com/2012/06/is-you-big-datahot-wam-o-cold/ [16] R. D. Stong, Low-latency techniques fo impoving system enegy efficiency, Ph.D. dissetation, Univ. Califonia, San Diego, CA, USA, 2013. [17] G. Joshi, Y. Liu, and E. Soljanin, On the delay-stoage tade-off in content download fom coded distibuted stoage systems, IEEE J. Sel. Aeas Commun., vol. 32, no. 5, pp. 989 997, May 2014. [18] K. Shvachko, H. Kuang, S. Radia, and R. Chansle, The Hadoop distibuted file system, in Poc. IEEE Symp. Mass Stoage Syst. Technol., 2010, pp. 1 10. [19] M. Conway, A multipocesso system design, in Poc. AFIPS Fall Joint Comput. Conf., 1963, pp. 139 146. [20] E. W. Dijksta, Coopeating sequential pocesses, in Pogamming Languages, 1968, pp. 43 112. [21] L. Huang, S. Pawa, H. Zhang, and K. Ramchandan, Codes can educe queueing delay in data centes, in Poc. IEEE Int. Symp. Inf. Theoy, 2012, pp. 2766 2770. [22] N. B. Shah, K. Lee, and K. Ramchandan, The MDS queue, axiv, vol. abs/1211.5405, 2012. [23] N. Shah, K. Lee, and K. Ramchandan, When do edundant equests educe latency? in Poc. Annu. Alleton Conf. Commun., Contol, Comput., Oct. 2013, pp. 731 738. [24] (2014, Jun.). How AWS picing woks [Online]. Available: http:// media.amazonwebsevices.com/aws\_picing\_oveview.pdf [25] Google cloud stoage - picing [Online]. Available: https://cloud. google.com/stoage/docs/stoage-classes, May 2010. [26] J. Plank. (2013, Dec.). Easue codes fo stoage systems [Online]. Available: https://www.usenix.og/system/files/login/aticles/ 10_plank-online.pdf [27] N. Cao, S. Yu, Z. Yang, W. Lou, and Y. Hou, Lt Codes-based secue and eliable cloud stoage sevice, in Poc. IEEE INFOCOM, Ma. 2012, pp. 693 701. [28] S. Aly, Z. Kong, and E. Soljanin, Rapto codes based distibuted stoage algoithms fo wieless senso netwoks, in Poc. IEEE Int. Symp. Inf. Theoy, Jul. 2008, pp. 2051 2055. [29] S. Chen, Y. Sun, U. Kozat, L. Huang, P. Sinha, G. Liang, X. Liu, and N. Shoff, When queueing meets coding: Optimal-latency data etieving scheme in stoage clouds, in Poc. IEEE INFOCOM, Ap. 2014, pp. 1042 1050. [30] G. Liang and U. Kozat, Use of easue code fo low latency cloud stoage, in Poc. 52nd Annu. Alleton Conf. Commun., Contol, Comput., Sep. 2014, pp. 576 581. [31] G. Liang and U. Kozat, Fast cloud: Pushing the envelope on delay pefomance of cloud stoage with coding, IEEE/ACM Tans. Netw., vol. 22, no. 6, pp. 2012 2025, Dec. 2014. [32] G. Liang and U. Kozat, Tofec: Achieving optimal thoughputdelay tade-off of cloud stoage using easue codes, in Poc. IEEE INFOCOM, Ap. 2014, pp. 826 834. [33] Y. Xiang, T. Lan, V. Aggawal, and Y. F. R. Chen, Joint latency and cost optimization fo easuecoded data cente stoage, SIG- METRICS Pefom. Eval. Rev., vol. 42, no. 2, pp. 3 14, Sep. 2014. [34] L. Baoso and U. Holzle, The case fo enegy-popotional computing, Compute, vol. 40, no. 12, pp. 33 37, Dec. 2007. [35] D. Snowdon, S. Ruocco, and G. Heise, Powe management and dynamic voltage scaling: Myths and facts, in Poc. of Wokshop Powe Awae Real-Time Comput., Sep. 2005. [36] L. L. Andew, M. Lin, and A. Wieman, Optimality, fainess, and obustness in speed scaling designs, SIGMETRICS Pefom. Eval. Rev., vol. 38, no. 1, pp. 37 48, Jun. 2010. [37] D. Meisne, B. T. Gold, and T. F. Wenisch, Powenap: Eliminating seve idle powe, SIGARCH Comput. Achit. News, vol. 37, no. 1, pp. 205 216, Ma. 2009. [38] Y. Liu, S. Dape, and N. S. Kim, Queuing theoetic analysis of powe-pefomance tadeoff in powe-efficient computing, in Poc. Conf. Inf. Sci. Syst., Ma. 2013, pp. 1 6. [39] Y. Liu, S. Dape, and N. S. Kim, Sleepscale: Runtime joint speed scaling and sleep states management fo powe efficient data centes, in Poc. IEEE Int. Symp. Comput. Achit., Jun. 2014, pp. 313 324. [40] D. Betsekas and R. Gallage, Delay models in data netwoks, in Data Netwoks, 2nd Ed. Uppe Saddle Rive, NJ, USA: Pentice- Hall, 1992, pp. 203 206. [41] H. C. Tijms, A Fist Couse in Stochastic Models. New Yok, NY, USA: Wiley, 2003. [42] E. Vaki, A. Mechant, and H. Chen, The M/M/1 fok-join queue with vaiable sub-tasks [Online]. Available: http://citeseex.ist. psu.edu/viewdoc/summay?doi=10.1.1.100.3062, Ma. 2006. [43] S. Ross, A Fist Couse in Pobability. Englewood Cliffs, NJ, USA: Pentice-Hall, 2002. [44] A. Kuma, R. Tandon, and T. C. Clancy. (2015). On the latency and enegy efficiency of easue-coded cloud stoage systems. axiv, vol. abs/1405.2833v2 [Online]. Available: http://axiv.og/abs/ 1405.2833v2 [45] M. Covella and A. Bestavos, Self-similaity in wold wide web taffic: Evidence and possible causes, IEEE/ACM Tans. Netw., vol. 5, no. 6, pp. 835 846, Dec. 1997. [46] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On powe-law elationships of the intenet topology, SIGCOMM Comput. Commun. Rev., vol. 29, no. 4, pp. 251 262, Aug. 1999. Akshay Kuma (S 11) eceived the BTech degee in electical engineeing fom the Indian Institute of Technology, Guwahati (IIT Guwahati) in May 2010, and the MS degee in electical engineeing fom Viginia Tech in Novembe 2012. He is cuently woking towad the PhD degee in the Badley Depatment of Electical and Compute Engineeing, Viginia Tech. His eseach inteests include modeling and analysis of distibuted stoage systems. He is a student membe of the IEEE.

KUMAR ET AL.: ON THE LATENCY AND ENERGY EFFICIENCY OF DISTRIBUTED STORAGE SYSTEMS 233 Ravi Tandon (S 03-M 09) eceived the BTech degee in electical engineeing fom the Indian Institute of Technology, Kanpu (IIT Kanpu) in May 2004, and the PhD degee in electical and compute engineeing fom the Univesity of Mayland, College Pak (UMCP) in June 2010. Fom July 2010 to July 2012, he was a postdoctoal eseach associate at Pinceton Univesity. Since July 2012, he has been with Viginia Tech, whee cuently, he is a eseach assistant pofesso in the Discovey Analytics Cente and the Depatment of Compute Science. His eseach inteests ae in the aeas of netwok infomation theoy fo wieless netwoks, infomation theoetic secuity, machine leaning and cloud stoage systems. He eceived the Best Pape Awad at the Communication Theoy symposium at the 2011 IEEE Global Communications Confeence. He is a membe of the IEEE. T. Chales Clancy (S 02-M 06-SM 10) eceived the BS degee in compute engineeing fom the Rose-Hulman Institute of Technology, the MS degee in electical engineeing fom the Univesity of Illinois, and the PhD degee in compute science fom the Univesity of Mayland. He is an associate pofesso of electical and compute engineeing at Viginia Tech and the diecto in the Hume Cente fo National Secuity and Technology. Pio to joining Viginia Tech in 2010, he seved as a senio eseache at the Laboatoy fo Telecommunications Sciences, a defense eseach lab at the Univesity of Mayland, whee he led eseach pogams in softwae-defined and cognitive adio. He has moe than 100 pee-eviewed technical publications. His cuent eseach inteests include cognitive communications and spectum secuity. He is a senio membe of the IEEE. " Fo moe infomation on this o any othe computing topic, please visit ou Digital Libay at www.compute.og/publications/dlib.