D 2 P: A Distributed Deadline Propagation Approach to Tolerate Long-Tail Latency in Datacenters

Size: px
Start display at page:

Download "D 2 P: A Distributed Deadline Propagation Approach to Tolerate Long-Tail Latency in Datacenters"

Transcription

1 D 2 P: A Distributed Deadline Propagation Approach to Tolerate Long-Tail Latency in Datacenters Rui Ren, Jiuyue Ma, Xiufeng Sui, Yungang Bao Institute of Computing Technology, Chinese Academy Sciences {renrui, suixiufeng, baoyg}@ict.ac.cn, majiuyue@ncic.ac.cn Abstract We propose a Distributed Deadline Propagation (D 2 P) approach for datacenter applications to tolerate latency variability. The key idea of D 2 P is to allow local nodes to perceive global deadline information and to propagate the information among distributed nodes. Local nodes can leverage the information to do scheduling and adjust processing speed to reduce latency variability. Preliminary experimental results show that D 2 P has the potential of reducing the long-tail latency in datacenters by leveraging propagated deadline information on the local n- odes. 1 Introduction Time is money. For Internet companies, the response time of online services is strongly related to user experience, which is a key factor for revenue. For instance, Amazon found that every 100ms increase in load time of Amazon.com decreases sales by 1% [13]; Google s advertising revenues decline by 20% when the response time increases from 0.4s to 0.9s [2]; and Bing s revenue generated by per user declines 4.3% when response time increases from 50ms to 2000ms [20]. For the sake of users, datacenter operators usually overprovision resources to guarantee QoS of these latency-critical applications, even if doing so lowers resource utilization. For instance, Google [5] reports that the CPU utilization of 20,000 servers averaged about 30% during January to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. APSys 14, June 25-26, 2014, Beijing, China Copyright 2014 ACM /14/06 $ March, 2013, in a typical datacenter for online services. In contrast, batch-workload datacenters averaged 75% u- tilization during the same period [5]. Resource sharing is an effective approach to improve datacenter resource utilization but also raises unpredictable performance variability due to interference. Even worse, the variability within a server can be amplified by scale in datacenters [8, 9], i.e., long-tail phenomenon, severely degrades the responsiveness of latency-sensitive services. Therefore datacenter operators usually have to make tough tradeoffs between application QoS and datacenter utilization: either disregarding QoS to maximize datacenter utilization, or disallowing the colocation of latency critical online applications with other applications to guarantee QoS. Since latency variability is inevitable, and is impractical to be fully eliminated in shared environments, tolerating latency variability (i.e., tail-tolerated computing [9]) has become an important issue in datacenters. There are some studies on latency analysis [13, 14, 25], reducing latency variability for large fan-out search engine systems through adopting selective replication, backup requests techniques [8], and reducing network latency [4, 23, 24, 26]. However, these techniques are unsuitable for sequential/dependent applications. In this paper, we leverage a stage-service model to formularize datacenter applications, and then propose a Distributed Deadline Propagation (D 2 P) approach to tolerate latency variability for latency critical applications. The idea of D 2 P is inspired by the traffic light system in Manhattan, New York City where one can enjoy a chain of green lights after stop at a red light. In datacenters, latency variability in a node (an analogy to red lights) alters the real-time laxity of subsequent steps, however, which are unaware of the varying real-time requirements. We further implement D 2 P in an open-sourced distributed service framework Dubbo [1]. Our preliminary experimental results show that D 2 P is able to reduce latency variability.

2 Probability Density (x10 ) Equake,SPEC CPU2000 (10000 Runs) Sample Mean Variability parameters: avg = 2.21, max=2.3, min=2.18 Variability range (%) = ( )/2.21 * 100% = 0.5% Execution Time (us) x10 5 Number of Instances Latency Profile of a Service Variability parameters: avg = 5, max>300, min<1 Variability range (%) > (300-1)/5 * 100% = 600% Latency (ms) Percentage (%) stage-1 stage-2 stage-3 stage-4 stage Response Time (ms) Figure 1: The example of Variability range Analysis. Left figure is from [6], right figure is from [13]. Figure 2: Latency variability is significantly amplified as the depth of service stage increases. 2 Datacenter Applications and Variability 2.1 Datacenter Application Patterns Basically, there are three patterns: (1) Partition/Aggregate pattern. This pattern is used by applications such as web search, and data processing services like MapReduce, Dryad, which can scale out by partitioning tasks into many sub-tasks and assigning them to worker machines (possibly at multiple layers) [26]. Usually the fan-out can be 100X and even 1000X. (2) Sequential/Dependent pattern. In this pattern, requests are processed sequentially and subsequent requests depend on the results of previous ones. Some online shop services are typical applications. (3) Hybrid pattern with Sequential/Dependent and Partition/Aggregate. Actually hybrid pattern is more typical in datacenters. For example, in Facebook [17], a single user request can result in hundreds of memcached requests that can form a dependency graph. When a dependency graph is executed in a datacenter, requests without dependencies can be sent concurrently. Since there is work on Partition/Aggregate pattern [8, 9], this work is focused on the latter two patterns. 2.2 Latency Variability and Long Tail In both single machine and datacenter environments, performance variability or latency variability is inevitable [6, 13], due to resource sharing, queuing, background maintenance activities, etc. In datacenters, however, the variability within a server can be amplified when applications being scaled out [8, 9]. For example, Figure 1 illustrates the comparison of the performance variability of execution time distribution of Equake in a single machine [6] and the performance variability of Google backend services in a datacenter [13]. Here we define the variability range as the value of ((Max Min)/Average) 100%. According to the definition, the variability range in a single machine is less than 1%, but the variability range in datacenter environment can be more than 600%. The reason of large performance variability is that a user request is divided into a number of sub-requests that are assigned to hundreds and even thousands of machines. The response time of the request obviously depends on the lowest machine. Assume only 1% of subrequests on each machine suffer slow process time, e.g., more than one second, if a user request needs to be processed on 100 such machine in parallel, then the latency of 63% of user requests will exceed one second [9]. This is the Long-Tail phenomenon. The long-tail phenomenon exists not only in Partition/Aggregate pattern but also in Sequential/Dependent pattern as well as Hybrid pattern. For example, we implement a simple five-stage service and calculate the response time of each stage. Figure 2 illustrates that the variance of response time in this five-stage service is exacerbated by 51.2%. In particular, the 90%-percentile and the 95%-percentile latency are increased by 2.6X and 2.4X respectively from stage-1 to stage-5. 3 Distributed Deadline Propagation (D 2 P) In this work we focus on tolerating long tail in Sequential/Dependent pattern and Hybrid pattern. 3.1 Stage-Service Model We define Stage-Service Model (SSM) to describe the service containing the two patterns. In SSM, applications are composed of multiple service stages connected by request queues, as shown in Figure 3. Below are the parameters in SSM: N: Depth of service stage S i : The i-th service stage in the process of application, 1 i N L i : Processing time of the i-th service stage. If there are fan-out requests at some service stage in Hybrid pattern, L i includes the processing time of fan-out requests. P i : Total processing latency after the i-th service stage, P i = P i 1 + L i

3 Request Stage #1 For each service stage, it can use the time information to do scheduling, resource allocation or other techniques to accelerate those requests with urgent deadline. P 1 P 2 Stage #2 Stage #2 3.3 D 2 P-Enabled Distributed Framework D 2 P is a design methodology and can be implemented in various distributed systems. Figure 4 illustrates a typical D 2 P-enabled multiple-phase distributed service framework that consists of three major steps: P N Stage #N Figure 3: Stage-Service model. There are N stages of the Service. At the end of each stage, we can get the processing latency of each service stage P i. 3.2 D 2 P based on Stage-Service Model The Long-Tail phenomenon mainly results from inappropriately overlaying the service latency of multiple stages and missing global processing time information. For example, in Figure 3, if both service latencies of Stage 2 and Stage 3 for a request fall into tail regions, the final response time of the request will be exacerbated. To address the long-tail problem, we propose a Distributed Deadline Propagation (D 2 P) approach to dynamically update global deadline information of requests and propagate the deadline information in datacenters during requests whole lifecycle. Here, given a user request, deadline is measured as the difference between elapsed time and expected response time. Upon a user request arriving a datacenter, its Init deadline is initialized as the same value of expected response time, and then the deadline information will be dynamically updated according to formula (1). Init deadline = expected response time deadline si = Init deadline elapsed time si (1 i N,elapsed time si = P i ) (1) At the same time, we define the percentage o f elapsed time according to the formula (2), which can be used to describe the request s priority. percentage elapsed time si = elapsed time s i Init deadline (1 i N,elapsed time si = P i ) (2) The time information of each request (deadline, percentage elapsed time) is propagated between stages. (1) Assign deadline information to a request; (2) Use deadline information to schedule requests and/or adjust requests process speed; (3) Update and propagate deadline information. Step 1: When a new request arrives in front-end servers of the framework, it will be appended with deadline field (deadline, percentage elapsed time), and assigned with initial value. Once the deadline information is assigned, it will be propagated along with the original request during the whole lifecycle of the request. In our experiments, we predefine the expected response time because usually there is a cut-off latency (e.g., 200ms) for latency-sensitive service, and set the initial value of percentage elapsed time as zero. An alternative way is to leverage machine learning approach to automatically learn expected response time via profiling data. Step 2: When a node receives a modified request with additional deadline information, it first extracts the information and computes deadline laxity, which is used to determine the scheduling priority and the processing priority of the request. In particular, we provide APIs to allow programmers to use deadline information to control the request. In our experiments, we implement the APIs by attaching a Thread Local Storage (TLS) to each thread, and extract deadline information to TLS for referencing by programmers. There are various techniques to control request processing such as scheduling and acceleration. For example, the deadline laxity can be used for real time scheduling algorithms e.g., Least Laxity First scheduling (LLF) [3]. Step 3: When the request is done at a service stage, the framework will record the elapsed time and calculate a new value of deadline and percentage elapsed time using formula (1)(2), which will be filled into the deadline information field of the request. Then the new (deadline, percentage elapsed time) will be sent to next service stage.

4 Front End Stage #1 Stage #2 Stage #N User Request ❶ Manually Defined / Profiled Deadline Constrain Request (data, deadline, elapsed_time% ) ❷ Scheduling by deadline field Thread Pool Save deadline into TLS ❸ Generate new deadline field TLS Request send to next service stage Figure 4: D2P-enabled framework. 1 Assign deadline information to a request; 2 Call APIs to use deadline information; 3 Update and propagate deadline. 4 Leveraging D 2 P to Tolerate Variability There are mainly three kinds of techniques to reduce contention of unmanaged shared resources and to enforce QoS requirements. (1) Scheduling. Application-level scheduling [10, 27] and distributed real-time scheduling can co-locate applications with different resource requirements and schedule urgent requests to be processed timely. (2) Resource allocation adjustment. This is another effective approach, such as machine-level resources (VMs) allocated by hypervisors, OS-level resources (e.g., I/O bandwidth) allocated by OS containers and architecture-level resources (e.g., shared cache) allocated by page-coloring [15, 16]. (3) Levering tradeoff between precision and execution time [19]. For Internet online applications, the precision of advertisement is iteratively updated. Thus, more time applications spend, more precise ads users can get. This tradeoff can be used for performance adjustment for user requests according to deadline requirement, as shown in Figure 5(c). Since D2P allows local nodes to perceive request s global time information, there are still many open problems in how to leverage the information. 5 Evaluation and Discussion 5.1 Experimental Setup and Evaluation We implement a D 2 P-enabled framework based on the Dubbo distributed service framework [1] developed by Alibaba. Dubbo was the key part of Alibaba s SOA solution and deployed in the whole alibaba.com, serving 2,000+ services with more than 3 billion invocations every day. It is also used by tens of famous web service providers in China. In Dubbo framework, an application is divided into clients and servers that communicate with each other via RPC. We implement the D 2 P mechanism by hooking client and server invocations. As illustrated in Figure 4, 1 After a client invocation, the framework packs the deadline information into RPC request. 2 Before a server invocation, the framework extracts the deadline information, and copies it to TLS of a target service thread as explained in Section After the server finishes processing a request, the framework accumulates processing latency to update deadline information which stored in TLS, re-packs it into a RPC response, and propagate the updated deadline information back to the client through the RPC callback. Then, we implement a priority-based scheduling strategy at server side that using the global dynamic deadline information (see in Section 3.2) as priority, in order to evaluate the effect of D 2 P-enabled deadline-aware scheduling. Specifically, (i) the requests are issued from client (front end) to servers at service stage S 1, and the requests are generated at 100 requests/second. At the same time, we assume the request arrival rate and service time meeting normal distribution and simulate this process in server, simultaneously assume there is only one bottleneck node whose queue buffer is almost full. (ii) We use predefined deadline method (See Step 1 in Section 3.3) to assign a cut-off response time as Init deadline, here the value is set as 200ms. (iii) After requests have been processed, we record the request processing time as e- lapsed time and compute the processing latency distribution of service stage S 1, update the deadline field of requests using formula (1)(2). (iv) Then, the updated deadline information will be sent to the next service stage S 2, and percentage elapsed time s1 can be set as the priority of requests at service stage S 2 : if the percentage elapsed time s1 of request is larger, the priority of request is higher, which means that the request is more urgent for scheduling. (v) When the requests are completed at this service stage, the deadline information will be updated and sent to next service stage, and so on. In this process, the request scheduler of each service stage S i uses the value of percentage elapsed time si 1 as request priority. Figure 5(a) illustrates that the latency variability is amplified by 2X (from 50ms to 110ms) with FI- FO scheduling strategy. This amplification effect is reduced by about 10% with D 2 P-enabled deadlineaware scheduling strategy and the standard deviation of

5 Percentage (%) Original FIFO DP-enabled Percentage (%) Original DP-enabled Response Time (ms) Response Time (ms) Server Work Time (ms) (a) (b) (c) Probablity of best ad Figure 5: (a) Effect of D2P-enabled deadline-aware scheduling. Compared with FIFO scheduling, D2P-enabled scheduling reduces 22.5% response time variability in terms of standard deviation. (b) Effect of D2P-enabled performance adjustment. On simulated performance adjustable server leveraging the tradeoff in Figure 5(c), D2P-enabled processing can reduce 37.8% response time variability in in terms of standard deviation. (c) Trade-off between execution time and ad precision [19]. response time is reduced by 22.5%, from to These results can show that D 2 P Approach is useful in distributed system for tolerating latency variability via real-time scheduling. Figure 5(b) illustrates the effect of performance adjustment policy according to Figure 5(c). The response time distribution of the five-stage service application is shown in the Figure. With the help of the performance adjustment policy, standard deviation of response time is reduced from 9.28 to 5.77, improved about 37.8%. 5.2 Discussions Apart from the Stage-Service Model in distributed systems discussed in this paper, there are many other situations where D 2 P approach can be employed. All these scenarios can be abstracted as multiple phase resource sharing, such as memory hierarchy or network switch. In addition, how to reduce the latency variability of Partition/Aggregate application is also an important problems, which need further discussion and is our next work. 6 Related Work Latency in Long Tail: Datacenters run many latencycritical applications that are extremely important for the revenue of internet companies. How to reduce long tail delay has become an important issue in datacenters. There is some work focusing on reducing network latency via removing network congestion and prioritizing flows [26, 4, 23, 24]. Some other work analyzes or diagnoses bottlenecks in large scale systems through a number of techniques, such as model and attribute analysis [13, 14] to understand latency variations, using improved BSP model [7] and co-scheduling method like Bobtail [25] to avoid long tails. Dean and Barroso introduced that Google uses technology such as containers isolation technology, priority management, backup request [8, 9], but Google s approaches are only suitable for Partition/Aggregate pattern. In this work, D 2 P focus on the applications containing Sequential/Dependent pattern and Hybrid pattern that can be represented as Stage-Service Model. Besides datacenter applications, the mobile app marketplace also concerns the latency of interactive applications. For example, Microsoft uses AppInsight [18] and Timecard [19] to monitor mobile app performance and control user-perceived delays in server-based mobile applications. By the contrast, D 2 P focuses on datacenter traffics and leverages a distributed framework Dubbo. QoS Guarantee: Hardware platforms that enforce QoS priorities are proposed in [5, 11], such as CQoS [11] and the QoS-enabled memory architecture for CM- P platforms [12]. Software solutions like compilation techniques can also be used to improve performance and enforce QoS, for example, Tang et al. proposed QoS- Compile [21] and ReQoS [22]. These QoS-guarantee techniques improve only intra-node QoS, while D 2 P aims to tolerate latency variability across nodes in distributed environments. 7 Conclusions In this paper, we propose distributed deadline propagation (D 2 P) approach to tolerate latency variability for the applications containing Sequential/Dependent pattern. We design and implement the idea in an extensive used distributed framework. Experimental results show that D 2 P is able to tolerate variability. However, there are still many open problems such as various scenarios, hardware acceleration and architecture supports.

6 References [1] Dubbo distributed service framework. alibabatech.com/wiki/display/dubbo/home. [2] Google s marissa mayer: Speed wins. com/blog/btl/googles-marissa-mayer-speed-wins/ [3] Least laxity first. slack_time_scheduling. [4] ALIZADEH, M., GREENBERG, A., MALTZ, D. A., PADHYE, J., PATEL, P., PRABHAKAR, B., SENGUPTA, S., AND SRID- HARAN, M. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM 2010 Conference (New York, NY, USA, 2010), SIGCOMM 10, ACM, pp [5] BARROSO, L. A., CLIDARAS, J., AND HOLZLE, U. The datacenter as a computer: An introduction to the design of warehousescale machines. Synthesis Lectures on Computer Architecture 8, 3 (2013), [6] CHEN, T., CHEN, Y., GUO, Q., TEMAM, O., WU, Y., AND HU, W. Statistical performance comparisons of computers. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on (2012), IEEE, pp [7] CIPAR, J., HO, Q., KIM, J. K., LEE, S., GANGER, G. R., GIBSON, G., KEETON, K., AND XING, E. Solving the straggler problem with bounded staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (Berkeley, CA, USA, 2013), HotOS 13, USENIX Association, pp [8] DEAN, J. Achieving rapid response times in large online services. In Berkeley AMPLab Cloud Seminar (2012). [9] DEAN, J., AND BARROSO, L. A. The tail at scale. Communications of the ACM 56, 2 (2013), [10] DELIMITROU, C., AND KOZYRAKIS, C. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2013), ASPLOS 13, ACM, pp [11] IYER, R. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th annual international conference on Supercomputing (2004), ACM, pp [12] IYER, R., ZHAO, L., GUO, F., ILLIKKAL, R., MAKINENI, S., NEWELL, D., SOLIHIN, Y., HSU, L., AND REINHARDT, S. QoS policies and architecture for cache/memory in CMP platforms. In ACM SIGMETRICS Performance Evaluation Review (2007), vol. 35, ACM, pp [13] KRUSHEVSKAJA, D., AND SANDLER, M. Understanding latency variations of black box services. In Proceedings of the 22Nd International Conference on World Wide Web (Republic and Canton of Geneva, Switzerland, 2013), WWW 13, International World Wide Web Conferences Steering Committee, p- p [14] KRZYSZTOF OSTROWSKI, GIDEON MANN, AND MARK SAN- DLER. Diagnosing latency in multi-tier black-box services. In 5th Workshop on Large Scale Distributed Systems and Middleware (LADIS 2011) (2011). [15] LIN, J., LU, Q., DING, X., ZHANG, Z., ZHANG, X., AND SA- DAYAPPAN, P. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In High Performance Computer Architecture, HPCA IEEE 14th International Symposium on (2008), IEEE, pp [16] LIU, L., CUI, Z., XING, M., BAO, Y., CHEN, M., AND WU, C. A software memory partition approach for eliminating bank-level interference in multicore systems. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques (2012), ACM, pp [17] NISHTALA, R., FUGAL, H., GRIMM, S., KWIATKOWSKI, M., LEE, H., LI, H. C., MCELROY, R., PALECZNY, M., PEEK, D., SAAB, P., STAFFORD, D., TUNG, T., AND VENKATARAMANI, V. Scaling memcache at facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2013), nsdi 13, USENIX Association, pp [18] RAVINDRANATH, L., PADHYE, J., AGARWAL, S., MAHAJAN, R., OBERMILLER, I., AND SHAYANDEH, S. AppInsight: mobile app performance monitoring in the wild. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (Berkeley, CA, USA, 2012), OSDI 12, USENIX Association, pp [19] RAVINDRANATH, L., PADHYE, J., MAHAJAN, R., AND BAL- AKRISHNAN, H. Timecard: Controlling user-perceived delays in server-based mobile applications. In Proceedings of the Twenty- Fourth ACM Symposium on Operating Systems Principles (New York, NY, USA, 2013), SOSP 13, ACM, pp [20] SCHURMAN, E., AND J. BRUTLAG. The user and business impact of server delays. [21] TANG, L., MARS, J., AND SOFFA, M. L. Compiling for niceness: Mitigating contention for QoS in warehouse scale computers. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (New York, NY, USA, 2012), CGO 12, ACM, pp [22] TANG, L., MARS, J., WANG, W., DEY, T., AND SOFFA, M. L. ReQoS: reactive Static/Dynamic compilation for QoS in warehouse scale computers. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2013), ASPLOS 13, ACM, pp [23] VAMANAN, B., HASAN, J., AND VIJAYKUMAR, T. Deadlineaware datacenter TCP (D2TCP). In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (New York, NY, USA, 2012), SIGCOMM 12, ACM, pp [24] WILSON, C., BALLANI, H., KARAGIANNIS, T., AND ROWTRON, A. Better never than late: Meeting deadlines in datacenter networks. In Proceedings of the ACM SIGCOMM 2011 Conference (New York, NY, USA, 2011), SIGCOMM 11, ACM, pp [25] XU, Y., MUSGRAVE, Z., NOBLE, B., AND BAILEY, M. Bobtail: Avoiding long tails in the cloud. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2013), nsdi 13, USENIX Association, pp [26] ZATS, D., DAS, T., MOHAN, P., BORTHAKUR, D., AND KATZ, R. DeTail: reducing the flow completion time tail in datacenter networks. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (New York, NY, USA, 2012), SIGCOMM 12, ACM, pp [27] ZHURAVLEV, S., BLAGODUROV, S., AND FEDOROVA, A. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2010), ASPLOS XV, ACM, pp

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu Presenter: Guoxin Liu Ph.D. Department of Electrical and Computer Engineering, Clemson University, Clemson, USA Computer

More information

Coping with network performance

Coping with network performance Coping with network performance Ankit Singla ETH Zürich P. Brighten Godfrey UIUC The end to end principle END-TO-END ARGUMENTS IN SYSTEM DESIGN J.H. Saltzer, D.P. Reed and D.D. Clark* M.I.T. Laboratory

More information

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What

More information

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Presenter: Haiying Shen Associate professor *Department of Electrical and Computer Engineering, Clemson University,

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Microsoft Research Stanford University 1

More information

Cloud e Datacenter Networking

Cloud e Datacenter Networking Cloud e Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica Prof.

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this

More information

DAQ: Deadline-Aware Queue Scheme for Scheduling Service Flows in Data Centers

DAQ: Deadline-Aware Queue Scheme for Scheduling Service Flows in Data Centers DAQ: Deadline-Aware Queue Scheme for Scheduling Service Flows in Data Centers Cong Ding and Roberto Rojas-Cessa Abstract We propose a scheme to schedule the transmission of data center traffic to guarantee

More information

A Mechanism Achieving Low Latency for Wireless Datacenter Applications

A Mechanism Achieving Low Latency for Wireless Datacenter Applications Computer Science and Information Systems 13(2):639 658 DOI: 10.2298/CSIS160301020H A Mechanism Achieving Low Latency for Wireless Datacenter Applications Tao Huang 1,2, Jiao Zhang 1, and Yunjie Liu 2 1

More information

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz

DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks. David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz DeTail Reducing the Tail of Flow Completion Times in Datacenter Networks David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, Randy Katz 1 A Typical Facebook Page Modern pages have many components

More information

Data Center TCP (DCTCP)

Data Center TCP (DCTCP) Data Center Packet Transport Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Cloud computing

More information

Lecture 15: Datacenter TCP"

Lecture 15: Datacenter TCP Lecture 15: Datacenter TCP" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mohammad Alizadeh Lecture 15 Overview" Datacenter workload discussion DC-TCP Overview 2 Datacenter Review"

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

Data Centers and Cloud Computing

Data Centers and Cloud Computing Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks

Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks Veselin Rakocevic School of Engineering and Mathematical Sciences City University London EC1V HB, UK V.Rakocevic@city.ac.uk

More information

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers 2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong

More information

On the Effectiveness of CoDel in Data Centers

On the Effectiveness of CoDel in Data Centers On the Effectiveness of in Data Centers Saad Naveed Ismail, Hasnain Ali Pirzada, Ihsan Ayyub Qazi Computer Science Department, LUMS Email: {14155,15161,ihsan.qazi}@lums.edu.pk Abstract Large-scale data

More information

Better Never than Late: Meeting Deadlines in Datacenter Networks

Better Never than Late: Meeting Deadlines in Datacenter Networks Better Never than Late: Meeting Deadlines in Datacenter Networks Christo Wilson, Hitesh Ballani, Thomas Karagiannis, Ant Rowstron Microsoft Research, Cambridge User-facing online services Two common underlying

More information

This is a repository copy of M21TCP: Overcoming TCP Incast Congestion in Data Centres.

This is a repository copy of M21TCP: Overcoming TCP Incast Congestion in Data Centres. This is a repository copy of M21TCP: Overcoming TCP Incast Congestion in Data Centres. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/89460/ Version: Accepted Version Proceedings

More information

IX: A Protected Dataplane Operating System for High Throughput and Low Latency

IX: A Protected Dataplane Operating System for High Throughput and Low Latency IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges

More information

Tortoise vs. hare: a case for slow and steady retrieval of large files

Tortoise vs. hare: a case for slow and steady retrieval of large files Tortoise vs. hare: a case for slow and steady retrieval of large files Abstract Large file transfers impact system performance at all levels of a network along the data path from source to destination.

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

Per-Packet Load Balancing in Data Center Networks

Per-Packet Load Balancing in Data Center Networks Per-Packet Load Balancing in Data Center Networks Yagiz Kaymak and Roberto Rojas-Cessa Abstract In this paper, we evaluate the performance of perpacket load in data center networks (DCNs). Throughput and

More information

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM

Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Migration Based Page Caching Algorithm for a Hybrid Main Memory of DRAM and PRAM Hyunchul Seok Daejeon, Korea hcseok@core.kaist.ac.kr Youngwoo Park Daejeon, Korea ywpark@core.kaist.ac.kr Kyu Ho Park Deajeon,

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Towards Makespan Minimization Task Allocation in Data Centers

Towards Makespan Minimization Task Allocation in Data Centers Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,

More information

TCP Incast problem Existing proposals

TCP Incast problem Existing proposals TCP Incast problem & Existing proposals Outline The TCP Incast problem Existing proposals to TCP Incast deadline-agnostic Deadline-Aware Datacenter TCP deadline-aware Picasso Art is TLA 1. Deadline = 250ms

More information

QoS-Aware Admission Control in Heterogeneous Datacenters

QoS-Aware Admission Control in Heterogeneous Datacenters QoS-Aware Admission Control in Heterogeneous Datacenters Christina Delimitrou, Nick Bambos and Christos Kozyrakis Stanford University ICAC June 28 th 2013 Cloud DC Scheduling Workloads DC Scheduler S S

More information

CORAL: A Multi-Core Lock-Free Rate Limiting Framework

CORAL: A Multi-Core Lock-Free Rate Limiting Framework : A Multi-Core Lock-Free Rate Limiting Framework Zhe Fu,, Zhi Liu,, Jiaqi Gao,, Wenzhe Zhou, Wei Xu, and Jun Li, Department of Automation, Tsinghua University, China Research Institute of Information Technology,

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

15-744: Computer Networking. Data Center Networking II

15-744: Computer Networking. Data Center Networking II 15-744: Computer Networking Data Center Networking II Overview Data Center Topology Scheduling Data Center Packet Scheduling 2 Current solutions for increasing data center network bandwidth FatTree BCube

More information

Real-Time Internet of Things

Real-Time Internet of Things Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.

More information

TAPS: Software Defined Task-level Deadline-aware Preemptive Flow scheduling in Data Centers

TAPS: Software Defined Task-level Deadline-aware Preemptive Flow scheduling in Data Centers 25 44th International Conference on Parallel Processing TAPS: Software Defined Task-level Deadline-aware Preemptive Flow scheduling in Data Centers Lili Liu, Dan Li, Jianping Wu Tsinghua National Laboratory

More information

Improving Multipath TCP for Latency Sensitive Flows in the Cloud

Improving Multipath TCP for Latency Sensitive Flows in the Cloud 2016 5th IEEE International Conference on Cloud Networking Improving Multipath TCP for Latency Sensitive Flows in the Cloud Wei Wang,Liang Zhou,Yi Sun Institute of Computing Technology, CAS, University

More information

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell

More information

Quality-Assured Cloud Bandwidth Auto-Scaling for Video-on-Demand Applications

Quality-Assured Cloud Bandwidth Auto-Scaling for Video-on-Demand Applications Quality-Assured Cloud Bandwidth Auto-Scaling for Video-on-Demand Applications Di Niu, Hong Xu, Baochun Li University of Toronto Shuqiao Zhao UUSee, Inc., Beijing, China 1 Applications in the Cloud WWW

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

Low Latency Datacenter Networking: A Short Survey

Low Latency Datacenter Networking: A Short Survey Low Latency Datacenter Networking: A Short Survey Shuhao Liu, Hong Xu, Zhiping Cai Department of Computer Science, City University of Hong Kong College of Computer, National University of Defence Technology

More information

A Framework for Providing Quality of Service in Chip Multi-Processors

A Framework for Providing Quality of Service in Chip Multi-Processors A Framework for Providing Quality of Service in Chip Multi-Processors Fei Guo 1, Yan Solihin 1, Li Zhao 2, Ravishankar Iyer 2 1 North Carolina State University 2 Intel Corporation The 40th Annual IEEE/ACM

More information

MicroFuge: A Middleware Approach to Providing Performance Isolation in Cloud Storage Systems

MicroFuge: A Middleware Approach to Providing Performance Isolation in Cloud Storage Systems 1 MicroFuge: A Middleware Approach to Providing Performance Isolation in Cloud Storage Systems Akshay Singh, Xu Cui, Benjamin Cassell, Bernard Wong and Khuzaima Daudjee July 3, 2014 2 Storage Resources

More information

A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness

A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness A Hardware Evaluation of Cache Partitioning to Improve Utilization and Energy-Efficiency while Preserving Responsiveness Henry Cook, Miquel Moreto, Sarah Bird, Kanh Dao, David Patterson, Krste Asanovic

More information

QUT Digital Repository:

QUT Digital Repository: QUT Digital Repository: http://eprints.qut.edu.au/ Gui, Li and Tian, Yu-Chu and Fidge, Colin J. (2007) Performance Evaluation of IEEE 802.11 Wireless Networks for Real-time Networked Control Systems. In

More information

Non-preemptive Coflow Scheduling and Routing

Non-preemptive Coflow Scheduling and Routing Non-preemptive Coflow Scheduling and Routing Ruozhou Yu, Guoliang Xue, Xiang Zhang, Jian Tang Abstract As more and more data-intensive applications have been moved to the cloud, the cloud network has become

More information

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017 CSE 124: THE DATACENTER AS A COMPUTER George Porter November 20 and 22, 2017 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative

More information

Data Center TCP(DCTCP)

Data Center TCP(DCTCP) Data Center TCP(DCTCP) Mohammad Alizadeh * +, Albert Greenberg *, David A. Maltz *, Jitendra Padhye *, Parveen Patel *, Balaji Prabhakar +, Sudipta Sengupta *, Murari Sridharan * * + Microsoft Research

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Pocket: Elastic Ephemeral Storage for Serverless Analytics Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1

More information

No Tradeoff Low Latency + High Efficiency

No Tradeoff Low Latency + High Efficiency No Tradeoff Low Latency + High Efficiency Christos Kozyrakis http://mast.stanford.edu Latency-critical Applications A growing class of online workloads Search, social networking, software-as-service (SaaS),

More information

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 )

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) Abstract Traditionally, Content Delivery Networks (CDNs)

More information

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager

D E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager 1 T HE D E N A L I N E X T - G E N E R A T I O N H I G H - D E N S I T Y S T O R A G E I N T E R F A C E Laura Caulfield Senior Software Engineer Arie van der Hoeven Principal Program Manager Outline Technology

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

Demand-Aware Flow Allocation in Data Center Networks

Demand-Aware Flow Allocation in Data Center Networks Demand-Aware Flow Allocation in Data Center Networks Dmitriy Kuptsov Aalto University/HIIT Espoo, Finland dmitriy.kuptsov@hiit.fi Boris Nechaev Aalto University/HIIT Espoo, Finland boris.nechaev@hiit.fi

More information

Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing

Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing Application-Specific Configuration Selection in the Cloud: Impact of Provider Policy and Potential of Systematic Testing Mohammad Hajjat +, Ruiqi Liu*, Yiyang Chang +, T.S. Eugene Ng*, Sanjay Rao + + Purdue

More information

VARIABILITY IN OPERATING SYSTEMS

VARIABILITY IN OPERATING SYSTEMS VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021

More information

Towards Makespan Minimization Task Allocation in Data Centers

Towards Makespan Minimization Task Allocation in Data Centers Towards Makespan Minimization Task Allocation in Data Centers Kangkang Li, Ziqi Wan, Jie Wu, and Adam Blaisse Department of Computer and Information Sciences Temple University Philadelphia, Pennsylvania,

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University

Information-Agnostic Flow Scheduling for Commodity Data Centers. Kai Chen SING Group, CSE Department, HKUST May 16, Stanford University Information-Agnostic Flow Scheduling for Commodity Data Centers Kai Chen SING Group, CSE Department, HKUST May 16, 2016 @ Stanford University 1 SING Testbed Cluster Electrical Packet Switch, 1G (x10) Electrical

More information

Video Diffusion: A Routing Failure Resilient, Multi-Path Mechanism to Improve Wireless Video Transport

Video Diffusion: A Routing Failure Resilient, Multi-Path Mechanism to Improve Wireless Video Transport Video Diffusion: A Routing Failure Resilient, Multi-Path Mechanism to Improve Wireless Video Transport Jinsuo Zhang Yahoo! Inc. 701 First Avenue Sunnyvale, CA 94089 azhang@yahoo-inc.com Sumi Helal Dept

More information

Department of Information Technology Sri Venkateshwara College of Engineering, Chennai, India. 1 2

Department of Information Technology Sri Venkateshwara College of Engineering, Chennai, India. 1 2 Energy-Aware Scheduling Using Workload Consolidation Techniques in Cloud Environment 1 Sridharshini V, 2 V.M.Sivagami 1 PG Scholar, 2 Associate Professor Department of Information Technology Sri Venkateshwara

More information

TCP Nicer: Support for Hierarchical Background Transfers

TCP Nicer: Support for Hierarchical Background Transfers TCP Nicer: Support for Hierarchical Background Transfers Neil Alldrin and Alvin AuYoung Department of Computer Science University of California at San Diego La Jolla, CA 9237 Email: nalldrin, alvina @cs.ucsd.edu

More information

Performance Gain with Variable Chunk Size in GFS-like File Systems

Performance Gain with Variable Chunk Size in GFS-like File Systems Journal of Computational Information Systems4:3(2008) 1077-1084 Available at http://www.jofci.org Performance Gain with Variable Chunk Size in GFS-like File Systems Zhifeng YANG, Qichen TU, Kai FAN, Lei

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

Challenges in Service-Oriented Networking

Challenges in Service-Oriented Networking Challenges in Bob Callaway North Carolina State University Department of Electrical and Computer Engineering Ph.D Qualifying Examination April 14, 2006 Advisory Committee: Dr. Michael Devetsikiotis, Dr.

More information

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments Chung Hwan Kim, Junghwan Rhee *, Kyu Hyung Lee +, Xiangyu Zhang, Dongyan Xu * + Performance Problems Performance

More information

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan

Coflow. Recent Advances and What s Next? Mosharaf Chowdhury. University of Michigan Coflow Recent Advances and What s Next? Mosharaf Chowdhury University of Michigan Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow Networking Open Source Apache Spark Open

More information

THE DATACENTER AS A COMPUTER AND COURSE REVIEW

THE DATACENTER AS A COMPUTER AND COURSE REVIEW THE DATACENTER A A COMPUTER AND COURE REVIEW George Porter June 8, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-hareAlike 3.0 Unported (CC BY-NC-A 3.0) Creative Commons

More information

Computer Architecture Lecture 24: Memory Scheduling

Computer Architecture Lecture 24: Memory Scheduling 18-447 Computer Architecture Lecture 24: Memory Scheduling Prof. Onur Mutlu Presented by Justin Meza Carnegie Mellon University Spring 2014, 3/31/2014 Last Two Lectures Main Memory Organization and DRAM

More information

RCD: Rapid Close to Deadline Scheduling for Datacenter Networks

RCD: Rapid Close to Deadline Scheduling for Datacenter Networks RCD: Rapid Close to Deadline Scheduling for Datacenter Networks Mohammad Noormohammadpour 1, Cauligi S. Raghavendra 1, Sriram Rao 2, Asad M. Madni 3 1 Ming Hsieh Department of Electrical Engineering, University

More information

Information-Agnostic Flow Scheduling for Commodity Data Centers

Information-Agnostic Flow Scheduling for Commodity Data Centers Information-Agnostic Flow Scheduling for Commodity Data Centers Wei Bai, Li Chen, Kai Chen, Dongsu Han (KAIST), Chen Tian (NJU), Hao Wang Sing Group @ Hong Kong University of Science and Technology USENIX

More information

qtlb: Looking inside the Look-aside buffer

qtlb: Looking inside the Look-aside buffer qtlb: Looking inside the Look-aside buffer Omesh Tickoo 1, Hari Kannan 2, Vineet Chadha 3, Ramesh Illikkal 1, Ravi Iyer 1, and Donald Newell 1 1 Intel Corporation, 2111 NE 25th Ave., Hillsboro OR, USA,

More information

Research and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen SUN

Research and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen SUN 2016 International Conference on Wireless Communication and Network Engineering (WCNE 2016) ISBN: 978-1-60595-403-5 Research and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen

More information

Deconstructing Datacenter Packet Transport

Deconstructing Datacenter Packet Transport Deconstructing Datacenter Packet Transport Mohammad Alizadeh, Shuang Yang, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker Stanford University U.C. Berkeley / ICSI {alizade, shyang, skatti,

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Minimum-cost Cloud Storage Service Across Multiple Cloud Providers

Minimum-cost Cloud Storage Service Across Multiple Cloud Providers Minimum-cost Cloud Storage Service Across Multiple Cloud Providers Guoxin Liu and Haiying Shen Department of Electrical and Computer Engineering, Clemson University, Clemson, USA 1 Outline Introduction

More information

Limitations of Load Balancing Mechanisms for N-Tier Systems in the Presence of Millibottlenecks

Limitations of Load Balancing Mechanisms for N-Tier Systems in the Presence of Millibottlenecks Limitations of Load Balancing Mechanisms for N-Tier Systems in the Presence of Millibottlenecks Tao Zhu 1, Jack Li 1, Josh Kimball 1, Junhee Park 1, Chien-An Lai 1, Calton Pu 1 and Qingyang Wang 2 1 Computer

More information

Asynchronous Method Calls White Paper VERSION Copyright 2014 Jade Software Corporation Limited. All rights reserved.

Asynchronous Method Calls White Paper VERSION Copyright 2014 Jade Software Corporation Limited. All rights reserved. VERSION 7.0.10 Copyright 2014 Jade Software Corporation Limited. All rights reserved. Jade Software Corporation Limited cannot accept any financial or other responsibilities that may be the result of your

More information

Adaptive replica consistency policy for Kafka

Adaptive replica consistency policy for Kafka Adaptive replica consistency policy for Kafka Zonghuai Guo 1,2,*, Shiwang Ding 1,2 1 Chongqing University of Posts and Telecommunications, 400065, Nan'an District, Chongqing, P.R.China 2 Chongqing Mobile

More information

Fastpass A Centralized Zero-Queue Datacenter Network

Fastpass A Centralized Zero-Queue Datacenter Network Fastpass A Centralized Zero-Queue Datacenter Network Jonathan Perry Amy Ousterhout Hari Balakrishnan Devavrat Shah Hans Fugal Ideal datacenter network properties No current design satisfies all these properties

More information

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK) Cache Management for TelcoCDNs Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK) d.tuncer@ee.ucl.ac.uk 06/01/2017 Agenda 1. Internet traffic: trends and evolution

More information

Efficient On-Demand Operations in Distributed Infrastructures

Efficient On-Demand Operations in Distributed Infrastructures Efficient On-Demand Operations in Distributed Infrastructures Steve Ko and Indranil Gupta Distributed Protocols Research Group University of Illinois at Urbana-Champaign 2 One-Line Summary We need to design

More information

Network Function Virtualization. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

Network Function Virtualization. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli Network Function Virtualization CSU CS557, Spring 2018 Instructor: Lorenzo De Carli Managing middleboxes Middlebox manifesto (ref. previous lecture) pointed out the need for automated middlebox management

More information

Coflow. Big Data. Data-Parallel Applications. Big Datacenters for Massive Parallelism. Recent Advances and What s Next?

Coflow. Big Data. Data-Parallel Applications. Big Datacenters for Massive Parallelism. Recent Advances and What s Next? Big Data Coflow The volume of data businesses want to make sense of is increasing Increasing variety of sources Recent Advances and What s Next? Web, mobile, wearables, vehicles, scientific, Cheaper disks,

More information

Data Center Performance

Data Center Performance Data Center Performance George Porter CSE 124 Feb 15, 2017 *Includes material taken from Barroso et al., 2013, UCSD 222a, and Cedric Lam and Hong Liu (Google) Part 1: Partitioning work across many servers

More information

G-NET: Effective GPU Sharing In NFV Systems

G-NET: Effective GPU Sharing In NFV Systems G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science

More information

Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference

Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference Yunqi Zhang, David Meisner, Jason Mars, Lingjia Tang Internet services User interactive applications

More information

MixApart: Decoupled Analytics for Shared Storage Systems

MixApart: Decoupled Analytics for Shared Storage Systems MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto, NetApp Abstract Data analytics and enterprise applications have very

More information

Armon HASHICORP

Armon HASHICORP Nomad Armon Dadgar @armon Distributed Optimistically Concurrent Scheduler Nomad Distributed Optimistically Concurrent Scheduler Nomad Schedulers map a set of work to a set of resources Work (Input) Resources

More information

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud

Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Hybrid Auto-scaling of Multi-tier Web Applications: A Case of Using Amazon Public Cloud Abid Nisar, Waheed Iqbal, Fawaz S. Bokhari, and Faisal Bukhari Punjab University College of Information and Technology,Lahore

More information

ICON: Incast Congestion Control using Packet Pacing in Datacenter Networks

ICON: Incast Congestion Control using Packet Pacing in Datacenter Networks ICON: Incast Congestion Control using Packet Pacing in Datacenter Networks Hamed Rezaei, Hamidreza Almasi, Muhammad Usama Chaudhry, and Balajee Vamanan University of Illinois at Chicago Abstract Datacenters

More information

The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace

The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace The Elasticity and Plasticity in Semi-Containerized Colocating Cloud Workload: a view from Alibaba Trace Qixiao Liu* and Zhibin Yu Shenzhen Institute of Advanced Technology Chinese Academy of Science @SoCC

More information

NaaS Network-as-a-Service in the Cloud

NaaS Network-as-a-Service in the Cloud NaaS Network-as-a-Service in the Cloud joint work with Matteo Migliavacca, Peter Pietzuch, and Alexander L. Wolf costa@imperial.ac.uk Motivation Mismatch between app. abstractions & network How the programmers

More information

A QoS Load Balancing Scheduling Algorithm in Cloud Environment

A QoS Load Balancing Scheduling Algorithm in Cloud Environment A QoS Load Balancing Scheduling Algorithm in Cloud Environment Sana J. Shaikh *1, Prof. S.B.Rathod #2 * Master in Computer Engineering, Computer Department, SAE, Pune University, Pune, India # Master in

More information

Nowadays data-intensive applications play a

Nowadays data-intensive applications play a Journal of Advances in Computer Engineering and Technology, 3(2) 2017 Data Replication-Based Scheduling in Cloud Computing Environment Bahareh Rahmati 1, Amir Masoud Rahmani 2 Received (2016-02-02) Accepted

More information

A General Purpose Queue Architecture for an ATM Switch

A General Purpose Queue Architecture for an ATM Switch Mitsubishi Electric Research Laboratories Cambridge Research Center Technical Report 94-7 September 3, 994 A General Purpose Queue Architecture for an ATM Switch Hugh C. Lauer Abhijit Ghosh Chia Shen Abstract

More information

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 797- flur,chengkokg@ecn.purdue.edu

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Scaling Distributed Machine Learning

Scaling Distributed Machine Learning Scaling Distributed Machine Learning with System and Algorithm Co-design Mu Li Thesis Defense CSD, CMU Feb 2nd, 2017 nx min w f i (w) Distributed systems i=1 Large scale optimization methods Large-scale

More information

TM ALGORITHM TO IMPROVE PERFORMANCE OF OPTICAL BURST SWITCHING (OBS) NETWORKS

TM ALGORITHM TO IMPROVE PERFORMANCE OF OPTICAL BURST SWITCHING (OBS) NETWORKS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 232-7345 TM ALGORITHM TO IMPROVE PERFORMANCE OF OPTICAL BURST SWITCHING (OBS) NETWORKS Reza Poorzare 1 Young Researchers Club,

More information

ISSN: (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information