RobinHood: Tail Latency-Aware Caching Dynamically Reallocating from Cache-Rich to Cache-Poor

Size: px

Start display at page:

Download "RobinHood: Tail Latency-Aware Caching Dynamically Reallocating from Cache-Rich to Cache-Poor"

Madeleine Peters
5 years ago
Views:

1 RobinHood: Tail Latency-Aware Caching Dynamically Reallocating from -Rich to -Poor Daniel S. Berger (CMU) Joint work with: Benjamin Berg (CMU), Timothy Zhu (PennState), Siddhartha Sen (Microsoft Research), Mor Harchol-Balter (CMU) To appear at USENIX OSDI (October 2018). Stanford Platform Lab Seminar, 10/2/18.

2 Microsoft Web Architecture User request Backend queries Ads Recom. Products Request latency Aggregation server Request must wait for last query! Goal: minimize 99-th percentile request latency (P99) 1

3 What Causes High P99 Request Latency? User request Observations at xbox.com (3/2018): Aggregation server Backend queries Ads Recom. Products Better load balancing? Elastically scale backends? Already implemented! 2

4 What Else Can We Do? User request Observations for xbox.com (3/2018): Aggregation server Backend queries Ads Recom. Products Aggregation Currently shared among queries to all backends Can we use the aggregation cache to reduce P99 request latency? 3

5 Belief: No 90% 1ms B 10% 100ms P99 [ms] Can We Use Caching to Reduce the P99? Most hit ratios Hit Ratio [%] Caching layers do not directly address tail latency, aside from configurations where the entire working set can reside in a cache. State-of-the-art caching systems focus on hit ratio, fairness - not the P99 4

6 90% 1ms Belief: No B But: latency is not a constant 10% 100ms P99 [ms] Can We Use Caching to Reduce the P99? Hit Ratio [%] 90% 10% 50ms 100ms Caching can reduce P99 request latency! Effectiveness in Microsoft s architecture? 5

7 Effectiveness Of Caching at Microsoft User request Aggregation server Observations for xbox.com (3/2018): During load spike: Backend queries Ads Recom. Product s Outsized impact of small reductions in backend load Balance load: steal from cache-rich give to cache-poor 6

8 Our Experimental Prototype t s f o s o r Mic n o i t i by e h Part c a nc o i t a g aggre em t s y s nd backe Dep loya ble o shel n off f sof -the twar e sta ck RobinHood Caching System Min by imize dyn req u par a titi mical est P9 on size ly adju 9 stin s g ds, n e k bac # n i le rs b e a v l r a e Sc ns o i t a reg g g a # 7

9 Challenges in Minimizing the Request P99 Reallocate from -Rich to -Poor Definition of poor? 1) -Poor = High Query Rate? 8

10 Challenges in Minimizing the Request P99 Reallocate from -Rich to -Poor Definition of poor? 2) -Poor = High Query P99? User request 100 queries in parallel User requests 99.5% 0.5% High latency Query latency insufficient: need to find cause of request P99 9

11 Basic RobinHood Algorithm Find the backend causing high request P99 Challenges: - Not a single cause - Slow to adapt 1. Sort all request latencies: P0 P99 P Determine who blocked P99 request (= on critical path) Green blocked Consider a neighborhood of the P99 3. Allocate cache space to blocking backend 10

12 Refined RobinHood Algorithm Find the backend causing high request P99 1. Sort all request latencies: P0 P99 P S = { requests in P99 neighborhood } Challenges: - Not a single cause - Slow to adapt 3. Determine who blocked requests in S (= on critical path) Green blocked Consider a neighborhood of the P99 4. Allocate in proportion to request blocking count (RBC) in S 11

13 Dynamic Reallocation with RobinHood Record request latencies Calculate RBC (steps 1-3) Take 1% cache space from every partition. Reallocate in proportion to RBC (step 4) Per request: - latency blocking backend Record request latencies Δ seconds Δ seconds Time 12

14 RobinHood Architecture Aggregation RH-control Aggregation server Backends - need support for dynamic resizing - e.g., memcached (off-the-shelf version 1.5) RobinHood Controller - not on request path - lightweight python - computes RBC - runs allocation algorithm - controls cache partitioning 13

15 RobinHood Architecture Production system: Ag. servers RH-control / Ag. server RH-stats Distributed RobinHood: - Pooled measurements Local measurements Ag. servers RH-control RH-control RH-control - Increase #tail data points - Stream to/pull from central buffer (RH-stats) - Just a buffer (15s state) Backends Constraints on Δ: Δ = 5 seconds - Sufficient # tail data points - reallocation delay - Local decisions - Based on allocation speed - Can differ across servers 14

16 Experimental Setup Request generator Replay production trace For 4 hours, 200k queries/second (peak: ~500k queries / second) 32 GB cache size RH-stats 16 threads, 8 Gbit/s network 20 backend clusters RH-control Ag. servers Backends MySQL (I/O Bound) RH-control RH-control up to 8 servers each Emulate query latency spikes Matrix Multiply (CPU Bound) K-V Store (CPU Bound) A B C D 15

17 Evaluation Results: P99 Request Latency [our proposal] MS Production System [OneRF] Maximize Overall Hit Ratio [Cliffhanger, NSDI 16] Fairness Between Partitions [FairRide, NSDI 16] Request P99 Latency [ms] RobinHood Balance Query Latencies [Hyberbolic, ATC 17] (P99 variant) 16

18 Evaluation Results: RBC Balance RBC = request blocking count Intuition: balanced no bottleneck RobinHood > MS Prod Maximize Hit Ratio Fairness RobinHood balances RBCs by trading off the performance of low-rbc backends Balance Latencies T=150min T=100min T=50min 17

19 Conclusions Is it possible to use caches to improve the request P99? Yes! SLO violations down to 0.3%, from 30%. Use cache as load balancers: RBC load metric. Feasibility in production systems? Yes! Built using off-the-shelf software stack. Works orthogonally to existing load balancing and auto scaling techniques. In progress. Is this the optimal solution? End of this project? No! There s a lot to do, e.g., other workloads and architectures. Next steps: performance model optimality, robustness (Background in modeling, e.g., Performance 14, Sigmetrics 14, 18) 18

RobinHood: Tail Latency Aware Caching Dynamic Reallocation from Cache-Rich to Cache-Poor

RobinHood: Tail Latency Aware Caching Dynamic Reallocation from Cache-Rich to Cache-Poor Daniel S. Berger and Benjamin Berg, Carnegie Mellon University; Timothy Zhu, Pennsylvania State University; Siddhartha