Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP Response 1. HTTP Request Web Server 6. (key, data) 5. Data 3. Miss(key) 2. Get(key) 4. DB Lookup(key) Memcache DB DB 6/17/14 DB 2

Memcached at Scale Databases are hard to scale Memcached is easy o Facebook has 10,000+ memcached servers Partition data and divide key space among all nodes o Simple data model. Stupid nodes. Web application must track where each object is stored o Or use a proxy (load-balancer) like moxi LB Clients Web Servers DB Memcached nodes 6/17/14 3

Scales easily, but loads are imbalanced Random placement Skewed popularity distributions Load on Wikipedia s memcached servers 6/17/14 4

Motivation Cache clusters are typically provisioned to support peak load, in terms of request processing capabilities and cache storage size o This worst-case provisioning can be very expensive o This does not take advantage of dynamic resource allocation and VM provisioning capabilities There can be great diversity in both the workloads and types of nodes o This makes cluster management difficult Solution: large-scale self-managing caches 6/17/14 5

Contributions A high speed memcached load balancer o can forward millions of requests per second A hot spot detection algorithm o uses stream data mining techniques to efficiently determine the hottest keys A two-level key mapping system o combines consistent hashing with a lookup table to flexibly control the placement and replication level of hot keys An automated server management system o determines the number of (types of) servers in the caching cluster 6/17/14 6

Outline Background and Motivation Workload & Server Heterogeneity Memcached Load Balancer Design Hotspot Detection Algorithm Conclusions 6/17/14 7

Workload Heterogeneity Varied key popularity depending on applications Read/write rates, churn rates, costs for cache misses, quality-of-service demands Unpopular region (65%) Hash Space (0-2 32 ) Popular region (35%) Based on Wikipedia 2008 database dump and access trace 0 50 100 150 200 250 300 Time (5 hours) 6/17/14 8

Memcached Cluster Grouping Traditionally, many applications share memcached servers Today, to handle different applications, memcacheds are clustered manually Web Server Memcached Cluster Application Server Memcached Servers Applications Streaming Static File Media Server DB Query Need smarter way to manage cache clusters 6/17/14 9

Server Heterogeneity Software and hardware can be diverse Different memory cache softwares out there Many researches prototyped memcached on different HW architectures o GP-GPU [Hetherington:ISPASS12] o RDMA (remote direct memory access) [Stuedi:ATC12] o FPGA [Maysam:Computer_Architecture_Letter13] o Intel DPDK [Lim:NSDI14] Dynamically adapt to workload heterogeneity and server heterogeneity 6/17/14 10

Outline Background and Motivation Workload & Server Heterogeneity Memcached Load Balancer Design Hotspot Detection Algorithm Conclusions 6/17/14 11

Self-Managing Cache Clustering System Cache clustering system considers heterogeneous workloads and servers o Identifies server capabilities to adjust key space o Finds hot items based on the item frequency to handle them differently Dynamically increase/decrease the number of memcached servers Uses two-level key mapping system o Consistent hashing: applies to warm/cold items (LRU) o Forward table: applies to hot items 6/17/14 12

Automated Load-Balancer Architecture Clients Server Manager Hot Spot Detector Key Replicator k1 k2 k3 Fwd Table Automated Load Balancer Key Director S1, S14 S3, S6 S2,S8,S9 Consistent Hashing Ring S1 S1 Heterogeneous Cache Servers Use UDP protocol to redirect the packets directly from memcached servers to clients (lower load in load-balancer) Hot Spot Detector runs a streaming algorithm to find hot items Server Manager manages the memcached servers Key Replicator manages the key replication Key Director manages forwarding table and consistent hashing ring 6/17/14 13

Outline Background and Motivation Workload & Server Heterogeneity Memcached Load Balancer Design Hotspot Detection Algorithm Conclusions 6/17/14 14

Lossy Counting based Hot Spot Detection Lossy counting algorithm is a one-pass deterministic streaming algorithm with user defined parameters o Support threshold s Given the stream length N, return at least sn o Error rate ε (ε << s) o No item with true frequency < (s - ε)n is returned (no false negative) o Memory size is not monotonically increasing Parameters: ε, s Data <key, val> N Counting sn Frequency Estimate Tracking 6/17/14 15

Find Hot Items and Groups of Hot Items Estimated frequency (i.e., request rate) seen by the top keys The goal is two-fold o Find hot items o Find groups of items that can balance the loads across servers 25 Frequency (x1000) 20 15 10 5 0 0 20 40 60 80 100 120 Item Number 6/17/14 16

Adaptively Sizing Number of Hot Items Given the number of servers, and found hot spot groups, the load-balancer decides to increase/decrease the number of keys In a scheduling window, we adjust the support parameter s to adjust the number of hot items Number of Hot Items 250 200 150 100 50 Autonomic Lossy Counting Generic Lossy Counting Target Level 0 10 20 30 40 50 60 70 Stream Length N (x1000) 6/17/14 17

Conclusion Summary o Self-managing cache clustering system Workload and server heterogeneity o Adaptive hot item groups Number of keys and groups based on servers o High speed load-balancer (~10 million requests per sec / single core) Leverage high-performing NICs and processors Ongoing work o Optimize the number of servers and replicas of hot items (cloud environment) 6/17/14 18