Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Similar documents
Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

Scalable Enterprise Networks with Inexpensive Switches

From architecture to algorithms: Lessons from the FAWN project

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

Big data, little time. Scale-out data serving. Scale-out data serving. Highly skewed key popularity

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

NetCache: Balancing Key-Value Stores with Fast In-Network Caching

NetChain: Scale-Free Sub-RTT Coordination

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Stateless Network Functions:

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Correlation based File Prefetching Approach for Hadoop

OpenFlow Software Switch & Intel DPDK. performance analysis

LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Facebook Tao Distributed Data Store for the Social Graph

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

System and Algorithmic Adaptation for Flash

An Efficient Memory-Mapped Key-Value Store for Flash Storage

Large-scale Caching. CS6450: Distributed Systems Lecture 18. Ryan Stutsman

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Next Generation Architecture for NVM Express SSD

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS

Decision Forest: A Scalable Architecture for Flexible Flow Matching on FPGA

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

Building Efficient and Reliable Software-Defined Networks. Naga Katta

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Memory-Based Cloud Architectures

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

Ben Walker Data Center Group Intel Corporation

Designing Next Generation FS for NVMe and NVMe-oF

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

CSE 124: Networked Services Lecture-17

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Aerospike Scales with Google Cloud Platform

Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding. Shenglong Li, Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

Lecture 11: Packet forwarding

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Accelerating Ceph with Flash and High Speed Networks

Throughput & Latency Control in Ethernet Backplane Interconnects. Manoj Wadekar Gary McAlpine. Intel

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Design and Implementation of Virtual TAP for Software-Defined Networks

Understanding Data Locality in VMware vsan First Published On: Last Updated On:

XPU A Programmable FPGA Accelerator for Diverse Workloads

DistCache: Provable Load Balancing for Large- Scale Storage Systems with Distributed Caching

S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University

Architecture of a Real-Time Operational DBMS

Learning with Purpose

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

Intel Open Network Platform. Recep Ozdag Intel Networking Division May 8, 2013

DevoFlow: Scaling Flow Management for High Performance Networks

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

PacketShader: A GPU-Accelerated Software Router

Main-Memory Databases 1 / 25

Centec V350 Product Introduction. Centec Networks (Suzhou) Co. Ltd R

CS 350 Winter 2011 Current Topics: Virtual Machines + Solid State Drives

arxiv: v1 [cs.dc] 24 Jan 2019

CellSDN: Software-Defined Cellular Core networks

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017

HYRISE In-Memory Storage Engine

When, Where & Why to Use NoSQL?

High-Performance Key-Value Store on OpenSHMEM

Scalable Causal Consistency for Wide-Area Storage with COPS

SOFTWARE DEFINED NETWORKS. Jonathan Chu Muhammad Salman Malik

HashKV: Enabling Efficient Updates in KV Storage via Hashing

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

Understanding Data Locality in VMware Virtual SAN

Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand

A Comparison of Performance and Accuracy of Measurement Algorithms in Software

ADDENDUM TO: BENCHMARK TESTING RESULTS UNPARALLELED SCALABILITY OF ITRON ENTERPRISE EDITION ON SQL SERVER

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04

MOHA: Many-Task Computing Framework on Hadoop

Intelligence Defined Network (IDN)

Evolution of Rack Scale Architecture Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

A New Key-Value Data Store For Heterogeneous Storage Architecture

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

Speeding Up Cloud/Server Applications Using Flash Memory

Enabling Efficient and Scalable Zero-Trust Security

DPDK Summit China 2017

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Performance Analysis for Crawling

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Flashed-Optimized VPSA. Always Aligned with your Changing World

Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference

HCI: Hyper-Converged Infrastructure

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Transcription:

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman

Goal: fast and cost-effective key-value store Target: cluster-level storage for modern cloud services Massive number of small key-value objects Highly skewed and dynamic workloads Aggressive latency and throughput performance goals This talk: scale-out flash-based storage for this setting 2

Key challenge: dynamic load balancing time t time t+x How to handle the highly skewed and dynamic workloads? Today s solution: data migration / replication system overhead consistency challenge 3

Fast, small cache can ensure load balancing Need only cache O(nlogn) items to provide good load balance, where n is the number of backend nodes. [Fan, SOCC 11] less-hot queries, better-balanced loads hottest queries flash-based backend nodes frontend cache node E.g., 100 backends with hundreds of billions of items + cache with 10,000 entries How to efficiently serve queries with cache and backend nodes? How to efficiently update the cache under dynamic workloads? 4

High overheads with traditional caching architectures clients clients backends Look-aside cache miss backends cache (load balancer) Look-through Cache must process all queries and handle misses In our case, cache is small and hit ratio could be low Throughput is bounded by the cache I/O High latency for queries for uncached keys 5

SwitchKV: content-aware routing clients controller OpenFlow Switches cache backends Switches route requests directly to the appropriate nodes Latency can be minimized for all queries Throughput can scale out with # of backends Availability would not be affected by cache node failures 6

Exploit SDN and switch hardware Clients encode key information in packet headers Encode key hash in MAC for read queries Encode destination backend ID in IP for all queries Switches maintain forwarding rules and route query packets Packet In L2 table exact match rule per cached key miss hit Packet Out to the cache TCAM table match rule per physical machine Packet Out 7

Keep cache and switch rules updated New challenges for cache updates Only cache the hottest O(nlogn) items Limited switch rule update rate Goal: react quickly to workload changes with minimal updates switch rule update top-k <key, load> list (periodic) controller cache fetch <key, value> backend bursty hot <key, value> (instant) 8

Evaluation How well does a fast small cache improve the system load balance and throughput? Does SwitchKV improve system performance compared to traditional architectures? Can SwitchKV react quickly to workload changes? 9

Evaluation Platform Reference backend 1 Gb link Intel Atom C2750 processor Intel DC P3600 PCIe-based SSD RocksDB with 120 million 1KB objects 99.4K queries per second 10

Evaluation Platform Xeon Server 1 Client 40 GbE 40 GbE 40 GbE Backends Pica8 P-3922 (OVS 2.3) Xeon Server 2 Cache 40 GbE 40 GbE 40 GbE 40 GbE 40 GbE Backends Xeon Server 3 Xeon Server 4 Ryu # of backends 128 backend tput 100 KQPS keyspace size 10 billion key size 16 bytes value size 128 bytes query skewness Zipf 0.99 cache size 10,000 entries Default settings in this talk Use Intel DPDK to efficiently transfer packets and modify headers Client adjusts its sending rate, keep loss rate between 0.5% and 1% 11

Throughput with and without caching Backends aggregate (without cache) Backends aggregate (with cache) Cache (10,000 entries) 12

Throughput vs. Number of backends backend rate limit: 50KQPS, cache rate limit: 5MQPS 13

End-to-end latency vs. Throughput 14

Throughput with workload changes Traditional cache update method Periodic top-k updates only Periodic top-k updates + instant bursty hot key updates Make 200 cold keys become the hottest keys every 10 seconds 15

Conclusion SwitchKV: high-performance and cost-efficient KV store Fast, small cache guarantees backend load balancing Efficient content-aware OpenFlow switching Low (tail) latency Scalable throughput High availability Keep high performance under highly dynamic workloads 16