NetCache: Balancing Key-Value Stores with Fast In-Network Caching
|
|
- Lenard French
- 6 years ago
- Views:
Transcription
1 NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica
2 NetCache is a rack-scale key-value store that leverages in-network data plane caching to achieve billions QPS throughput & ~10 μs latency highly-skewed even under & workloads. rapidly-changing New generation of systems enabled by programmable switches J
3 Goal: fast and cost-efficient rack-scale key-value storage q Store, retrieve, manage key-value objects Critical building block for large-scale cloud services Need to meet aggressive latency and throughput objectives efficiently q Target workloads Small objects Read intensive Highly skewed and dynamic key popularity
4 Key challenge: highly-skewed and rapidly-changing workloads low throughput & high tail latency Load Server Q: How to provide effective dynamic load balancing?
5 Opportunity: fast, small cache can ensure load balancing Cache absorbs hottest queries Balanced load
6 Opportunity: fast, small cache can ensure load balancing [B. Fan et al. SoCC 11, X. Li et al. NSDI 16] Cache O(N log N) hottest items E.g., 10,000 hot objects N: # of servers E.g., 100 backends with 100 billions items Requirement: cache throughput backend aggregate throughput
7 NetCache: towards billions QPS key-value storage rack Cache needs to provide the aggregate throughput of the storage layer flash/disk each: O(100) KQPS total: O(10) MQPS storage layer in-memory each: O(10) MQPS total: O(1) BQPS cache cache in-memory O(10) MQPS cache layer O(1) BQPS
8 NetCache: towards billions QPS key-value storage rack Cache needs to provide the aggregate throughput of the storage layer flash/disk each: O(100) KQPS total: O(10) MQPS storage layer in-memory each: O(10) MQPS total: O(1) BQPS cache cache in-memory O(10) MQPS cache layer in-network O(1) BQPS Small on-chip memory? Only cache O(N log N) small items
9 Key-value caching in network ASIC at line rate?! q How to identify application-level packet fields? q How to store and serve variable-length data? q How to efficiently keep the cache up-to-date?
10 PISA: Protocol Independent Switch Architecture q Programmable Parser Converts packet data into metadata q Programmable Mach-Action Pipeline Operate on metadata and update memory states Match + Action Memory ALU Programmable Parser Programmable Match-Action Pipeline
11 PISA: Protocol Independent Switch Architecture q Programmable Parser Parse custom key-value fields in the packet q Programmable Mach-Action Pipeline Read and update key-value data Provide query statistics for cache updates Match + Action Memory ALU Programmable Parser Programmable Match-Action Pipeline
12 PISA: Protocol Independent Switch Architecture Network Management PCIe Control plane (CPU) Run-time API Network Functions Data plane (ASIC) Match + Action Memory Programmable Match-Action Pipeline Programmable Parser ALU
13 NetCache rack-scale architecture PCIe Network Management Run-time API Network Functions Clients Cache Management Key-Value Cache Query Statistics Top of Rack Switch q Storage Servers Switch data plane Key-value store to serve queries for cached keys Query statistics to enable efficient cache updates q Switch control plane Insert hot items into the cache and evict less popular items Manage memory allocation for on-chip key-value store
14 Data plane query handling Read Query (cache hit) Client 1 Hit Cache Update Stats 2 Server Read Query (cache miss) Client 1 4 Miss Cache Update Stats 2 3 Server Client 1 Write Query Invalidate Cache Stats Server
15 Key-value caching in network ASIC at line rate q How to identify application-level packet fields? q How to store and serve variable-length data? q How to efficiently keep the cache up-to-date?
16 NetCache Packet Format Existing Protocols NetCache Protocol ETH IP TCP/UDP OP SEQ KEY VALUE L2/L3 Routing reserved port # read, write, delete, etc. q Application-layer protocol: compatible with existing L2-L4 layers q Only the top of rack switch needs to parse NetCache fields
17 Key-value caching in network ASIC at line rate q How to identify application-level packet fields? q How to store and serve variable-length data? q How to efficiently keep the cache up-to-date?
18 Key-value store using register array in network ASIC Match pkt.key == A pkt.key == B Action process_array(0) process_array(1) pkt.value: A B action process_array(idx): if pkt.op == read: pkt.value array[idx] elif pkt.op == cache_update: array[idx] pkt.value A B Register Array
19 Variable-length key-value store in network ASIC? Match pkt.key == A pkt.key == B Action process_array(0) process_array(1) pkt.value: A B A B Key Challenges: Register Array q No loop or string due to strict timing requirements q Need to minimize hardware resources consumption Number of table entries Size of action data from each entry Size of intermediate metadata across tables
20 Combine outputs from multiple arrays Lookup Table Match pkt.key == A Action bitmap = 111 index = 0 pkt.value: A0 A1 A2 Bitmap indicates arrays that store the key s value Index indicates slots in the arrays to get the value Minimal hardware resource overhead Value Table 0 Match bitmap[0] == 1 Action process_array_0 (index ) A0 Register Array 0 Value Table 1 Match bitmap[1] == 1 Action process_array_1 (index ) A1 Register Array 1 Value Table 2 Match bitmap[2] == 1 Action process_array_2 (index ) A2 Register Array 2
21 Combine outputs from multiple arrays Lookup Table Match pkt.key == A pkt.key == B pkt.key == C pkt.key == D Action bitmap = 111 index = 0 bitmap = 110 index = 1 bitmap = 010 index = 2 bitmap = 101 index = 2 pkt.value: A0 A1 A2 B0 B1 C0 D0 D1 Value Table 0 Match bitmap[0] == 1 Action process_array_0 (index ) A0 B0 D0 Register Array 0 Value Table 1 Match bitmap[1] == 1 Action process_array_1 (index ) A1 B1 C0 Register Array 1 Value Table 2 Match bitmap[2] == 1 Action process_array_2 (index ) A2 D1 Register Array 2
22 Key-value caching in network ASIC at line rate q How to identify application-level packet fields? q How to store and serve variable-length data? q How to efficiently keep the cache up-to-date?
23 Cache insertion and eviction q Challenge: cache the hottest O(N log N) items with limited insertion rate q Goal: react quickly and effectively to workload changes with minimal updates 1 Data plane reports hot keys Cache Management PCIe Key-Value Cache Tor Switch 3 2 Control plane compares loads of new hot and sampled cached keys 3 Query Statistics Control plane fetches values for keys to be inserted to the cache 4 Control plane inserts and evicts keys Storage Servers
24 Query statistics in the data plane report not cached hot pkt.key Cache Lookup cached Count-Min sketch Bloom filter Per-key counters for each cached item q Cached key: per-key counter array q Uncached key Count-Min sketch: report new hot keys Bloom filter: remove duplicated hot key reports
25 Evaluation q Can NetCache run on programmable switches at line rate? q Can NetCache provide significant overall performance improvements? q Can NetCache efficiently handle workload dynamics?
26 Prototype implementation and experimental setup q Switch P4 program (~2K LOC) Routing: basic L2/L3 routing Key-value cache: 64K items with 16-byte key and up to 128-byte value Evaluation platform: one 6.5Tbps Barefoot Tofino switch q Server 16-core Intel Xeon E5-2630, 128 GB memory, 40Gbps Intel XL710 NIC TommyDS for in-memory key-value store Throughput: 10 MQPS; Latency: 7 us
27 Single switch benchmark 2.5 TKrougKSut (B436) ThroughSut (B436) ue n d ge he m. ne We rpf The boring life of a NetCache switch alue 6ize (Byte) CacKe 6ize 64. Throughput value size.(b)(b) Throughput cache size. (a)(a) Throughput vs.vs. value size. Throughput vs.vs. cache size. Figure Switch microbenchmark (read and update). Figure 9: 9: Switch microbenchmark (read and update).
28 And its not so boring benefits 1 switch storage servers ThroughSuW (BQPS) 1oCDche 1eWCDche(servers) 1eWCDche(cDche) uqiforp zisf-0.9 zisf-0.95 zisf-0.99 WorNloDd DisWribuWioQ 3-10x throughput improvements
29 Impact of workload dynamics hot-in workload (radical change) random workload (moderate change) ThroughSut (0436) average throughsut Ser sec. average throughsut Ser 10 sec TiPe (s) ThroughSut (0436) average throughsut Ser sec. average throughsut Ser 10 sec TiPe (s) Quickly and effectively reacts to a wide range of workload dynamics. (2 physical servers to emulate 128 storage servers, performance scaled down by 64x)
30 NetCache is a rack-scale key-value store that leverages in-network data plane caching to achieve billions QPS throughput & ~10 μs latency highly-skewed even under & workloads. rapidly-changing
31 Conclusion: programmable switches beyond networking q Cloud datacenters are moving towards Rack-scale disaggregated architecture In-memory storage systems Task scheduling at microseconds granularity q Programmable switches can do more than packet forwarding Cross-layer co-design of compute, storage and network stacks Switches help on caching, coordination, scheduling, etc. q New generations of systems enabled by programmable switches J
NetCache: Balancing Key-Value Stores with Fast In-Network Caching
NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica NetCache is a rack-scale key-value
More informationNetChain: Scale-Free Sub-RTT Coordination
NetChain: Scale-Free Sub-RTT Coordination Xin Jin Xiaozhou Li, Haoyu Zhang, Robert Soulé, Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica Conventional wisdom: avoid coordination NetChain: lightning
More informationNetCache: Balancing Key-Value Stores with Fast In-Network Caching
NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin 1, Xiaozhou Li 2, Haoyu Zhang 3, Robert Soulé 2,4, Jeongkeun Lee 2, Nate Foster 2,5, Changhoon Kim 2, Ion Stoica 6 1 Johns Hopkins
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationDistCache: Provable Load Balancing for Large- Scale Storage Systems with Distributed Caching
DistCache: Provable Load Balancing for Large- Scale Storage Systems with Distributed Caching Zaoxing Liu and Zhihao Bai, Johns Hopkins University; Zhenming Liu, College of William and Mary; Xiaozhou Li,
More informationarxiv: v1 [cs.dc] 24 Jan 2019
DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching Zaoxing Liu, Zhihao Bai, Zhenming Liu, Xiaozhou Li, Changhoon Kim, Vladimir Braverman, Xin Jin, Ion Stoica Johns
More informationNetChain: Scale-Free Sub-RTT Coordination
NetChain: Scale-Free Sub-RTT Coordination Xin Jin 1, Xiaozhou Li 2, Haoyu Zhang 3, Robert Soulé 2,4 Jeongkeun Lee 2, Nate Foster 2,5, Changhoon Kim 2, Ion Stoica 6 1 Johns Hopkins University, 2 Barefoot
More informationProgrammable Data Plane at Terabit Speeds
AUGUST 2018 Programmable Data Plane at Terabit Speeds Milad Sharif SOFTWARE ENGINEER PISA: Protocol Independent Switch Architecture PISA Block Diagram Match+Action Stage Memory ALU Programmable Parser
More informationA Comparison of Performance and Accuracy of Measurement Algorithms in Software
A Comparison of Performance and Accuracy of Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref 1, Yang Zhou 2, Tong Yang 2, Minlan Yu 3 Yale University, Barefoot Networks 1, Peking University
More informationPVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim
PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 IPv6
More informationBackend for Software Data Planes
The Case for a Flexible Low-Level Backend for Software Data Planes Sean Choi 1, Xiang Long 2, Muhammad Shahbaz 3, Skip Booth 4, Andy Keep 4, John Marshall 4, Changhoon Kim 5 1 2 3 4 5 Why software data
More informationHigh-Throughput Publish/Subscribe in the Forwarding Plane
1 High-Throughput Publish/Subscribe in the Forwarding Plane Theo Jepsen, Masoud Moushref, Antonio Carzaniga, Nate Foster, Xiaozhou Li, Milad Sharif, Robert Soulé Università della Svizzera italiana (USI)
More informationTowards a Software Defined Data Plane for Datacenters
Towards a Software Defined Data Plane for Datacenters Arvind Krishnamurthy Joint work with: Antoine Kaufmann, Ming Liu, Naveen Sharma Tom Anderson, Kishore Atreya, Changhoon Kim, Jacob Nelson, Simon Peter
More informationHigh-resolution Measurement of Data Center Microbursts
High-resolution Measurement of Data Center Microbursts Qiao Zhang (University of Washington) Vincent Liu (University of Pennsylvania) Hongyi Zeng (Facebook) Arvind Krishnamurthy (University of Washington)
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationCPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces
CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces Zvonimir Z. Bandic, Sr. Director Robert Golla, Sr. Fellow Dejan Vucinic,
More informationArrakis: The Operating System is the Control Plane
Arrakis: The Operating System is the Control Plane Simon Peter, Jialin Li, Irene Zhang, Dan Ports, Doug Woos, Arvind Krishnamurthy, Tom Anderson University of Washington Timothy Roscoe ETH Zurich Building
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationPacket-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel
Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform
More informationCisco Ultra Packet Core High Performance AND Features. Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018
Cisco Ultra Packet Core High Performance AND Features Aeneas Dodd-Noble, Principal Engineer Daniel Walton, Director of Engineering October 18, 2018 The World s Top Networks Rely On Cisco Ultra 90+ 300M
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationInfiniswap. Efficient Memory Disaggregation. Mosharaf Chowdhury. with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin
Infiniswap Efficient Memory Disaggregation Mosharaf Chowdhury with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, and Kang G. Shin Rack-Scale Computing Datacenter-Scale Computing Geo-Distributed Computing Coflow
More informationScaling Hardware Accelerated Network Monitoring to Concurrent and Dynamic Queries with *Flow
Scaling Hardware Accelerated Network Monitoring to Concurrent and Dynamic Queries with *Flow John Sonchack, Oliver Michel, Adam J. Aviv, Eric Keller, Jonathan M. Smith Measuring High Speed Networks 00
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationG-NET: Effective GPU Sharing In NFV Systems
G-NET: Effective Sharing In NFV Systems Kai Zhang*, Bingsheng He^, Jiayu Hu #, Zeke Wang^, Bei Hua #, Jiayi Meng #, Lishan Yang # *Fudan University ^National University of Singapore #University of Science
More informationAbhishek Pandey Aman Chadha Aditya Prakash
Abhishek Pandey Aman Chadha Aditya Prakash System: Building Blocks Motivation: Problem: Determining when to scale down the frequency at runtime is an intricate task. Proposed Solution: Use Machine learning
More informationIntel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment
Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Case Study Order Number: 334534-002US Ordering Information Contact your local Intel sales representative for ordering
More informationIncBricks: Toward In-Network Computation with an In-Network Cache
IncBricks: Toward In-Network Computation with an In-Network Cache Ming Liu University of Washington mgliu@cs.washington.edu Luis Ceze University of Washington luisceze@cs.washington.edu Liang Luo University
More informationWORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES BIG AND SMALL SERVER PLATFORMS
WORKLOAD CHARACTERIZATION OF INTERACTIVE CLOUD SERVICES ON BIG AND SMALL SERVER PLATFORMS Shuang Chen*, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium
More informationMemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationHigh Performance Packet Processing with FlexNIC
High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationOn Smart Query Routing: For Distributed Graph Querying with Decoupled Storage
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Belay, A. et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Reviewed by Chun-Yu and Xinghao Li Summary In this
More informationZhang Tianfei. Rosen Xu
Zhang Tianfei Rosen Xu Agenda Part 1: FPGA and OPAE - Intel FPGAs and the Modern Datacenter - Platform Options and the Acceleration Stack - FPGA Hardware overview - Open Programmable Acceleration Engine
More informationBig data, little time. Scale-out data serving. Scale-out data serving. Highly skewed key popularity
/7/6 Big data, little time Goal is to keep (hot) data in memory Requires scale-out approach Each server responsible for one chunk Fast access to local data The Case for RackOut Scalable Data Serving Using
More informationCubro Packetmaster EX48600
Cubro Packetmaster EX48600 PRODUCT REVIEW Network Packet Broker (NPB) At a glance Definition A network packet broker (NPB) is a tool that receives data from number of network links; duplicates, aggregates
More informationMicrosoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays
Microsoft SQL Server 2012 Fast Track Reference Configuration Using PowerEdge R720 and EqualLogic PS6110XV Arrays This whitepaper describes Dell Microsoft SQL Server Fast Track reference architecture configurations
More informationJinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)
Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes
More informationP51: High Performance Networking
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed
More informationS. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationFarewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation
Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware
More informationDistributed caching for cloud computing
Distributed caching for cloud computing Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens February 11, 2013 Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16 Introduction Context
More informationDeploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c
White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits
More informationDIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU Architectures
DIDO: Dynamic Pipelines for In-Memory Key-Value Stores on Coupled CPU-GPU rchitectures Kai Zhang, Jiayu Hu, Bingsheng He, Bei Hua School of Computing, National University of Singapore School of Computer
More informationKV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC
KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC Bojie Li * Zhenyuan Ruan * Wencong Xiao Yuanwei Lu Yongqiang Xiong Andrew Putnam Enhong Chen Lintao Zhang Microsoft Research
More informationEfficient Memory Disaggregation with Infiniswap. Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin
Efficient Memory Disaggregation with Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin Agenda Motivation and related work Design and system overview Implementation and evaluation
More informationAlbis: High-Performance File Format for Big Data Systems
Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference
More informationBespoKV: Application Tailored Scale-Out Key-Value Stores
BespoKV: Application Tailored Scale-Out Key-Value Stores Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt BespoKV Role of Distributed KV stores in HPC
More informationEfficient Memory Mapped File I/O for In-Memory File Systems. Jungsik Choi, Jiwon Kim, Hwansoo Han
Efficient Memory Mapped File I/O for In-Memory File Systems Jungsik Choi, Jiwon Kim, Hwansoo Han Operations Per Second Storage Latency Close to DRAM SATA/SAS Flash SSD (~00μs) PCIe Flash SSD (~60 μs) D-XPoint
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationDatacenter application interference
1 Datacenter application interference CMPs (popular in datacenters) offer increased throughput and reduced power consumption They also increase resource sharing between applications, which can result in
More informationDynacache: Dynamic Cloud Caching
Dynacache: Dynamic Cloud Caching Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin Ka? Stanford University MIT 1 Driving Web Applica4on Performance Web- scale applica4ons heavily reliant on memory
More informationSurvey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016
Survey of ETSI NFV standardization documents BY ABHISHEK GUPTA FRIDAY GROUP MEETING FEBRUARY 26, 2016 VNFaaS (Virtual Network Function as a Service) In our present work, we consider the VNFaaS use-case
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationArista 7170 series: Q&A
Arista 7170 series: Q&A Product Overview What are the 7170 series? The Arista 7170 Series are purpose built multifunctional programmable 100GbE systems built for the highest performance environments and
More informationTreadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference
Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference Yunqi Zhang, David Meisner, Jason Mars, Lingjia Tang Internet services User interactive applications
More informationFastReact. In-Network Control and Caching for Industrial Control Networks using Programmable Data Planes
FastReact In-Network Control and Caching for Industrial Control Networks using Programmable Data Planes Authors: Jonathan Vestin Andreas Kassler Johan
More informationImproving Cloud Application Performance with Simulation-Guided CPU State Management
Improving Cloud Application Performance with Simulation-Guided CPU State Management Mathias Gottschlag, Frank Bellosa April 23, 2017 KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT) - OPERATING SYSTEMS GROUP KIT
More informationDeduplication Storage System
Deduplication Storage System Kai Li Charles Fitzmorris Professor, Princeton University & Chief Scientist and Co-Founder, Data Domain, Inc. 03/11/09 The World Is Becoming Data-Centric CERN Tier 0 Business
More informationProgrammable Data Plane at Terabit Speeds Vladimir Gurevich May 16, 2017
Programmable Data Plane at Terabit Speeds Vladimir Gurevich May 16, 2017 Programmable switches are 10-100x slower than fixed-function switches. They cost more and consume more power. Conventional wisdom
More informationImplementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd
Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd Performance Study Dell EMC Engineering October 2017 A Dell EMC Performance Study Revisions Date October 2017
More informationBuilding Efficient and Reliable Software-Defined Networks. Naga Katta
FPO Talk Building Efficient and Reliable Software-Defined Networks Naga Katta Jennifer Rexford (Advisor) Readers: Mike Freedman, David Walker Examiners: Nick Feamster, Aarti Gupta 1 Traditional Networking
More informationTotal Cost of Ownership Analysis for a Wireless Access Gateway
white paper Communications Service Providers TCO Analysis Total Cost of Ownership Analysis for a Wireless Access Gateway An analysis of the total cost of ownership of a wireless access gateway running
More informationSwitchX Virtual Protocol Interconnect (VPI) Switch Architecture
SwitchX Virtual Protocol Interconnect (VPI) Switch Architecture 2012 MELLANOX TECHNOLOGIES 1 SwitchX - Virtual Protocol Interconnect Solutions Server / Compute Switch / Gateway Virtual Protocol Interconnect
More informationSmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center
SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center Jeff Defilippi Senior Product Manager Arm #Arm Tech Symposia The Cloud to Edge Infrastructure Foundation for a World of 1T Intelligent
More informationInexpensive Coordination in Hardware
Consensus in a Box Inexpensive Coordination in Hardware Zsolt István, David Sidler, Gustavo Alonso, Marko Vukolic * Systems Group, Department of Computer Science, ETH Zurich * Consensus IBM in Research,
More informationLanguage-Directed Hardware Design for Network Performance Monitoring
Language-Directed Hardware Design for Network Performance Monitoring Srinivas Narayana, Anirudh Sivaraman, Vikram Nathan, Prateesh Goyal, Venkat Arun, Mohammad Alizadeh, Vimal Jeyakumar, and Changhoon
More informationCisco Nexus 9508 Switch Power and Performance
White Paper Cisco Nexus 9508 Switch Power and Performance The Cisco Nexus 9508 brings together data center switching power efficiency and forwarding performance in a high-density 40 Gigabit Ethernet form
More informationSDA: Software-Defined Accelerator for general-purpose big data analysis system
SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search
More informationFlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun,
More informationBlueDBM: An Appliance for Big Data Analytics*
BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting
More informationSSS: An Implementation of Key-value Store based MapReduce Framework. Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh
SSS: An Implementation of Key-value Store based MapReduce Framework Hirotaka Ogawa (AIST, Japan) Hidemoto Nakada Ryousei Takano Tomohiro Kudoh MapReduce A promising programming tool for implementing largescale
More informationConsolidating Microsoft SQL Server databases on PowerEdge R930 server
Consolidating Microsoft SQL Server databases on PowerEdge R930 server This white paper showcases PowerEdge R930 computing capabilities in consolidating SQL Server OLTP databases in a virtual environment.
More informationReducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet
Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationSpeeding up Linux TCP/IP with a Fast Packet I/O Framework
Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation
More informationParallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have
More informationAn FPGA-Based Optical IOH Architecture for Embedded System
An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing
More informationOptimizing Flash-based Key-value Cache Systems
Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University
More informationFAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing
More informationFacebook Tao Distributed Data Store for the Social Graph
L. Lancia, G. Salillari Cloud Computing Master Degree in Data Science Sapienza Università di Roma Facebook Tao Distributed Data Store for the Social Graph L. Lancia & G. Salillari 1 / 40 Table of Contents
More informationWorkload-driven Analysis of File Systems in Shared Multi-tier Data-Centers over InfiniBand
Workload-driven Analysis of File Systems in Shared Multi-tier Data-Centers over InfiniBand K. VAIDYANATHAN, P. BALAJI, H. -W. JIN AND D. K. PANDA Technical Report OSU-CISRC-12/4-TR65 Workload-driven Analysis
More informationHardware Evolution in Data Centers
Hardware Evolution in Data Centers 2004 2008 2011 2000 2013 2014 Trend towards customization Increase work done per dollar (CapEx + OpEx) Paolo Costa Rethinking the Network Stack for Rack-scale Computers
More informationJinho Hwang and Timothy Wood George Washington University
Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP
More informationPacketShader: A GPU-Accelerated Software Router
PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,
More informationRack Disaggregation Using PCIe Networking
Ethernet-based Software Defined Network (SDN) Rack Disaggregation Using PCIe Networking Cloud Computing Research Center for Mobile Applications (CCMA) Industrial Technology Research Institute 雲端運算行動應用研究中心
More informationNon-Volatile Memory Through Customized Key-Value Stores
Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)
More informationGateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance
Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance STAC Summit: Panel: FPGA for trading today: December 2015 John W. Lockwood, PhD, CEO Algo-Logic Systems, Inc. JWLockwd@algo-logic.com
More informationA 400Gbps Multi-Core Network Processor
A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationReconfigurable hardware for big data. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland
Reconfigurable hardware for big data Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland www.systems.ethz.ch Systems Group 7 faculty ~40 PhD ~8 postdocs Researching all
More informationIX: A Protected Dataplane Operating System for High Throughput and Low Latency
IX: A Protected Dataplane Operating System for High Throughput and Low Latency Adam Belay et al. Proc. of the 11th USENIX Symp. on OSDI, pp. 49-65, 2014. Presented by Han Zhang & Zaina Hamid Challenges
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More information