MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing
|
|
- Mae Holland
- 6 years ago
- Views:
Transcription
1 MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI
2 Goal: Improve Memcached 1. Reduce space overhead (bytes/key) 2. Improve performance (queries/sec) 2
3 Overview Previous Work: Sharding Avoid inter-thread synchronization e.g., dedicated cores [Berezecki11] Hotspot? Memory Efficiency? Our Approach: Algorithm Engineering Apply concurrent / space-efficient data structures MemC3 vs. Memcached 3x throughput 30% more small key-value items 3
4 Memcached Overview A DRAM-based key-value store GET(key) SET(key, value) LRU eviction for high hit rate Webserver Webserver Webserver Typical use: Speed up webservers Alleviate db load get(x) set(y,"123") get(z) Memcached server on miss Memcached server on miss Database 4
5 Watch the next talk! Typical Workloads Often used for small objects (Facebook [Atikoglu12] ) 90% keys < 31 bytes Some apps only use 2-byte values Tens of millions of queries per second for large memcached clusters (Facebook [Nishtala13] ) Small Objects, High Rate 5
6 Memcached: Core Data Structures Key-Value Index: Chaining hash table Hash table w/ chaining K V K V K V K V K V 6 Bin Fan April 13!!
7 Memcached: Core Data Structures Key-Value Index: Chaining hash table Hash table w/ chaining K V LRU Eviction: Doubly-linked lists Doubly-linked-list (for each slab) LRU header K V K V K V K V 7
8 Problems We Solve Single-node scalability Accessing hash table and updating LRU are serialized Space overhead 56-byte header per object Including 3 pointers and 1 refcount For a 100B object, overhead > 50% 8
9 Solutions Optimistic cuckoo hashing Better memory efficiency: 95% table occupancy Higher concurrency: single-writer/multi-reader focus of this talk CLOCK-based LRU eviction Better space efficiency and concurrency Additional algo & tuning improvements described in paper 9
10 Solutions Optimistic cuckoo hashing Better memory efficiency: 95% table occupancy Higher concurrency: single-writer/multi-reader focus of this talk CLOCK-based LRU eviction Better space efficiency and concurrency Additional algo & tuning improvements 10
11 Memcached Default Hash Table Chaining items hashed in same bucket: lookup K V K V K V Good: simple (Data Structure 101) Bad: low cache locality: (dependent pointer dereference) Bad: pointer costs space 11
12 Linear Probing Probing consecutive buckets for vacancy Good: simple lookup Good: cache friendly Bad: poor memory efficiency: ( if occupancy > 50%, lookup needs to search a long chain) 12
13 Cuckoo Hashing [Pagh04] Each key has two candidate buckets Assigned by hash 1 (key), hash 2 (key) Stored in one of its candidate buckets Lookup: read 2 buckets in parallel hash 1 (x) lookup x hash 2 (x) 0! 1! 2! 3! 4! 5! 6! 7! 13
14 Cuckoo Hashing [Pagh04] Each key has two candidate buckets Assigned by hash 1 (key), hash 2 (key) Stored in one of its candidate buckets Lookup: read 2 buckets in parallel Insert: Perform key displacement recursively Still O(1) on average [Pagh04] hash 1 (x) hash 1 (b) 0! 1! 2! 3! 4! insert x 5! hash 2 (c) hash 6! 2 (x) 7! a c b 14
15 Increase Set-Associativity a b Each bucket still fits in 1 cacheline b e f g h x 6 7 a x c d 2 cacheline-sized reads per lookup 50% space utilized 2 cacheline-sized reads per lookup 95% space utilized! 15
16 Solutions Optimistic cuckoo hashing Better memory efficiency: 95% table occupancy Higher concurrency: single-writer/multi-reader focus of this talk CLOCK-based LRU eviction Better space efficiency and concurrency Additional algo & tuning improvements 16
17 During insertion: False Miss Problem always a floating item during insertion a reader may miss this floating item 0! 1! 2! 3! Floating Insert Item x 4! 5! b 6! a 7! 17
18 Our Solution: 2-Step Insert Step1: Find a cuckoo path to an empty slot without editing buckets 0! 1! 2! 3! Insert x 4! 5! 6! 7! a b 18
19 Our Solution: 2-Step Insert Step1: Find a cuckoo path to an empty slot without editing buckets 0! 1! Step2: Move hole backwards: Insert x 2! 3! 4! 5! 6! 7! a b 19
20 Our Solution: 2-Step Insert Step1: Find a cuckoo path to an empty slot without editing buckets 0! 1! Step2: Move hole backwards: Insert x 2! 3! 4! 5! 6! 7! b a 20
21 Our Solution: 2-Step Insert Step1: Find a cuckoo path to an empty slot without editing buckets 0! 1! Step2: Move hole backwards: 2! 3! 4! b a Insert x 5! 6! 7! 21
22 Our Solution: 2-Step Insert Step1: Find a cuckoo path to an empty slot without editing buckets 0! 1! Step2: Move hole backwards: 2! 3! 4! b a 5! Only need to ensure each move is atomic w.r.t. reader 6! 7! x 22
23 How to Ensure Atomic Move e.g., move key b from bucket 4 to bucket 2 A simple implementation: Lock bucket 2 and 4 Move key Unlock bucket 2 and 4 Our approach: Optimistic locking 0! 1! 2! 3! 4! Optimized for read-heavy workloads Each key mapped to a version counter 5! Reader detects version change 6! (described in paper) 7! b 23
24 How to Coordinate Writers Simple (current) solution: Serialize inserts Works fine with read-heavy workload Ongoing work: allow multiple writers 24
25 Solutions Optimistic cuckoo hashing Better memory efficiency: 95% table occupancy Higher concurrency: single-writer/multi-reader focus of this talk CLOCK-based LRU eviction Better space efficiency and concurrency 2ptr/key => 1bit/key, concurrent update Additional algo & tuning improvements 25
26 Solutions Optimistic cuckoo hashing Better memory efficiency: 95% table occupancy Higher concurrency: single-writer/multi-reader focus of this talk CLOCK-based LRU eviction Better space efficiency and concurrency 2ptr/key => 1bit/key, concurrent update Additional algo & tuning improvements Avoid unnecessary full-key comparisons on hash collision 26
27 Problems We Solve Single-node scalability Accessing hash table and updating LRU are serialized GET requires no mutex Single-writer/multiple-reader Space overhead 56-byte header per object 3 pointers + 1 refcount => 1 pointer + 1 refbit 27
28 Hash Table Microbenchmark Million Lookups/sec Server: Low Power Xeon CPU w/ 12 cores, 12 MB L3 cache 6 local threads reading hash tables of ~1GB 60% 50% 40% 30% 20% 10% 0% Lookups all hit Base 21.54% 12.79% 68% 14.28% 1.74% 1.9% Chaining w/ Bkt Lock Optimistic Cuckoo Base Lookups all miss Chaining w/ Bkt Lock 47.89% Optimistic Cuckoo 235% 28
29 Hash Table Microbenchmark Million Lookups/sec Server: Low Power Xeon CPU w/ 12 cores, 12 MB L3 cache 6 local threads reading hash tables of ~1GB 60% 50% 40% 30% 20% 10% 0% Lookups all hit Base 21.54% 12.79% 68% 14.28% 1.74% 1.9% Chaining w/ Bkt Lock Optimistic Cuckoo Base Lookups all miss Chaining w/ Bkt Lock 47.89% Optimistic Cuckoo 235% 29
30 End-to-end Performance 16-Byte key, 32-Byte Value, 95% GET, 5% SET, zipf distributed 50 remote clients generate workloads 5" max tput 4.3 MOPS MQPS% 4" 3" 2" 1" max tput 1.5 MOPS Memcached MemC3 Sharding 0" 1" 2" 4" 6" 8" 10" 12" 14" 16" Number%of%Server%Threads%% max tput 0.6 MOPS 30
31 Conclusion Optimistic cuckoo hashing High space efficiency Optimized for read-heavy workloads Source Code available: github.com/efficient/libcuckoo MemC3 improves Memcached 3x throughput, 30% more (small) objects Optimistic Cuckoo Hashing, CLOCK, other system tuning 31
32 References [Atikoglu12] Workload analysis of a large-scale key- value store. [Berezecki11] Many-core key-value store [Nishtala13] Scaling Memcache at Facebook [Pagh04] Cuckoo hashing 32
33 Q & A Thanks! 33
MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
To appear in Proceedings of the 1th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), Lombard, IL, April 213 MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
More informationMemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing Bin Fan, David G. Andersen, Michael Kaminsky Carnegie Mellon University, Intel Labs CMU-PDL-12-116 November 2012 Parallel
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationCuckoo Filter: Practically Better Than Bloom
Cuckoo Filter: Practically Better Than Bloom Bin Fan (CMU/Google) David Andersen (CMU) Michael Kaminsky (Intel Labs) Michael Mitzenmacher (Harvard) 1 What is Bloom Filter? A Compact Data Structure Storing
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationGoals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.
Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every
More informationCuckoo Linear Algebra
Cuckoo Linear Algebra Li Zhou, CMU Dave Andersen, CMU and Mu Li, CMU and Alexander Smola, CMU and select advertisement p(click user, query) = logist (hw, (user, query)i) select advertisement find weight
More informationFAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan
FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing
More informationMega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores
Mega KV: A Case for GPUs to Maximize the Throughput of In Memory Key Value Stores Kai Zhang Kaibo Wang 2 Yuan Yuan 2 Lei Guo 2 Rubao Lee 2 Xiaodong Zhang 2 University of Science and Technology of China
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationA Distributed Hash Table for Shared Memory
A Distributed Hash Table for Shared Memory Wytse Oortwijn Formal Methods and Tools, University of Twente August 31, 2015 Wytse Oortwijn (Formal Methods and Tools, AUniversity Distributed of Twente) Hash
More information@ Massachusetts Institute of Technology All rights reserved.
..... CPHASH: A Cache-Partitioned Hash Table with LRU Eviction by Zviad Metreveli MASSACHUSETTS INSTITUTE OF TECHAC'LOGY JUN 2 1 2011 LIBRARIES Submitted to the Department of Electrical Engineering and
More informationMICA: A Holistic Approach to Fast In-Memory Key-Value Storage
MICA: A Holistic Approach to Fast In-Memory Key-Value Storage Hyeontaek Lim, Carnegie Mellon University; Dongsu Han, Korea Advanced Institute of Science and Technology (KAIST); David G. Andersen, Carnegie
More informationFrom architecture to algorithms: Lessons from the FAWN project
From architecture to algorithms: Lessons from the FAWN project David Andersen, Vijay Vasudevan, Michael Kaminsky*, Michael A. Kozuch*, Amar Phanishayee, Lawrence Tan, Jason Franklin, Iulian Moraru, Sang
More informationRocksDB Key-Value Store Optimized For Flash
RocksDB Key-Value Store Optimized For Flash Siying Dong Software Engineer, Database Engineering Team @ Facebook April 20, 2016 Agenda 1 What is RocksDB? 2 RocksDB Design 3 Other Features What is RocksDB?
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationNbQ-CLOCK: A Non-blocking Queue-based CLOCK Algorithm for Web-Object Caching
NbQ-CLOCK: A Non-blocking Queue-based CLOCK Algorithm for Web-Object Caching Gage Eads 1, Juan A. Colmenares 2 1 EECS Department, University of California, Berkeley, CA, USA 2 Computer Science Laboratory,
More informationA distributed in-memory key-value store system on heterogeneous CPU GPU cluster
The VLDB Journal DOI 1.17/s778-17-479- REGULAR PAPER A distributed in-memory key-value store system on heterogeneous CPU GPU cluster Kai Zhang 1 Kaibo Wang 2 Yuan Yuan 3 Lei Guo 2 Rubao Li 3 Xiaodong Zhang
More informationHash Table Design and Optimization for Software Virtual Switches
Hash Table Design and Optimization for Software Virtual Switches P R E S E N T E R : R E N WA N G Y I P E N G WA N G, S A M E H G O B R I E L, R E N WA N G, C H A R L I E TA I, C R I S T I A N D U M I
More informationCPHash: A Cache-Partitioned Hash Table Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-211-51 November 26, 211 : A Cache-Partitioned Hash Table Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, and Jie Wu, Huazhong University of Science and Technology https://www.usenix.org/conference/osdi18/presentation/zuo
More informationMemshare: a Dynamic Multi-tenant Key-value Cache
Memshare: a Dynamic Multi-tenant Key-value Cache ASAF CIDON*, DANIEL RUSHTON, STEPHEN M. RUMBLE, RYAN STUTSMAN *STANFORD UNIVERSITY, UNIVERSITY OF UTAH, GOOGLE INC. 1 Cache is 100X Faster Than Database
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, Jie Wu Huazhong University of Science and Technology, China 3th USENIX Symposium on Operating Systems
More informationS. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University
More informationDynacache: Dynamic Cloud Caching
Dynacache: Dynamic Cloud Caching Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin Ka? Stanford University MIT 1 Driving Web Applica4on Performance Web- scale applica4ons heavily reliant on memory
More informationMULTI-THREADED QUERIES
15-721 Project 3 Final Presentation MULTI-THREADED QUERIES Wendong Li (wendongl) Lu Zhang (lzhang3) Rui Wang (ruiw1) Project Objective Intra-operator parallelism Use multiple threads in a single executor
More informationSILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR
SILT: A MEMORY-EFFICIENT, HIGH-PERFORMANCE KEY- VALUE STORE PRESENTED BY PRIYA SRIDHAR AGENDA INTRODUCTION Why SILT? MOTIVATION SILT KV STORAGE SYSTEM LOW READ AMPLIFICATION CONTROLLABLE WRITE AMPLIFICATION
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationAdvance Operating Systems (CS202) Locks Discussion
Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static
More informationFaRM: Fast Remote Memory
FaRM: Fast Remote Memory Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, Miguel Castro Microsoft Research Abstract We describe the design and implementation of FaRM, a new main memory distributed
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More informationFAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017
Liang 1 Jintian Liang CS244B December 13, 2017 1 Introduction FAWN as a Service FAWN, an acronym for Fast Array of Wimpy Nodes, is a distributed cluster of inexpensive nodes designed to give users a view
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationPart II: Software Infrastructure in Data Centers: Key-Value Data Management Systems
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems 1 Key-Value Store Clients
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number
More informationAn FPGA-based In-line Accelerator for Memcached
An FPGA-based In-line Accelerator for Memcached MAYSAM LAVASANI, HARI ANGEPAT, AND DEREK CHIOU THE UNIVERSITY OF TEXAS AT AUSTIN 1 Challenges for Server Processors Workload changes Social networking Cloud
More informationScalable Concurrent Hash Tables via Relativistic Programming
Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009 Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second
More informationNbQ-CLOCK: A Non-blocking Queue-based CLOCK Algorithm for Web-Object Caching
NbQ-CLOCK: A Non-blocking Queue-based CLOCK Algorithm for Web-Object Caching Gage Eads Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-174
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationConcurrent Data Structures Concurrent Algorithms 2016
Concurrent Data Structures Concurrent Algorithms 2016 Tudor David (based on slides by Vasileios Trigonakis) Tudor David 11.2016 1 Data Structures (DSs) Constructs for efficiently storing and retrieving
More informationGo Deep: Fixing Architectural Overheads of the Go Scheduler
Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target
More informationHigh Performance Transactions in Deuteronomy
High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,
More informationFaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs
FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs Anuj Kalia (CMU), Michael Kaminsky (Intel Labs), David Andersen (CMU) RDMA RDMA is a network feature that
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationNo compromises: distributed transactions with consistency, availability, and performance
No compromises: distributed transactions with consistency, availability, and performance Aleksandar Dragojevi c, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam,
More informationPanu Silvasti Page 1
Multicore support in databases Panu Silvasti Page 1 Outline Building blocks of a storage manager How do existing storage managers scale? Optimizing Shore database for multicore processors Page 2 Building
More informationSystem and Algorithmic Adaptation for Flash
System and Algorithmic Adaptation for Flash The FAWN Perspective David G. Andersen, Vijay Vasudevan, Michael Kaminsky* Amar Phanishayee, Jason Franklin, Iulian Moraru, Lawrence Tan Carnegie Mellon University
More informationGD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores
GD-Wheel: A Cost-Aware Replacement Policy for Key-Value Stores Conglong Li Carnegie Mellon University conglonl@cs.cmu.edu Alan L. Cox Rice University alc@rice.edu Abstract Memory-based key-value stores,
More informationffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu
ffwd: delegation is (much) faster than you think Sepideh Roghanchi, Jakob Eriksson, Nilanjana Basu int get_seqno() { } return ++seqno; // ~1 Billion ops/s // single-threaded int threadsafe_get_seqno()
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationThe Art and Science of Memory Allocation
Logical Diagram The Art and Science of Memory Allocation Don Porter CSE 506 Binary Formats RCU Memory Management Memory Allocators CPU Scheduler User System Calls Kernel Today s Lecture File System Networking
More informationAn efficient Unbounded Lock-Free Queue for Multi-Core Systems
An efficient Unbounded Lock-Free Queue for Multi-Core Systems Authors: Marco Aldinucci 1, Marco Danelutto 2, Peter Kilpatrick 3, Massimiliano Meneghin 4 and Massimo Torquati 2 1 Computer Science Dept.
More informationEnhancing the Scalability of Memcached Alex Wiggins Jimmy Langston, Ph.D.
INTEL CORPORATION Enhancing the Scalability of Memcached Alex Wiggins wigginal@engr.orst.edu Jimmy Langston, Ph.D. jimmy.langston@intel.com 5/17/2012 Memcached is an open-source, multi-threaded, distributed,
More informationHigh-Performance Key-Value Store On OpenSHMEM
High-Performance Key-Value Store On OpenSHMEM Huansong Fu Manjunath Gorentla Venkata Ahana Roy Choudhury Neena Imam Weikuan Yu Florida State University Oak Ridge National Laboratory {fu,roychou,yuw}@cs.fsu.edu
More informationFAWN: A Fast Array of Wimpy Nodes
FAWN: A Fast Array of Wimpy Nodes David G. Andersen, Jason Franklin, Michael Kaminsky *, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University, * Intel Labs SOSP 09 CAS ICT Storage
More informationUnderstanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
More informationA Scalable Lock Manager for Multicores
A Scalable Lock Manager for Multicores Hyungsoo Jung Hyuck Han Alan Fekete NICTA Samsung Electronics University of Sydney Gernot Heiser NICTA Heon Y. Yeom Seoul National University @University of Sydney
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationExploring mtcp based Single-Threaded and Multi-Threaded Web Server Design
Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design A Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Technology by Pijush Chakraborty (153050015)
More informationPart II: Software Infrastructure in Data Centers: Key-Value Data Management Systems
CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services Part II: Software Infrastructure in Data Centers: Key-Value Data Management Systems 1 Key-Value Store Clients
More informationFaRM: Fast Remote Memory
FaRM: Fast Remote Memory Problem Context DRAM prices have decreased significantly Cost effective to build commodity servers w/hundreds of GBs E.g. - cluster with 100 machines can hold tens of TBs of main
More informationAbstraction, Reality Checks, and RCU
Abstraction, Reality Checks, and RCU Paul E. McKenney IBM Beaverton University of Toronto Cider Seminar July 26, 2005 Copyright 2005 IBM Corporation 1 Overview Moore's Law and SMP Software Non-Blocking
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationAddressing the Memory Wall
Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationData Structure Engineering for High Performance Software Packet Processing
Data Structure Engineering for High Performance Software Packet Processing Dong Zhou dongz@cs.cmu.edu January 18, 2019 THESIS PROPOSAL Computer Science Department Carnegie Mellon University Pittsburgh,
More informationMnemosyne Lightweight Persistent Memory
Mnemosyne Lightweight Persistent Memory Haris Volos Andres Jaan Tack, Michael M. Swift University of Wisconsin Madison Executive Summary Storage-Class Memory (SCM) enables memory-like storage Persistent
More informationClosing the Performance Gap Between Volatile and Persistent K-V Stores
Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs
More informationCapriccio : Scalable Threads for Internet Services
Capriccio : Scalable Threads for Internet Services - Ron von Behren &et al - University of California, Berkeley. Presented By: Rajesh Subbiah Background Each incoming request is dispatched to a separate
More informationNo Compromises. Distributed Transactions with Consistency, Availability, Performance
No Compromises Distributed Transactions with Consistency, Availability, Performance Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, Miguel
More informationAlbis: High-Performance File Format for Big Data Systems
Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference
More informationOptimizing Flash-based Key-value Cache Systems
Optimizing Flash-based Key-value Cache Systems Zhaoyan Shen, Feng Chen, Yichen Jia, Zili Shao Department of Computing, Hong Kong Polytechnic University Computer Science & Engineering, Louisiana State University
More informationJinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)
Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes
More informationXtraDB 5.7: Key Performance Algorithms. Laurynas Biveinis Alexey Stroganov Percona
XtraDB 5.7: Key Performance Algorithms Laurynas Biveinis Alexey Stroganov Percona firstname.lastname@percona.com XtraDB 5.7 Key Performance Algorithms Focus on the buffer pool, flushing, the doublewrite
More informationImplementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server
Implementation and Evaluation of Moderate Parallelism in the BIND9 DNS Server JINMEI, Tatuya / Toshiba Paul Vixie / Internet Systems Consortium [Supported by SCOPE of the Ministry of Internal Affairs and
More informationCS 261 Fall Caching. Mike Lam, Professor. (get it??)
CS 261 Fall 2017 Mike Lam, Professor Caching (get it??) Topics Caching Cache policies and implementations Performance impact General strategies Caching A cache is a small, fast memory that acts as a buffer
More informationPacket-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO. Oliver Michel
Packet-Level Network Analytics without Compromises NANOG 73, June 26th 2018, Denver, CO Oliver Michel Network monitoring is important Security issues Performance issues Equipment failure Analytics Platform
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationInnoDB: Status, Architecture, and Latest Enhancements
InnoDB: Status, Architecture, and Latest Enhancements O'Reilly MySQL Conference, April 14, 2011 Inaam Rana, Oracle John Russell, Oracle Bios Inaam Rana (InnoDB / MySQL / Oracle) Crash recovery speedup
More informationUsing RDMA Efficiently for Key-Value Services
Using RDMA Efficiently for Key-Value Services Anuj Kalia, Michael Kaminsky, David G. Andersen Carnegie Mellon University, Intel Labs CMU-PDL-14-16 June 214 Parallel Data Laboratory Carnegie Mellon University
More informationLinked Lists: The Role of Locking. Erez Petrank Technion
Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing
More informationAnti-Caching: A New Approach to Database Management System Architecture. Guide: Helly Patel ( ) Dr. Sunnie Chung Kush Patel ( )
Anti-Caching: A New Approach to Database Management System Architecture Guide: Helly Patel (2655077) Dr. Sunnie Chung Kush Patel (2641883) Abstract Earlier DBMS blocks stored on disk, with a main memory
More informationPerformance and Optimization Issues in Multicore Computing
Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program
More informationNetCache: Balancing Key-Value Stores with Fast In-Network Caching
NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica NetCache is a rack-scale key-value
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationNetCache: Balancing Key-Value Stores with Fast In-Network Caching
NetCache: Balancing Key-Value Stores with Fast In-Network Caching Xin Jin, Xiaozhou Li, Haoyu Zhang, Robert Soulé Jeongkeun Lee, Nate Foster, Changhoon Kim, Ion Stoica NetCache is a rack-scale key-value
More informationGPUfs: Integrating a File System with GPUs. Yishuai Li & Shreyas Skandan
GPUfs: Integrating a File System with GPUs Yishuai Li & Shreyas Skandan Von Neumann Architecture Mem CPU I/O Von Neumann Architecture Mem CPU I/O slow fast slower Direct Memory Access Mem CPU I/O slow
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More informationKV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC
KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC Bojie Li * Zhenyuan Ruan * Wencong Xiao Yuanwei Lu Yongqiang Xiong Andrew Putnam Enhong Chen Lintao Zhang Microsoft Research
More informationarxiv: v1 [cs.ds] 17 May 2016
Fast Concurrent Cuckoo Kick-out Eviction Schemes for High-Density Tables William Kuszmaul Stanford University 50 Serra Mall Stanford, CA 9305 kuszmaul@stanford.edu arxiv:605.05236v [cs.ds] 7 May 206 ABSTRACT
More informationHashing and Natural Parallism. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Hashing and Natural Parallism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Sequential Closed Hash Map 0 1 2 3 16 9 buckets 2 Items h(k) = k mod 4 Art of Multiprocessor
More informationCache Policies. Philipp Koehn. 6 April 2018
Cache Policies Philipp Koehn 6 April 2018 Memory Tradeoff 1 Fastest memory is on same chip as CPU... but it is not very big (say, 32 KB in L1 cache) Slowest memory is DRAM on different chips... but can
More informationPERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES
PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES Anish Athalye and Patrick Long Mentors: Austin Clements and Stephen Tu 3 rd annual MIT PRIMES Conference Sequential
More informationChapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition
Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation
More informationCS3350B Computer Architecture
CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &
More informationIncreasing Performance for PowerCenter Sessions that Use Partitions
Increasing Performance for PowerCenter Sessions that Use Partitions 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,
More informationMemory: Page Table Structure. CSSE 332 Operating Systems Rose-Hulman Institute of Technology
Memory: Page Table Structure CSSE 332 Operating Systems Rose-Hulman Institute of Technology General address transla+on CPU virtual address data cache MMU Physical address Global memory Memory management
More information