Scalable Concurrent Hash Tables via Relativistic Programming

Similar documents
A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

RCU in the Linux Kernel: One Decade Later

Generalized Construction of Scalable Concurrent Data Structures via Relativistic Programming

Introduction to RCU Concepts

Scalable Correct Memory Ordering via Relativistic Programming

Scalable Concurrent Hash Tables via Relativistic Programming

Read-Copy Update (RCU) Don Porter CSE 506

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Scalable Concurrent Hash Tables via Relativistic Programming

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Read-Copy Update in a Garbage Collected Environment. By Harshal Sheth, Aashish Welling, and Nihar Sheth

Introduction to RCU Concepts

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Relativistic Causal Ordering. A Memory Model for Scalable Concurrent Data Structures. Josh Triplett

Advanced Topic: Efficient Synchronization

Abstraction, Reality Checks, and RCU

How to hold onto things in a multiprocessor world

Lecture 10: Multi-Object Synchronization

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 31 October 2012

A Scalable Lock Manager for Multicores

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Concurrency Race Conditions and Deadlocks

Read-Copy Update (RCU) Don Porter CSE 506

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Lecture 10: Avoiding Locks

RCU and C++ Paul E. McKenney, IBM Distinguished Engineer, Linux Technology Center Member, IBM Academy of Technology CPPCON, September 23, 2016

Hash Joins for Multi-core CPUs. Benjamin Wagner

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

Transactional Memory

Operating Systems. Synchronization

Kernel Scalability. Adam Belay

A Relativistic Enhancement to Software Transactional Memory

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

10/17/ Gribble, Lazowska, Levy, Zahorjan 2. 10/17/ Gribble, Lazowska, Levy, Zahorjan 4

A Practical Scalable Distributed B-Tree

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

CSE 120 Principles of Operating Systems

Dealing with Issues for Interprocess Communication

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Locking: A necessary evil? Lecture 10: Avoiding Locks. Priority Inversion. Deadlock

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

Cluster-Level Google How we use Colossus to improve storage efficiency

VFS, Continued. Don Porter CSE 506

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Advance Operating Systems (CS202) Locks Discussion

Design and Evaluation of Scalable Software

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 17 November 2017

CS 220: Introduction to Parallel Computing. Introduction to CUDA. Lecture 28

Advanced Operating Systems (CS 202)

Indexing in RAMCloud. Ankita Kejriwal, Ashish Gupta, Arjun Gopalan, John Ousterhout. Stanford University

What Is RCU? Indian Institute of Science, Bangalore. Paul E. McKenney, IBM Distinguished Engineer, Linux Technology Center 3 June 2013

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

CS 241 Honors Concurrent Data Structures

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 21 November 2014

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

Sleepable Read-Copy Update

Foundations of the C++ Concurrency Memory Model

CSE332: Data Abstractions Lecture 23: Programming with Locks and Critical Sections. Tyler Robison Summer 2010

Chair for Network Architectures and Services Prof. Carle Department of Computer Science Technische Universität München.

What Is RCU? Guest Lecture for University of Cambridge

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

230 Million Tweets per day

The Google File System

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Synchronization. CSE 2431: Introduction to Operating Systems Reading: Chapter 5, [OSC] (except Section 5.10)

High Performance Transactions in Deuteronomy

Speculative Synchronization: Applying Thread Level Speculation to Parallel Applications. University of Illinois

Concurrent Data Structures Concurrent Algorithms 2016

Main Memory (Part II)

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

CS377P Programming for Performance Multicore Performance Synchronization

Heckaton. SQL Server's Memory Optimized OLTP Engine

NPTEL Course Jan K. Gopinath Indian Institute of Science

FaRM: Fast Remote Memory

Part II: Data Center Software Architecture: Topic 2: Key-value Data Management Systems. SkimpyStash: Key Value Store on Flash-based Storage

CS162 - Operating Systems and Systems Programming. Address Translation => Paging"

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

Read-Copy Update (RCU) for C++

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Internet Technology 3/2/2016

Tradeoff Evaluation. Comparison between CB-R and CB-A as they only differ for this aspect.

Toward MP-safe Networking in NetBSD

Main Points of the Computer Organization and System Software Module

Network File System (NFS)

Netconf RCU and Breakage. Paul E. McKenney IBM Distinguished Engineer & CTO Linux Linux Technology Center IBM Corporation

bobpusateri.com heraflux.com linkedin.com/in/bobpusateri. Solutions Architect

Network File System (NFS)

The RCU-Reader Preemption Problem in VMs

Innodb Performance Optimization

Bw-Tree. Josef Schmeißer. January 9, Josef Schmeißer Bw-Tree January 9, / 25

A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 5 Programming with Locks and Critical Sections

FaSST: Fast, Scalable, and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs

Near Memory Key/Value Lookup Acceleration MemSys 2017

Lustre SMP scaling. Liang Zhen

Distributed Filesystem

Transcription:

Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009

Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second 0.1 meters/cycle (4 inches/cycle) Ignores propagation delay, ramp time, speed of signals

Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second 0.1 meters/cycle (4 inches/cycle) Ignores propagation delay, ramp time, speed of signals One of the reasons CPUs stopped getting faster Physical limit on memory, CPU CPU communication

Throughput vs Latency CPUs can do a lot of independent work in 1 cycle CPUs can work out of their own cache in 1 cycle CPUs can t communicate and agree in 1 cycle

How to scale? To improve scalability, work independently Agreement represents the bottleneck Scale by reducing the need to agree

Classic concurrent programming Every CPU agrees on the order of instructions No tolerance for conflicts Implicit communication and agreement required Does not scale Example: mutual exclusion

Relativistic programming By analogy with physics: no global reference frame Allow each thread to work with its observed relative view of memory Minimal constraints on instruction ordering Tolerance for conflicts: allow concurrent threads to access shared data at the same time, even when doing modifications.

Why relativistic programming? Wait-free Very low overhead Linear scalability

Per-CPU variables Concrete examples

Concrete examples Per-CPU variables Deferred destruction Read-Copy Update (RCU)

What does RCU provide? Delimited readers with near-zero overhead Wait for all current readers to finish operation Primitives for conflict-tolerant operations: rcu_assign_pointer, rcu_deference

What does RCU provide? Delimited readers with near-zero overhead Wait for all current readers to finish operation Primitives for conflict-tolerant operations: rcu_assign_pointer, rcu_deference Working data structures you don t have to think hard about

RCU data structures Linked lists Radix trees Hash tables, sort of

Hash tables, sort of RCU linked lists for buckets Insertion and removal No other operations

New RCU hash table operations Move element Resize table

Move operation key old a n 1 n 2 n 3. b n 4 n 5

Move operation a n 1 n 2. key new b n 4 n 5 n 3

Move operation semantics If a reader doesn t see the old item, subsequent lookups of the new item must succeed. If a reader sees the new item, subsequent lookups of the old item must fail. The move operation must not cause concurrent lookups for other items to fail Semantics based roughly on filesystems

Move operation challenge Trivial to implement with mutual exclusion Insert then remove, or remove then insert Intermediate states don t matter Hash table buckets use linked lists RCU linked list implementations provide insert and remove Move semantics not possible using just insert and remove

Current approach in Linux Sequence lock Readers retry if they race with a rename Any rename

Solution characteristics Principles: One semantically significant change at a time Intermediate states must not violate semantics Need a new move operation specific to relativistic hash tables, making moves a single semantically significant change with no broken intermediate state Must appear to simultaneously move item to new bucket and change key

Solution characteristics Principles: One semantically significant change at a time Intermediate states must not violate semantics Need a new move operation specific to relativistic hash tables, making moves a single semantically significant change with no broken intermediate state Must appear to simultaneously move item to new bucket and change key

key old Key idea a n 1 n 2 n 3. b n 4 n 5 Cross-link end of new bucket to node in old bucket

key new Key idea a n 1 n 2 n 3. b n 4 n 5 Cross-link end of new bucket to node in old bucket While target node appears in both buckets, change the key

key new Key idea a n 1 n 2 n 3. b n 4 n 5 Cross-link end of new bucket to node in old bucket While target node appears in both buckets, change the key Need to resolve cross-linking safely, even for readers looking at the target node First copy target node to the end of its bucket, so readers can t miss later nodes Memory barriers

Benchmarking with rcuhashbash Run one thread per CPU. Continuous loop: randomly lookup or move Configurable algorithm and lookup:move ratio Run for 30 seconds, count reads and writes Average of 10 runs Tested on 64 CPUs

Results, 999:1 lookup:move ratio, reads 200 180 Proposed algorithm Current Linux (RCU+seqlock) Per-bucket spinlocks Per-bucket reader-writer locks 160 Millions of Hash Lookups per Second 140 120 100 80 60 40 20 0 1 2 4 8 16 32 64 CPUs

Results, 1:1 lookup:move ratio, reads 7 6 Per-bucket spinlocks Per-bucket reader-writer locks Proposed algorithm Current Linux (RCU+seqlock) Millions of Hash Lookups per Second 5 4 3 2 1 0 1 2 4 8 16 32 64 CPUs

Resizing RCU-protected hash tables Disclaimer: work in progress Working on implementation and test framework in rcuhashbash No benchmark numbers yet Expect code and announcement soon

Resizing algorithm Keep a secondary table pointer, usually NULL Lookups use secondary table if primary table lookup fails

Resizing algorithm Keep a secondary table pointer, usually NULL Lookups use secondary table if primary table lookup fails Cross-link tails of chains to second table in appropriate bucket

Resizing algorithm Keep a secondary table pointer, usually NULL Lookups use secondary table if primary table lookup fails Cross-link tails of chains to second table in appropriate bucket Wait for current readers to finish before removing cross-links from primary table

Resizing algorithm Keep a secondary table pointer, usually NULL Lookups use secondary table if primary table lookup fails Cross-link tails of chains to second table in appropriate bucket Wait for current readers to finish before removing cross-links from primary table Repeat until primary table empty Make the secondary table primary Free the old primary table after a grace period

For more information Code: git://git.kernel.org/pub/scm/linux/kernel/ git/josh/rcuhashbash (Resize coming soon!) Relativistic programming: http://wiki.cs.pdx.edu/rp/ Email: josh@joshtriplett.org