Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting

Similar documents
Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Allocating memory in a lock-free manner

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Non-blocking Array-based Algorithms for Stacks and Queues!

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Progress Guarantees When Composing Lock-Free Objects

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

Håkan Sundell University College of Borås Parallel Scalable Solutions AB

DESIGN CHALLENGES FOR SCALABLE CONCURRENT DATA STRUCTURES for Many-Core Processors

Efficient & Lock-Free Modified Skip List in Concurrent Environment

Lock-Free and Practical Deques and Doubly Linked Lists using Single-Word Compare-And-Swap 1

CS377P Programming for Performance Multicore Performance Synchronization

Fine-grained synchronization & lock-free programming

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Wait-Free Multi-Word Compare-And-Swap using Greedy Helping and Grabbing

CMSC421: Principles of Operating Systems

arxiv: v1 [cs.dc] 4 Dec 2017

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming

Fine-grained synchronization & lock-free data structures

SSC - Concurrency and Multi-threading Java multithreading programming - Synchronisation (II)

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Synchronising Threads

Drop the Anchor: Lightweight Memory Management for Non-Blocking Data Structures

Performance of memory reclamation for lockless synchronization

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

Process Synchronisation (contd.) Operating Systems. Autumn CS4023

Linked Lists: The Role of Locking. Erez Petrank Technion

A Skiplist-based Concurrent Priority Queue with Minimal Memory Contention

Viper: Communication-Layer Determinism and Scaling in Low-Latency Stream Processing

Semaphore. Originally called P() and V() wait (S) { while S <= 0 ; // no-op S--; } signal (S) { S++; }

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

From Lock-Free to Wait-Free: Linked List. Edward Duong

Today: Synchronization. Recap: Synchronization

arxiv: v1 [cs.dc] 12 Feb 2013

Lecture 9: Virtual Memory

CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.

Timeslice Donation in Component-Based Systems

On the Design and Implementation of an Efficient Lock-Free Scheduler

Parallel access to linked data structures

Lock-Free Techniques for Concurrent Access to Shared Objects

On the Design and Implementation of an Efficient Lock-Free Scheduler

Concurrent Manipulation of Dynamic Data Structures in OpenCL

Deadlock. Concurrency: Deadlock and Starvation. Reusable Resources

Mutual Exclusion and Synchronization

arxiv:cs/ v1 [cs.dc] 5 Aug 2004

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

An Evaluation of the Dynamic and Static Multiprocessor Priority Ceiling Protocol and the Multiprocessor Stack Resource Policy in an SMP System

A Non-Blocking Concurrent Queue Algorithm

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Programming Languages

15 418/618 Project Final Report Concurrent Lock free BST

Non-Blocking Concurrent FIFO Queues With Single Word Synchronization Primitives

Characterizing the Performance and Energy Efficiency of Lock-Free Data Structures

Synchronization for Concurrent Tasks

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

Split-Ordered Lists: Lock-Free Extensible Hash Tables. Pierre LaBorde

Chapter 5 Concurrency: Mutual Exclusion. and. Synchronization. Operating Systems: Internals. and. Design Principles

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency: Deadlock and Starvation. Chapter 6

Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model

IT 540 Operating Systems ECE519 Advanced Operating Systems

Remaining Contemplation Questions

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Dealing with Issues for Interprocess Communication

Multi-core Architecture and Programming

AST: scalable synchronization Supervisors guide 2002

Distributed Operating Systems

Concurrency Race Conditions and Deadlocks

Lock-free Programming

A Consistency Framework for Iteration Operations in Concurrent Data Structures

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Advanced Topic: Efficient Synchronization

Concurrent Preliminaries

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

LOCKLESS ALGORITHMS. Lockless Algorithms. CAS based algorithms stack order linked list

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Interprocess Communication By: Kaushik Vaghani

Bringing Practical Lock-Free Synchronization to 64-Bit Applications

Lecture #7: Implementing Mutual Exclusion

Distributed Systems Synchronization. Marcus Völp 2007

Synchronization and memory consistency on Intel Single-chip Cloud Computer. Ivan Walulya

Lecture Topics. Announcements. Today: Concurrency (Stallings, chapter , 5.7) Next: Exam #1. Self-Study Exercise #5. Project #3 (due 9/28)

Advanced Synchronization and Deadlock

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Java Monitors. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Lecture Outine. Why Automatic Memory Management? Garbage Collection. Automatic Memory Management. Three Techniques. Lecture 18

The Google File System

Synchronization 1. Synchronization

CS420: Operating Systems. Process Synchronization

A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines

Marwan Burelle. Parallel and Concurrent Programming

Transcription:

Efficient and Reliable Lock-Free Memory Reclamation d on Reference ounting nders Gidenstam, Marina Papatriantafilou, Håkan Sundell and Philippas Tsigas Distributed omputing and Systems group, Department of omputer Science and Engineering, halmers University of Technology

Outline Introduction The Problem Lock-free synchronization Our solution Idea Properties Experiments onclusions 2

The Lock-Free Memory Reclamation Problem oncurrent shared data structure Dynamic use of shared memory oncurrent and overlapping operations by threads or processes 3

The Lock-Free Memory Reclamation Problem Local variables 4

The Lock-Free Memory Reclamation Problem X has de-referenced the link (pointer) to 5

The Lock-Free Memory Reclamation Problem nother thread, Y, finds and deletes (removes) from the active structure Thread Y 6

The Lock-Free Memory Reclamation Problem Property I: (de-)referenced node is not reclaimed Thread Y wants to reclaim(/free)? Thread Y 7

The Lock-Free Memory Reclamation Problem Property II: Links in a (de-)referenced node should always be de-referencable. The nodes and? are deleted from the active structure. D 8

The Lock-Free Memory Reclamation Problem Solutions? Garbage collection? Reference counting? Needs to be lock-free! 2 D 9

Lock-free synchronization lock-free shared data structure llows concurrent operations without enforcing mutual exclusion (i.e. no locks) Guarantees that at least one operation always makes progress voids: locking, deadlock and priority inversion Hardware synchronization primitives uilt into PU and memory system Typically: atomic read-modify-write instructions Examples Test-and-set, ompare-and-swap, Load-Linked / Store-onditional 0

Previous solutions Lock-free Reference ounting Valois + Michael & Scott 995 Detlefs et al. 200 Slow Herlihy et al. 2002 Remaining issues slow thread might prevent reclamation yclic garbage Implementation practicality issues Reference-count field MUST remain forever (Valois + Michael & Scott) Needs double word S (Detlefs et al.) Needs double width S (Herlihy, 2002) Large overhead

Our approach The basic idea ombine the best of Hazard pointers (Michael 2002) Tracks references from threads Fast de-reference Upper bound on the amount of unreclaimed deleted nodes ompatible with standard memory allocators Reference counting Tracks references from links in shared memory Manages links within dynamic nodes Safe to traverse links (also) in deleted nodes Practical Uses only single-word ompare-nd-swap 2

The basic idea PI DeRefLink ReleaseRef omparendswapref StoreRef NewNode DeleteNode Hazard pointers () Deletion list 3

The basic idea PI DeRefLink() ReleaseRef omparendswapref StoreRef NewNode DeleteNode Hazard pointers () Deletion list R 4

The basic idea PI DeRefLink ReleaseRef(R) omparendswapref StoreRef NewNode DeleteNode Hazard pointers () Deletion list R 5

The basic idea PI DeRefLink ReleaseRef omparendswapref StoreRef NewNode 0 DeleteNode D Hazard pointers () Deletion list new 6

The basic idea PI DeRefLink ReleaseRef omparendswapref StoreRef(new.next, R) NewNode 0 DeleteNode D Hazard pointers () Deletion list R new 2 7

The basic idea PI DeRefLink ReleaseRef Hazard pointers () omparendswapref(prev.next, old, new) StoreRef NewNode DeleteNode D Deletion list prev old new 0 2 8

The basic idea PI DeRefLink ReleaseRef omparendswapref StoreRef NewNode DeleteNode(old) D Hazard pointers () Deletion list _ prev old new 0 2 9

reaking chains of garbage lean-up deleted nodes Update links to point to live nodes Performed on nodes in Own deletion list ll deletion lists 0 Hazard pointers (Thread Y) Deletion list 2 D E 20

reaking chains of garbage lean-up deleted nodes Update links to point to live nodes Performed on nodes in Own deletion list ll deletion lists 0 Hazard pointers (Thread Y) 0 Deletion list 3 D E 2

ound on unreclaimed nodes deleted node can be reclaimed when The reference count is zero and No hazard pointer is pointing to it and There is no ongoing clean-up of this node With a rate relative to the number of threads of Scanning hazard pointers leaning up nodes as needed Then the maximum size of each deletion list depends on The number of hazard pointers The number of links per node The number of threads 22

Experimental evaluation Lock-free deque (Sundell and Tsigas 2004) (deque double-ended queue) The algorithm needs traversal of deleted nodes Time for 0000 random operations/thread Tested memory reclamation schemes Reference counting, Valois et al. The new algorithm Systems 4 processor Xeon P / Linux (UM) 8 processor SGI Origin 2000 / IRIX (NUM) 23

Experimental evaluation 24

Experimental evaluation 25

onclusions First lock-free memory reclamation scheme that Only uses atomic primitives available in contemporary architectures Guarantees safety of Local and Global references Has an upper bound on the amount of deleted but unreclaimed nodes llows arbitrary reuse of reclaimed memory 26

Questions? ontact Information: ddress: nders Gidenstam, omputer Science & Engineering, halmers University of Technology, SE-42 96 Göteborg, Sweden Email: andersg @ cs.chalmers.se Web: http://www.cs.chalmers.se/~dcs http://www.cs.chalmers.se/~andersg Implementation http://www.noble-library.org/ 27

onclusions First lock-free memory reclamation scheme that Only uses atomic primitives available in contemporary architectures Guarantees safety of Local and Global references Has an upper bound on the amount of deleted but unreclaimed nodes ( ound: N * N * (k + L_max + a + ) ) llows arbitrary reuse of reclaimed memory 28