Bw-Tree. Josef Schmeißer. January 9, Josef Schmeißer Bw-Tree January 9, / 25

Size: px
Start display at page:

Download "Bw-Tree. Josef Schmeißer. January 9, Josef Schmeißer Bw-Tree January 9, / 25"

Transcription

1 Bw-Tree Josef Schmeißer January 9, 2018 Josef Schmeißer Bw-Tree January 9, / 25

2 Table of contents 1 Fundamentals 2 Tree Structure 3 Evaluation 4 Further Reading Josef Schmeißer Bw-Tree January 9, / 25

3 Fundamentals bool compare_and_swap(int * ptr, int & expected, int desired) { int oldvalue; atomic { oldvalue = *ptr; if (oldvalue == expected) { *ptr = desired; return true; } } expected = oldvalue; return false; } Figure: Semantics of the CAS instruction. Josef Schmeißer Bw-Tree January 9, / 25

4 Features Main features: Lock-free data structure, which maps Page Identifiers (P IDs) to pointers entries can be atomically altered via CAS B link -Tree like side links (important for split and merge) Josef Schmeißer Bw-Tree January 9, / 25

5 Architecture PID Ptr P2 P1 P4 P ID Reference Memory Pointer Josef Schmeißer Bw-Tree January 9, / 25

6 Delta Updates PID Ptr Immutable base page P4 Josef Schmeißer Bw-Tree January 9, / 25

7 Delta Updates PID Ptr Insert 5 Immutable base page Perform updates to logical pages through delta records P4 Josef Schmeißer Bw-Tree January 9, / 25

8 Delta Updates PID Ptr Delete 2 Insert 5 Immutable base page Perform updates to logical pages through delta records Delta records are chained in a singly linked list P4 Josef Schmeißer Bw-Tree January 9, / 25

9 Delta Updates PID Ptr Delete 2 Insert 5 P4 Immutable base page Perform updates to logical pages through delta records Delta records are chained in a singly linked list Install updates atomically via CAS Josef Schmeißer Bw-Tree January 9, / 25

10 Search PID Ptr Delete 2 Insert 5 Traverse the tree as usual P4 Josef Schmeißer Bw-Tree January 9, / 25

11 Search PID Ptr Delete 2 Insert 5 Traverse the tree as usual Inspect each record of the delta chain, and stop at the first occurrence P4 Josef Schmeißer Bw-Tree January 9, / 25

12 Search PID Ptr Delete 2 Insert 5 P4 Traverse the tree as usual Inspect each record of the delta chain, and stop at the first occurrence Perform a binary search on the base page if the search drops through Josef Schmeißer Bw-Tree January 9, / 25

13 Conflicts PID Ptr P1 Josef Schmeißer Bw-Tree January 9, / 25

14 Conflicts PID Ptr Delete 2 Insert 5 P1 Multiple threads may try to install an update to the same page simultaneously Josef Schmeißer Bw-Tree January 9, / 25

15 Conflicts PID Ptr Delete 2 Insert 5 P1 Multiple threads may try to install an update to the same page simultaneously The atomic CAS ensures that only one thread succeeds Josef Schmeißer Bw-Tree January 9, / 25

16 Conflicts PID Ptr Delete 2 Insert 5 P1 Multiple threads may try to install an update to the same page simultaneously The atomic CAS ensures that only one thread succeeds Slower threads may retry Josef Schmeißer Bw-Tree January 9, / 25

17 Consolidation PID Ptr Insert 1 Delete 2 Insert 5 P3 Constantly appending deltas leads to ever-expanding chains. Josef Schmeißer Bw-Tree January 9, / 25

18 Consolidation PID Ptr Insert 1 Delete 2 Insert 5 P3 Constantly appending deltas leads to ever-expanding chains. Solution: 1. Consolidate the logical page by creating a new base page P3 Josef Schmeißer Bw-Tree January 9, / 25

19 Consolidation PID Ptr Insert 1 Delete 2 Insert 5 P3 Constantly appending deltas leads to ever-expanding chains. Solution: 1. Consolidate the logical page by creating a new base page 2. Install the new base page with an atomic CAS P3 Josef Schmeißer Bw-Tree January 9, / 25

20 Consolidation PID Ptr P3 Constantly appending deltas leads to ever-expanding chains. Solution: 1. Consolidate the logical page by creating a new base page 2. Install the new base page with an atomic CAS 3. Reclaim the memory of the old logical page, once it is no longer used Josef Schmeißer Bw-Tree January 9, / 25

21 Node Split PID O P Ptr k Q Page P Page R Josef Schmeißer Bw-Tree January 9, / 25

22 Node Split PID O P Ptr k Q Page P Page R Page Q Josef Schmeißer Bw-Tree January 9, / 25

23 Node Split PID O P Q Ptr Split k Page P Page R Page Q Josef Schmeißer Bw-Tree January 9, / 25

24 Node Split PID Ptr O P Q Split Index entry k Page P Page R Page Q Josef Schmeißer Bw-Tree January 9, / 25

25 Node Split PID Ptr O P Q Split Index entry k Page P Page R Page Q Josef Schmeißer Bw-Tree January 9, / 25

26 Node Merge PID P L R S Ptr k 1 k 2 Page L Page R Page S Josef Schmeißer Bw-Tree January 9, / 25

27 Node Merge PID P L R S Ptr k 1 k 2 Remove Node Page L Page R Page S Josef Schmeißer Bw-Tree January 9, / 25

28 Node Merge PID P L R S Ptr Merge k 1 k 2 Remove Node Page L Page R Page S Josef Schmeißer Bw-Tree January 9, / 25

29 Node Merge PID Ptr P L R S Merge Delete entry k 1 k 2 Remove Node Page L Page R Page S Josef Schmeißer Bw-Tree January 9, / 25

30 Optimal Delta Chain Length 4 M Operations per second Delta Chain Limit Josef Schmeißer Bw-Tree January 9, / 25

31 Experiment Description Synthetic workload: Integer keys and payload Randomly distributed Index size: 5M Test System - atkemper4: Intel Core i9-7900x 10 Cores; 20 Threads Restricted Transactional Memory Josef Schmeißer Bw-Tree January 9, / 25

32 Insert 4 3 M Operations per second 2 1 approach Bw Tree LockCoupling nosync Threads Josef Schmeißer Bw-Tree January 9, / 25

33 Lookup 60 M Operations per second approach Bw Tree LockCoupling nosync Threads Josef Schmeißer Bw-Tree January 9, / 25

34 Insert + Lookup 10.0 M Operations per second approach Bw Tree LockCoupling nosync Threads Josef Schmeißer Bw-Tree January 9, / 25

35 Alternative Approach Optimistic Lock Coupling: Versioned write locks Writers acquire locks as usual Readers traverse the tree optimistically without acquiring any locks Validate the version after each page access If validation fails, restart Josef Schmeißer Bw-Tree January 9, / 25

36 Insert 9 M Operations per second 6 3 approach Bw Tree LockCoupling olcepoch Threads Josef Schmeißer Bw-Tree January 9, / 25

37 Lookup 60 M Operations per second approach Bw Tree LockCoupling olcepoch Threads Josef Schmeißer Bw-Tree January 9, / 25

38 Insert + Lookup M Operations per second approach Bw Tree LockCoupling olcepoch Threads Josef Schmeißer Bw-Tree January 9, / 25

39 Another Alternative Modern Intel CPUs provide transactional memory support: Hardware Lock Elision (HLE) Restricted Transactional Memory (RTM) Josef Schmeißer Bw-Tree January 9, / 25

40 Insert M Operations per second 10 5 approach Bw Tree HLE olcepoch RTM Threads Josef Schmeißer Bw-Tree January 9, / 25

41 Lookup 60 M Operations per second approach Bw Tree HLE olcepoch RTM Threads Josef Schmeißer Bw-Tree January 9, / 25

42 Insert + Lookup M Operations per second approach Bw Tree HLE olcepoch RTM Threads Josef Schmeißer Bw-Tree January 9, / 25

43 Further Reading Justin J. Levandoski, David B. Lomet and Sudipta Sengupta. The Bw-Tree: A B-tree for New Hardware Platforms. IEEE 29th International Conference on Data Engineering (ICDE), Philip L. Lehman and S. Bing Yao. Efficient Locking for Concurrent Operations on B-Trees. ACM Transactions on Database Systems, Vol. 6, No. 4, December 1981, Pages Viktor Leis, Florian Scheibner, Alfons Kemper and Thomas Neumann. The ART of Practical Synchronization. Twelfth International Workshop on Data Management on New Hardware, Josef Schmeißer Bw-Tree January 9, / 25

Implementierungstechniken für Hauptspeicherdatenbanksysteme: The Bw-Tree

Implementierungstechniken für Hauptspeicherdatenbanksysteme: The Bw-Tree Implementierungstechniken für Hauptspeicherdatenbanksysteme: The Bw-Tree Josef Schmeißer January 9, 218 Abstract The Bw-Tree as presented by Levandoski et al. was designed to accommodate the emergence

More information

Hardware Transactional Memory on Haswell

Hardware Transactional Memory on Haswell Hardware Transactional Memory on Haswell Viktor Leis Technische Universität München 1 / 15 Introduction transactional memory is a very elegant programming model transaction { transaction { a = a 10; c

More information

High Performance Transactions in Deuteronomy

High Performance Transactions in Deuteronomy High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,

More information

Building a Bw-Tree Takes More Than Just Buzz Words

Building a Bw-Tree Takes More Than Just Buzz Words Building a Bw-Tree Takes More Than Just Buzz Words Ziqi Wang Carnegie Mellon University ziqiw@cs.cmu.edu Andrew Pavlo Carnegie Mellon University pavlo@cs.cmu.edu Hyeontaek Lim Carnegie Mellon University

More information

A Concurrent Skip List Implementation with RTM and HLE

A Concurrent Skip List Implementation with RTM and HLE A Concurrent Skip List Implementation with RTM and HLE Fan Gao May 14, 2014 1 Background Semester Performed: Spring, 2014 Instructor: Maurice Herlihy The main idea of my project is to implement a skip

More information

Introduction. New latch modes

Introduction. New latch modes A B link Tree method and latch protocol for synchronous node deletion in a high concurrency environment Karl Malbrain malbrain@cal.berkeley.edu Introduction A new B link Tree latching method and protocol

More information

To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing

To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing To Lock, Swap, or Elide: On the Interplay of Hardware Transactional Memory and Lock-Free Indexing Darko Makreshanski 1 Justin Levandoski 2 Ryan Stutsman 3 1 ETH Zurich, 2,3 Microsoft Research 1 darkoma@inf.ethz.ch,

More information

HTM in the wild. Konrad Lai June 2015

HTM in the wild. Konrad Lai June 2015 HTM in the wild Konrad Lai June 2015 Industrial Considerations for HTM Provide a clear benefit to customers Improve performance & scalability Ease programmability going forward Improve something common

More information

Synchronizing Data Structures

Synchronizing Data Structures Synchronizing Data Structures 1 / 56 Overview caches and atomics list-based set memory reclamation Adaptive Radix Tree B-tree Bw-tree split-ordered list hardware transactional memory 2 / 56 Caches Caches

More information

230 Million Tweets per day

230 Million Tweets per day Tweets per day Queries per day Indexing latency Avg. query response time Earlybird - Realtime Search @twitter Michael Busch @michibusch michael@twitter.com buschmi@apache.org Earlybird - Realtime Search

More information

Synchronizing Data Structures

Synchronizing Data Structures 1 / 78 Overview caches and atomics list-based set memory reclamation Adaptive Radix Tree B-tree Bw-tree split-ordered list hardware transactional memory 2 / 78 Caches Caches modern CPUs consist of multiple

More information

The Adaptive Radix Tree

The Adaptive Radix Tree Department of Informatics, University of Zürich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: -708-887 Email: rk@rafaelkallis.com September 8, 08 supervised by Prof. Dr. Michael

More information

Scalable Concurrent Hash Tables via Relativistic Programming

Scalable Concurrent Hash Tables via Relativistic Programming Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009 Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second

More information

Easy Lock-Free Indexing in Non-Volatile Memory

Easy Lock-Free Indexing in Non-Volatile Memory Easy Lock-Free Indexing in Non-Volatile Memory Tianzheng Wang University of Toronto tzwang@cs.toronto.edu Justin Levandoski Microsoft Research justin.levandoski@microsoft.com Per-Ake Larson University

More information

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency Document Number: P0233R3 Date: 2017-02-06 Reply-to: maged.michael@acm.org, michael@codeplay.com Authors: Maged M. Michael, Michael Wong, Paul McKenney, Arthur O'Dwyer, David Hollman Project: Programming

More information

Invyswell: A HyTM for Haswell RTM. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy

Invyswell: A HyTM for Haswell RTM. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy Invyswell: A HyTM for Haswell RTM Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy Multicore Performance Scaling u Problem: Locking u Solution: HTM? u IBM BG/Q, zec12,

More information

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory JOY ARULRAJ JUSTIN LEVANDOSKI UMAR FAROOQ MINHAS PER-AKE LARSON Microsoft Research NON-VOLATILE MEMORY [NVM] PERFORMANCE DRAM VOLATILE

More information

Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing

Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing Richard Yoo, Christopher Hughes: Intel Labs Konrad Lai, Ravi Rajwar: Intel Architecture Group Agenda

More information

15 418/618 Project Final Report Concurrent Lock free BST

15 418/618 Project Final Report Concurrent Lock free BST 15 418/618 Project Final Report Concurrent Lock free BST Names: Swapnil Pimpale, Romit Kudtarkar AndrewID: spimpale, rkudtark 1.0 SUMMARY We implemented two concurrent binary search trees (BSTs): a fine

More information

Building Efficient Concurrent Graph Object through Composition of List-based Set

Building Efficient Concurrent Graph Object through Composition of List-based Set Building Efficient Concurrent Graph Object through Composition of List-based Set Sathya Peri Muktikanta Sa Nandini Singhal Department of Computer Science & Engineering Indian Institute of Technology Hyderabad

More information

CS4021/4521 INTRODUCTION

CS4021/4521 INTRODUCTION CS4021/4521 Advanced Computer Architecture II Prof Jeremy Jones Rm 4.16 top floor South Leinster St (SLS) jones@scss.tcd.ie South Leinster St CS4021/4521 2018 jones@scss.tcd.ie School of Computer Science

More information

Advance Operating Systems (CS202) Locks Discussion

Advance Operating Systems (CS202) Locks Discussion Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static

More information

Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems

Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor Systems Sang K. Cha, Sangyong Hwang, Kihong Kim, Keunjoo Kwon VLDB 2001 Presented by: Raluca Marcuta 1 / 31 1

More information

Realtime Search with Lucene. Michael

Realtime Search with Lucene. Michael Realtime Search with Lucene Michael Busch @michibusch michael@twitter.com buschmi@apache.org 1 Realtime Search with Lucene Agenda Introduction - Near-realtime Search (NRT) - Searching DocumentsWriter s

More information

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas Outline Synchronization Methods Priority Queues Concurrent Priority Queues Lock-Free Algorithm: Problems

More information

The Hekaton Memory-Optimized OLTP Engine

The Hekaton Memory-Optimized OLTP Engine The Hekaton Memory-Optimized OLTP Engine Per-Ake Larson palarson@microsoft.com Mike Zwilling mikezw@microsoft.com Kevin Farlee kfarlee@microsoft.com Abstract Hekaton is a new OLTP engine optimized for

More information

Easy Lock-Free Indexing in Non-Volatile Memory

Easy Lock-Free Indexing in Non-Volatile Memory Easy Lock-Free Indexing in Non-Volatile Memory Tianzheng Wang 1 * Justin Levandoski Per-Åke Larson 3 1 University of Toronto, Microsoft Research, 3 University of Waterloo 1 tzwang@cs.toronto.edu, justin.levandoski@microsoft.com,

More information

Process Synchronization. Mehdi Kargahi School of ECE University of Tehran Spring 2008

Process Synchronization. Mehdi Kargahi School of ECE University of Tehran Spring 2008 Process Synchronization Mehdi Kargahi School of ECE University of Tehran Spring 2008 Producer-Consumer (Bounded Buffer) Producer Consumer Race Condition Producer Consumer Critical Sections Structure of

More information

Atomicity via Source-to-Source Translation

Atomicity via Source-to-Source Translation Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){

More information

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap

Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Lock-Free and Practical Doubly Linked List-Based Deques using Single-Word Compare-And-Swap Håkan Sundell Philippas Tsigas OPODIS 2004: The 8th International Conference on Principles of Distributed Systems

More information

Synchronising Threads

Synchronising Threads Synchronising Threads David Chisnall March 1, 2011 First Rule for Maintainable Concurrent Code No data may be both mutable and aliased Harder Problems Data is shared and mutable Access to it must be protected

More information

Concurrent Data Structures Concurrent Algorithms 2016

Concurrent Data Structures Concurrent Algorithms 2016 Concurrent Data Structures Concurrent Algorithms 2016 Tudor David (based on slides by Vasileios Trigonakis) Tudor David 11.2016 1 Data Structures (DSs) Constructs for efficiently storing and retrieving

More information

The Bw-Tree: A B-tree for New Hardware Platforms

The Bw-Tree: A B-tree for New Hardware Platforms The Bw-Tree: A B-tree for New Hardware latforms Justin J. Levandoski 1, David B. Lomet 2, Sudipta Sengupta 3 Microsoft Research Redmond, WA 98052, USA 1 justin.levandoski@microsoft.com, 2 lomet@microsoft.com,

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017

FAWN as a Service. 1 Introduction. Jintian Liang CS244B December 13, 2017 Liang 1 Jintian Liang CS244B December 13, 2017 1 Introduction FAWN as a Service FAWN, an acronym for Fast Array of Wimpy Nodes, is a distributed cluster of inexpensive nodes designed to give users a view

More information

Improving STM Performance with Transactional Structs 1

Improving STM Performance with Transactional Structs 1 Improving STM Performance with Transactional Structs 1 Ryan Yates and Michael L. Scott University of Rochester IFL, 8-31-2016 1 This work was funded in part by the National Science Foundation under grants

More information

Other consistency models

Other consistency models Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.   Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 Solutions Overview 1 Sharing Data First, always try to apply the following mantra: Don t share data! When non-scalar data are

More information

Identity, State and Values

Identity, State and Values Identity, State and Values Clojure s approach to concurrency Rich Hickey Agenda Functions and processes Identity, State, and Values Persistent Data Structures Clojure s Managed References Q&A Functions

More information

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis oncurrent programming: From theory to practice oncurrent Algorithms 2015 Vasileios Trigonakis From theory to practice Theoretical (design) Practical (design) Practical (implementation) 2 From theory to

More information

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing

More information

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1 (19) United States US 2016037 1322A1 (12) Patent Application Publication (10) Pub. No.: US 2016/0371322 A1 GUNTI et al. (43) Pub. Date: Dec. 22, 2016 (54) EFFICIENT MANAGEMENT OF LARGE (52) U.S. Cl. NUMBER

More information

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing

MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing MemC3: MemCache with CLOCK and Concurrent Cuckoo Hashing Bin Fan (CMU), Dave Andersen (CMU), Michael Kaminsky (Intel Labs) NSDI 2013 http://www.pdl.cmu.edu/ 1 Goal: Improve Memcached 1. Reduce space overhead

More information

Learning Data Systems Components

Learning Data Systems Components Work partially done at Learning Data Systems Components Tim Kraska [Disclaimer: I am NOT talking on behalf of Google] Comments on Social Media Joins Sorting HashMaps Tree Bloom Filter

More information

arxiv: v1 [cs.dc] 8 May 2017

arxiv: v1 [cs.dc] 8 May 2017 Towards Reduced Instruction Sets for Synchronization arxiv:1705.02808v1 [cs.dc] 8 May 2017 Rati Gelashvili MIT gelash@mit.edu Alexander Spiegelman Technion sashas@tx.technion.ac.il Idit Keidar Technion

More information

Chapter 6: Process Synchronization

Chapter 6: Process Synchronization Chapter 6: Process Synchronization Objectives Introduce Concept of Critical-Section Problem Hardware and Software Solutions of Critical-Section Problem Concept of Atomic Transaction Operating Systems CS

More information

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool

Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Brushing the Locks out of the Fur: A Lock-Free Work Stealing Library Based on Wool Håkan Sundell School of Business and Informatics University of Borås, 50 90 Borås E-mail: Hakan.Sundell@hb.se Philippas

More information

Concept of a process

Concept of a process Concept of a process In the context of this course a process is a program whose execution is in progress States of a process: running, ready, blocked Submit Ready Running Completion Blocked Concurrent

More information

Synchronization for Concurrent Tasks

Synchronization for Concurrent Tasks Synchronization for Concurrent Tasks Minsoo Ryu Department of Computer Science and Engineering 2 1 Race Condition and Critical Section Page X 2 Algorithmic Approaches Page X 3 Hardware Support Page X 4

More information

CS5460/6460: Operating Systems. Lecture 11: Locking. Anton Burtsev February, 2014

CS5460/6460: Operating Systems. Lecture 11: Locking. Anton Burtsev February, 2014 CS5460/6460: Operating Systems Lecture 11: Locking Anton Burtsev February, 2014 Race conditions Disk driver maintains a list of outstanding requests Each process can add requests to the list 1 struct list

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

GPUfs: Integrating a File System with GPUs. Yishuai Li & Shreyas Skandan

GPUfs: Integrating a File System with GPUs. Yishuai Li & Shreyas Skandan GPUfs: Integrating a File System with GPUs Yishuai Li & Shreyas Skandan Von Neumann Architecture Mem CPU I/O Von Neumann Architecture Mem CPU I/O slow fast slower Direct Memory Access Mem CPU I/O slow

More information

class 12 b-trees 2.0 prof. Stratos Idreos

class 12 b-trees 2.0 prof. Stratos Idreos class 12 b-trees 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ A B C A B C clustered/primary index on A Stratos Idreos /26 2 A B C A B C clustered/primary index on A pos C pos

More information

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem?

What is the Race Condition? And what is its solution? What is a critical section? And what is the critical section problem? What is the Race Condition? And what is its solution? Race Condition: Where several processes access and manipulate the same data concurrently and the outcome of the execution depends on the particular

More information

Fine-grained synchronization & lock-free programming

Fine-grained synchronization & lock-free programming Lecture 17: Fine-grained synchronization & lock-free programming Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Tunes Minnie the Moocher Robbie Williams (Swings Both Ways)

More information

Engineering Robust Server Software

Engineering Robust Server Software Engineering Robust Server Software Scalability Lock Free Data Structures Atomics operations work great when they do what you need E.g., increment an int What about more complicated things? E.g., No hardware

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information

Concurrent Counting using Combining Tree

Concurrent Counting using Combining Tree Final Project Report by Shang Wang, Taolun Chai and Xiaoming Jia Concurrent Counting using Combining Tree 1. Introduction Counting is one of the very basic and natural activities that computers do. However,

More information

Stuart

Stuart Clojure Time Stuart Halloway stu@clojure.com @stuarthalloway Copyright 2007-2010 Relevance, Inc. This presentation is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United

More information

Lecture 10: Avoiding Locks

Lecture 10: Avoiding Locks Lecture 10: Avoiding Locks CSC 469H1F Fall 2006 Angela Demke Brown (with thanks to Paul McKenney) Locking: A necessary evil? Locks are an easy to understand solution to critical section problem Protect

More information

Main-Memory Databases 1 / 25

Main-Memory Databases 1 / 25 1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low

More information

Atomicity CS 2110 Fall 2017

Atomicity CS 2110 Fall 2017 Atomicity CS 2110 Fall 2017 Parallel Programming Thus Far Parallel programs can be faster and more efficient Problem: race conditions Solution: synchronization Are there more efficient ways to ensure the

More information

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs CSE 451: Operating Systems Winter 2005 Lecture 7 Synchronization Steve Gribble Synchronization Threads cooperate in multithreaded programs to share resources, access shared data structures e.g., threads

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

CS 241 Honors Concurrent Data Structures

CS 241 Honors Concurrent Data Structures CS 241 Honors Concurrent Data Structures Bhuvan Venkatesh University of Illinois Urbana Champaign March 27, 2018 CS 241 Course Staff (UIUC) Lock Free Data Structures March 27, 2018 1 / 43 What to go over

More information

Persistent Data Structures and Managed References

Persistent Data Structures and Managed References Persistent Data Structures and Managed References Clojure s approach to Identity and State Rich Hickey Agenda Functions and processes Identity, State, and Values Persistent Data Structures Clojure s Managed

More information

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access Philip W. Howard, Josh Triplett, and Jonathan Walpole Portland State University Abstract. This paper explores the

More information

Technical Report: Contention Adapting Search Trees

Technical Report: Contention Adapting Search Trees Technical Report: Contention Adapting Search Trees Konstantinos Sagonas and Kjell Winblad Department of Information Technology, Uppsala University, Uppsala, Sweden School of ECE, National Technical University

More information

Intel Transactional Synchronization Extensions (Intel TSX) Linux update. Andi Kleen Intel OTC. Linux Plumbers Sep 2013

Intel Transactional Synchronization Extensions (Intel TSX) Linux update. Andi Kleen Intel OTC. Linux Plumbers Sep 2013 Intel Transactional Synchronization Extensions (Intel TSX) Linux update Andi Kleen Intel OTC Linux Plumbers Sep 2013 Elision Elision : the act or an instance of omitting something : omission On blocking

More information

Marwan Burelle. Parallel and Concurrent Programming

Marwan Burelle.  Parallel and Concurrent Programming marwan.burelle@lse.epita.fr http://wiki-prog.infoprepa.epita.fr Outline 1 2 3 OpenMP Tell Me More (Go, OpenCL,... ) Overview 1 Sharing Data First, always try to apply the following mantra: Don t share

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

A Practical Transactional Memory Interface

A Practical Transactional Memory Interface A Practical Transactional Memory Interface Shahar Timnat 1, Maurice Herlihy 2, and Erez Petrank 1 1 Computer Science Department, Technion, 2 Computer Science Department, Brown University Abstract. Hardware

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 14 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 14 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6 Introduction Amdahl s law Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading

More information

1 RCU. 2 Improving spinlock performance. 3 Kernel interface for sleeping locks. 4 Deadlock. 5 Transactions. 6 Scalable interface design

1 RCU. 2 Improving spinlock performance. 3 Kernel interface for sleeping locks. 4 Deadlock. 5 Transactions. 6 Scalable interface design Overview of Monday s and today s lectures Outline Locks create serial code - Serial code gets no speedup from multiprocessors Test-and-set spinlock has additional disadvantages - Lots of traffic over memory

More information

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar

C++ 11 Memory Consistency Model. Sebastian Gerstenberg NUMA Seminar C++ 11 Memory Gerstenberg NUMA Seminar Agenda 1. Sequential Consistency 2. Violation of Sequential Consistency Non-Atomic Operations Instruction Reordering 3. C++ 11 Memory 4. Trade-Off - Examples 5. Conclusion

More information

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination.

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination. Times: Thursday 2017-06-22 at 19:00 to 20:50 (7 to 8:50PM) Duration: 1 hour 50 minutes (110 minutes) Exam ID: 3520593 Please print in pen: Waterloo Student ID Number: WatIAM/Quest Login Userid: Sections:

More information

RCU in the Linux Kernel: One Decade Later

RCU in the Linux Kernel: One Decade Later RCU in the Linux Kernel: One Decade Later by: Paul E. Mckenney, Silas Boyd-Wickizer, Jonathan Walpole Slides by David Kennedy (and sources) RCU Usage in Linux During this same time period, the usage of

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 2 (R&G ch. 17) Serializability Two-Phase Locking Deadlocks

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Concurrency Control Part 3 (R&G ch. 17) Lock Granularities Locking in B+Trees The

More information

Teleportation as a Strategy for Improving Concurrent Skiplist Performance. Frances Steen

Teleportation as a Strategy for Improving Concurrent Skiplist Performance. Frances Steen Teleportation as a Strategy for Improving Concurrent Skiplist Performance by Frances Steen Submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Bachelor

More information

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture 1 Last class: Course administration OS definition, some history Today: Background on Computer Architecture 2 Canonical System Hardware CPU: Processor to perform computations Memory: Programs and data I/O

More information

RocksDB Key-Value Store Optimized For Flash

RocksDB Key-Value Store Optimized For Flash RocksDB Key-Value Store Optimized For Flash Siying Dong Software Engineer, Database Engineering Team @ Facebook April 20, 2016 Agenda 1 What is RocksDB? 2 RocksDB Design 3 Other Features What is RocksDB?

More information

Concurrent Programming Issues & Readers/Writers

Concurrent Programming Issues & Readers/Writers Concurrent Programming Issues & Readers/Writers 1 Summary of Our Discussions! Developing and debugging concurrent programs is hard Ø Non-deterministic interleaving of instructions! Safety: isolation and

More information

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling.

Background. The Critical-Section Problem Synchronisation Hardware Inefficient Spinning Semaphores Semaphore Examples Scheduling. Background The Critical-Section Problem Background Race Conditions Solution Criteria to Critical-Section Problem Peterson s (Software) Solution Concurrent access to shared data may result in data inconsistency

More information

File System Interface. ICS332 Operating Systems

File System Interface. ICS332 Operating Systems File System Interface ICS332 Operating Systems Files and Directories Features A file system implements the file abstraction for secondary storage It also implements the directory abstraction to organize

More information

Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA

Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA Message Passing Improvements to Shared Address Space Thread Synchronization Techniques DAN STAFFORD, ROBERT RELYEA Agenda Background Motivation Remote Memory Request Shared Address Synchronization Remote

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization Hank Levy Levy@cs.washington.edu 412 Sieg Hall Synchronization Threads cooperate in multithreaded programs to share resources, access shared

More information

Advanced Systems Programming

Advanced Systems Programming Advanced Systems Programming Introduction to C++ Martin Küttler September 19, 2017 1 / 18 About this presentation This presentation is not about learning programming or every C++ feature. It is a short

More information

Operating Systems Comprehensive Exam. There are five questions on this exam. Please answer any four questions total

Operating Systems Comprehensive Exam. There are five questions on this exam. Please answer any four questions total Operating Systems Comprehensive Exam There are five questions on this exam. Please answer any four questions. ID 1 2 3 4 5 total 1) The following questions pertain to McKusick et al's A Fast File System

More information

Linked Lists: The Role of Locking. Erez Petrank Technion

Linked Lists: The Role of Locking. Erez Petrank Technion Linked Lists: The Role of Locking Erez Petrank Technion Why Data Structures? Concurrent Data Structures are building blocks Used as libraries Construction principles apply broadly This Lecture Designing

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

The Java Memory Model

The Java Memory Model Jeremy Manson 1, William Pugh 1, and Sarita Adve 2 1 University of Maryland 2 University of Illinois at Urbana-Champaign Presented by John Fisher-Ogden November 22, 2005 Outline Introduction Sequential

More information

Composable Shared Memory Transactions Lecture 20-2

Composable Shared Memory Transactions Lecture 20-2 Composable Shared Memory Transactions Lecture 20-2 April 3, 2008 This was actually the 21st lecture of the class, but I messed up the naming of subsequent notes files so I ll just call this one 20-2. This

More information

State Machine Diagrams

State Machine Diagrams State Machine Diagrams Introduction A state machine diagram, models the dynamic aspects of the system by showing the flow of control from state to state for a particular class. 2 Introduction Whereas an

More information

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION PROCESS SYNCHRONIZATION Process Synchronization Background The Critical-Section Problem Peterson s Solution Synchronization Hardware Semaphores Classic Problems of Synchronization Monitors Synchronization

More information