Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Similar documents
Scheduling Transactions in Replicated Distributed Transactional Memory

Lock vs. Lock-free Memory Project proposal

Distributed Transactional Contention Management as the Traveling Salesman Problem

Scheduling Memory Transactions in Distributed Systems

Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Scheduling Memory Transactions in Distributed Systems

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions

Chí Cao Minh 28 May 2008

Software transactional memory

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

Supporting Software Transactional Memory in Distributed Systems: Protocols for Cache-Coherence, Conflict Resolution and Replication

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Integrating Transactionally Boosted Data Structures with STM Frameworks: A Case Study on Set

1 Publishable Summary

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

HyflowCPP: A Distributed Transactional Memory framework for C++

An Automated Framework for Decomposing Memory Transactions to Exploit Partial Rollback

6.852: Distributed Algorithms Fall, Class 20

On Open Nesting in Distributed Transactional Memory

The Common Case Transactional Behavior of Multithreaded Programs

Transactional Memory

Using Software Transactional Memory In Interrupt-Driven Systems

On Developing Optimistic Transactional Lazy Set

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

On Open Nesting in Distributed Transactional Memory

Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench

6 Transactional Memory. Robert Mullins

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Mohamed M. Saad & Binoy Ravindran

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Scheduling Transactions in Replicated Distributed Software Transactional Memory

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Linked Lists: The Role of Locking. Erez Petrank Technion

Reminder from last time

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level

An Evaluation of Distributed Concurrency Control. Harding, Aken, Pavlo and Stonebraker Presented by: Thamir Qadah For CS590-BDS

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Hardware Transactional Memory on Haswell

Transactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41

Towards a Software Transactional Memory for Graphics Processors

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

On Improving Distributed Transactional Memory

EazyHTM: Eager-Lazy Hardware Transactional Memory

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Synchronization. CSCI 5103 Operating Systems. Semaphore Usage. Bounded-Buffer Problem With Semaphores. Monitors Other Approaches

Ensuring dependability and improving performance of transactional systems deployed on multi-core architectures

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014

Transactifying Apache s Cache Module

Portland State University ECE 588/688. Transactional Memory

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

The Multicore Transformation

Scalable Software Transactional Memory for Chapel High-Productivity Language

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Building Efficient Concurrent Graph Object through Composition of List-based Set

Disjoint- Access Parallelism: Impossibility, Possibility, and Cost of Transactional Memory Implementations

Programming with Transactional Memory

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages

Extracting Parallelism from Legacy Sequential Code Using Software Transactional Memory

Invyswell: A HyTM for Haswell RTM. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy

DBT Tool. DBT Framework

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Conflict Detection and Validation Strategies for Software Transactional Memory

Tradeoffs in Transactional Memory Virtualization

System Challenges and Opportunities for Transactional Memory

Massimiliano Ghilardi

bool Account::withdraw(int val) { atomic { if(balance > val) { balance = balance val; return true; } else return false; } }

Speculative Locks. Dept. of Computer Science

ABORTING CONFLICTING TRANSACTIONS IN AN STM

Enhancing efficiency of Hybrid Transactional Memory via Dynamic Data Partitioning Schemes

Optimistic Preventive Replication in a Database Cluster

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications

HiperTM: High Performance, Fault-Tolerant Transactional Memory

Commit Phase in Timestamp-based STM

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Transactional Memory, linking Theory and Practice

Managing Resource Limitation of Best-Effort HTM

Remote Transaction Commit: Centralizing Software Transactional Memory Commits

Exploiting Hardware Transactional Memory for Efficient In-Memory Transaction Processing. Hao Qian, Zhaoguo Wang, Haibing Guan, Binyu Zang, Haibo Chen

Lecture 12 Transactional Memory

On Improving Distributed Transactional Memory Through Nesting and Data Partitioning

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Implementing Rollback: Copy. 2PL: Rollback. Implementing Rollback: Undo. 8L for Part IB Handout 4. Concurrent Systems. Dr.

Lecture: Consistency Models, TM

Real-Time Software Transactional Memory: Contention Managers, Time Bounds, and Implementations

Parallelizing SPECjbb2000 with Transactional Memory

arxiv: v1 [cs.db] 12 Nov 2018

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Incrementally Parallelizing. Twofold Speedup on a Quad-Core. Thread-Level Speculation. A Case Study with BerkeleyDB. What Am I Working on Now?

Lecture: Transactional Memory. Topics: TM implementations

Low Overhead Concurrency Control for Partitioned Main Memory Databases

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Transcription:

Enhancing Concurrency in Distributed Transactional Memory through Commutativity Junwhan Kim, Roberto Palmieri, Binoy Ravindran Virginia Tech USA

Lock-based concurrency control has serious drawbacks Coarse grained locking Simple But no concurrency

Fine-grained locking is better, but Excellent performance Poor programmability Lock problems don t go away! Deadlocks, livelocks, lock-convoying, priority inversion,. Most significant difficulty composition

Transactional memory Like database transactions ACI properties (no D) Easier to program Composable First HTM, then STM, later HyTM M. Herlihy and J. B. Moss (1993). Transactional memory: Architectural support for lock-free data structures. ISCA. pp. 289 300. N. Shavit and D. Touitou (1995). Software Transactional Memory. PODC. pp. 204 213.

Optimistic execution yields performance gains at the simplicity of coarse-grain, but no silver bullet Time Coarse-grained locking STM Fine-grained locking High data dependencies Irrevocable operations Interaction between transactions and non-transactions Conditional waiting Threads E.g., C/C++ Intel Run-Time System STM (B. Saha et. al. (2006). McRT- STM: A High Performance Software Transactional Memory. ACM PPoPP)

Contention management. Which transaction to abort? T0! T1!!! x = x + y;!!!!! x = x / 25;! x = x / 25;! Contention manager Can cause too many aborts, e.g., when a long running transaction conflicts with shorter transactions An aborted transaction may wait too long Transactional scheduler s goal: minimize conflicts (e.g., avoid repeated aborts)

Distributed TM (or DTM) Extends TM to distributed systems Nodes interconnected using message passing links Execution and network models Execution models Ø Data flow DTM (DISC 05) p p Transactions are immobile Objects migrate to invoking transactions Ø Control flow DTM (USENIX 12) p p Objects are immobile Transactions move from node to node Herlihy s metric-space network model (DISC 05) Ø Communication delay between every pair of nodes Ø Delay depends upon node-to-node distance 1.499 ms 9.095 ms 16.613 ms 13.709 ms 15.016 ms 1st hop 2nd hop 3rd hop 4th hop 5th hop Distance

Paper s motivation How to boost transactions processing in DTM Read/Read No conflict Conflict! Write/Write Read/Write conflicts minimized thanks to the multi-versioning support

How? How can a concurrency control manager allow write conflicting transactions to commit concurrently without affecting isolation/consistency? Exploiting commutable operations

Commutable operations by examples (Data Structures) Thread1: Insert(2) Thread2: Insert(0) 6 3 8 1 4 7 9 NON COMMUTABLE

Commutable operations by examples (Data Structures) Thread1: Insert(2) Thread2: Insert(5) 6 3 8 1 4 7 9 COMMUTABLE

Commutable operations by examples (TPC-C) New Order transactions: Read: Ø Customer, District, Warehouse, Item, Stock Write: Ø District, Stock Payment: Read: Ø Warehouse, Customer, District Write: New Order Transactions: Write Ø district.d_next_o_id() Ø Payment Transactions: Write Ø district.d_ytd() Ø Ø Warehouse, Customer, District

Paper s contribution MV-TFA, a multi-versioned version of TFA (Transactional Forwarding Algorithm) ensuring Snapshot Isolation Read transactions don t abort CRF - Commutative Request First: a distributed transactional scheduler integrated with MV-TFA CRF assumes definition of commutable rules by programmer; Minimize abort rate Increase concurrency Increase performance Implementation and extensive experimental evaluation Comparisons with state-of-the art DTMs Experiments for the best tuning of the system

MV-TFA READ N0 - T1 N1- Owner O 1 N2 - Owner O 2 N3 - T3 Read (O 1 ) Object (O 10 ) Write(O 2 ) Object (O 21 ) Validate (O 2 ) OK(O 21 ) Commit (O 21 ) CH Ow (O 20, O 21 ) Read (O 2 ) Object(O 20 )

CRF WRITE without concurrent validations N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) Object (O 1 ) NON COMMUTABLE Write(O 1, [X]) Object (O 11 ) Validate (O 1 ) OK(O 11 ) Abort Commit (O 1 ) Restart

CRF WRITE with concurrent validations N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) COMMUTABLE Object (O 1 ) Validate (O 1 ) OK(O 1 ) Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Commit (O 1 ) Commit (O 1 )

CRF WRITE with concurrent validations N0/T1 N1/Owner O 1 {[T1;X]}{[T2;Y]}{[T3;Y]} Write (O 1, [X]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Commit (O 1 ) Commit (O 1 ) N4/T2 Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) Abort N5/T3 COMMUTABLE NON COMMUTABLE Restart

Depth of validation (MaxD) Transaction commit phase is stretched depending on the concurrent validating transactions Depth of validation (MaxD): the number of transactions involved in the validation N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) Object (O 1 ) Write(O 1, [Y]) Object (O 1 ) T 2$ validates$o 1.$ Epoch$of$Valida!on$!me$ Depth$$ of$valida!on$ Validate (O 1 ) Validate (O 1 ) OK(O 1 ) T 3$ validates$o 1.$ T 4$ validates$o 1.$ OK(O 1 ) Commit (O 1 ) Commit (O 1 )

Epochs CRF prioritizes commutable transactions for increasing concurrency. However adverse schedules can penalize non-commutable transactions CRF defines execution epochs: In each epoch, commutative transactions concurrently participate in validation. In the next epoch, the non-commutative transactions stored in the scheduling queue restart and validate Epoch shift is triggered when MaxD is reached or commitment process ends

Experiments Test-bed: Cluster of 10 nodes interconnected by a Gigabit connection Each node equipped with 12 cores 2 up to 120 concurrent threads in the system Competitors: DecentSTM, MV-TFA (without scheduling) Benchmarks: Micro Benchmarks: Ø Linked-List, Skip-list. Both implementation of Commutable Set Macro Benchmarks: Ø TPC-C. Field-based commutativity

Finding depth of validation Run experiments measuring the performance of the system varying the depth of validation parameter 3500 2500 1500 500 60 50 40 30 Threshold 20 10 0 1 2 3 7 6 5 4 900 800 700 600 500 400 300 200 100 60 50 40 30 Threshold 20 10 0 1 2 3 7 6 5 4 Linked List Max Depth = 10 TPC-C Max Depth = 5

Linked List 3500 2500 1500 Linkedlist(CRF-MV-TFA), 10% Read 3500 2500 1500 Linkedlist(MV-TFA), 10% Read 3500 2500 1500 Linkedlist(DcentSTM), 10% Read 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 500 (a) CRF-MV-TFA, 10% Read 500 (b) MV-TFA, 10% Read 500 (c) DecentSTM, 10% Read 4500 4000 3500 2500 1500 Linkedlist(CRF-MV-TFA), 90% Read 500 4500 4000 3500 2500 1500 Linkedlist(MV-TFA), 90% Read 500 4500 4000 3500 2500 1500 Linkedlist(DcentSTM), 90% Read 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 500 (d) CRF-MV-TFA, 90% Read (e) MV-TFA, 90% Read (f) DecentSTM, 90% Read

Skip List Skiplist(CRF-MV-TFA), 10% Read Skiplistlist(MV-TFA), 10% Read Skiplist(Decent), 10% Read 6000 5000 4000 6000 5000 4000 6000 5000 4000 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads (a) CRF-MV-TFA, 10% Read (b) MV-TFA, 10% Read (c) DecentSTM, 10% Read Skiplist(CRF-MV-TFA), 90% Read Skiplist(MV-TFA), 90% Read Skiplist(Decent), 90% Read 8000 7000 6000 5000 4000 8000 7000 6000 5000 4000 8000 7000 6000 5000 4000 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads (d) CRF-MV-TFA, 90% Read (e) MV-TFA, 90% Read (f) DecentSTM, 90% Read

TPC-C % of transaction profiles as in the original specification # warehouse = 4 to increase the conflict probability TPC-C(CRF-MV-TFA) TPC-C(MV-TFA) TPC-C(DecentSTM) 800 600 400 800 600 400 800 600 400 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 200 200 200 (a) CRF-MV-TFA (b) MV-TFA (c) DecentSTM

Thank you! Questions? http://www.hyflow.org/ http://www.ssrg.ece.vt.edu/