Enhancing Concurrency in Distributed Transactional Memory through Commutativity Junwhan Kim, Roberto Palmieri, Binoy Ravindran Virginia Tech USA
Lock-based concurrency control has serious drawbacks Coarse grained locking Simple But no concurrency
Fine-grained locking is better, but Excellent performance Poor programmability Lock problems don t go away! Deadlocks, livelocks, lock-convoying, priority inversion,. Most significant difficulty composition
Transactional memory Like database transactions ACI properties (no D) Easier to program Composable First HTM, then STM, later HyTM M. Herlihy and J. B. Moss (1993). Transactional memory: Architectural support for lock-free data structures. ISCA. pp. 289 300. N. Shavit and D. Touitou (1995). Software Transactional Memory. PODC. pp. 204 213.
Optimistic execution yields performance gains at the simplicity of coarse-grain, but no silver bullet Time Coarse-grained locking STM Fine-grained locking High data dependencies Irrevocable operations Interaction between transactions and non-transactions Conditional waiting Threads E.g., C/C++ Intel Run-Time System STM (B. Saha et. al. (2006). McRT- STM: A High Performance Software Transactional Memory. ACM PPoPP)
Contention management. Which transaction to abort? T0! T1!!! x = x + y;!!!!! x = x / 25;! x = x / 25;! Contention manager Can cause too many aborts, e.g., when a long running transaction conflicts with shorter transactions An aborted transaction may wait too long Transactional scheduler s goal: minimize conflicts (e.g., avoid repeated aborts)
Distributed TM (or DTM) Extends TM to distributed systems Nodes interconnected using message passing links Execution and network models Execution models Ø Data flow DTM (DISC 05) p p Transactions are immobile Objects migrate to invoking transactions Ø Control flow DTM (USENIX 12) p p Objects are immobile Transactions move from node to node Herlihy s metric-space network model (DISC 05) Ø Communication delay between every pair of nodes Ø Delay depends upon node-to-node distance 1.499 ms 9.095 ms 16.613 ms 13.709 ms 15.016 ms 1st hop 2nd hop 3rd hop 4th hop 5th hop Distance
Paper s motivation How to boost transactions processing in DTM Read/Read No conflict Conflict! Write/Write Read/Write conflicts minimized thanks to the multi-versioning support
How? How can a concurrency control manager allow write conflicting transactions to commit concurrently without affecting isolation/consistency? Exploiting commutable operations
Commutable operations by examples (Data Structures) Thread1: Insert(2) Thread2: Insert(0) 6 3 8 1 4 7 9 NON COMMUTABLE
Commutable operations by examples (Data Structures) Thread1: Insert(2) Thread2: Insert(5) 6 3 8 1 4 7 9 COMMUTABLE
Commutable operations by examples (TPC-C) New Order transactions: Read: Ø Customer, District, Warehouse, Item, Stock Write: Ø District, Stock Payment: Read: Ø Warehouse, Customer, District Write: New Order Transactions: Write Ø district.d_next_o_id() Ø Payment Transactions: Write Ø district.d_ytd() Ø Ø Warehouse, Customer, District
Paper s contribution MV-TFA, a multi-versioned version of TFA (Transactional Forwarding Algorithm) ensuring Snapshot Isolation Read transactions don t abort CRF - Commutative Request First: a distributed transactional scheduler integrated with MV-TFA CRF assumes definition of commutable rules by programmer; Minimize abort rate Increase concurrency Increase performance Implementation and extensive experimental evaluation Comparisons with state-of-the art DTMs Experiments for the best tuning of the system
MV-TFA READ N0 - T1 N1- Owner O 1 N2 - Owner O 2 N3 - T3 Read (O 1 ) Object (O 10 ) Write(O 2 ) Object (O 21 ) Validate (O 2 ) OK(O 21 ) Commit (O 21 ) CH Ow (O 20, O 21 ) Read (O 2 ) Object(O 20 )
CRF WRITE without concurrent validations N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) Object (O 1 ) NON COMMUTABLE Write(O 1, [X]) Object (O 11 ) Validate (O 1 ) OK(O 11 ) Abort Commit (O 1 ) Restart
CRF WRITE with concurrent validations N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) COMMUTABLE Object (O 1 ) Validate (O 1 ) OK(O 1 ) Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Commit (O 1 ) Commit (O 1 )
CRF WRITE with concurrent validations N0/T1 N1/Owner O 1 {[T1;X]}{[T2;Y]}{[T3;Y]} Write (O 1, [X]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) OK(O 1 ) Commit (O 1 ) Commit (O 1 ) N4/T2 Write(O 1, [Y]) Object (O 1 ) Validate (O 1 ) Abort N5/T3 COMMUTABLE NON COMMUTABLE Restart
Depth of validation (MaxD) Transaction commit phase is stretched depending on the concurrent validating transactions Depth of validation (MaxD): the number of transactions involved in the validation N1/Owner O 1 N0/T1 {[T1;X]} {[T2;X]} N4/T2 Write (O 1, [X]) Object (O 1 ) Write(O 1, [Y]) Object (O 1 ) T 2$ validates$o 1.$ Epoch$of$Valida!on$!me$ Depth$$ of$valida!on$ Validate (O 1 ) Validate (O 1 ) OK(O 1 ) T 3$ validates$o 1.$ T 4$ validates$o 1.$ OK(O 1 ) Commit (O 1 ) Commit (O 1 )
Epochs CRF prioritizes commutable transactions for increasing concurrency. However adverse schedules can penalize non-commutable transactions CRF defines execution epochs: In each epoch, commutative transactions concurrently participate in validation. In the next epoch, the non-commutative transactions stored in the scheduling queue restart and validate Epoch shift is triggered when MaxD is reached or commitment process ends
Experiments Test-bed: Cluster of 10 nodes interconnected by a Gigabit connection Each node equipped with 12 cores 2 up to 120 concurrent threads in the system Competitors: DecentSTM, MV-TFA (without scheduling) Benchmarks: Micro Benchmarks: Ø Linked-List, Skip-list. Both implementation of Commutable Set Macro Benchmarks: Ø TPC-C. Field-based commutativity
Finding depth of validation Run experiments measuring the performance of the system varying the depth of validation parameter 3500 2500 1500 500 60 50 40 30 Threshold 20 10 0 1 2 3 7 6 5 4 900 800 700 600 500 400 300 200 100 60 50 40 30 Threshold 20 10 0 1 2 3 7 6 5 4 Linked List Max Depth = 10 TPC-C Max Depth = 5
Linked List 3500 2500 1500 Linkedlist(CRF-MV-TFA), 10% Read 3500 2500 1500 Linkedlist(MV-TFA), 10% Read 3500 2500 1500 Linkedlist(DcentSTM), 10% Read 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 500 (a) CRF-MV-TFA, 10% Read 500 (b) MV-TFA, 10% Read 500 (c) DecentSTM, 10% Read 4500 4000 3500 2500 1500 Linkedlist(CRF-MV-TFA), 90% Read 500 4500 4000 3500 2500 1500 Linkedlist(MV-TFA), 90% Read 500 4500 4000 3500 2500 1500 Linkedlist(DcentSTM), 90% Read 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 500 (d) CRF-MV-TFA, 90% Read (e) MV-TFA, 90% Read (f) DecentSTM, 90% Read
Skip List Skiplist(CRF-MV-TFA), 10% Read Skiplistlist(MV-TFA), 10% Read Skiplist(Decent), 10% Read 6000 5000 4000 6000 5000 4000 6000 5000 4000 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads (a) CRF-MV-TFA, 10% Read (b) MV-TFA, 10% Read (c) DecentSTM, 10% Read Skiplist(CRF-MV-TFA), 90% Read Skiplist(MV-TFA), 90% Read Skiplist(Decent), 90% Read 8000 7000 6000 5000 4000 8000 7000 6000 5000 4000 8000 7000 6000 5000 4000 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads (d) CRF-MV-TFA, 90% Read (e) MV-TFA, 90% Read (f) DecentSTM, 90% Read
TPC-C % of transaction profiles as in the original specification # warehouse = 4 to increase the conflict probability TPC-C(CRF-MV-TFA) TPC-C(MV-TFA) TPC-C(DecentSTM) 800 600 400 800 600 400 800 600 400 1 thread 2 threads 4 threads 6 threads 8 threads 12 threads 200 200 200 (a) CRF-MV-TFA (b) MV-TFA (c) DecentSTM
Thank you! Questions? http://www.hyflow.org/ http://www.ssrg.ece.vt.edu/