Distributed Transactional Contention Management as the Traveling Salesman Problem

Similar documents
Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Scheduling Transactions in Replicated Distributed Transactional Memory

Supporting Software Transactional Memory in Distributed Systems: Protocols for Cache-Coherence, Conflict Resolution and Replication

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Lock vs. Lock-free Memory Project proposal

Location-Aware Cache-Coherence Protocols for Distributed Transactional Contention Management in Metric-Space Networks

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

Scheduling Memory Transactions in Distributed Systems

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Lecture: Transactional Memory. Topics: TM implementations

Mohamed M. Saad & Binoy Ravindran

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

This chapter is based on the article Distributed Computing and the Multicore Revolution by Maurice Herlihy and Victor Luchangco. Thanks!

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

The Common Case Transactional Behavior of Multithreaded Programs

Multiprocessor Support

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

1 The Traveling Salesperson Problem (TSP)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Algorithms for Euclidean TSP

Distributed Computing Column 54 Transactional Memory: Models and Algorithms

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Chapter 20: Database System Architectures

1 Publishable Summary

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Introduction to Data Management CSE 344

Dynamic programming. Trivial problems are solved first More complex solutions are composed from the simpler solutions already computed

Algorithm Design and Analysis

Bounds On Contention Management Algorithms

Multi-Processor / Parallel Processing

Lecture 23 Database System Architectures

Theorem 2.9: nearest addition algorithm

Lecture: Consistency Models, TM

Concurrent systems Lecture 7: Crash recovery, lock- free programming, and transac<onal memory. Dr Robert N. M. Watson

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Notes for Lecture 24

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Computer Architecture

EE/CSCI 451 Midterm 1

An O(log n/ log log n)-approximation Algorithm for the Asymmetric Traveling Salesman Problem

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Hardness of Approximation for the TSP. Michael Lampis LAMSADE Université Paris Dauphine

Chapter 9 Multiprocessors

Autonomic Thread Parallelism and Mapping Control for Software Transactional Memory

Reminder from last time

6 Transactional Memory. Robert Mullins

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Synchronization. CSCI 5103 Operating Systems. Semaphore Usage. Bounded-Buffer Problem With Semaphores. Monitors Other Approaches

Software transactional memory

bool Account::withdraw(int val) { atomic { if(balance > val) { balance = balance val; return true; } else return false; } }

Chí Cao Minh 28 May 2008

6.852: Distributed Algorithms Fall, Class 20

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

A Competitive Analysis for Balanced Transactional Memory Workloads

Lecture 9: Load Balancing & Resource Allocation

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Ensuring dependability and improving performance of transactional systems deployed on multi-core architectures

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Hardware Transactional Memory on Haswell

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

An Automated Framework for Decomposing Memory Transactions to Exploit Partial Rollback

Concurrent Preliminaries

Approximation Algorithms

Scheduling Memory Transactions in Distributed Systems

A Quorum-Based Replication Framework for Distributed Software Transactional Memory

Database design and implementation CMPSCI 645. Lectures 18: Transactions and Concurrency

Improving the Practicality of Transactional Memory

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Introduction to Graph Theory

Following are a few basic questions that cover the essentials of OS:

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Lecture Notes: Euclidean Traveling Salesman Problem

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Dr e v prasad Dt

Portland State University ECE 588/688. Transactional Memory

TRANSACTION PROCESSING CONCEPTS

230 Chapter 17. (c) Otherwise, T writes O and WTS(O) is set to TS(T).

Concurrent Programming Synchronisation. CISTER Summer Internship 2017

Travelling Salesman Problem. Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij

T ransaction Management 4/23/2018 1

Graph and Digraph Glossary

Approximation Algorithms

HiperTM: High Performance, Fault-Tolerant Transactional Memory

Transactional Memory

Software-Controlled Multithreading Using Informing Memory Operations

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSCI 136: Fundamentals of Computer Science II. Keith Vertanen Museum 102 h8p://ka<e.mtech.edu/classes/csci136

Approximation Algorithms

Transcription:

Distributed Transactional Contention Management as the Traveling Salesman Problem Bo Zhang, Binoy Ravindran, Roberto Palmieri Virginia Tech SIROCCO 204

Lock-based concurrency control has serious drawbacks Coarse grained locking Simple But no concurrency SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

Fine-grained locking is better, but SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Excellent performance Poor programmability Lock problems don t go away! Deadlocks, livelocks, lock-convoying, priority inversion,. Most significant difficulty composition

Transactional memory Like database transactions Easier to program Composable First HTM, then STM now HyTM SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

Optimistic execution yields performance gains at the simplicity of coarse-grain, but no silver bullet SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Time Coarse-grained locking STM Fine-grained locking High data dependencies Irrevocable operations Interaction between transactions and non-transactions Conditional waiting Threads E.g., C/C++ Intel Run-Time System STM (B. Saha et. al. (2006). McRT- STM: A High Performance Software Transactional Memory. ACM PPoPP)

Contention management. Which transaction to abort? T0! T!!! x = x + y;!!!!! x = x / 25;! x = x / 25;! Contention manager Can cause too many aborts, e.g., when a long running transaction conflicts with shorter transactions An aborted transaction may wait too long SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

From Mul)processor to Distributed Systems (from STM to DTM) cache cache cache Bus shared memory cache cache cache Bus shared memory Mul$processor TM: Built- in cache- coherence support cache cache cache Bus Distributed TM: Message passing links Intel SMP MESI protocol [3] shared memory Cache- coherence protocol needed SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan D- STM problem space Cache- coherence protocol Locate and move objects in the network Guarantee the consistency over mul$ple object copies Conflict resolu$on Conserva$ve approach Non- conserva$ve approach Key property: guarantee progress Fault- tolerance Network with node failures Replica$on protocol: manage object replicas

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Transac)on execu)on models in DTM Control flow Moving transac$ons, objects are held locally Consistency: distributed commit protocol Inherit the database transac$onal synchroniza$on Data flow Move the object to run all transac$ons locally Synchroniza$on: op$mis$c Conflicts are resolved by conflict resolu$on strategy No need for a distributed commit protocol Easier to exploit locality

DTM, how it works A Txn reques$ng object o B Txn holding object o CC.locate(o) A Network B CC.move(o) req(o) req(o) TM Proxy Local Cache Local Cache TM Proxy not found in use Processors (or nodes) connected by message- passing links cmp(a,b) CR(A,B) Distributed cache- coherence protocol (CC) Loca$ng and moving objects in the network Conflict Resolu$on Maintaining consistency among mul$ple copies of an object Conflict resolu$on module (CR) Resolve conflicts among transac$ons How to make the correct/op$mal decision? SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Design Goal Input The distributed system: nodes communicate via message passing links T: a set of n transactions accessing s shared objects in a metricspace network of m nodes A: conflict resolution strategy C: cache-coherence protocol. Output makespan(a,c): the total time needed to complete the set of transactions under (A,C). Goal: maximize the throughput by minimizing makespan(a,c) over all possible combinations of input (A,C).

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Measures of Quality Compare with an optimal clairvoyant off-line scheduling algorithm OPT. OPT has all transactions knowledge in advance. Each transaction is scheduled exactly once under OPT. Competitive ratio: evaluate the optimality of makespan(a,c) CR(A,C) = max (makespan(a,c)/(makespan(opt))

Problem statement We can consider the transaction scheduling problem for multiprocessor STM as a subset of the transaction scheduling problem for DTM. The two problems are equivalent as long as the communication cost can be ignored, compared with the local execution time duration. We model contention management as a non-clairvoyant scheduling problem If all transactions are conflicting each other, then a sequential schedule is the best solution. SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Towards the optimal: Cost Graph Cost Graph: each node in the system is a vertex in the graph each edge (v i, v j ) represents the channel to move an object from node v i to node v j each edge (v i, v j ) is weighted with the cost (d ij ) for moving the object from node v i to node v j V d 2 V 2 d 4 d 3 d 23 d 24 V 4 d 34 V 3

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Towards the optimal: Conflict Graph Conflict Graph: each transaction is represented as a numbered node each edge is marked with the object which causes transactions to conflict we can construct a coloring of the conflict graph since transactions with the same color are not connected, every set Ci forms an independent set and can be executed in parallel without facing any conflicts

Towards the optimal: Ordering Conflict Graph An optimal offline schedule Opt determines a k-coloring of the conflict graph and an execution order (the ordering conflict graph) such that for any two sets C i and C j, where i < j, if and conflict, and is in C and in, then is postponed until commits There are k! ordering conflict graphs The ordering conflict graph is weighted: Node s weight is the transaction s execution time Arc s weight is the cost for moving the object from the source node to the destination node SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

but the optimal is too complex The commit time of a transaction T is determined by one of the weighted paths that ends at T The makespan is the weight of the longest weighted path in the ordering conflict graph The optimal is reached selecting the ordering conflict graph that minimizes the makespan Finding an optimal contention manager is (NP-) hard If each node issues only one transaction and cost of moving objects is negligible, then the problem is equivalent to finding the chromatic number of the conflict graph If the number of shared object is one, the problem is equivalent to finding the traveling salesman path (TSP) in the cost graph SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Also When each node generates a sequence of transactions, it is not always optimal to schedule transactions according to the ordering conflict graph since the conflict graph evolves over time, an optimal schedule based on a static conflict graph may lose potential parallelism in the future.

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Local optimality is not global optimality (2-coloring) O 5 O 4 O 8 4 8 5 7 5 8 O 2 O 3 O 4 O O 9 O O 3 3 6 4 2 6 O 0 7 O 6 O O 7 T 2 2 T 3 T T 3 2 4 T 5 2 T 2 2 T 7 T 4 2 2 Makespan = 4d 6 8 5 7 3 6 Not Op)mal! 8

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Local optimality is not global optimality (4-coloring) O 5 O 4 O 8 4 8 5 4 5 8 O 2 O 3 O 4 O O 9 O O 3 3 6 7 2 6 O 0 7 O 6 O O 7 T 2 2 T 3 Makespan = 3d 2 5 4 8 3 6 4 7 2 5 6 7 8 3

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Lower Bounds For STM, any online determinis$c CM is Ω(s)- compe$$ve, where s is the number of objects [A[ya et al. 06] For DTM, any online determinis$c work conserva$ve CM is 5 For DTM, a (max[s, s2 ])- D op$mal, where is the normalized network diameter D When the normalized network diameter is bounded (D is a constant), it can only provide a Ω(s 2 )- compe$$ve ra$o. Can we find an approximate op$mal solu$on in reasonable $me?

CUTTING CUTTING is a randomized scheduling algorithm based on partitioning the cost graph Assumptions: Transaction T i knows its required set of objects after it starts We assume that the moving cost is bounded at D Input: A set of transactions with their execution time The conflict graph The cost graph An approximate TSP algorithm (ATSP) Output: A schedule for executing transactions SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Why TSP and why that assumption? T invoked by N writes objects {o 2, o 3 } N 2 stores o 2 ; N3 stores o 3 d 2 N d 4 d 3 N 2 d 23 d 24 N 4 d 34 N 3 Without assumption T Fetch (o 2 ) 2 x d 2 Fetch (o 3 ) N 2 With assumption T d 2 Fetch (o 2, o 3 ) d 23 N 2 2 x d 3 N 3 d 3 N 3 (2 x d 2 ) + (2 x d 3 ) = ~4d d 2 + d 23 + d 3 = ~3d

Cutting: how it works The cost graph is partitioned in C partitions such that for any pair of nodes (v i,v j ) belonging to one partition, d ij ATSP/C Within each partition: Nodes are numbered with an integer from to the size of the partition A binary tree is built following nodes numbers Each transaction randomly selects an integer that is used for deciding the transaction to abort after a conflict P P 2 P C v v v v 2 v 3 v 2 v 3 v 2 v 3 SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

Cutting: how it works Handling conflicts between two transactions: Phase : Ø Within the same partition and one transaction is an ancestor of the other in the partition s binary tree, the node that precedes the other in the ATPS path aborts the other Ø Otherwise the transaction with the lesser partition number aborts the other Phase 2: Ø Each transaction randomly selects an integer (π) when it starts or restarts. If one transaction is not an ancestor of the other, the transaction with the lower π proceeds and the other transaction aborts. Whenever a transaction is aborted by a remote transaction, the requested object is moved to the remote transaction immediately. SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

Cutting: analysis The average case competitive ratio of Cutting is O s A log 2 m log 2 n for s objects shared by n transactions invoked by m nodes This is close to the multiprocessor bound of O(s) SIROCCO 204, July 23-25, 204, Hida Takayama, Japan

Analysis n2 with probability n2. Hen st integer among all conflicting transactions 2 2 with probability ( l me transaction is O 32e(C C log + m) log nn ( trials + from the average-case perow of study two efficiency measures oflncutting n2 ) > otal, TaOther needs at most results on Cutting ive: the average response time (how long it takes for Therefore, a transactionbytoinduction, commit ). verage) andlthe averagetmakespan. L of Bt(P ) needs at most ( + q A transaction needs: t 2 nsaction T invoked by a level-l node v action needs O C log m log n trials, on av(+log2 L) log2 L at least 2 log A8e(C + ) nt trials probability 2n2 ma transaction needs Oa C log m log n trials from the moment it. N of 2a6Ltrial, i.e., thelntime untilwith transaction voked untila it commits, on Note transaction canaverage. only selectofa trials: new E[# of trials a transaction n canthat calculate the average number trials from the it is invoked until itsince commits, onthe starting 2 moment when (locally or remotely). Hence, if a transaccommit]average = O C log m log n. f.the Wesame startpartition, from transaction T selected, invoked by root nodethat v 2a Bt(P the duration is at most t ). probability transactio the ATSP patha is randomly thethe v is the T cannot be aborted by another ancestor in Bt(Pt ). Hence, nsaction inroot, another partition, the. duration is follows. Lmax L+ ated at level L is /2 The lemma 0 ntion onlysends aborted when chooses a larger than, is: which is the integer itsaverage requests ofit objects simultaneq be The response time of a transaction 0 n by a conflicting transaction T 0transactions, invoked by node v 2 Pt08. The nsaction conflicts with multiple Lemma Theprobability average resp 0 2T Atsp for transaction T, no transaction 2 selects thea same random number knows is the transaction closest to it.nn From T ( me of a transaction is O C log m log + ). Q C (T ) m 0 0 is:by Pr(@T 2 NT from = )other = partitions ( ) ( ) ( rted transactions by 0 NT m m m) e, expected time a transaction otethe that (T ) commit C m. Onofthe other hand, the probability that is6,ateach Proof. From Lemma 0 0 Cutting is The lemma q The average-case ratio as small as follows. O for transaction Tof is at least Thus, the ction needs C any log2conflicting m logcompetitive n trials, on average. We (C+) now.study the du of a trial, i.e., the time until a transaction 2its neighbors 2can select ability that is smallest among all is at least a new random num e(c+). petitive ratio of Cutting is O s A log m log n. oteuse that transaction can only select a new random number after it is a We thea following Cherno bound: ound provided by Lemma 7 andiflemma 8, tion conflicts with a transac (locally or remotely). Hence, a transacma same 7 Let partition, X, Xwith, Xnduration be independent Poisson trials such that, for 2,...the Atsp uces a schedule average-case makespan A the is at most P X+= Cn ; X if,itµconflicts with 2 i = )2= pi, where, Pr(X 0 p. Then, for = E[X] = i i SIROCCO 204, July 23 25, 204, Hida Takayama, Japan N log m log n + Atsp ), where N is i=

SIROCCO 204, July 23-25, 204, Hida Takayama, Japan Thanks! Questions? Research project s web-site: www.hyflow.org