Deconstructing Transactional Semantics: The Subtleties of Atomicity

Size: px
Start display at page:

Download "Deconstructing Transactional Semantics: The Subtleties of Atomicity"

Transcription

1 Abstract Deconstructing Transactional Semantics: The Subtleties of Atomicity Colin Blundell E Christopher Lewis Milo M. K. Martin Department of Computer and Information Science University of Pennsylvania Philadelphia, Pennsylvania USA {blundell,lewis,milom}@cis.upenn.edu Researchers have recently proposed software and hardware support for transactions as a replacement for the traditional lock-based synchronization most common in multithreaded programs. Transactions allow the programmer to specify a region of the program that should appear to execute atomically, while the hardware and runtime system optimistically execute the transactions concurrently to obtain high performance. The transactional abstraction is thus a promising approach for creating both faster and simpler multithreaded programs. Although transactions have great potential for simplifying multithreaded programming due to their strong atomicity guarantees, this work shows that these same guarantees can have unexpected and potentially serious negative effects on programs that were written assuming weaker synchronization primitives. We make three contributions: (1) we show that a direct translation (statically or dynamically) of lock-based critical sections into transactions can introduce deadlocks into otherwise correct programs, (2) we define an atomicity model for transactions, in which we introduce the terms strong and weak atomicity, and (3) we show that the decision to enforce strong atomicity as opposed to weak atomicity can also result in deadlock. These results invalidate the intuitive idea that transactions are strictly safer than lock-based critical sections and strong atomicity is strictly safer than weak atomicity. We assert that the research community must confront these subtle issues of transactional semantics by exploring the design space and deciding upon the most appropriate semantics for future transactional systems. 1 Introduction Synchronization has a first-order impact on the correctness and performance of multithreaded programs, and locks are currently the prevailing synchronization mechanism. Locks guard regions of code called critical sections, preventing concurrent access to shared data structures. Unfortunately, effective programming with lock-based synchronization is a delicate balancing act between achieving high performance and maintaining correctness. Coarse-grained locking (e.g., one lock for an entire binary tree data structure) gives rise to simple programming paradigms. However, the resulting lock contention can significantly limit scalability and performance. Fine-grained locking (e.g., a lock per node in a binary tree) can reduce lock contention but adds programming complexity, increasing the potential for deadlock and subtle data-race bugs. Furthermore, fine-grained locks may actually increase overhead due to more frequent acquire and release operations, which are slow on modern processors. To further complicate matters, programmers must often choose between various lock implementations (e.g., simple spin locks for uncontended locks, contentionrobust queue-based locks, per-thread reentrant locks, and reader/writer locks). In response to the performance and complexity challenges of locks, researchers have proposed hardware support for synchronization via transactions [2, 6, 7, 10, 12, 14, 15]: segments of code that are atomic with respect to each other. Like lock-based critical sections, transactions are a mechanism for mutual exclusion, but transactions are simpler (specifying atomicity without naming a lock) and more efficiently implemented (optimistically executing concurrently, rolling back on dynamically detected inter-transaction conflicts). This combination of intuitive interface and efficient implementation has the potential to solve many lock-related problems. Returning to our binary tree example, encapsulating an entire binary tree operation in a transaction has (1) the simplicity of an implementation that uses coarse-grained locking, (2) the scalable performance of an implementation that uses fine-grained locking, and (3) the efficiency of avoiding fine-grained locking overheads. In addition, efficient implementations of transactions have the potential to replace all the various types of locks (e.g., spin locks, queue-based locks, reader/write locks) with a single powerful primitive. Finally, transactions can provide a simpler approach for creating wait-free and non-blocking data structures [10]. We elaborate on the various transactional proposals in Section 2. Although transactions have great potential, this work un-

2 covers subtle issues and common misconceptions about their semantics. In particular, we investigate the implications of different forms of atomicity. We show that in several circumstances, correct programs created assuming one form of atomicity (e.g., that of lock-based critical sections) can deadlock when run on a system supporting a stronger form of atomicity (e.g., that of transactions). We make three main contributions: We show that a direct translation (statically or dynamically) of lock-based critical sections into transactions can introduce deadlock into an otherwise correct program. That is, it is unsafe to simply begin a transaction at lock acquisition and end it at lock release. At least one proposal advocates this direct conversion of lock-based programs to transaction-based programs. In Section 3 we use an example to show that such a transformation is not safe in general. We define two atomicity models for transactional systems. Researchers have implemented differing policies for handling interactions between transactional and non-transactional code; however, current definitions of transactional semantics do not always explicitly express these differences. We define strong atomicity as a transactional semantics that guarantees atomicity between transactions and non-transactional code, and we define weak atomicity as a transactional semantics that guarantees atomicity only among transactions. We assert that a transactional system should specify its atomicity model explicitly, much as a shared memory multiprocessor specifies a memory consistency model to define memory ordering [1]. In Section 4 we further discuss these atomicity models. We show that a program that is correct under the weak atomicity model may deadlock under the strong atomicity model. The intuitive view that a stronger atomicity model will correctly execute a superset of the code that is correct under a weaker atomicity model is false. In Section 5 we show by example that strong atomicity can negatively affect program correctness. This work invalidates the intuitive and commonly-held view that transactions are strictly safer than lock-based critical sections and that strong atomicity is strictly safer than weak atomicity. Stronger atomicity restricts the set of legal program interleavings, which would seem to only help the programmer by removing (potentially) buggy interleavings. However, programmers may intentionally or unintentionally exploit the non-atomicity of critical sections guarded by different locks or the non-atomicity between transactions and non-transactional statements, producing programs that require concurrent execution of these regions to avoid deadlock. 2 Background on Transactions Transactions have been used both as an efficient synchronization primitive and as an approach for optimistic execution of lock-based critical sections. 2.1 Synchronization Primitives Researchers have proposed both hardware and software support for transactions and transaction-like synchronization primitives. These proposals are distinguished by several aspects: their policy for mediating conflicts between transactions and non-transactional code, whether they have at-most-once or exactly-once execution semantics, the maximum size of a transaction, and whether the programmer must access memory differently within a transaction than outside one. Below, we survey hardware and software transactional systems, focusing on the above aspects. Supporting Transactions in Hardware. Herlihy and Moss [10] propose transactional memory as a means of supporting lock-free data structures. Inspired by database transactions, they define a transaction as a sequence of instructions that are atomic and serializable with respect to other transactions (i.e., transactions see each other s changes atomically and there is some serial commit order observed by all processors). In their system, transactions execute speculatively and roll back upon detecting a conflict (i.e., an inter-transaction data race). Speculative transactional state is buffered in caches, and conflict detection is implemented as an extension to standard multiprocessor cache coherence protocols. This general implementation strategy has been followed by most subsequent hardware proposals. Herlihy and Moss do not prescribe a semantics to conflicts between a transaction and non-transactional code. Another notable aspect of their programmatic interface is that transactions have at-most-once semantics: although the system causes transactions to abort on conflicts, the programmer is responsible for detecting the abort and retrying the transaction. The maximum size of a transaction is implementation specific, because fixed-size hardware caches buffer speculative memory updates until transactions are committed. A major limitation of Herlihy and Moss original proposal is that transactional state is buffered in caches, so programmers must be aware of cache size and application cache behavior when constructing transactions. Resent work has addressed this limitation. Ananian et al. s Unbounded Transactional Memory (UTM) [2] supports transactions that can be as large as virtual memory and persist through interrupts; this system, however, requires substantial changes to the processor and memory system. Therefore, they also propose Large Transactional Memory (LTM), which supports transactions that can be as large as physical memory but cannot survive interrupts;

3 LTM requires only minor changes to existing cache designs and coherence protocols. Interestingly, UTM allows transactions to interleave arbitrarily with non-transactional code, while LTM forces a transaction to abort on a conflict with non-transactional code. Virtual Transactional Memory (VTM) [15] and Thread-level Transactional Memory (TTM) [12] go a step further and locate memory resident speculative-state buffers in the application s virtual address space, thus tying transactions to threads/applications instead of processors. The hardware and operating system requirements for these systems are far from modest. In Stanford s Transactional Coherence & Consistency (TCC) [6, 7], the transaction is the basic unit of parallel work, communication, memory coherence, and memory reference consistency. Unlike the above systems, all code must reside in some transaction. Furthermore, rather than leverage the cache coherence protocol, TCC defines a transaction-grained coherence and synchronization protocol. TCC supports transactions of unbounded size not by spilling overflowed hardware buffers to memory, but instead an overflowing transaction acquires permission to commit before it has actually completed; once it has this permission, it no longer needs to buffer. The same technique allows I/O to appear in transactions. A unique aspect of TCC s programming model is that it allows the programmer to explicitly specify an ordering among transactions, supporting a form of thread-level-speculative-style parallelization. Supporting Transactions in Software. Researchers initially investigated transactional support in software for the purpose of building lock-free and non-blocking data structures. For this purpose, Shavit and Touitou [16] propose Software Transactional Memory (STM). STM supports only transactions whose data sets are statically known. Herlihy et al. [9] propose Dynamic Software Transactional Memory (DSTM), which supports transactions that access dynamic sets of memory locations. In DSTM, however, the programmer must access any object in a transaction by explicitly opening a transactional version of that object. In contrast to these proposals, Harris and Fraser [8] propose an STM to support a general-purpose atomic construct, similar in spirit to transactions. A programmer can make ordinary memory references within an atomic region, which has exactly-once semantics. Unfortunately, the performance overheads appear to be high when compared to hardware proposals. In a complementary line of research, Flanagan et al. [3, 4, 5] propose type systems that can statically verify atomicity of lock-based critical sections. These type systems could potentially reduce the work that a transactional memory system must do to dynamically guarantee atomicity. 2.2 Optimistic Execution of Critical Sections Transactions may also be used as a basis for improving the implementation of lock-based critical sections. Transactional Lock Removal (TLR) [14] is a system that increases performance by dynamically converting lock-based critical sections to transactions. TLR is an extension of Speculative Lock Elision [13], which dynamically elides lock acquires by executing critical sections speculatively (buffering the results in hardware) and acquiring locks on conflicts. The goal of TLR is to avoid lock acquisition even on conflicts. To this end, TLR uses timestamps to provide an ordering that is used to resolve conflicts without acquiring locks. TLR has no explicit policy for mediating conflicts between transactions and non-transactional code, but the designers suggest extensions that make transactions atomic with respect to non-transactional code. Although TLR tries to avoid acquiring the lock, it must revert to acquiring the lock when buffer resources are exhausted or a transaction exceeds a scheduling quantum. In a similar spirit, Welc et al. [17] define transactional monitors for Java. However, the name is somewhat of a misnomer, because transactional monitors have the same semantics as ordinary Java monitors, with the difference being that they execute speculatively (buffered in software) and roll back on conflicts between critical sections guarded by the same monitor. This per-monitor conflict detection behavior is an important difference between the two proposals, as TLR converts critical sections into transactions that are atomic with respect to all other transactions. 3 Critical Sections Transactions As transactions are a promising replacement for lock-based critical sections, some transactional proposals [2, 12, 14] have extended the benefits of transactional systems to legacy lock-based programs by directly converting lockbased critical sections to transactions (replacing lock acquires and releases with transaction begin and end operations, respectively). This conversion changes the program s semantics: a critical section that was previously atomic only with respect to other critical sections guarded by the same lock is now atomic with respect to all other critical sections. In this section, we show that this semantic change can cause some correct lock-based programs to deadlock. As a result, such direct conversations may be acceptable for architectural studies, but a system that indiscriminately applies this direct conversion will not be backward-compatible for all legacy programs. The assumption that lock-based critical section can be transparently translated to transactions is a natural one; the conversion simply disallows (previously legal) interleavings that contain perceivably concurrent execution of critical sections guarded by different locks, and it does not introduce any new interleavings. The disallowed interleavings would seem to be those most unintuitive to the programmer. For example, they may produce data races due to incorrect locking, in which case the conversion could actually remove

4 bool flaga = false, flagb = false; mutex m1, m2; bool flaga = false, flagb = false; acquire(m1); acquire(m2); while(!flaga){} flaga = true; flagb = true; while(!flagb){} release(m1); release(m2); } } (a) begin trans(); begin trans(); while(!flaga){} flaga = true; flagb = true; while(!flagb){} end trans(); end trans(); } } (b) Figure 1. A program with benign data races that executes correctly using locking (a) but deadlocks when directly converted to transactions (b). bugs. However, some correct program might require one of these disallowed interleavings to make progress. Such a program would deadlock after the direct conversion, indicating that this conversion is not always safe. Figure 1(a) presents a short (admittedly contrived) program that has this property. In this code, the programmer intends that neither proc1 nor proc2 can reach the lines marked by until the other can also reach this line (i.e., effecting a barrier); the locks m1 and m2 each guard a (different) shared variable that is accessed in this line by proc1 and proc2, respectively. Unprotected references to flaga and flagb give rise to benign data races, but one may choose to protect these variables with locks as in Figure 2. In either case, these programs operate as intended because proc1 and proc2 are protected by different locks, so their execution can be interleaved (assuming pre-emptive thread scheduling). Suppose we directly convert these critical sections to transactions, as shown in Figure 1(b). Now the transactions in proc1 and proc2 must execute atomically with respect to each other, meaning that one transaction must appear to execute before the other. This restriction allows either proc1 to observe proc2 s update of flaga or proc2 to observe proc1 s update of flagb, but not both. As a result, the program will deadlock because one or both of the transactions will be unable to make progress beyond the while loop. This example shows that the direct method of converting lock-based programs to use transactions may result in deadlock in legal lock-based programs by disallowing an interleaving that is necessary for progress. This observation impacts some (but not all) previous proposals. For example, Ananian et al. [2] propose a tool that simply replace[s] all locks with transactions, and use this tool to convert lock-based C programs for evaluation of LTM; our example shows that their tool is unsound in general. In contrast, TLR has a policy that the system reverts to acquiring the lock when a transaction exceeds its scheduling quantum. This fall-back case will result in correct execution on this example (and, we believe, in general), because the transactional deadlock will eventually cause the system to revert to lock-based execution. We emphasize that our intent is not to exhibit a real or even necessarily realistic program on which the direct conversion is unsafe, but rather to show that it is theoretically possible for this conversion to cause deadlock. In practice, such a conversion may almost always be safe. Nonetheless, any system that translates lock-based critical sections into transactions cannot assume that this translation is always safe; it must either determine for which lock-based critical regions such a conversion is safe or have a fallback method (such as that of TLR). Determining when this direct translation can be safely applied is now an open research issue. An interesting first question is whether the direct translation of a correct program preserves partial correctness, i.e., the translated program has the property that it will give a correct answer if it gives any answer. If this is true (as seems likely), then researchers can focus on detecting and preventing deadlock and livelock situations. 4 Strong versus Weak Atomicity Transactions should clearly be atomic with respect to each other, but their relationship to non-transactional code is less clear. This ambiguity would at first appear to be merely an implementation detail, because we would expect all references to shared data to be contained within transactions. However, legal programs may contain unprotected refer-

5 bool flaga = false, flagb = false; mutex m1, m2, ma, mb; acquire(m1); acquire(m2); while(true){ acquire(ma); acquire(ma); flaga = true; if (flaga){ release(ma); release(ma); while(true){ break; acquire(mb); } if (flagb){ release(ma); release(mb); } break; acquire(mb); } flagb = true; release(mb); release(mb); } release(m1); release(m2); } } Figure 2. A race-free equivalent of the code in Figure 1. ences to shared variables (i.e., outside transactions) without creating malignant data races, so both transactional and non-transactional code can refer to the same data. To account for these cases, we present two atomicity models. We define strong atomicity to be a transaction semantics in which transactions execute atomically with respect to both other transactions and non-transactional code, and we define weak atomicity to be a semantics in which transactions are atomic only with respect to other transactions (i.e., their execution may be interleaved with non-transactional code). An atomicity model for a transactional system is analogous to a memory consistency model for a traditional shared memory multiprocessor. A memory consistency model defines the observable orderings of memory operations between threads [1]. A strong memory consistency model, which limits the observable reordering of memory operations, is easiest to reason about for programmers, but it is difficult to implement efficiently [11]. In contrast, a weak (or relaxed) memory consistency model, which allows for counter-intuitive reordering of memory operations, is more complex for programmers to reason about because it requires them to explicitly insert memory barriers to enforce ordering. However, weak ordering models are easier to implement efficiently. Similarly, strong atomicity provides a simple and intuitive view of transactional atomicity, which may be more difficult to implement efficiently (especially in software-based transactional systems). In contrast, weak atomicity provides a less intuitive model (as transactions may not appear atomic when interleaved with non-transactional code), but it may be easier to implement efficiently. Interestingly, as transactional systems are also shared memory systems, such systems must define both a transactional atomicity model and a memory consistency model, as well as any previously unconsidered interactions between the two. Just as early work in shared memory multiprocessors did not explicitly address memory consistency issues, current work in transactional memory often does not explicitly consider the distinction between and implications of strong and weak atomicity. One model or the other is specified seemingly arbitrarily (e.g., based on the published description of UTM/LTM [2], we believe that UTM provides weak atomicity, while LTM provides strong atomicity; TLR provides strong atomicity although lock-based critical sections can interleave arbitrarily with code not under a lock [14]) or the form of atomicity is left unspecified (e.g., Herlihy and Moss original definition of transactional memory semantics [10] does not fully specify how it resolves interactions between transactional and non-transactional code). TCC avoids this issue altogether because all code is contained within some transaction. 5 Code Assuming Weak Atomicity Can Break under Strong Atomicity Another common and implicit assumption is that any program that executes correctly under weak atomicity will also execute correctly under strong atomicity. However, this assumption is not true; some programs that are correct under weak atomicity will deadlock under strong atomicity. A program executing under weak atomicity can interleave non-transactional code arbitrarily with transactional code, and such interleavings may be necessary for the program to make progress. If the system actually provides strong atomicity, these interleavings are not allowed and the program may deadlock as a result. For example, consider the two concurrently executing procedures in Figure 3. The programmer intends that the two threads proceed in a coordinated way through the use of the shared variables flaga and flagb, effecting a barrier. Under weak atomicity, the program will execute correctly: the two threads reads and writes can interleave arbitrarily, and the threads proceed as the programmer intended. However, consider what occurs if the program is executing under strong atomicity. The loop labeled ❶ in proc1 will terminate only after the transaction in proc2 propagates its update of flaga when the transaction commits; however, the transaction in proc2 can commit only after the update to flagb (labeled ❷) executes (because of the loop labeled ❸). The resulting circular dependency causes this program to deadlock under strong atomicity, despite correctly executing under weak atomicity.

6 bool flaga, flagb = false;... begin trans(); ❶ while(!flaga){} flaga = true; ❷ flagb = true; ❸ while(!flagb){} }... end trans(); } Figure 3. A program that executes correctly under weak atomicity but deadlocks under strong atomicity. The above example illustrates the need for transactional memory systems to specify whether they are strongly atomic or only weakly atomic and then implement that semantics precisely. For example, consider that this program will deadlock on LTM but execute correctly on UTM, although the designers claim that the systems have similar semantics. (TLR s fallback mechanism of reverting to lock acquires when a transaction exceeds its scheduling quantum will save it from deadlock on lock-based programs that are similar to this example.) If a programmer believes that a transactional system is strongly atomic and it is only weakly atomic, the programmer may write a buggy program due to race conditions between a transaction and non-transactional code (e.g., a program that intermingles locks and transactions). Conversely, if a program is written with the assumption that a transactional system is weakly atomic and it in fact implements strong atomicity, the program may deadlock because it relies on transactions being non-atomic with respect to non-transactional code. As such, neither model can serve as a safe least common denominator target for programmers. 6 Conclusions and Open Questions The main contribution of this paper is the counter-intuitive observation that programs that execute correctly under certain guarantees of atomicity can break when executing under stronger guarantees. Therefore, further work on transactions should consider this observation when proposing any transparent strengthening of atomicity policies. We have illustrated this situation by showing two ways in which this phenomenon can occur. First, transactions do not strictly subsume lock-guarded critical sections in the sense that any program that works correctly with locks will work correctly when directly converted to transactions. The stronger guarantees that transactions provide result in different requirements for correct execution: locks enforce atomicity only among segments of code that are guarded by the same lock, while transactions enforce atomicity among all concurrent transactions. Hence, a program that depends on non-atomicity between critical sections guarded by different locks may break when converted to transactions. Second, introducing atomicity between non-transactional and transactional code can break a program that correctly executes when non-transactional code can interleave with transactions. Therefore, a system must specify its policy on atomicity among transactions and non-transactional code as part of its transactional semantics. We have introduced the definitions of two transactional atomicity models. We define strong atomicity as a transactional semantics that gives the guarantee of atomicity between transactions and nontransactional code; weak atomicity is a transactional semantics that makes no such guarantee. We assert that in the future, designers of transactional memory systems should state explicitly whether they are implementing strong atomicity or weak atomicity. These subtleties of atomicity suggest two important implications. First, users of transactions programmers or automatic conversion tools must be aware of the exact semantics supported by the system they are using. Anything less can lead to incorrect programs. Second, that there does not yet exist a standard semantics for transactions threatens their utility as a synchronization mechanism. If different systems provide different transactional semantics, programs will not be portable and programmers will resort to more portable primitives (e.g., locks). This paper raises several questions. We have given theoretical program examples that give rise to the problems we describe, but how often (if ever) do these types of codes arise in practice? Is it possible to build tools that determine either statically or dynamically when it is safe to convert a lock-based critical section into a transaction? What are the benefits and drawbacks of strong atomicity and weak atomicity? Is a single transactional semantics appropriate for all applications and implementations? If not, how many different semantics are necessary? We hope that this work spurs researchers to investigate these questions. Although this work presents additional challenges for designers of transactional systems, we continue to believe that transactional systems are a promising approach for addressing many difficulties of programming tightly-coupled multiprocessor and multithreaded systems. We hope that this and other critical treatments [18] of practical aspects of transactions will contribute to their successful implementation, evaluation, and use, and not their abandonment.

7 Acknowledgments. The authors thank Mark Hill, Christos Kozyrakis, Ravi Rajwar, Craig Zilles, and the anonymous reviews for their helpful comments on drafts of this paper; in particular, Mark Hill raised the question of whether the direct conversion from locks to transactions maintains partial correctness. This work is funded in part by NSF Award and gifts from Intel Corporation. E Lewis is supported by NSF Career Award References [1] Sarita V. Adve and Kourosh Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12):66 76, December [2] C. Scott Ananian, Krste Asanovic, Bradley C. Kuszmaul, Charles E. Leiserson, and Sean Lie. Unbounded transactional memory. In International Symposium on High-Performance Computer Architecture, pages , [3] Cormac Flanagan, Stephen N. Freund, and Marina Lifshin. Type inference for atomicity. In Types in Language Design and Implementation, pages 47 58, [4] Cormac Flanagan and Shaz Qadeer. A type and effect system for atomicity. In Programming Language Design and Implementation, pages , [5] Cormac Flanagan and Shaz Qadeer. Types for atomicity. In Types in Language Design and Implementation, pages 1 12, [6] Lance Hammond, Brian D. Carlstrom, Vicky Wong, Ben Hertzberg, Mike Chen, Christos Kozyrakis, and Kunle Olukotun. Programming with transactional coherence and consistency (TCC). In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1 13, [7] Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, and Kunle Olukotun. Transactional memory coherence and consistency. In International Symposium on Computer Architecture, pages , [8] Tim Harris and Keir Fraser. Language support for lightweight transactions. In Object-Oriented Programming, Systems, Languages, and Applications, pages , [9] Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer III. Software transactional memory for dynamic-sized data structures. In Principles of Distributed Computing, pages , [10] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural support for lock-free data structures. In International Symposium on Computer Architecture, pages , [11] Mark D. Hill. Multiprocessors should support simple memory consistency models. IEEE Computer, 31(8): 28 34, August [12] Kevin E. Moore, Mark D. Hill, and David A. Wood. Thread-level transactional memory. Technical Report 1524, Department of Computer Sciences, University of Wisconsin, March [13] Ravi Rajwar and James R. Goodman. Speculative lock elision: enabling highly concurrent multithreaded execution. In International Symposium on Microarchitecture, pages , [14] Ravi Rajwar and James R. Goodman. Transactional lock-free execution of lock-based programs. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 5 17, [15] Ravi Rajwar, Maurice Herlihy, and Konrad Lai. Virtualizing transactional memory. In International Symposium on Computer Architecture, [16] Nir Shavit and Dan Touitou. Software transactional memory. In Principles of Distributed Computing, pages , [17] Adam Welc, Suresh Jagannathan, and Antony L. Hosking. Transactional monitors for concurrent objects. In European Conference on Object-Oriented Programming, pages , [18] Craig Zilles and David H. Flint. Challenges to providing performance isolation in transactional memories. In Annual Workshop on Duplicating, Deconstructing, and Debunking, June 2005.

6 Transactional Memory. Robert Mullins

6 Transactional Memory. Robert Mullins 6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2

More information

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Hardware Transactional Memory. Daniel Schwartz-Narbonne

Hardware Transactional Memory. Daniel Schwartz-Narbonne Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent

More information

Hybrid Transactional Memory

Hybrid Transactional Memory Hybrid Transactional Memory Mark Moir Sun Microsystems Laboratories 1 Network Drive, UBUR02-311 Burlington, MA 01803 July 2005 Abstract Transactional memory (TM) promises to substantially reduce the difficulty

More information

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion

More information

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems Hybrid Transactional Memory Peter Damron Sun Microsystems peter.damron@sun.com Alexandra Fedorova Harvard University and Sun Microsystems Laboratories fedorova@eecs.harvard.edu Yossi Lev Brown University

More information

Evaluating Contention Management Using Discrete Event Simulation

Evaluating Contention Management Using Discrete Event Simulation Evaluating Contention Management Using Discrete Event Simulation Brian Demsky Alokika Dash Department of Electrical Engineering and Computer Science University of California, Irvine Irvine, CA 92697 {bdemsky,adash}@uci.edu

More information

Challenges to Providing Performance Isolation in Transactional Memories

Challenges to Providing Performance Isolation in Transactional Memories Challenges to Providing Performance Isolation in Transactional Memories Craig Zilles and David H. Flint Dept. of Computer Science, University of Illinois at Urbana- Champaign 201 N. Goodwin Ave. Urbana,

More information

Foundations of the C++ Concurrency Memory Model

Foundations of the C++ Concurrency Memory Model Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Memory Consistency Models

Memory Consistency Models Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt

More information

Transactional Memory Coherence and Consistency

Transactional Memory Coherence and Consistency Transactional emory Coherence and Consistency all transactions, all the time Lance Hammond, Vicky Wong, ike Chen, rian D. Carlstrom, ohn D. Davis, en Hertzberg, anohar K. Prabhu, Honggo Wijaya, Christos

More information

Overview: Memory Consistency

Overview: Memory Consistency Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Parallelizing SPECjbb2000 with Transactional Memory

Parallelizing SPECjbb2000 with Transactional Memory Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh, Brian D. Carlstrom, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, caominh, bdc, kozyraki}@stanford.edu

More information

Implementing Sequential Consistency In Cache-Based Systems

Implementing Sequential Consistency In Cache-Based Systems To appear in the Proceedings of the 1990 International Conference on Parallel Processing Implementing Sequential Consistency In Cache-Based Systems Sarita V. Adve Mark D. Hill Computer Sciences Department

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Towards Transactional Memory Semantics for C++

Towards Transactional Memory Semantics for C++ Towards Transactional Memory Semantics for C++ Tatiana Shpeisman Intel Corporation Santa Clara, CA tatiana.shpeisman@intel.com Yang Ni Intel Corporation Santa Clara, CA yang.ni@intel.com Ali-Reza Adl-Tabatabai

More information

The University of Texas at Arlington

The University of Texas at Arlington The University of Texas at Arlington Lecture 6: Threading and Parallel Programming Constraints CSE 5343/4342 Embedded Systems II Based heavily on slides by Dr. Roger Walker More Task Decomposition: Dependence

More information

Pathological Interaction of Locks with Transactional Memory

Pathological Interaction of Locks with Transactional Memory Pathological Interaction of Locks with Transactional Memory Haris Volos, Neelam Goyal and Michael M. Swift University of Wisconsin Madison {hvolos,neelam,swift,@cs.wisc.edu Abstract Transactional memory

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

Memory model for multithreaded C++: August 2005 status update

Memory model for multithreaded C++: August 2005 status update Document Number: WG21/N1876=J16/05-0136 Date: 2005-08-26 Reply to: Hans Boehm Hans.Boehm@hp.com 1501 Page Mill Rd., MS 1138 Palo Alto CA 94304 USA Memory model for multithreaded C++: August 2005 status

More information

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract Commit Algorithms for Scalable Hardware Transactional Memory Seth H. Pugsley, Rajeev Balasubramonian UUCS-07-016 School of Computing University of Utah Salt Lake City, UT 84112 USA August 9, 2007 Abstract

More information

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing

More information

Implicit Transactional Memory in Chip Multiprocessors

Implicit Transactional Memory in Chip Multiprocessors Implicit Transactional Memory in Chip Multiprocessors Marco Galluzzi 1, Enrique Vallejo 2, Adrián Cristal 1, Fernando Vallejo 2, Ramón Beivide 2, Per Stenström 3, James E. Smith 4 and Mateo Valero 1 1

More information

Unrestricted Transactional Memory: Supporting I/O and System Calls within Transactions

Unrestricted Transactional Memory: Supporting I/O and System Calls within Transactions Unrestricted Transactional Memory: Supporting I/O and System Calls within Transactions Technical Report TR-CIS-06-09 May 2006 Colin Blundell blundell@cis.upenn.edu E Christopher Lewis lewis@cis.upenn.edu

More information

The University of Texas at Arlington

The University of Texas at Arlington The University of Texas at Arlington Lecture 10: Threading and Parallel Programming Constraints CSE 5343/4342 Embedded d Systems II Objectives: Lab 3: Windows Threads (win32 threading API) Convert serial

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Concurrency Control Service 7

Concurrency Control Service 7 Concurrency Control Service 7 7.1 Service Description The purpose of the Concurrency Control Service is to mediate concurrent access to an object such that the consistency of the object is not compromised

More information

LOCK-FREE DINING PHILOSOPHER

LOCK-FREE DINING PHILOSOPHER LOCK-FREE DINING PHILOSOPHER VENKATAKASH RAJ RAOJILLELAMUDI 1, SOURAV MUKHERJEE 2, RYAN SAPTARSHI RAY 3, UTPAL KUMAR RAY 4 Department of Information Technology, Jadavpur University, Kolkata, India 1,2,

More information

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising

More information

Potential violations of Serializability: Example 1

Potential violations of Serializability: Example 1 CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same

More information

G52CON: Concepts of Concurrency

G52CON: Concepts of Concurrency G52CON: Concepts of Concurrency Lecture 6: Algorithms for Mutual Natasha Alechina School of Computer Science nza@cs.nott.ac.uk Outline of this lecture mutual exclusion with standard instructions example:

More information

TRANSACTIONAL EXECUTION: TOWARD RELIABLE, HIGH- PERFORMANCE MULTITHREADING

TRANSACTIONAL EXECUTION: TOWARD RELIABLE, HIGH- PERFORMANCE MULTITHREADING TRANSACTIONAL EXECUTION: TOWARD RELIABLE, HIGH- PERFORMANCE MULTITHREADING ALTHOUGH LOCK-BASED CRITICAL SECTIONS ARE THE SYNCHRONIZATION METHOD OF CHOICE, THEY HAVE SIGNIFICANT PERFORMANCE LIMITATIONS

More information

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations

More information

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University 740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess

More information

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem. Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

Lock-Free Readers/Writers

Lock-Free Readers/Writers www.ijcsi.org 180 Lock-Free Readers/Writers Anupriya Chakraborty 1, Sourav Saha 2, Ryan Saptarshi Ray 3 and Utpal Kumar Ray 4 1 Department of Information Technology, Jadavpur University Salt Lake Campus

More information

CS510 Advanced Topics in Concurrency. Jonathan Walpole

CS510 Advanced Topics in Concurrency. Jonathan Walpole CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to

More information

AST: scalable synchronization Supervisors guide 2002

AST: scalable synchronization Supervisors guide 2002 AST: scalable synchronization Supervisors guide 00 tim.harris@cl.cam.ac.uk These are some notes about the topics that I intended the questions to draw on. Do let me know if you find the questions unclear

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri TRANSACTION MEMORY Presented by Hussain Sattuwala Ramya Somuri AGENDA Issues with Lock Free Synchronization Transaction Memory Hardware Transaction Memory Software Transaction Memory Conclusion 1 ISSUES

More information

Shared Memory Consistency Models: A Tutorial

Shared Memory Consistency Models: A Tutorial Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The

More information

Unbounded Transactional Memory

Unbounded Transactional Memory (1) Unbounded Transactional Memory!"#$%&''#()*)+*),#-./'0#(/*)&1+2, #3.*4506#!"#-7/89*75,#!:*.50/#;"#

More information

The Common Case Transactional Behavior of Multithreaded Programs

The Common Case Transactional Behavior of Multithreaded Programs The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab

More information

Transactional Lock-Free Execution of Lock-Based Programs

Transactional Lock-Free Execution of Lock-Based Programs Appears in the proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 6-Oct. 9, 2002, San Jose, California. Transactional

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Concurrency. Glossary

Concurrency. Glossary Glossary atomic Executing as a single unit or block of computation. An atomic section of code is said to have transactional semantics. No intermediate state for the code unit is visible outside of the

More information

DB2 Lecture 10 Concurrency Control

DB2 Lecture 10 Concurrency Control DB2 Lecture 10 Control Jacob Aae Mikkelsen November 28, 2012 1 / 71 Jacob Aae Mikkelsen DB2 Lecture 10 Control ACID Properties Properly implemented transactions are commonly said to meet the ACID test,

More information

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs CSE 451: Operating Systems Winter 2005 Lecture 7 Synchronization Steve Gribble Synchronization Threads cooperate in multithreaded programs to share resources, access shared data structures e.g., threads

More information

CS533 Concepts of Operating Systems. Jonathan Walpole

CS533 Concepts of Operating Systems. Jonathan Walpole CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect

More information

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Multiple Processes OS design is concerned with the management of processes and threads: Multiprogramming Multiprocessing Distributed processing

More information

Cigarette-Smokers Problem with STM

Cigarette-Smokers Problem with STM Rup Kamal, Ryan Saptarshi Ray, Utpal Kumar Ray & Parama Bhaumik Department of Information Technology, Jadavpur University Kolkata, India Abstract - The past few years have marked the start of a historic

More information

A Case for System Support for Concurrency Exceptions

A Case for System Support for Concurrency Exceptions A Case for System Support for Concurrency Exceptions Luis Ceze, Joseph Devietti, Brandon Lucia and Shaz Qadeer University of Washington {luisceze, devietti, blucia0a}@cs.washington.edu Microsoft Research

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Transactional Locking II

Transactional Locking II Transactional Locking II Dave Dice 1, Ori Shalev 2,1, and Nir Shavit 1 1 Sun Microsystems Laboratories, 1 Network Drive, Burlington MA 01803-0903, {dice,shanir}@sun.com 2 Tel-Aviv University, Tel-Aviv

More information

Concurrent Objects and Linearizability

Concurrent Objects and Linearizability Chapter 3 Concurrent Objects and Linearizability 3.1 Specifying Objects An object in languages such as Java and C++ is a container for data. Each object provides a set of methods that are the only way

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Dealing with Issues for Interprocess Communication

Dealing with Issues for Interprocess Communication Dealing with Issues for Interprocess Communication Ref Section 2.3 Tanenbaum 7.1 Overview Processes frequently need to communicate with other processes. In a shell pipe the o/p of one process is passed

More information

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important

More information

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications Early Results Using Hardware Transactional Memory for High-Performance Computing Applications Sverker Holmgren sverker.holmgren@it.uu.se Karl Ljungkvist kalj0193@student.uu.se Martin Karlsson martin.karlsson@it.uu.se

More information

Conflict Detection and Validation Strategies for Software Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/

More information

Duplicating and Verifying LogTM with OS Support in the M5 Simulator

Duplicating and Verifying LogTM with OS Support in the M5 Simulator Duplicating and Verifying LogTM with OS Support in the M5 Simulator Geoffrey Blake, Trevor Mudge {blakeg,tnm}@eecs.umich.edu Advanced Computer Architecture Lab Department of Electrical Engineering and

More information

Conventional processor designs run out of steam Complexity (verification) Power (thermal) Physics (CMOS scaling)

Conventional processor designs run out of steam Complexity (verification) Power (thermal) Physics (CMOS scaling) A Gentler, Kinder Guide to the Multi-core Galaxy Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili Reality Check Conventional

More information

Hybrid Transactional Memory

Hybrid Transactional Memory Hybrid Transactional Memory Peter Damron Sun Microsystems peter.damron@sun.com Alexandra Fedorova Harvard University and Sun Microsystems Laboratories fedorova@eecs.harvard.edu Yossi Lev Brown University

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

Atom-Aid: Detecting and Surviving Atomicity Violations

Atom-Aid: Detecting and Surviving Atomicity Violations Atom-Aid: Detecting and Surviving Atomicity Violations Abstract Writing shared-memory parallel programs is an error-prone process. Atomicity violations are especially challenging concurrency errors. They

More information

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization Hank Levy Levy@cs.washington.edu 412 Sieg Hall Synchronization Threads cooperate in multithreaded programs to share resources, access shared

More information

A Unified Formalization of Four Shared-Memory Models

A Unified Formalization of Four Shared-Memory Models Computer Sciences Technical Rert #1051, September 1991, Revised September 1992 A Unified Formalization of Four Shared-Memory Models Sarita V. Adve Mark D. Hill Department of Computer Sciences University

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Lock Elision and Transactional Memory Predictor in Hardware. William Galliher, Liang Zhang, Kai Zhao. University of Wisconsin Madison

Lock Elision and Transactional Memory Predictor in Hardware. William Galliher, Liang Zhang, Kai Zhao. University of Wisconsin Madison Lock Elision and Transactional Memory Predictor in Hardware William Galliher, Liang Zhang, Kai Zhao University of Wisconsin Madison Email: {galliher, lzhang432, kzhao32}@wisc.edu ABSTRACT Shared data structure

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory

Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory Colin Blundell (University of Pennsylvania) Joe Devietti (University of Pennsylvania) E Christopher Lewis (VMware,

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Mutual Exclusion: Classical Algorithms for Locks

Mutual Exclusion: Classical Algorithms for Locks Mutual Exclusion: Classical Algorithms for Locks John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 422 Lecture 18 21 March 2006 Motivation Ensure that a block of

More information

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 28 November 2014 Lecture 8 Problems with locks Atomic blocks and composition Hardware transactional memory Software transactional memory

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

A Uniform Transactional Execution Environment for Java

A Uniform Transactional Execution Environment for Java A Uniform Transactional Execution Environment for Java Lukasz Ziarek 1, Adam Welc 2, Ali-Reza Adl-Tabatabai 2, Vijay Menon 2, Tatiana Shpeisman 2, and Suresh Jagannathan 1 1 Dept. of Computer Sciences

More information

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25 DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of

More information

Cache Coherence in Distributed and Replicated Transactional Memory Systems. Technical Report RT/4/2009

Cache Coherence in Distributed and Replicated Transactional Memory Systems. Technical Report RT/4/2009 Technical Report RT/4/2009 Cache Coherence in Distributed and Replicated Transactional Memory Systems Maria Couceiro INESC-ID/IST maria.couceiro@ist.utl.pt Luis Rodrigues INESC-ID/IST ler@ist.utl.pt Jan

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

A Concurrent Skip List Implementation with RTM and HLE

A Concurrent Skip List Implementation with RTM and HLE A Concurrent Skip List Implementation with RTM and HLE Fan Gao May 14, 2014 1 Background Semester Performed: Spring, 2014 Instructor: Maurice Herlihy The main idea of my project is to implement a skip

More information

Chapter 5 Asynchronous Concurrent Execution

Chapter 5 Asynchronous Concurrent Execution Chapter 5 Asynchronous Concurrent Execution Outline 5.1 Introduction 5.2 Mutual Exclusion 5.2.1 Java Multithreading Case Study 5.2.2 Critical Sections 5.2.3 Mutual Exclusion Primitives 5.3 Implementing

More information

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts. CPSC/ECE 3220 Fall 2017 Exam 1 Name: 1. Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.) Referee / Illusionist / Glue. Circle only one of R, I, or G.

More information