A Survey Paper on Transactional Memory

Size: px
Start display at page:

Download "A Survey Paper on Transactional Memory"

Transcription

1 A Survey Paper on Transactional Memory Elan Dubrofsky CPSC 508 Course Project Department of Computer Science University of British Columbia Vancouver, B.C., Canada, V6T 1Z4 Abstract The necessity to write concurrent programs is increasing as systems are getting more complex while processor speed increases are slowing down. The current popular solution for parallel programming is to use locks but they contain many known drawbacks that make them a suboptimal solution. Transactional memory is a recent alternative to locks that is gaining a lot of attention in the research community. In this survey paper I explain the concept of transaction memory and identify its various benefits and limitations. Work on software, hardware and hybrid approaches to transactional memory is presented as well as a way to combine transactional code with code that uses locks. I conclude with my thoughts on the future of this potentially groundbreaking mechanism for shared-variable synchronization. 1 Introduction For a couple of decades now, developers have been able to rely on the fact that their computers would get faster. Processing speeds have increased consistenly over the years according to Moore s law and as such we have been able to develop systems of increasing complexity without requiring groundbreaking innovation on the software side. Unfortunately, it appears that the free lunch is over. According to Simon Peyton Jones [9] we can no longer assume that our programs will run faster just by purchesing the newest generation processor. While individual processor improvements may be declining there is still hope that comes from parallel programming. Multi-core processors are becoming very prevelant and it is up to software and operating system developers to find a solution to using them to their full capacity. The hardest problem that needs to be overcome when writing parallel programs is that of synchronization. Multiple threads may need to access the same locations in memory and if careful measures aren t taken the result can be disasterous; if two threads try to modify the same variable at the same time, the data can become corrupted. Most of today s software use locks to solve this problem. Locks ensure that a critical section, which is a block of code that contains variables that may be accessed by multiple threads, can only be accessed by one thread at a time. When a thread tries to enter a critical section, it must first acquire that section s lock.

2 If another thread is already holding the lock, the former thread must wait until the lock-holding thread releases the lock, which it does when it leaves the critical section. While locks do solve the problem of multiple threads accessing the same data at the same time, they have several well known drawbacks with respect to performance and ease of implementation. One big problem with locks is the potential for deadlock which can cause a program to freeze. Deadlock occurs when one thread is waiting for a lock in order to proceed and that lock is being held by a second thread that cannot proceed because it is waiting for a lock that is being held by the first thread. According to Birrell [2], the most effective way to avoid deadlocks is to apply a partial order to the acquisition of locks in your program. While this solution works it can be very tedious for a programmer to ensure that his code adheres to this rule. Another issue with locks is that in order to promote concurrency, fine-grained locking is required. This can lead to very complicated code with numourus locks being acquired and released all over the place. A popular example of this is in the Linux kernel where there are pages of comments just explaining what all of the locks are for. Simon Payton Jones [9] as well as others comment that the fundamental shortcoming of locks is that locks do not support modular programming. This means that large programs cannot be built using small programs without modifying the smaller programs. This survey paper discusses transactional memory, which is an alternative to using locks to enforce proper synchronization. Transactions do not suffer from the problems associated with locks that were mentioned above. Simon Payton Jones [9] and Aguilera et al. [1] point out that the important guarantees to the execution of critical sections provided by transactions are atomicity, consistency, isolation and durability (the ACID properties). Atomicity means that a critical section will execute completely or not at all, or in other words, no other threads will be able to see a state of memory where a critical section is only partially complete. Consistency means that data will never get corrupted and islotation means that the execution of a critical section will never be affected by the actions of other threads. Durability simply means that any committed memory modifications must be reliable. Another big advantage of transactional memory is that it makes synchronization simple to implement and code using transactions is very readable and understandable, which is definitely not the case with locks. The next section of this paper explains what transactions are and how transactional memory can be used by an application developer to write parallel programs. It also elaborates on some of the benefits that transactions provide over the use of locks. In section 3 I discuss the problem of where transactions should be implemented, in hardware or in software. The pros and cons of each option are compared and I present work being done on a hybrid approach which tries to take advantage of the best features in both options. Section 4 then discusses some drawbacks of transactional memory and work that has been done in evaluating its performance and section 5 presents work that has been done to overcome some of the problems by having systems dynamically decide when to use transactions and when not to. I then conclude with my opinion on the future of transactional memory and what I think needs to be done to increase its chances of widespread adoption. 2 Transactional Memory Overview Larus and Rajwar [10] point out in their book that database systems have successfully been exploiting concurrency for decades using transactions. This has led to many people trying to get this programming model used by databases to function as a more general parallel programming model. In 1993 Herlihy and Moss in [7]

3 proposed hardware-supported transactional memory as a mechanism for lock-free data synchronization and since then transaction memory has been a very hot topic in the systems community. The concept of using transactions is pretty straight forward. Any critical section of code that one wants made atomic by a transaction must be surrounded with, for example, xbegin and xend tags. When inside a transaction, any attempts to read or write to memory are not actually executed but are instead buffered to some sort of log (conceptually). When a transaction ends, the system checks to see if the memory locations that were accessed inside the transaction were modified by another thread between the time that xbegin and xend were called. If there is no conflict detected, the transaction is free to commit all of its memory modifications from the log and exit. In the case of a conflict, the transaction wipes the log clean and reverts back to the beginning of the transaction. This revert mechanism serves to make it appear as if the critical section had never been executed. An implentation of transactional memory such as this is called optimistic execution. It is considered to be optimistic because when an xbegin tag is reached, the system enters the transaction with the hope that it will be able to commit all of its changes at the end. It is important to note that a transaction does not worry about obtaining any locks. It simply executes right away and records any memory reads or writes to the log. The verification step at the end checks that the log is valid before the changes are committed. To check that the log is valid, the system must go through every variable that was read or written to ensure that their values are consistent with what they were when the transaction began (thus ensuring isolation). It is up to the implementation to ensure that the verification step is done atomically. Overall this is a very clean solution to parallel programming, as concurrency is dealt with simply by surrounding all critical sections with xbegin and xend tags. Unfortunately, transactional memory has some major limitations that have kept it from replacing locks as of yet. One limitation is that of performance. Whenever a transaction reverts, all of the work that it had done is essentially wasted. Cascaval et al. [3] say that transactional memory systems have yet to produce consistent results that indicate they can work without introducing unnacceptable overheads that cause the systems to be to slow. The overhead in using transactions is a major hurdle that must be overcome before transactions will ever be adopted. Another limitation is that I/O operations cannot be supported with transactions. For example, there is no way to revert printing something to the console for the user to see. These issues, as well as others are discussed in sections 4 and 5 of this paper. First though, I will discuss the issue of wether to implement transactional memory in hardware or software. 3 Hardware vs. Software An important topic in the transactional memory research is wether transactions should be implemented in hardware or software. This section will discuss some implementations for both options and explore the benefits and drawbacks of each approach. Recent work on a hybrid implementation is also covered. 3.1 Hardware Transactional Memory Herlihy and Moss 1993 The key to transactional memory is for the system to know when a transaction can be committed and when it must be aborted. Herlihy and Moss [7] proposed

4 a very clever way to implement transactional memory in hardware. They do so by modifying standard multiprocessor cache coherence protocols in order to work for transactional memory. Multiprocessor cache coherence protocols ensure that different processors cannot contain inconsistent values in their caches for the same location in memory. Herlihy and Moss proposed that any protocol capable of detecting accessibility conflicts can also detect transaction conflicts at no extra cost. The implementation discussed in [7] provides three primitives: Load Transactional (LT), Load Transactional Exclusive (LTX) and Store Transactional (ST). LT reads the value of a shared memory location into a private register, LTX does the same thing but hints that the location will be updated and ST writes from a private register to memory but does not makes the change visible to other processors until the transaction commits. It also provides three instructions: COMMIT (attempt to make changes permanent), ABORT (discard changes) and VALIDATE (test current transaction status). In standard multiprocessor cache coherence protocols access may be non-exclusive (permitting reads) or exclusive (permitting writes). At any time, a memory location will either be not immediately accessible by any processor (in memory only), accessible non-exclusively by one or more processors or accessible exclusively by only one processor. If a processor P has non-exclusive access to a location in memory and processor Q wants to store to that location, Q must obtain exclusive access and does so by revoking P s access. Herlihy and Moss implementation of hardware transactional memory aborts any transaction that tries to revoke access of a transactional entry from another active transaction. By extending Goodman s snoopy protocol for a shared bus [6], Herlihy and Moss have each processor maintain a regular and transactional cache. The transactional cache holds all tentative writes and only propagates changes to other processors or main memory if the transaction commits. They augment each transactional cache line with a transactional state of EMPTY (no data), NORMAL (contains committed data), XCOMMIT (discard on commit) and XABORT (discard on abort). Transactional operations then put two entries in cache, one with tag XCOMMIT and one with XABORT. All modifications are made to the XABORT entry and upon COM- MIT, entries marked XCOMMIT are set to EMPTY and entries marked XABORT are set to NORMAL. Upon an ABORT instruction, entries marked XABORT are set to EMPTY and those marked XCOMMIT are set to NORMAL. On a bus cycle, the cache acts like a regular cache except it ignores entries that are not marked NORMAL. This ensures that all changes will be propagated back to main memory if the transaction was committed. Of course this implementation also has a mechanism to discard changes if there is a transaction conflict. It does so by having each processor maintain two flags: TACTIVE (is a transaction active?) and TSTATUS (has the active transaction aborted?). TACTIVE is set to True whenever a transaction executes its first transactional operation. Upon an LT instruction, the transactional cache is probed for an XABORT entry (this will be an entry that was modified but not yet committed). If there is no XABORT entry for the memory location but there is a NORMAL one, they change the state from NORMAL to XABORT and create a second entry with the same data and assign it the XCOMMIT state. If there is no NORMAL entry either, the data is read from memory and XCOMMIT and XABORT entries are made as discussed above. If this memory read fails (because of a conflict), TSTA- TUS is set to false, all XABORT entries are dropped and all the XCOMMIT entries are set to NORMAL. LTX and ST instructions work in very similar ways. When VALIDATE is called, if TSTATUS is False, TACTIVE is set to False and TSTA- TUS is set to True (this basically aborts the transaction). When ABORT is called,

5 transactional cache entries are discarded, TACTIVE is set to False and TSTATUS is set to True. But, if COMMIT is called, TSTATUS is set to True, TACTIVE is set to false, all XABORT entries are dropped and all XABORT entries are set to NORMAL (so they can be read on the next bus cycle). This idea of using cache coherence protocols to implement hardware transactional memory was considered very novel in its time and sparked quite a lot of research in the area. Unfortunately there are some drawbacks to hardware transactional memory that are still considered major issues today Other Ways to do Hardware Transactional Memory The Wikipedia entry for transactional memory currently says that Load-Link and Store-Conditional operations can be viewed as the most basic hardware transactional memory support. Load-Link returns the value of a memory location and Store-Conditional stores a new value in memory only if no updates have occurred to that location since the last Load-Link call. This is an example of a simple update and commit operation [7]. Of course, these operations operate on data the size of a native machine word and therefore they are ineffective in providing the functionality of regular transactions. Rajwar and Goodman [12] comment that Herlihy and Moss implementation is not optimal because it requires special instructions, programmer support and coherence protocol extensions. Lev and Maessen [11] add that it is not robust and only works for transactions up to a fixed size. They also point out that it is architecture specific and not portable. Some of these problems are dealt with in Rajwar et al. s work on Virtualizing Transactional Memory [13]. In [13] the authors virtualize transactional memory in much the same way that virtual memory virtualizes physical memory. That is, programmers write applications without concern for the hardware limitations. Though their results are impressive, they still admit that there are some open challenges such as requiring a mechanism to support interactions among processes from different virtual address spaces. They also mention the issue of I/O which will be discussed in section 5. According to [11] and others, many of the limitations mentioned above are simply due to the fact that hardware transactional memory (alone) is not the best solution. In the next section I discuss an alternative, software transactional memory. 3.2 Software Transactional Memory Shavit and Touitou in [15] propose a software-based implementation of transactional memory. They call their approach Software Transactional Memory (STM) and describe it as a novel design that supports flexible transactional programming of synchronization operations in software. While they admit in the introduction that they cannot aim for the same performance as hardware-based implementations, they comment that STM has advantages in terms of applicability to today s machines, portability and resiliency in the face of timing anomalies and processor failures. Their implementation supports static transactions, which are transactions that access a pre-determined set of memory locations. It is also non-blocking, which means that threads competing for a shared resource will never have their execution indefinitely postponed by mutual exclusion. The implementation in [15] uses two data structures of size M (M being the number of memory locations): M emory and Ownerships. The Memory data structure is a vector which conatins the data stored in the transactional memory while Ownerships is a vector which stores records which identify which transaction owns a particular

6 cell in memory. Each process i keeps a record (pointed to by Rec i ) that stores information about the current transaction in progress (this can of course be null). Rec i contains a number of fields including: Add, which is a vector of the addresses in the transaction, Size, which stores the size of the data set, and OldV alues, which is a vector that will contain the former values stored in the involved locations. V ersion is an integer field, initially zero, which is incremented every time a process terminates a transaction. This field is used to determine the instance number of a transaction. A process initiates a transaction by calling the StartT ransaction routine which is given in the paper. This routine initializes the process record, executes the transaction with the T ransaction routine and checks if the transaction succeeded. If it did, it returns the vector of OldV alues. The T ransaction routine (also given in the paper) first tries to acquire ownership on the data set s locations. If it succeeds, the process writes the old values into the transaction s record, calculates the new values to be stored and writes them to memory. If it fails, it returns the location that caused the failure. Shavit and Touitou provide detailed correctness proofs for their algorithm as well as an empirical evaluation that compares the performance of STM to other software methods and they conclude that STM is very much competitive. Since Shavit and Touitou s work in 1997, there have been many papers written proposing various improvements to STM. In 2003 Herlihy et al. published work on software transactional memory for dynamic-sized data structures [8]. While prior STM designs require both the memory usage and transactions to be statically defined in advance, this work allows transactions and transactional objects to be created dynamically. They contend that their Dynamic Software Transactional Memory (DTSM) system is much better suited than the previous work to the implementation of dynamic-sized data structures such as lists and trees. Another interesting contribution to STM comes from Robert Ennals 2005 paper called Efficient Software Transactional Memory [5]. Ennals points out that on modern multi-processor machines on which STM implementations are designed to run, cache behavior has a significant effect on performance. His work aims to minimize cache contention and does so by making some deviations from previous STM designs including storing object versioning information inline and not guaranteeing that a transaction will make progress while another transaction is descheduled by the operating system. He notes that the latter deviation is theoretically inelegant but the testing shows that his algorithm significantly improves on the performance of previous STM algorithms. 3.3 Hybrid Approach Damron et al. [4] propose a hardware/software hybrid approach to transactional memory. This is motivated by the fact that both hardware and software solutions present significant limitations. While Herlihy and Moss [7] showed that boundedsize atomic transactions could be supported using simple modifications to current cache mechanisms, they cannot guarantee that any implementation will be sufficient for all transactions; there can always be a small fraction of transactions that will not be supported. This requires programmers to account for architecture-related limitations of hardware transactional memory and that erodes the benefits of transactional memory with regards to ease of use. Software transactional memory allows transactions to be unbounded, but comes with a significant increase in overheads. Not only do both approaches have these deficiencies, but they also are unlikely to be widely accepted. Hardware manufacturers are unlikely to be willing to produce chips with transactional support because there are no software implementations

7 that would make use of it. At the same time, software developers are unlikely to write software that use transactions since there is no hardware support available. This chicken and egg problem is one of the main issues that Damron et al. s Hybrid Transactional Memory (HyTM) aims to resolve. The main idea of HyTM is that the system attempts to execute a transaction in hardware if hardware support is available and the transaction does not exceed the hardware s limitations. If this fails then the system transparently executes the transaction in software. The programmer does not need to be concerned with hardware limitations and can just use transactions whenever he sees fit. Since any transaction that cannot be handled by hardware will be handled in software instead, the prevalence of HyTM will allow hardware designers to focus on solutions that will handle the majority of transactions without having to worry about extreme cases. In other words, HyTM will allow the hardware designers to build best-effort hardware transactional memory instead of having to provide guarantees on bounds. HyTM will also allow programmers to write programs using transactional memory even before hardware support is available. The performance of these programs will progressively improve as chips with better HTM support are released. This will motivate the hardware designers to design chips with better HTM support since there will be programs out there that can be improved. The results in [4] demonstrate that HyTM in software-only mode (no hardware support) provides much better scalability than simple coarse-grained locking and even comparable scalability to fine-grained locking which is very difficult to program. They also show that meager hardware support improves performance even more, which is what was expected. The hope is that programmers will start writing transactional code using HyTM and this will motivate processor designers to support transactions in hardware and put more effort into researching ways to make best-effort HTM even faster. 4 Limitations of and Skepticism Towards Transactional Memory With all of the benefits that have been associated with transactional memory, it may seem a bit surprising that this parallel programming paradigm has yet to take the multi-core world by storm. A very recent paper by Cascaval et al. [3] explores why transactional memory is still only a research toy. Some of the limitations of transactional memory they discuss have already been covered in this report. For example, they point out that STM leads to too much overhead with respect to performance. They also discuss how HTM capacity constraints lead to significant performance degradation when overflow occurs. Cascaval et al. also point out some transactional semantics issues independent of the hardware vs. software decision that break the ideal transactional programming model. The first issue is the problem of transactional code interacting with nontransactional code. There will always be systems with legacy code and thus this issue needs to be considered. It is unclear how to deal with shared data outside of a transaction (i.e. how to tolerate weak atomicity) and how to deal with locks being used inside transactions. Another issue is how to deal with exceptions. There needs to be an elegant mechanism to handle exceptions and propagate exception information from within a transactional context. Yet another issue is that of code that cannot be transactionalized, such as when I/O is required. They also note that the non-determanism introduced by aborting transactions makes debugging very complicated as it may be difficult to reproduce bugs when they occur. They conclude

8 that given all of these issues and the high transactional overheads, transactional memory has not yet matured to the point where it will be widely adopted. Aside from discussing all of these drawbacks related to transactional memory, Cascaval et al. also perform a number of tests comparing STM implementations to purely sequential code using a number of popular benchmarks. Their results are not very promising as in most cases the STM implementations perform equally well or worse than the sequential code. There is certainly an uphill battle ahead for transactional memory but there is certainly still hope. I have already discussed HyTM which may lead to improved performance if it is adopted by programmers and hardware developers. The next section of this survey paper describes work that has been done to deal with some of the semantic issues mentioned above such as integrating transactions with locks and dealing with code that requires I/O. 5 Deciding When to use Transactions Dynamically Rossbach et al. present a very interesting way to deal with some of the issues discussed above with their work titled TxLinux [14]. Their work is again motivated by the fact that programming with locks is very difficult. They thus decided to replace as many critical sections covered by locks in Linux as they could with hardware transactions. Unfortunately, as we have seen, not all critical sections can be replaced with transactions. For example, if the code contains I/O a transaction cannot be used because the action cannot be undone. They also had issues of idiosyncratic locking that were just to complicated to replace with transactions. The task of converting Linux to use transactions took them a year with 6 developers working full time. This is mainly because they had to spend so much time figuring out which critical sections could be replaced (in the end 30% of lock calls were replaced). This experience motivated them to invent an ingenious new parallel programming mechanism called Cooperative Transactional Spinlocks (Cxspinlocks). Cxspinlocks allow critical sections to dynamically decide wether to use locks or transactions. Most critical sections will attempt to use transactions. If the code attempts an I/O operation, the transaction will rollback and a lock will be used instead. Using Cxspinlocks the developers were able to convert Linux to TxLinux in 1 month with only one developer. The performance of TxLinux shows very small speedups over Linux according to the testing in [14]. This is especially impressive considering the goal of Cxspinlocks is more to make parallel programming easier than anything else. Also Linux is highly optimized for performance and thus any small improvement is an impressive feat. The Cxspinlock API contains three main operations. cx optimistic is an instruction used to optimistically attempt to execute a critical section using a transaction. If I/O is encountered, the transaction reverts and is called with cx exclusive which acquires a lock for the critical section. cx end signals the end of a critical section. A contention manager is used to decide which process should proceed if more than one process is trying to modify a shared variable using locks or transactions. This contention manager can be optimized to satisfy different goals of the system. Another contribution of [14] is that they suggest that the TxLinux contention manager should communicate with the OS scheduler in order to support OS goals such as avoiding priority inversion. They accomplish this by introducing the os-prio contention management policy. With os-prio the OS communicates priority to the transactional memory hardware and the contention manager always decides in favor of higher priority processes. os-prio defaults to other policies when necessary. According to the tests presented in the paper, os-prio eliminates 100% of priority

9 inversion and introduces a negligible performance cost. Rossbach et al. s work on TxLinux brings a lot of hope to the potential adoption of transaction memory. It allows locks and transactions to cooperate with negligible performance costs and thus resolves some of the semantic issues regarding transactional memory that were discussed in section 4 of this paper. 6 Conclusion Transactional memory has been shown in many ways to be a good alternative to using locks for writing parallel programs. While locks are messy and complicated, transactional memory primitives are elegant and allow code synchronization sections to be easily implemented and understood by developers. This survey paper has discussed both hardware and software transactional memory implementations and has identified benefits and drawbacks to each approach. It appears that the best solution is the HyTM hybrid approach which contains the performance benefits from HTM and the unboundedness of STM. I have also discussed semantic problems with transactional memory that are unrelated to the hardware vs. software discussion. A really good solution for many of these issues is the TxLinux work which uses Cxspinlocks to allow locks and threads to work together and to use transactions only when they are appropriate. In my opinion the future of transactional memory will be a combination of HyTM and Cxspinlocks. While it may still take a while to work out the various kinks, I feel that the necessity for better parallel programming solutions will drive the eventual adoption of transactional memory. As they predict in the HyTM paper, it appears that once the adoption of transactional memory begins it will have the potential to pick up momentum and make a very large impact on software development in the long run. References [1] Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. In SOSP 07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages , New York, NY, USA, ACM. [2] Andrew D. Birrell. An introduction to programming with threads. Technical report, Research Report 35, Digital Equipment Corporation Systems Research, [3] Calin Cascaval, Colin Blundell, Maged Michael, Harold W. Cain, Peng Wu, Stefanie Chiras, and Siddhartha Chatterjee. Software transactional memory: why is it only a research toy? Commun. ACM, 51(11):40 46, [4] Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages , New York, NY, USA, ACM. [5] Robert Ennals. Efficient software transactional memory. Technical Report IRC-TR , Intel Research Cambridge Tech Report, Jan [6] James R. Goodman. Using cache memory to reduce processor-memory traffic. In ISCA 83: Proceedings of the 10th annual international symposium on Computer architecture, pages , Los Alamitos, CA, USA, IEEE Computer Society Press.

10 [7] Maurice Herlihy, J. Eliot, and B. Moss. Transactional memory: architectural support for lock-free data structures. In in Proceedings of the 20th Annual International Symposium on Computer Architecture, pages , [8] Maurice Herlihy, Victor Luchangco, Mark Moir, and III William N. Scherer. Software transactional memory for dynamic-sized data structures. pages , Jul [9] Simon Peyton Jones. Beautiful Code, chapter 24. O Reilly, [10] Jim Larus and Ravi Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, [11] Yossi Lev and Jan-Willem Maessen. Toward a safer interaction with transactional memory by tracking object visibility. In Proceedings, Workshop on Synchronization and Concurrency in Object-Oriented Languages. San Diego, CA, October [12] Ravi Rajwar and James R. Goodman. Transactional lock-free execution of lock-based programs. In Proceedings of the Tenth Symposium on Architectural Support for Programming Languages and Operating Systems, pages Oct [13] Ravi Rajwar, Maurice Herlihy, and Konrad Lai. Virtualizing transactional memory. In ISCA 05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages , Washington, DC, USA, IEEE Computer Society. [14] Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Bhandari Aditya, and Emmett Witchel. Txlinux: using and managing hardware transactional memory in an operating system. In SOSP 07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages , New York, NY, USA, ACM. [15] N. Shavit and D. Touitou. Software transactional memory. Distributed Computing, Special Issue, 10:99 116, 1997.

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems Hybrid Transactional Memory Peter Damron Sun Microsystems peter.damron@sun.com Alexandra Fedorova Harvard University and Sun Microsystems Laboratories fedorova@eecs.harvard.edu Yossi Lev Brown University

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Portland State University ECE 588/688. Transactional Memory

Portland State University ECE 588/688. Transactional Memory Portland State University ECE 588/688 Transactional Memory Copyright by Alaa Alameldeen 2018 Issues with Lock Synchronization Priority Inversion A lower-priority thread is preempted while holding a lock

More information

LOCK-FREE DINING PHILOSOPHER

LOCK-FREE DINING PHILOSOPHER LOCK-FREE DINING PHILOSOPHER VENKATAKASH RAJ RAOJILLELAMUDI 1, SOURAV MUKHERJEE 2, RYAN SAPTARSHI RAY 3, UTPAL KUMAR RAY 4 Department of Information Technology, Jadavpur University, Kolkata, India 1,2,

More information

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional

More information

6 Transactional Memory. Robert Mullins

6 Transactional Memory. Robert Mullins 6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri TRANSACTION MEMORY Presented by Hussain Sattuwala Ramya Somuri AGENDA Issues with Lock Free Synchronization Transaction Memory Hardware Transaction Memory Software Transaction Memory Conclusion 1 ISSUES

More information

Cigarette-Smokers Problem with STM

Cigarette-Smokers Problem with STM Rup Kamal, Ryan Saptarshi Ray, Utpal Kumar Ray & Parama Bhaumik Department of Information Technology, Jadavpur University Kolkata, India Abstract - The past few years have marked the start of a historic

More information

Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin

Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin Hardware Transactional Memory is a reality Sun Rock supports HTM Solaris 10 takes advantage

More information

Using Software Transactional Memory In Interrupt-Driven Systems

Using Software Transactional Memory In Interrupt-Driven Systems Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense Introduction Thesis Statement Software transactional

More information

Lock-Free Readers/Writers

Lock-Free Readers/Writers www.ijcsi.org 180 Lock-Free Readers/Writers Anupriya Chakraborty 1, Sourav Saha 2, Ryan Saptarshi Ray 3 and Utpal Kumar Ray 4 1 Department of Information Technology, Jadavpur University Salt Lake Campus

More information

The Multicore Transformation

The Multicore Transformation Ubiquity Symposium The Multicore Transformation The Future of Synchronization on Multicores by Maurice Herlihy Editor s Introduction Synchronization bugs such as data races and deadlocks make every programmer

More information

Hardware Transactional Memory. Daniel Schwartz-Narbonne

Hardware Transactional Memory. Daniel Schwartz-Narbonne Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent

More information

COMP3151/9151 Foundations of Concurrency Lecture 8

COMP3151/9151 Foundations of Concurrency Lecture 8 1 COMP3151/9151 Foundations of Concurrency Lecture 8 Transactional Memory Liam O Connor CSE, UNSW (and data61) 8 Sept 2017 2 The Problem with Locks Problem Write a procedure to transfer money from one

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

A Concurrent Skip List Implementation with RTM and HLE

A Concurrent Skip List Implementation with RTM and HLE A Concurrent Skip List Implementation with RTM and HLE Fan Gao May 14, 2014 1 Background Semester Performed: Spring, 2014 Instructor: Maurice Herlihy The main idea of my project is to implement a skip

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Summary: Issues / Open Questions:

Summary: Issues / Open Questions: Summary: The paper introduces Transitional Locking II (TL2), a Software Transactional Memory (STM) algorithm, which tries to overcomes most of the safety and performance issues of former STM implementations.

More information

TxLinux: Using and Managing Hardware Transactional Memory in an Operating System

TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Appears in SOSP 2007 TxLinux: Using and Managing Hardware Transactional Memory in an Operating System Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Aditya Bhandari, and Emmett

More information

Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench

Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench Performance Comparison of Various STM Concurrency Control Protocols Using Synchrobench Ajay Singh Dr. Sathya Peri Anila Kumari Monika G. February 24, 2017 STM vs Synchrobench IIT Hyderabad February 24,

More information

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today

More information

METATM/TXLINUX: TRANSACTIONAL MEMORY FOR AN OPERATING SYSTEM

METATM/TXLINUX: TRANSACTIONAL MEMORY FOR AN OPERATING SYSTEM ... METATM/TXLINUX: TRANSACTIONAL MEMORY FOR AN OPERATING SYSTEM... HARDWARE TRANSACTIONAL MEMORY CAN REDUCE SYNCHRONIZATION COMPLEXITY WHILE RETAINING HIGH PERFORMANCE. METATM MODELS CHANGES TO THE X86

More information

sinfonia: a new paradigm for building scalable distributed systems

sinfonia: a new paradigm for building scalable distributed systems sinfonia: a new paradigm for building scalable distributed systems marcos k. aguilera arif merchant mehul shah alistair veitch christos karamanolis hp labs hp labs hp labs hp labs vmware motivation 2 corporate

More information

Phased Transactional Memory

Phased Transactional Memory Phased Transactional Memory Dan Nussbaum Scalable Synchronization Research Group Joint work with Yossi Lev and Mark Moir Sun Microsystems Labs August 16, 2007 1 Transactional Memory (TM) Replace locks

More information

Scheduling Transactions in Replicated Distributed Transactional Memory

Scheduling Transactions in Replicated Distributed Transactional Memory Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

OPERATING SYSTEM TRANSACTIONS

OPERATING SYSTEM TRANSACTIONS OPERATING SYSTEM TRANSACTIONS Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel The University of Texas at Austin OS APIs don t handle concurrency 2 OS is weak

More information

TxLinux and MetaTM: Transactional Memory and the Operating System

TxLinux and MetaTM: Transactional Memory and the Operating System TxLinux and MetaTM: Transactional Memory and the Operating System Christopher J. Rossbach, Hany E. Ramadan, Owen S. Hofmann, Donald E. Porter, Aditya Bhandari, and Emmett Witchel Department of Computer

More information

doi: /

doi: / Tx and MetaTM: Transactional Memory and the Operating System By Christopher J. Rossbach, Hany E. Ramadan, Owen S. Hofmann, Donald E. Porter, Aditya Bhandari, and Emmett Witchel doi:10.1145/1378727.1378747

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Hybrid Transactional Memory

Hybrid Transactional Memory Hybrid Transactional Memory Peter Damron Sun Microsystems peter.damron@sun.com Alexandra Fedorova Harvard University and Sun Microsystems Laboratories fedorova@eecs.harvard.edu Yossi Lev Brown University

More information

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications Early Results Using Hardware Transactional Memory for High-Performance Computing Applications Sverker Holmgren sverker.holmgren@it.uu.se Karl Ljungkvist kalj0193@student.uu.se Martin Karlsson martin.karlsson@it.uu.se

More information

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency

More information

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

CPEG 852 Advanced Topics in Computing Systems Introduction to Transactional Memory

CPEG 852 Advanced Topics in Computing Systems Introduction to Transactional Memory CPEG 852 Advanced Topics in Computing Systems Introduction to Transactional Memory Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

A Relativistic Enhancement to Software Transactional Memory

A Relativistic Enhancement to Software Transactional Memory A Relativistic Enhancement to Software Transactional Memory Philip W. Howard Portland State University Jonathan Walpole Portland State University Abstract Relativistic Programming is a technique that allows

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Atomic Transactions in Cilk

Atomic Transactions in Cilk Atomic Transactions in Jim Sukha 12-13-03 Contents 1 Introduction 2 1.1 Determinacy Races in Multi-Threaded Programs......................... 2 1.2 Atomicity through Transactions...................................

More information

CS5412: TRANSACTIONS (I)

CS5412: TRANSACTIONS (I) 1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of

More information

Towards a Software Transactional Memory for Graphics Processors

Towards a Software Transactional Memory for Graphics Processors Eurographics Symposium on Parallel Graphics and Visualization (21) J. Ahrens, K. Debattista, and R. Pajarola (Editors) Towards a Software Transactional Memory for Graphics Processors Daniel Cederman, Philippas

More information

Hybrid Transactional Memory

Hybrid Transactional Memory Hybrid Transactional Memory Mark Moir Sun Microsystems Laboratories 1 Network Drive, UBUR02-311 Burlington, MA 01803 July 2005 Abstract Transactional memory (TM) promises to substantially reduce the difficulty

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising

More information

Conflict Detection and Validation Strategies for Software Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict

More information

Birth of Optimistic Methods. On Optimistic Methods for Concurrency Control. Basic performance arg

Birth of Optimistic Methods. On Optimistic Methods for Concurrency Control. Basic performance arg On Optimistic Methods for Concurrency Control. Kung81: H.T. Kung, John Robinson. ACM Transactions on Database Systems (TODS), vol 6, no 2, June 1981. Birth of Optimistic Methods Lovely, complex, very concurrent

More information

Understanding Hardware Transactional Memory

Understanding Hardware Transactional Memory Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence

More information

Software transactional memory

Software transactional memory Transactional locking II (Dice et. al, DISC'06) Time-based STM (Felber et. al, TPDS'08) Mentor: Johannes Schneider March 16 th, 2011 Motivation Multiprocessor systems Speed up time-sharing applications

More information

ABORTING CONFLICTING TRANSACTIONS IN AN STM

ABORTING CONFLICTING TRANSACTIONS IN AN STM Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS

More information

Interprocess Communication By: Kaushik Vaghani

Interprocess Communication By: Kaushik Vaghani Interprocess Communication By: Kaushik Vaghani Background Race Condition: A situation where several processes access and manipulate the same data concurrently and the outcome of execution depends on the

More information

Reduced Hardware Lock Elision

Reduced Hardware Lock Elision Reduced Hardware Lock Elision Yehuda Afek Tel-Aviv University afek@post.tau.ac.il Alexander Matveev MIT matveeva@post.tau.ac.il Nir Shavit MIT shanir@csail.mit.edu Abstract Hardware lock elision (HLE)

More information

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion

More information

Hybrid Transactional Memory

Hybrid Transactional Memory Hybrid Transactional Memory Sanjeev Kumar Michael Chu Christopher J. Hughes Partha Kundu Anthony Nguyen Intel Labs, Santa Clara, CA University of Michigan, Ann Arbor {sanjeev.kumar, christopher.j.hughes,

More information

Unbounded Transactional Memory

Unbounded Transactional Memory (1) Unbounded Transactional Memory!"#$%&''#()*)+*),#-./'0#(/*)&1+2, #3.*4506#!"#-7/89*75,#!:*.50/#;"#

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in

More information

Software Transactional Memory

Software Transactional Memory Software Transactional Memory Michel Weimerskirch 22nd January 2008 Technical University of Kaiserslautern, 67653 Kaiserslautern, Germany michel@weimerskirch.net WWW home page: http://michel.weimerskirch.net/

More information

Advanced Topic: Efficient Synchronization

Advanced Topic: Efficient Synchronization Advanced Topic: Efficient Synchronization Multi-Object Programs What happens when we try to synchronize across multiple objects in a large program? Each object with its own lock, condition variables Is

More information

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency

Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency Anders Gidenstam Håkan Sundell Philippas Tsigas School of business and informatics University of Borås Distributed

More information

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access Philip W. Howard, Josh Triplett, and Jonathan Walpole Portland State University Abstract. This paper explores the

More information

Deconstructing Transactional Semantics: The Subtleties of Atomicity

Deconstructing Transactional Semantics: The Subtleties of Atomicity Abstract Deconstructing Transactional Semantics: The Subtleties of Atomicity Colin Blundell E Christopher Lewis Milo M. K. Martin Department of Computer and Information Science University of Pennsylvania

More information

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks

Agenda. Lecture. Next discussion papers. Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Agenda Lecture Bottom-up motivation Shared memory primitives Shared memory synchronization Barriers and locks Next discussion papers Selecting Locking Primitives for Parallel Programming Selecting Locking

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Page 1. Challenges" Concurrency" CS162 Operating Systems and Systems Programming Lecture 4. Synchronization, Atomic operations, Locks"

Page 1. Challenges Concurrency CS162 Operating Systems and Systems Programming Lecture 4. Synchronization, Atomic operations, Locks CS162 Operating Systems and Systems Programming Lecture 4 Synchronization, Atomic operations, Locks" January 30, 2012 Anthony D Joseph and Ion Stoica http://insteecsberkeleyedu/~cs162 Space Shuttle Example"

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Chapter 5 Asynchronous Concurrent Execution

Chapter 5 Asynchronous Concurrent Execution Chapter 5 Asynchronous Concurrent Execution Outline 5.1 Introduction 5.2 Mutual Exclusion 5.2.1 Java Multithreading Case Study 5.2.2 Critical Sections 5.2.3 Mutual Exclusion Primitives 5.3 Implementing

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

Problem. Context. Hash table

Problem. Context. Hash table Problem In many problems, it is natural to use Hash table as their data structures. How can the hash table be efficiently accessed among multiple units of execution (UEs)? Context Hash table is used when

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio

CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio CS 5523 Operating Systems: Midterm II - reivew Instructor: Dr. Tongping Liu Department Computer Science The University of Texas at San Antonio Fall 2017 1 Outline Inter-Process Communication (20) Threads

More information

Emmett Witchel The University of Texas At Austin

Emmett Witchel The University of Texas At Austin Emmett Witchel The University of Texas At Austin 1 Q: When is everything happening? A: Now A: Concurrently 2 CS is at forefront of understanding concurrency We operate near light speed Concurrent computer

More information

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Linked Lists: Locking, Lock-Free, and Beyond. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Linked Lists: Locking, Lock-Free, and Beyond Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Objects Adding threads should not lower throughput Contention

More information

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel

More information

Performance Improvement via Always-Abort HTM

Performance Improvement via Always-Abort HTM 1 Performance Improvement via Always-Abort HTM Joseph Izraelevitz* Lingxiang Xiang Michael L. Scott* *Department of Computer Science University of Rochester {jhi1,scott}@cs.rochester.edu Parallel Computing

More information

Refined Transactional Lock Elision

Refined Transactional Lock Elision Refined Transactional Elision Dave Dice Alex Kogan Yossi Lev Oracle Labs {dave.dice,alex.kogan,yossi.lev}@oracle.com Abstract Transactional lock elision () is a well-known technique that exploits hardware

More information

Building Efficient Concurrent Graph Object through Composition of List-based Set

Building Efficient Concurrent Graph Object through Composition of List-based Set Building Efficient Concurrent Graph Object through Composition of List-based Set Sathya Peri Muktikanta Sa Nandini Singhal Department of Computer Science & Engineering Indian Institute of Technology Hyderabad

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5 Multiple Processes OS design is concerned with the management of processes and threads: Multiprogramming Multiprocessing Distributed processing

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25

DATABASE TRANSACTIONS. CS121: Relational Databases Fall 2017 Lecture 25 DATABASE TRANSACTIONS CS121: Relational Databases Fall 2017 Lecture 25 Database Transactions 2 Many situations where a sequence of database operations must be treated as a single unit A combination of

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Leslie Lamport: The Specification Language TLA +

Leslie Lamport: The Specification Language TLA + Leslie Lamport: The Specification Language TLA + This is an addendum to a chapter by Stephan Merz in the book Logics of Specification Languages by Dines Bjørner and Martin C. Henson (Springer, 2008). It

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

Transactional Memory

Transactional Memory Transactional Memory Michał Kapałka EPFL, LPD STiDC 08, 1.XII 2008 Michał Kapałka (EPFL, LPD) Transactional Memory STiDC 08, 1.XII 2008 1 / 25 Introduction How to Deal with Multi-Threading? Locks? Wait-free

More information

Enhancing Concurrency in Distributed Transactional Memory through Commutativity

Enhancing Concurrency in Distributed Transactional Memory through Commutativity Enhancing Concurrency in Distributed Transactional Memory through Commutativity Junwhan Kim, Roberto Palmieri, Binoy Ravindran Virginia Tech USA Lock-based concurrency control has serious drawbacks Coarse

More information

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers

More information

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Transactional Memory Companion slides for The by Maurice Herlihy & Nir Shavit Our Vision for the Future In this course, we covered. Best practices New and clever ideas And common-sense observations. 2

More information