Hardware Transactional Memory Architecture and Emulation

Size: px
Start display at page:

Download "Hardware Transactional Memory Architecture and Emulation"

Transcription

1 Hardware Transactional Memory Architecture and Emulation Dr. Peng Liu 刘鹏 Media Processor Lab Dept. of Information Science and Electronic Engineering Zhejiang University Hangzhou, ,P.R.China

2 Outline Motivation Introduce basic TM concepts & interfaces TM implementation tradeoffs Discuss opportunities beyond parallelism Related work

3 Motivation: The Parallel Programming Crisis Multi-core chips, inflection point for SW development Scalable performance now requires parallel programming Parallel programming up until now Limited to people with access to large parallel systems Using low-level concurrency features in languages Thin veneer over underlying hardware Too cumbersome for mainstream software developers Difficult to write, debug, maintain and even get some speedup We need better concurrency abstractions Goal = easy to use + good performance 90% of the speedup with 10% of the effort

4 Parallel Programming is Hard Thread level parallelism is great until we want to share data Defining & implementing synchronization Races, deadlock avoidance, memory model issues Fundamentally, it s hard to work on shared data at the same time so we don t mutual exclusion via locks Locks have problems performance/correctness, fine/coarse tradeoff deadlocks and failure recovery

5 Transactional Memory (TM) Memory transaction [Knight 86, Herlihy & Moss 93] Inspired by database transactions Execute large, programmer-defined regions atomically and in isolation atomic { x = x + y; } Atomicity (all or nothing) At commit, all memory writes take effec at once On abort, none of the writes appear to take effect Isolation No other code can observe writes before commit Serializability Transactions seem to commit in a single serial order The exact order is not guaranteed though

6 Advantages of TM Easy to use synchronization construct As easy to use as coarse-grain locks Programmer declares, system implements Performs as well as fine-grain locks Automatic read-read & fine-grain concurrency No tradeoff between performance & correctness Failure atomicity & recovery No lost locks when a thread fails Failure recovery = transaction abort + restart Composability Safe & scalable composition of software modules

7 Programming with TM Basic atomic blocks: atomic {} User-triggered abort: abort Conditional synchronization: retry Composing code sequences: orelse Integration with parallel models:?

8 TM Caveats and Open Issues TM Vs. Locks I/O and unrecoverable actions Interaction with non-transactional code

9 Atomic() Lock() + Unlock() The difference Atomic: high-level declaration of atomicity Does not specify implementation/blocking behavior Does not provide a consistency model Lock: low-level blocking primitive Does not provide atomicity or isolation on its own Keep in mind Locks can be used to implement atomic() Locks can be used for purposes beyond atomicity Cannot replace all lock regions with atomic regions Atomic eliminates many data races Atomic blocks can suffer from atomicity violations Atomic action in algorithm split into two atomic blocks

10 I/O and Other Irrevocable Actions Challenge: difficult to undo output & redo input I/O devices, I/O registers Alternative solutions (open problem) Buffer output & log input Finalize output & clear log at commit Does not work if atomic does input after output Guarantee that transaction will not abort Abort interfering transactions or sequentialize the system Does not work with abort(), input-after-output Transaction-based systems Multiple transactional devices Manager coordinates transactions across devices

11 Interactions with Non-Transactional Code Two basic alternatives Weak atomicity Transactions are serializable only against other transactions No guarantees about interactions with non-transactional code Strong atomicity Transactions are serializable against all memory accesses Non-transactional loads/stores are 1-instrcution transactions The tradeoff Strong atomicity seems intuitive Predictable interactions for a wide range of coding patterns But, strong atomicity has high overheads for software TM

12 Why TM? TM= declarative synchronization User specifies requirement (atomicity & isolation) System implements in best possible way Motivation for TM Difficult for user to get explicit sync right Correctness Vs performance Vs complexity Explicit sync is difficult to scale Locking scheme for 4 CPUs is not the best for 64 Difficult to do explicit sync with composable SW Need a global locking strategy Other advantages: fault atomicity,.. TM applicability Apps with irregular or unstructured parallelism Difficult to prove independence in advance Difficult to partition data in advance TM does not generate new parallelism It just helps you tap into what is there TM target: 90% of 10% of work

13 Implementation Requirements for TM To build TM, you need Data Versioning atomic { x = x + y; } Conflict Detection T0 atomic { x = x + y; } T1 atomic { x = x / 8; } Conflict Resolution T0 x = x + y; x = x / 8; x = x / 8; T1 Where do you put the new x until commit? How do you detect that reads/writes to x need to be serialized? How do you enforce serialization when required? Design space tradeoffs

14 TM Implementation Basics TM systems must provide atomicity and isolation Without sacrificing concurrency Basic implementation requirements Data versioning Conflict detection & resolution Implementation options Hardware transactional memory (HTM) Software transactional memory (STM) Hybrid transactional memory Hardware accelerated STMs and dual-mode systems

15 Data Versioning Manage uncommitted (new) and committed (old) versions of data for concurrent transactions Eager versioning (undo-log based) Update memory location directly Maintain undo info in a log + Faster commit, direct reads (SW) - Slower aborts, fault tolerance issues Lazy versioning (write-buffer based) - Buffer data until commit in a write-buffer - Update actual memory location on commit + Faster abort, no fault tolerance issues - Slower commits, indirect reads (SW)

16 Conflict Detection Detect and handle conflicts between transaction Read-Write and (often) Write-Write conflicts Must track the transaction s read-set and write-set Read-set: addresses read within the transaction Write-set: addresses written within the transaction Pessimistic detection Check for conflicts during loads or stores SW: SW barriers using locks and/or version numbers HW: check through coherence actions Use contention manager to decide to stall or abort Various priority policies to handle common case fast Optimistic detection Detect conflicts when a transaction attempts to commit SW: validate write/read-set using locks or version numbers HW: validate write-set using coherence actions Get exclusive access for cache lines in write-set On a conflict, give priority to committing transaction Other transactions may abort later on On conflicts between committing transactions, use contention manager to decide priority

17 Conflict Detection Tradeoffs Pessimistic conflict detection (aka encounter or eager) + Detect conflicts early Undo less work, turn some aborts to stalls - No forward progress guarantees, more aborts in some cases - Locking issues (SW), fine-grain communication (HW) Optimistic conflict detection (aka commit or lazy) + Forward progress guarantees + Potentially less conflicts, shorter locking (SW), bulk communication (HW) - Detects conflicts late, still has fairness problems

18 Conflict Detection Granularity Object granularity (SW/hybrid) + Reduced overhead (time/space) + Close to programmers reasoning - False sharing on large objects (e.g. arrays) Word granularity + Minimize false sharing - Increased overhead (time/space) Cache line granularity + compromise between object & word + works for both HW/SW Mix & match ->best of both words word-level for arrays, object-level for other data,..

19 TM Implementation Space Hardware TM systems Lazy + optimistic: Stanford TCC Lazy + pessimistic: MIT LTM, Intel VTM Eager + pessimistic: Wisconsin LogTM Software TM Systems Lazy + optimistic (rd/wr): Sun TL2 Lazy + optimistic (rd)/pessimistic (wr): MS OSTM Eager + optimistic (rd)/pessimistic (wr): Intel STM Eager + pessimistic (rd/wr): Intel STM Optimal design is still an open questions May be different for HW, SW, and hybrid

20 Hardware or Software TM? Can be implemented in HW or SW SW is slow Bookkeeping is expensive: 2-8x slowdown SW has correctness pitfalls Even for correctly synchronized code! Lack of strong atomicity Let s use hardware for TM

21 Types of Hardware Support Hardware-accelerated STM systems (HASTM, SigTM, USTM, FlexTM ) Start with STM system & identify key bottlenecks Provide (simple) HW primitives for acceleration Hardware-based TM systems (TCC, LTM, VTM, LogTM, ) Versioning & conflict detection directly in HW Hybrid TM systems (Sun Rock, ) Combine an HTM with an STM by switching modes when needed Based on xaction characteristics available resources,

22 Hardware TM Data versioning in caches Cache the write-buffer or the undo-log Cache metadata to track read-set and write-set Can do with private, shared, and multi-level caches Conflict detection through cache coherence protocol Coherence lookups detect conflicts between transactions Works with snooping & directory coherence Notes Register checkpoint must be taken at transaction begin Virtualization of hardware resources HTM support similar for TLS and speculative lock-elision Some hardware can support all three models actually

23 HTM Advantages Transparent No need for SW barriers, function cloning,.. Fast common case behavior Zero-overhead tracking of read-set & write-set Zero-overhead versioning Fast commit & abort without data movement Continuous validation of read-set Strong isolation Conflicts detected on non-xaction loads/stores as well Can simplify multi-core hardware Replace existing coherence with transactional coherence

24 HTM Challenges and Opportunities 1.What s the best implementation in hardware? Many available options 2.What s the right HW/SW interface? HTM support flexible SW environment 3.What s happens when HW resources are exhausted? Virtualization of hardware resources Time virtualization Interrupts, paging, and context switch with xaction What happens to the state in caches Space virtualization Where is the write-buffer or log stored How are R&W bits stored and checked Most transactions are currently small Small read-sets & write-sets Short in terms of instructions

25 Project Aims The self-tuning transactional memory system Dynamically adapt its policies to best suit the application behavior. Configurable parameterized application programming interface (API) to improve the scalability and flexibility. Develop loop-closed debugger for HTM based on our FPGA prototype platform. Validate the self-tuning memory hierarchy in the platform that can support both software-managed memories and a cache-coherent or transactional memory system.

26 Processing Elements Concepts Memory Wall Processor frequency vs. DRAM memory latency Latency introduced by multiple levels of memory Attack on the Memory Wall 3-level Memory Model: Main storage, tightly coupled memory (TCM) and HTM cache, and Register file Streaming DMA architecture RISC processor RTOS support real-time worlds

27 Hardware Transactional Memory Architecture

28 TM Version, Conflict, Contention Implement an atomic and isolated transactional region: Versioning: eager and lazy Conflict detection: optimistic and pessimistic Contention management To make a transactional code region appear atomic, all its modifications must be stored and kept isolated from other transaction until commit time. To ensure serializability between transactions, conflicts must be detected and resolved.

29 Designer-defined Interface Define the TM instructions in three models Basic model XSTART, a transaction begin mark XEND, a transaction end mark Extension model XSTART_OPEN, independent atomicity and isolation for nested transactions XSTART_CLOSED, independent rollback & restart for nested transactions XABR, abort a running transaction XVLD, validate a running transaction User mode UCLEAR, clear the read-set and write-set data USTORE, store the data to memory, the speculative cache state is not changed ULOAD, load the data from the memory, the speculative cache state is not changed Problems: How to coincide with the instruction set architecture and processor pipeline How to write the application program using these primitives

30 Emulation Platform Framework Architecture research relies on software simulators which are too slow to facilitate interesting experiments. An alternative to simulation is to develop FPGA-based platforms for parallel computing platform. For the HTM project, we have developed the Transactional system Emulation Accelerator (TEA) platform to validate the HTM design and to support programming models and application development. We also can use the FPGA-based technology for prototyping modern CMP systems.

31 TEA Architecture FPGA E FPGA S RISC/DSP RISC/DSP RISC/DSP RISC/DSP I$ HTM DMA TCM I$ HTM DMA TCM DDR2 DRAM CTRL Token ARB I$ HTM DMA TCM I$ HTM DMA TCM switch switch Router switch switch DMA TCM I$ HTM DMA TCM I$ HTM Linux RISC FPGA M I/O DMA TCM I$ HTM DMA TCM I$ HTM RISC/DSP RISC/DSP RISC/DSP RISC/DSP FPGA N FPGA W Each User FPGA (East, South, West, North) contains two RISC/DSP cores enhanced with a HTM and DMA mechanism. The FPGA M connects all the processors to the shared memory and I/O devices. The router interfaces with the token arbiter, the DDR controller and RISC32E core that runs the Linux OS/RTOS.

32 Breakdown of TEA s Bandwidth FPGA-FPGA ⅰ)LVCMOS Link Control FPGA to User FPGA link: 100MHz x80bit = 8.0Gb/s User FPGA to User FPGA link:100mhz x100bit = 10.0Gb/s ⅱ)GTP Link Control FPGA to User FPGA link: 2 GTPs User FPGA to User FPGA link: 6 GTPs Memory Capacity 10GB DDRⅡ/FPGA Bandwidth 64bit x 150MHz =9.6Gb/s I/O Control FPGA 8 SFP User FPGA 2 SFP Supports both 10-Gigabit Ethernet and 10-Gigabit Infiniband standards Bandwidth 2.5Gb/s In addition, one Gbit Ethernet port/fpga for supplementary,

33 TEA Platform Photo Cache coherence and TM Emulation On-chip Interconnection Network and Protocol Verification of MPSoC

34 Contributions Evaluated hardware TM systems The best system from efficiency/complexity and application standpoint Replaced coherence and consistency with only transactions Using only transactions for communication is advantageous and efficient Devised a hardware/software interface for TM Simple primitives provide TM with flexible and needed semantics

35 Problems Software simulator user-level or full system? Hardware emulator? Is TM an panacea? How to attack memory wall?

36 Related Work Cell processor and Roadrunner RAMP( Research Accelerator for Multiple Processors) project, an FPGA-based hardware emulator in computer architecture. Smart Memory (Stanford University) A.Firoozshahian, et al., A memory system design framework: creating smart memories, ISCA Sun s Rock is a highly-speculative multicore processor with a isolating hardware checkpointing feature. M. Tremblay and S.Chaudhry, A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor, ISSCC TCC project LogTM K.E.Moore, et al., LogTM: log-based transactional memory, HPCA EazyHTM S.Tomić, et al., EazyHTM:eager-lazy hardware transactional memory, MICRO MetaTM Rossbach et al., "TxLinux and MetaTM: transactional memory and the operating system," Communications of the ACM, FlexTM S.Arrvindh et al. Flexible decoupled transactional memory support, ISCA TM research community TM bibliography:

37 Selected References TM Overview Larus & Rajwar. Transactional Memory, Morgan & Claypool Publishers,2007, 2011 Larus & Kozyrakis. Transactional Memory. Communications of the ACM, Harris et al. Transactional Memory: An Overview, IEEE Micro, Basics Herligh & Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures, ISCA, Hammond, et al. Transactional Memory Coherence and Consistency, ISCA, Rajwar et al. Virtualizing Transactional Memory. ISCA, Moore et al. logtm: Log-Based Transactional Memory, HPCA, Ceze et al. BulkSC: Bulk Enforcement of Sequential Consistency, ISCA, McDonald. Architectures for Transactional Memory, Dissertation, Stanford University, McDonald. Architectural Semantics for Practical Transactional Memory, ISCA, Moravan. Supporting Nested Transactional Memory in LogTM, ASPLOS, Wee et al. A practical FPGA-based Framework for Novel CMP Research, FPGA, Njoroge et al. ATLAS: A Chip-Multiprocessor with Transactional Memory Support, DATE, Lupon et al. A Dynamically Adaptable Hardware Transactional Memory, Microarchitecture, Christos. Transactional Memory, Concepts, Implementations, & Opportunities,

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 20: Transactional Memory Parallel Computer Architecture and Programming Slide credit Many of the slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford University) Raising

More information

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 19: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of the slides in today s talk are borrowed from Professor Christos Kozyrakis

More information

6 Transactional Memory. Robert Mullins

6 Transactional Memory. Robert Mullins 6 Transactional Memory ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Limitations of lock-based programming Transactional memory Programming with TM ( STM ) Software TM ( HTM ) Hardware TM 2

More information

Chí Cao Minh 28 May 2008

Chí Cao Minh 28 May 2008 Chí Cao Minh 28 May 2008 Uniprocessor systems hitting limits Design complexity overwhelming Power consumption increasing dramatically Instruction-level parallelism exhausted Solution is multiprocessor

More information

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Transactional Memory Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech (Adapted from Stanford TCC group and MIT SuperTech Group) Motivation Uniprocessor Systems Frequency

More information

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 18-742 Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II Prof. Onur Mutlu Carnegie Mellon University 10/12/2012 Past Due: Review Assignments Was Due: Tuesday, October 9, 11:59pm. Sohi

More information

Transactional Memory

Transactional Memory Transactional Memory Architectural Support for Practical Parallel Programming The TCC Research Group Computer Systems Lab Stanford University http://tcc.stanford.edu TCC Overview - January 2007 The Era

More information

Tradeoffs in Transactional Memory Virtualization

Tradeoffs in Transactional Memory Virtualization Tradeoffs in Transactional Memory Virtualization JaeWoong Chung Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi,, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab Stanford

More information

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28 Transactional Memory or How to do multiple things at once Benjamin Engel Transactional Memory 1 / 28 Transactional Memory: Architectural Support for Lock-Free Data Structures M. Herlihy, J. Eliot, and

More information

Transactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017

Transactional Memory. Lecture 18: Parallel Computer Architecture and Programming CMU /15-618, Spring 2017 Lecture 18: Transactional Memory Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Credit: many slides in today s talk are borrowed from Professor Christos Kozyrakis (Stanford

More information

EazyHTM: Eager-Lazy Hardware Transactional Memory

EazyHTM: Eager-Lazy Hardware Transactional Memory EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center,

More information

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM Lecture 6: Lazy Transactional Memory Topics: TM semantics and implementation details of lazy TM 1 Transactions Access to shared variables is encapsulated within transactions the system gives the illusion

More information

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems Concurrency unlocked Programming Bingsheng Wang TM Operating Systems 1 Outline Background Motivation Database Transaction Transactional Memory History Transactional Memory Example Mechanisms Software Transactional

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

The Common Case Transactional Behavior of Multithreaded Programs

The Common Case Transactional Behavior of Multithreaded Programs The Common Case Transactional Behavior of Multithreaded Programs JaeWoong Chung Hassan Chafi,, Chi Cao Minh, Austen McDonald, Brian D. Carlstrom, Christos Kozyrakis, Kunle Olukotun Computer Systems Lab

More information

LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood 12th International Symposium on High Performance Computer Architecture () 26 Mulitfacet

More information

ATLAS: A Chip-Multiprocessor. with Transactional Memory Support

ATLAS: A Chip-Multiprocessor. with Transactional Memory Support ATLAS: A Chip-Multiprocessor with Transactional Memory Support Njuguna Njoroge, Jared Casper, Sewook Wee, Yuriy Teslyar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun Transactional Coherence and Consistency

More information

Lecture: Transactional Memory. Topics: TM implementations

Lecture: Transactional Memory. Topics: TM implementations Lecture: Transactional Memory Topics: TM implementations 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock 2 Design Space Data Versioning

More information

Log-Based Transactional Memory

Log-Based Transactional Memory Log-Based Transactional Memory Kevin E. Moore University of Wisconsin-Madison Motivation Chip-multiprocessors/Multi-core/Many-core are here Intel has 1 projects in the works that contain four or more computing

More information

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations Lecture 21: Transactional Memory Topics: Hardware TM basics, different implementations 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end locks are blocking,

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat Relaxing Concurrency Control in Transactional Memory by Utku Aydonat A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of The Edward S. Rogers

More information

Lecture 12 Transactional Memory

Lecture 12 Transactional Memory CSCI-UA.0480-010 Special Topics: Multicore Programming Lecture 12 Transactional Memory Christopher Mitchell, Ph.D. cmitchell@cs.nyu.edu http://z80.me Database Background Databases have successfully exploited

More information

Transactional Memory Implementation Lecture 1. COS597C, Fall 2010 Princeton University Arun Raman

Transactional Memory Implementation Lecture 1. COS597C, Fall 2010 Princeton University Arun Raman Transactional Memory Implementation Lecture 1 COS597C, Fall 2010 Princeton University Arun Raman 1 Module Outline ecture 1 (THIS LECTURE) ransactional Memory System Taxonomy oftware Transactional Memory

More information

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in

More information

Scheduling Transactions in Replicated Distributed Transactional Memory

Scheduling Transactions in Replicated Distributed Transactional Memory Scheduling Transactions in Replicated Distributed Transactional Memory Junwhan Kim and Binoy Ravindran Virginia Tech USA {junwhan,binoy}@vt.edu CCGrid 2013 Concurrency control on chip multiprocessors significantly

More information

Potential violations of Serializability: Example 1

Potential violations of Serializability: Example 1 CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

DBT Tool. DBT Framework

DBT Tool. DBT Framework Thread-Safe Dynamic Binary Translation using Transactional Memory JaeWoong Chung,, Michael Dalton, Hari Kannan, Christos Kozyrakis Computer Systems Laboratory Stanford University http://csl.stanford.edu

More information

6.852: Distributed Algorithms Fall, Class 20

6.852: Distributed Algorithms Fall, Class 20 6.852: Distributed Algorithms Fall, 2009 Class 20 Today s plan z z z Transactional Memory Reading: Herlihy-Shavit, Chapter 18 Guerraoui, Kapalka, Chapters 1-4 Next: z z z Asynchronous networks vs asynchronous

More information

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC) Lecture 8: Transactional Memory TCC Topics: lazy implementation (TCC) 1 Other Issues Nesting: when one transaction calls another flat nesting: collapse all nested transactions into one large transaction

More information

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University Lecture 16: Checkpointed Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 18-1 Announcements Reading for today: class notes Your main focus:

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

Towards Pervasive Parallelism

Towards Pervasive Parallelism Towards Pervasive Parallelism Kunle Olukotun Pervasive Parallelism Laboratory Stanford University UT Austin, October 2008 End of Uniprocessor Performance 10000 Performance (vs. VAX-11/780) 1000 100 10

More information

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) Cost of Concurrency in Hybrid Transactional Memory Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University) 1 Transactional Memory: a history Hardware TM Software TM Hybrid TM 1993 1995-today

More information

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Dependence-Aware Transactional Memory for Increased Concurrency Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin Concurrency Conundrum Challenge: CMP ubiquity Parallel

More information

ABORTING CONFLICTING TRANSACTIONS IN AN STM

ABORTING CONFLICTING TRANSACTIONS IN AN STM Committing ABORTING CONFLICTING TRANSACTIONS IN AN STM PPOPP 09 2/17/2009 Hany Ramadan, Indrajit Roy, Emmett Witchel University of Texas at Austin Maurice Herlihy Brown University TM AND ITS DISCONTENTS

More information

Speculative Synchronization

Speculative Synchronization Speculative Synchronization José F. Martínez Department of Computer Science University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu/martinez Problem 1: Conservative Parallelization No parallelization

More information

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007 Thread-level Parallelism for the Masses Kunle Olukotun Computer Systems Lab Stanford University 2007 The World has Changed Process Technology Stops Improving! Moore s law but! Transistors don t get faster

More information

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, lazy implementation 1 Transactions New paradigm to simplify programming instead of lock-unlock, use transaction begin-end

More information

Building and Using the ATLAS Transactional Memory System

Building and Using the ATLAS Transactional Memory System Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford

More information

DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM

DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM DESIGNING AN EFFECTIVE HYBRID TRANSACTIONAL MEMORY SYSTEM A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT

More information

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures

FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures FARM: A Prototyping Environment for Tightly-Coupled, Heterogeneous Architectures Tayo Oguntebi, Sungpack Hong, Jared Casper, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Outline Motivation The Stanford

More information

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation Thesis Defense Master of Science Sean Moore Advisor: Binoy Ravindran Systems Software Research Group Virginia Tech Multiprocessing

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra is a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Using Software Transactional Memory In Interrupt-Driven Systems

Using Software Transactional Memory In Interrupt-Driven Systems Using Software Transactional Memory In Interrupt-Driven Systems Department of Mathematics, Statistics, and Computer Science Marquette University Thesis Defense Introduction Thesis Statement Software transactional

More information

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict

More information

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University 740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess

More information

Conventional processor designs run out of steam Complexity (verification) Power (thermal) Physics (CMOS scaling)

Conventional processor designs run out of steam Complexity (verification) Power (thermal) Physics (CMOS scaling) A Gentler, Kinder Guide to the Multi-core Galaxy Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech Guest lecture for ECE4100/6100 for Prof. Yalamanchili Reality Check Conventional

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Hydra ia a 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Transactional Memory Companion slides for The by Maurice Herlihy & Nir Shavit Our Vision for the Future In this course, we covered. Best practices New and clever ideas And common-sense observations. 2

More information

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6) Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) 1 Coherence Vs. Consistency Recall that coherence guarantees (i) that a write will eventually be seen by other processors,

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Lecture: Consistency Models, TM

Lecture: Consistency Models, TM Lecture: Consistency Models, TM Topics: consistency models, TM intro (Section 5.6) No class on Monday (please watch TM videos) Wednesday: TM wrap-up, interconnection networks 1 Coherence Vs. Consistency

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory 1 Example Programs Initially, A = B = 0 P1 P2 A = 1 B = 1 if (B == 0) if (A == 0) critical section

More information

FlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science

FlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science FlexTM Flexible Decoupled Transactional Memory Support Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science 1 Transactions: Our Goal Lazy Txs (i.e., optimistic conflict

More information

Conflict Detection and Validation Strategies for Software Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory Conflict Detection and Validation Strategies for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, William N. Scherer III, and Michael L. Scott University of Rochester www.cs.rochester.edu/research/synchronization/

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on

More information

Transactional Memory Coherence and Consistency

Transactional Memory Coherence and Consistency Transactional emory Coherence and Consistency all transactions, all the time Lance Hammond, Vicky Wong, ike Chen, rian D. Carlstrom, ohn D. Davis, en Hertzberg, anohar K. Prabhu, Honggo Wijaya, Christos

More information

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY. Tim Harris, 28 November 2014 NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 28 November 2014 Lecture 8 Problems with locks Atomic blocks and composition Hardware transactional memory Software transactional memory

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

Comparing Memory Systems for Chip Multiprocessors

Comparing Memory Systems for Chip Multiprocessors Comparing Memory Systems for Chip Multiprocessors Jacob Leverich Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, Christos Kozyrakis Computer Systems Laboratory Stanford University

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

Concurrent Preliminaries

Concurrent Preliminaries Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures

More information

Hardware Transactional Memory. Daniel Schwartz-Narbonne

Hardware Transactional Memory. Daniel Schwartz-Narbonne Hardware Transactional Memory Daniel Schwartz-Narbonne Hardware Transactional Memories Hybrid Transactional Memories Case Study: Sun Rock Clever ways to use TM Recap: Parallel Programming 1. Find independent

More information

Transactional Memory. review articles. Is TM the answer for improving parallel programming?

Transactional Memory. review articles. Is TM the answer for improving parallel programming? Is TM the answer for improving parallel programming? by james larus and christos Kozyrakis doi: 10.1145/1364782.1364800 Transactional Memory As computers evolve, programming changes as well. The past few

More information

SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS

SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS A Thesis Presented by Jennifer Mankin to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for

More information

Overview of Transaction Management

Overview of Transaction Management Overview of Transaction Management Chapter 16 Comp 521 Files and Databases Fall 2010 1 Database Transactions A transaction is the DBMS s abstract view of a user program: a sequence of database commands;

More information

ARCHITECTURES FOR TRANSACTIONAL MEMORY

ARCHITECTURES FOR TRANSACTIONAL MEMORY ARCHITECTURES FOR TRANSACTIONAL MEMORY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin

Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin Hardware Transactional Memory is a reality Sun Rock supports HTM Solaris 10 takes advantage

More information

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract Commit Algorithms for Scalable Hardware Transactional Memory Seth H. Pugsley, Rajeev Balasubramonian UUCS-07-016 School of Computing University of Utah Salt Lake City, UT 84112 USA August 9, 2007 Abstract

More information

Handout 3 Multiprocessor and thread level parallelism

Handout 3 Multiprocessor and thread level parallelism Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed

More information

SYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY

SYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY SYSTEM CHALLENGES AND OPPORTUNITIES FOR TRANSACTIONAL MEMORY A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL

More information

Multiprocessors and Locking

Multiprocessors and Locking Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

System Challenges and Opportunities for Transactional Memory

System Challenges and Opportunities for Transactional Memory System Challenges and Opportunities for Transactional Memory JaeWoong Chung Computer System Lab Stanford University My thesis is about Computer system design that help leveraging hardware parallelism Transactional

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

Lock vs. Lock-free Memory Project proposal

Lock vs. Lock-free Memory Project proposal Lock vs. Lock-free Memory Project proposal Fahad Alduraibi Aws Ahmad Eman Elrifaei Electrical and Computer Engineering Southern Illinois University 1. Introduction The CPU performance development history

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees

An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees An Effective Hybrid Transactional Memory System with Strong Isolation Guarantees Chi Cao Minh, Martin Trautmann, JaeWoong Chung, Austen McDonald, Nathan Bronson, Jared Casper, Christos Kozyrakis, Kunle

More information

Hardware Support For Serializable Transactions: A Study of Feasibility and Performance

Hardware Support For Serializable Transactions: A Study of Feasibility and Performance Hardware Support For Serializable Transactions: A Study of Feasibility and Performance Utku Aydonat Tarek S. Abdelrahman Edward S. Rogers Sr. Department of Electrical and Computer Engineering University

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Intro to Transactions

Intro to Transactions Reading Material CompSci 516 Database Systems Lecture 14 Intro to Transactions [RG] Chapter 16.1-16.3, 16.4.1 17.1-17.4 17.5.1, 17.5.3 Instructor: Sudeepa Roy Acknowledgement: The following slides have

More information

Lock Elision and Transactional Memory Predictor in Hardware. William Galliher, Liang Zhang, Kai Zhao. University of Wisconsin Madison

Lock Elision and Transactional Memory Predictor in Hardware. William Galliher, Liang Zhang, Kai Zhao. University of Wisconsin Madison Lock Elision and Transactional Memory Predictor in Hardware William Galliher, Liang Zhang, Kai Zhao University of Wisconsin Madison Email: {galliher, lzhang432, kzhao32}@wisc.edu ABSTRACT Shared data structure

More information

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea Programming With Locks s Tricky Multicore processors are the way of the foreseeable future thread-level parallelism anointed as parallelism model of choice Just one problem Writing lock-based multi-threaded

More information

Exploiting Distributed Software Transactional Memory

Exploiting Distributed Software Transactional Memory Exploiting Distributed Software Transactional Memory Christos Kotselidis Research Fellow Advanced Processor Technologies Group The University of Manchester Outline Transactional Memory Distributed Transactional

More information

Lecture 17: Transactional Memories I

Lecture 17: Transactional Memories I Lecture 17: Transactional Memories I Papers: A Scalable Non-Blocking Approach to Transactional Memory, HPCA 07, Stanford The Common Case Transactional Behavior of Multi-threaded Programs, HPCA 06, Stanford

More information

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93 What are lock-free data structures A shared data structure is lock-free if its operations

More information