INTRODUCTION. Hybrid Transactional Memory. Transactional Memory. Problems with Transactional Memory Problems

Similar documents
Hybrid Transactional Memory

Phased Transactional Memory

6.852: Distributed Algorithms Fall, Class 20

Hybrid Transactional Memory

Lock vs. Lock-free Memory Project proposal

PhTM: Phased Transactional Memory

Transactional Memory

ABORTING CONFLICTING TRANSACTIONS IN AN STM

Refined Transactional Lock Elision

Hardware Transactional Memory. Daniel Schwartz-Narbonne

Scheduling Transactions in Replicated Distributed Transactional Memory

Cost of Concurrency in Hybrid Transactional Memory. Trevor Brown (University of Toronto) Srivatsan Ravi (Purdue University)

Transactional Memory: Architectural Support for Lock-Free Data Structures Maurice Herlihy and J. Eliot B. Moss ISCA 93

Hybrid NOrec: A Case Study in the Effectiveness of Best Effort Hardware Transactional Memory

Early Results Using Hardware Transactional Memory for High-Performance Computing Applications

Transactional Memory. Yaohua Li and Siming Chen. Yaohua Li and Siming Chen Transactional Memory 1 / 41

A Qualitative Survey of Modern Software Transactional Memory Systems

Mutex Locking versus Hardware Transactional Memory: An Experimental Evaluation

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

unreadtvar: Extending Haskell Software Transactional Memory for Performance

Portland State University ECE 588/688. Transactional Memory

Architectural Support for Software Transactional Memory

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

6 Transactional Memory. Robert Mullins

Cache Coherence and Atomic Operations in Hardware

Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional. memory

SOFTWARE TRANSACTIONAL MEMORY FOR MULTICORE EMBEDDED SYSTEMS

TRANSACTION MEMORY. Presented by Hussain Sattuwala Ramya Somuri

Unbounded Transactional Memory

A Survey Paper on Transactional Memory

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Log-Based Transactional Memory

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

COMP3151/9151 Foundations of Concurrency Lecture 8

Reduced Hardware Lock Elision

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Hybrid STM/HTM for nested transactions in Java

Chí Cao Minh 28 May 2008

LogTM: Log-Based Transactional Memory

Multi-Core Computing with Transactional Memory. Johannes Schneider and Prof. Roger Wattenhofer

Summary: Issues / Open Questions:

Towards a Software Transactional Memory for Graphics Processors

Enhancing Concurrency in Distributed Transactional Memory through Commutativity

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Early Experience with a Commercial Hardware Transactional Memory Implementation

Consistency & TM. Consistency

Page 1. Consistency. Consistency & TM. Consider. Enter Consistency Models. For DSM systems. Sequential consistency

AST: scalable synchronization Supervisors guide 2002

Managing Resource Limitation of Best-Effort HTM

Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Summary: Open Questions:

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

NUMA-Aware Reader-Writer Locks PPoPP 2013

Lecture 25: Synchronization. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 12 Transactional Memory

Conflict Detection and Validation Strategies for Software Transactional Memory

Concurrency control CS 417. Distributed Systems CS 417

Atomic Transactions in Cilk

Reduced Hardware NOrec: A Safe and Scalable Hybrid Transactional Memory

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

Speculative Synchronization

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

A Simple Optimistic skip-list Algorithm

Reduced Hardware Transactions: A New Approach to Hybrid Transactional Memory

Dependence-Aware Transactional Memory for Increased Concurrency. Hany E. Ramadan, Christopher J. Rossbach, Emmett Witchel University of Texas, Austin

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract

Distributed Databases

Tackling Concurrency With STM. Mark Volkmann 10/22/09

Tackling Concurrency With STM

Using Software Transactional Memory In Interrupt-Driven Systems

Optimizing Hybrid Transactional Memory: The Importance of Nonspeculative Operations

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Scalable Software Transactional Memory for Chapel High-Productivity Language

Cost of Concurrency in Hybrid Transactional Memory

arxiv: v1 [cs.dc] 22 May 2014

arxiv: v1 [cs.dc] 13 Aug 2013

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level

Mohamed M. Saad & Binoy Ravindran

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

Tim Harris Microsoft Research Aleksandar Dragojević I&C, EPFL

SNZI: Scalable NonZero Indicators

Enhancing efficiency of Hybrid Transactional Memory via Dynamic Data Partitioning Schemes

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris

Overview. Introduction to Transaction Management ACID. Transactions

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Invyswell: A HyTM for Haswell RTM. Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, Maurice Herlihy

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

Distributed KIDS Labs 1

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

10 Supporting Time-Based QoS Requirements in Software Transactional Memory

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Other consistency models

Integrating Transactional Memory into C++

Transcription:

Hybrid Transactional Memory Peter Damron Sun Microsystems peter.damron@sun.com Alexandra Fedorova Harvard University and Sun Microsystems Laboratories fedorova@eecs.harvard.edu Yossi Lev Brown University and Sun Microsystems Laboratories levyossi@cs.brown.edu Victor Luchangco Mark Moir Daniel Nussbaum Sun Microsystems Laboratories {victor.luchangco,mark.moir,daniel.nussbaum}@sun.com INTRODUCTION Transactional Memory Transactional memory promises to substantially reduce difficulty in writing correct, efficient, and scalable concurrent programs Problems with Transactional Memory Problems Bounded* and Best-effort* HTM proposals impose unreasonable constraints on programmers More flexible STM implementations are too slow Too much overhead

What is Hybrid Transactional Memory? Hybrid Transactional Memory (HyTM) An approach to implementing transactions in software that uses best-effort HTM to boost performance but doesn t depend on HTM Goal: Programmers can design & test transactional programs in existing systems today, and still enjoy benefits of HTM support when it becomes available in the future Inspire processor designers to consider TM Hybrid Transactional Memory Hybrid Transactional Memory (HyTM) System Library Compiler 1. Transaction is attempted using best-effort HTM 2. Retried using software if it fails Background & Other Relevant Definitions

Review Transactional memory Supports code sections that are executed atomically Appear to be executed one at a time with no interleaving between their steps Allows programmers to express what SHOULD be executed atomically not HOW it should be done Reduces difficulty of writing correct concurrent programs Can significantly improve performance/scalability making them: Easier to write, understand and maintain Transactional Memory Review Transactional Memory Allows sections of code to be designated as transactional Transaction commits: Appears to have executed atomically Transaction aborts: Has no effect on shared state Transactional section is attempted/retried until transaction commits More on Bounded HTM Significant progress towards practical/sufficient STM Hardware support for TM is desirable/beneficial Herlihy & Moss Hardware Transactional Memory Bounded-size atomic transactions that are short enough to be completed without context switching Supported by simple additions to cache mechanisms of current processors Programmers must be aware of specific limitations of HTM Bounded & Best-effort HTM Herlihy & Moss (1993) Fixed-size, fully-associative transactional cache Exploits existing cache coherence protocols to enforce atomicity (up to the size of the transactional cache) Larger transactions fail Transactions that are interrupted by context switch fail

Bounded & Best-effort HTM Drawbacks of Bounded & Best-effort HTM Transactions can only succeed if it fits in cache Ability to commit depends on size and layout w.r.t. cache geometry Doesn t guarantee to handle every transaction Introduction Unbounded HTM Overcoming disadvantages of bounded HTM using unbounded HTM Allow transactions to commit even if they exceed on-chip resources or too long duration Too complex and risky to be used in near future Unbounded STM UTM Ananian et al. Supports transactions that can survive context switches and whose size is only limited by virtual memory Requires additional hardware support that seems too complicated for near future commercial processors Related Work Moore et al. Thread-Level TM (TTM) LogTM Stores tentative new values in place Maintain logs to undo changes made by transaction (abort) Transactions can t survive context switches

Related Work Hybrid Transactional Memory Kumar et al. Use HTM to optimize DSTM of Herlihy Specific HTM design Key Differences: Aims to optimize object-based DSTM HyTM uses low-level, word-based TM Requires new hardware support HyTM doesn t No Support for preserving transactions across context switches Introduction Hybrid Transactional Memory HyTM New approach to TM that works in existing systems but can boost performance and scalability using future hardware support 1. Exploits HTM support if available - For transactions that don t exceed HTM limitations 2. Executes other transactions in software (STM) Any transaction can be executed in software HTM doesn t need to be able to Build best-effort HTM Avoid risk/complexity of unbounded Introduction - HyTM Built prototype compiler & STM library Compiler Produces code for executing transactions in HTM/STM Key design challenge Ensure that HTM conflicts & STM conflicts are detected/resolved appropriately Augment HTM transactions with code to look up structures maintained by STM Hybrid Transactional Memory HyTM Prototype The Nitty Gritty

HyTM Design Overview Compiler & Library Compiler produces 2 code paths for each transaction One attempts to use HTM The other uses STM by invoking calls to the library HYTM_SECTION_BEGIN - where hybrid transactional code section begins HYTM_SECTION_END - where it ends Compiler translates code between to execute transactionally HTM Interface Transaction starts with txn_begin and ends with txn_end txn_begin specifies an address to branch to in case transaction aborts If transaction executes txn_end without aborting, continues past txn_end Appeared to have executed atomically Otherwise branches off to the address specified Transaction aborts with txn_abort HTM Interface First attempt each transaction in HTM Expect most transactions to be able to complete in hardware These are faster than software transactions If HTM fails, use library Call method in HyTM library to decide whether to retry with HTM or STM Method also implements contention control policies Transaction that fails repeatedly should be attempted in STM All transparent to the programmer Augmenting Hardware Transactions Augment hardware transactions to insure they interact correctly with transactions executed in software library Key observation: Location s logical value only differs from its physical value if a software transaction has modified it If no such transaction exists, can apply transaction to location using HTM HyTM augments HTM transactions to detect these conflicts with STM at orecs Looks up orec of location accessed to detect conflict with STM HTM aborts if orec changes before it commits

Augmenting Hardware Transactions Straightforward HTM Augmented HTM (produced by compiler) txn_begin handler-addr tmp= X; Y = tmp + 5; txn_end txn_begin handler-addr if (!canhardwareread(&x)) txn_abort; tmp= X; if(!canhardwarewrite(&y)) txn_abort; Y = tmp + 5; txn_end; Augmenting Hardware Transactions HyTM Software Library Bool canhardwareread(a) { } return (OREC_TABLE[h(a)].o_mode!= WRITE); Check orec to see if the mode is not set to WRITE Bool canhardwarewrite(a) { } return (OREC_TABLE[h(a)].o_mode == UNOWNED); Check orec to see if the mode is set to UNOWNED HyTM Data Structures Two key data structures: Transaction Descriptor Ownership Record (orec) HyTM Data Structures Transaction Descriptor Prototype maintains only one for each thread Ownership record Maintains table of orecs Each location in memory maps to an orec in table Multiple locations map to same orec Uses hashing to do this

How are Software Transactions Implemented? Transaction begins with empty read and write sets Status is set to ACTIVE Executes the user code (transactional section) Make calls to STM library for each memory access Writing in Software Transactions Writing Before writing a location: transaction acquires exclusive ownership (WRITE mode) of orec for that location Stores descriptor identifier (tdid) & version # in the orec Subsequent writes (by same transaction) finds entry in orec & overwrites value Creates an entry in write set to record new value Example follows...

Reading in Software Transactions Reading Before reading a location: Transaction acquires ownership of orec (READ mode) for that location Set mode field to READ and rdcnt to 1 If already owned in READ mode: Increment rdcnt field Records index of orec and snapshot of orec s contents into write set After every read: Transaction validates entire read set Ensure values read is consistent with values previously read Validation in Software Transactions Validation Determines that none of the locations its read have changed Iterating over read set Compare each orec owned in READ mode with snapshot Nothing should change except possibly rdcnt field Fast Read Validation Avoids iterating over a transaction s read set for validation Maintain a counter for # of times orec owned in READ mode is stolen by a transaction in WRITE mode If counter hasn t changed since last validation All snapshots are still valid If counter changes since last validation Must iterate over entire read set

Committing Software Transactions Completing a transaction: Transaction attempts to commit Validates read set If succeeds, atomically change status from ACTIVE to COMMITTED Copies values in its write set back to appropriate memory locations Releases ownership of those locations Maintains exclusive ownership of locations until values are copied Prevents others from viewing out-of-date values Implementing Software Transactions Read after write & Write after read Read after write T 0 has write ownership of orec T 0 needs to read from orec T 0 searches write set to see if value was already stored in location If not, get value from memory (virtual) Logical value would only change by another transaction Write after read T 0 has read ownership of orec T 0 needs to write to location that maps to orec Use snapshot to upgrade to WRITE mode Has to insure another transaction doesn t get WRITE ownership in between Entry in read set is discarded no longer in READ mode

Resolving Conflicts (Software Transactions) T 0 requires ownership of location T 1 already owns in WRITE mode Case 1: T 1 s status is ABORTED T 1 won t be able to commit T 0 can steal ownership from T 1 Case 2: T 1 s status is ACTIVE T 0 can abort T 1 (change its status fro ACTIVE to ABORTED) T 0 can wait and give T 1 a chance to complete Decision is left to the contention manager Examples follow... Implementing Software Transactions Resolving Conflicts cont d... Case 3: T 1 s status is COMMITTED Not safe to steal because T 1 may not have copied values T 0 waits for T 1 to release ownership of the location T 0 requires ownership of location (WRITE) T 1 already owns in READ mode T 1 can steal by acquiring in WRITE mode This may cause other read validations to fail T 1 can wait and allow reads to complete T 1 asks contention manager

Contention Management HyTM design allows interface for separable contention managers Library uses interface to inform contention manager of various events Asks advice for decisions in situations discussed earlier HyTM uses the following: Polka contention manager variant of Greedy manager Managers are not optimized Possible improvement of performance Conclusions Hybrid Transactional Memory allows us to use transactional programs today, while taking advantage of hardware support of the future Software-only mode is much more scalable than using locking techniques HyTM needs BETTER contention management Processor designers should consider supporting Transactional programming That s all Folks!