Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Similar documents
Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture: Transactional Memory. Topics: TM implementations

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture: Consistency Models, TM

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 17: Transactional Memories I

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Lecture 6: TM Eager Implementations. Topics: Eager conflict detection (LogTM), TM pathologies

Lecture 10: TM Implementations. Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture 8: Eager Transactional Memory. Topics: implementation details of eager TM, various TM pathologies

Potential violations of Serializability: Example 1

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

A Basic Snooping-Based Multi-Processor Implementation

Multiprocessor Cache Coherency. What is Cache Coherence?

A Basic Snooping-Based Multi-processor

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

6 Transactional Memory. Robert Mullins

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Lecture 7: Implementing Cache Coherence. Topics: implementation details

A More Sophisticated Snooping-Based Multi-Processor

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

PARALLEL MEMORY ARCHITECTURE

A Basic Snooping-Based Multi-Processor Implementation

EazyHTM: Eager-Lazy Hardware Transactional Memory

CSE502: Computer Architecture CSE 502: Computer Architecture

ECE 485/585 Microprocessor System Design

Distributed Shared Memory and Memory Consistency Models

Database Management System

Lecture 18: Shared-Memory Multiprocessors. Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections

Concurrent Preliminaries

DHTM: Durable Hardware Transactional Memory

Lecture 22: Fault Tolerance

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Envionment

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Relaxing Concurrency Control in Transactional Memory. Utku Aydonat

Concurrency control CS 417. Distributed Systems CS 417

Synchronization. Chapter 5

Transactional Memory

Limitations of parallel processing

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transactional Memory

Lecture 24: Virtual Memory, Multiprocessors

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment

The Cache Write Problem

Final Review. May 9, 2018 May 11, 2018

Final Review. May 9, 2017

UNIT I (Two Marks Questions & Answers)

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

Transactions and Recovery Study Question Solutions

A Scalable, Non-blocking Approach to Transactional Memory

Computer Architecture Spring 2016

Lecture: Cache Hierarchies. Topics: cache innovations (Sections B.1-B.3, 2.1)

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Lecture 8: Branch Prediction, Dynamic ILP. Topics: static speculation and branch prediction (Sections )

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Improving the Practicality of Transactional Memory

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

The Google File System

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Tutorial 11. Final Exam Review

Speculative Locks. Dept. of Computer Science

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract

Advanced Topic: Efficient Synchronization

EC 513 Computer Architecture

Lecture 24: Board Notes: Cache Coherency

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

EECS 570 Final Exam - SOLUTIONS Winter 2015

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala

Concurrency Control & Recovery

Multiple-Writer Distributed Memory

Transactional Memory. Lecture 19: Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

5/17/17. Announcements. Review: Transactions. Outline. Review: TXNs in SQL. Review: ACID. Database Systems CSE 414.

Weak Consistency and Disconnected Operation in git. Raymond Cheng

Database Systems CSE 414

Multiprocessor Synchronization

Transcription:

Lecture 8: Transactional Memory TCC Topics: lazy implementation (TCC) 1

Other Issues Nesting: when one transaction calls another flat nesting: collapse all nested transactions into one large transaction closed nesting: inner transaction s rd-wr set are included in outer transaction s rd-wr set on inner commit; on an inner conflict, only the inner transaction is re-started open nesting: on inner commit, its writes are committed and not merged with outer transaction s commit set What if a transaction performs I/O? (buffering can help) 2

Useful Rules of Thumb Transactions are often short more than 95% of them will fit in cache Transactions often commit successfully less than 10% are aborted 99.9% of transactions don t perform I/O Transaction nesting is not common Amdahl s Law again: optimize the common case! 3

Design Space Data Versioning Eager: based on an undo log Lazy: based on a write buffer Conflict Detection Optimistic detection: check for conflicts at commit time (proceed optimistically thru transaction) Pessimistic detection: every read/write checks for conflicts (so you can abort quickly) 4

Detecting Conflicts Overview Writes can be cached (can t be written to memory) if the block needs to be evicted, flag an overflow (abort transaction for now) on an abort, invalidate the written cache lines Keep track of read-set and write-set (bits in the cache) for each transaction When another transaction commits, compare its write set with your own read set a match causes an abort At transaction end, express intent to commit, broadcast write-set (transactions can commit in parallel if their write-sets do not intersect) 5

Lazy Implementation (Partially Based on TCC) An implementation for a small-scale multiprocessor with a snooping-based protocol Lazy versioning and lazy conflict detection Does not allow transactions to commit in parallel 6

Handling Reads/Writes When a transaction issues a read, fetch the block in read-only mode (if not already in cache) and set the rd-bit for that cache line When a transaction issues a write, fetch that block in read-only mode (if not already in cache), set the wr-bit for that cache line and make changes in cache If a line with wr-bit set is evicted, the transaction must be aborted (or must rely on some software mechanism to handle saving overflowed data) (or must acquire commit permissions) 7

Commit Process When a transaction reaches its end, it must now make its writes permanent A central arbiter is contacted (easy on a bus-based system), the winning transaction holds on to the bus until all written cache line addresses are broadcasted (this is the commit) (need not do a writeback until the line is evicted or written again must simply invalidate other readers of these lines) When another transaction (that has not yet begun to commit) sees an invalidation for a line in its rd-set, it realizes its lack of atomicity and aborts (clears its rd- and wr-bits and re-starts) 8

Summary of Properties Lazy versioning: changes are made locally the master copy is updated only at the end of the transaction Lazy conflict detection: we are checking for conflicts only when one of the transactions reaches its end Aborts are quick (must just clear bits in cache, flush pipeline and reinstate a register checkpoint) Commit is slow (must check for conflicts, all the coherence operations for writes are deferred until transaction end) No fear of deadlock/livelock the first transaction to acquire the bus will commit successfully Starvation is possible need additional mechanisms 9

TCC Features All transactions all the time (the code only defines transaction boundaries): helps get rid of the baseline coherence protocol When committing, a transaction must acquire a central token when I/O, syscall, buffer overflow is encountered, the transaction acquires the token and starts commit Each cache line maintains a set of renamed bits this indicates the set of words written by this transaction reading these words is not a violation and the read-bit is not set 10

TCC Features Lines evicted from the cache are stored in a write buffer; overflow of write buffer leads to acquiring the commit token Less tolerant of commit delay, but there is a high degree of coherence-level parallelism To hide the cost of commit delays, it is suggested that a core move on to the next transaction in the meantime this requires double buffering to distinguish between data handled by each transaction An ordering can be imposed upon transactions useful for speculative parallelization of a sequential program 11

Parallel Commits Writes cannot be rolled back hence, before allowing two transactions to commit in parallel, we must ensure that they do not conflict with each other One possible implementation: the central arbiter can collect signatures from each committing transaction (a compressed representation of all touched addresses) Arbiter does not grant commit permissions if it detects a possible conflict with the rd-wr-sets of transactions that are in the process of committing The lazy design can also work with directory protocols 12

Title Bullet 13