Lecture 17: Transactional Memories I

Similar documents
Lecture 10: TM Implementations. Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Lecture: Transactional Memory. Topics: TM implementations

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture: Consistency Models, TM

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Commit Algorithms for Scalable Hardware Transactional Memory. Abstract

Lecture 25: Multiprocessors

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Lecture 24: Virtual Memory, Multiprocessors

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Lecture 12: Hardware/Software Trade-Offs. Topics: COMA, Software Virtual Memory

Lecture 6: TM Eager Implementations. Topics: Eager conflict detection (LogTM), TM pathologies

CSE502: Computer Architecture CSE 502: Computer Architecture

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

Lecture 8: Eager Transactional Memory. Topics: implementation details of eager TM, various TM pathologies

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

EazyHTM: Eager-Lazy Hardware Transactional Memory

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

Scalable and Reliable Communication for Hardware Transactional Memory

BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Foundations of Computer Systems

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Potential violations of Serializability: Example 1

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Goldibear and the 3 Locks. Programming With Locks Is Tricky. More Lock Madness. And To Make It Worse. Transactional Memory: The Big Idea

Transactional Memory. Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

PARALLEL MEMORY ARCHITECTURE

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Chapter 5. Multiprocessors and Thread-Level Parallelism

Transactional Memory

A Scalable, Non-blocking Approach to Transactional Memory

Scalable Cache Coherence

Memory Hierarchy in a Multiprocessor

Multiprocessor Cache Coherency. What is Cache Coherence?

Lecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing

BulkCommit: Scalable and Fast Commit of Atomic Blocks in a Lazy Multiprocessor Environment

Lecture 1: Introduction

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )

Overview: Memory Consistency

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Handout 3 Multiprocessor and thread level parallelism

The Problems with OOO

Lecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Lecture 1: Parallel Architecture Intro

EECS 570 Final Exam - SOLUTIONS Winter 2015

Log-Based Transactional Memory

Multiprocessors & Thread Level Parallelism

Scalable Multiprocessors

Shared Symmetric Memory Systems

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Portland State University ECE 588/688. Transactional Memory

Midterm Exam 02/09/2009

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

Interconnect Routing

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Computer Architecture

Lecture 17. NUMA Architecture and Programming

Thread-level Parallelism for the Masses. Kunle Olukotun Computer Systems Lab Stanford University 2007

Three Tier Proximity Aware Cache Hierarchy for Multi-core Processors

CSC/ECE 506: Architecture of Parallel Computers Sample Final Examination with Answers

A Memory System Design Framework: Creating Smart Memories

Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala

FlexTM. Flexible Decoupled Transactional Memory Support. Arrvindh Shriraman Sandhya Dwarkadas Michael L. Scott Department of Computer Science

COMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation

Scalable Cache Coherent Systems

Lecture 11: Large Cache Design

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Lecture: Memory Technology Innovations

Computer Systems Architecture

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

TSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan

Computer Systems Architecture

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins

Transcription:

Lecture 17: Transactional Memories I Papers: A Scalable Non-Blocking Approach to Transactional Memory, HPCA 07, Stanford The Common Case Transactional Behavior of Multi-threaded Programs, HPCA 06, Stanford Characterization of TCC on Chip Multiprocessors, PACT 05, Stanford 1

TM Overview Recall the basic TM implementation: Every transaction maintains read-set and write-set Writes are not propagated until commit time At commit, acquire permission to commit from a central arbiter (no parallel commit for now) Send write-set to other nodes if the write-set intersects with the node s read-set, the node s transaction is aborted and re-started 2

Implementation Issues Design details: On attempting a write (the Tx is not yet ready to commit), must obtain the most recently committed version of the block in read-only state (the block is not yet part of the read-set, though) At the time of commit, either write-thru to memory (there is no M state in the coherence protocol), or move from S to M state (write-back policy) (a dirty line can t handle speculative writes, though) For parallel commits: is an ordering implied between these transactions? is a write-set/read-set conflict allowed? is a read-set/write-set conflict allowed? is a write-set/write-set conflict allowed? 3

Parallel Commits I Ordering is implied: a programmer believes that a transaction executes in isolation Write-set/Read-set conflict should cause an abort: hence, the second transaction must: confirm there is no conflict before propagating writes or propagate writes in a manner that does not affect correctness (can t employ write-thru or write-update, can t respond to others read requests / or must keep track of dependences) Read-set/Write-set conflict need not cause an abort: the ordering should indicate that the first transaction need not abort Write-set/Write-set conflict need not cause an abort: a mechanism is required to ensure that everyone sees writes in the same correct order * Reading your own write is truly not a problem, but since info is maintained at block granularity, the block is included in the read-set 4

Parallel Commits II Conflicting writes must be merged correctly (note that the writes may be to different words in the same line) If we re not checking early for Write-set/Read-set conflicts, the first transaction must inform the next transaction after it is done (received all acks) Consistency model: two parallel transactions are sending their write-sets to other nodes over unordered networks: Just as we saw for SC, a reader must not proceed unless everyone has seen the write (the entire transaction need not have committed) Writes to a location must be serialized 5

The Stanford Approach Transactions are ordered (by contacting a central agent) A transaction first engages in validation and proceeds with commit only if it is guaranteed to not have any conflicts Write-set/Read-set conflicts are not allowed (T i checks its read-set directories and proceeds only after those directories have seen previous writes) Read-set/Write-set conflicts are allowed (Ti will ignore write notices from transactions >i) Write-set/Write-set conflicts are allowed To maintain write serialization, T i confirms that a directory is done with all transactions <i 6

Algorithm Probe your write-set to see if it is your turn to write (helps serialize writes) Let others know that you don t plan to write (thereby allowing parallel commits to unrelated directories) Mark your write-set (helps hide latency) Probe your read-set to see if previous writes have completed Validation is now complete send the actual commit message to the write set 7

Example TID Vendor Rd X Wr X P1 P2 Rd Y Wr Z NS: 1 NS: 1 D: X Z D: Y 8

Example TID Vendor TID=2 Rd X Wr X P1 TID=1 P2 Rd Y Wr Z NS: 1 NS: 1 D: X Z D: Y 9

Example Rd X Wr X P1 TID=1 TID Vendor TID=2 P2 Rd Y Wr Z Probe to write-set to see if it can proceed No writes here. I m done with you. NS: 1 NS: 3 D: X Z D: Y P2 sends the same set of probes/notifications 10

Example Can go ahead with my wr Rd X Wr X P1 TID=1 TID Vendor TID=2 P2 Must wait my turn Rd Y Wr Z Mark X NS: 1 NS: 3 D: X Z D: Y Mark messages are hiding the latency for the subsequent commit 11

Example Rd X Wr X P1 TID=1 TID Vendor TID=2 P2 Keep probing and waiting Rd Y Wr Z Probe read set and make sure they re done NS: 1 NS: 3 D: X Z D: Y 12

Example Rd X Wr X P1 TID=1 TID Vendor TID=2 P2 Keep probing and waiting Rd Y Wr Z Commit NS: 2 Invalidate sharers; NS: 3 D: X Z May cause aborts D: Y 13

Example Rd X Wr X P1 TID=1 TID Vendor TID=2 P2 Rd Y Wr Z NS: 2 NS: 3 D: X Z D: Y P2: Probe finally successful. Can mark Z. Will then check read-set. Then proceed with commit 14

Algorithm Probe your write-set to see if it is your turn to write (helps serialize writes) Let others know that you don t plan to write (thereby allowing parallel commits to unrelated directories) Mark your write-set (helps hide latency) Probe your read-set to see if previous writes have completed Validation is now complete send the actual commit message to the write set 15

Issues The protocol requires many messages: enables latency hiding for the commit process, though There may be high directory locality for a NUMA machine, but possibly not for a single large-scale CMP with a directory-based cache coherence protocol To support a write-back cache policy, a new non-speculative cache is introduced (so that a transaction does not speculatively over-write a dirty line) 16

Evaluation 17

Results Applications with small transactions suffer more from commit latency 18

Results 19

Transaction Characteristics An evaluation of 35 multi-threaded programs Transaction length Sizes of read and write sets 20

Transaction Characteristics 21

Title Bullet 22