Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Similar documents
Lecture 18: Shared-Memory Multiprocessors. Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture: Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture 17: Multiprocessors. Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )

Lecture: Memory, Coherence Protocols. Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols

Lecture 8: Transactional Memory TCC. Topics: lazy implementation (TCC)

Limitations of parallel processing

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Multiprocessors & Thread Level Parallelism

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing

ECSE 425 Lecture 30: Directory Coherence

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Lecture: Transactional Memory. Topics: TM implementations

Multiprocessor Cache Coherency. What is Cache Coherence?

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Chapter 5. Multiprocessors and Thread-Level Parallelism

Lecture 1: Introduction

Lecture 20: Multi-Cache Designs. Spring 2018 Jason Tang

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

ECE 485/585 Microprocessor System Design

EC 513 Computer Architecture

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

Lecture 14: Cache Innovations and DRAM. Today: cache access basics and innovations, DRAM (Sections )

Thread- Level Parallelism. ECE 154B Dmitri Strukov

Lecture 17: Transactional Memories I

Chapter 5. Multiprocessors and Thread-Level Parallelism

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Caches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2

Shared Symmetric Memory Systems

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

CSC 631: High-Performance Computer Architecture

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Page 1. Cache Coherence

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

1. Memory technology & Hierarchy

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Directory Implementation. A High-end MP

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Computer Organization

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture: Consistency Models, TM

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

COMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

CS 152 Computer Architecture and Engineering

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

Aleksandar Milenkovich 1

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

Lecture 24: Board Notes: Cache Coherency

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Lecture 13: Cache Hierarchies. Today: cache access basics and innovations (Sections )

Computer Systems Architecture

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Interconnect Routing

Scalable Cache Coherence

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

CMSC 611: Advanced. Distributed & Shared Memory

COSC 6385 Computer Architecture - Multi Processor Systems

Snooping coherence protocols (cont.)

Computer Systems Architecture

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

3/13/2008 Csci 211 Lecture %/year. Manufacturer/Year. Processors/chip Threads/Processor. Threads/chip 3/13/2008 Csci 211 Lecture 8 4

EE382 Processor Design. Processor Issues for MP

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Computer Science 146. Computer Architecture

Cache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014

CMSC 611: Advanced Computer Architecture

Lecture: Coherence and Synchronization. Topics: synchronization primitives, consistency models intro (Sections )

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Transcription:

Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1

SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System 2

SMPs Centralized main memory and many caches many copies of the same data A system is cache coherent if a read returns the most recently written value for that word Time Event Value of X in Cache-A Cache-B Memory 0 - - 1 1 CPU-A reads X 1-1 2 CPU-B reads X 1 1 1 3 CPU-A stores 0 in X 0 1 0 3

Cache Coherence A memory system is coherent if: P writes to X; no other processor writes to X; P reads X and receives the value previously written by P P1 writes to X; no other processor writes to X; sufficient time elapses; P2 reads X and receives value written by P1 Two writes to the same location by two processors are seen in the same order by all processors write serialization The memory consistency model defines time elapsed before the effect of a processor is seen by others 4

Cache Coherence Protocols Directory-based: A single location (directory) keeps track of the sharing status of a block of memory Snooping: Every cache block is accompanied by the sharing status of that block all cache controllers monitor the shared bus so they can update the sharing status of the block, if necessary Write-invalidate: a processor gains exclusive access of a block before writing by invalidating all other copies Write-update: when a processor writes, it updates other shared copies of that block 5

Design Issues Invalidate Find data Writeback / writethrough Cache block states Contention for tags Enforcing write serialization Main Memory I/O System 6

SMP Example A B Main Memory C I/O System D A: Rd X B: Rd X C: Rd X A: Wr X A: Wr X C: Wr X B: Rd X A: Rd X A: Rd Y B: Wr X B: Rd Y B: Wr X B: Wr Y 7

SMP Example A B C A: Rd X B: Rd X C: Rd X A: Wr X A: Wr X C: Wr X B: Rd X A: Rd X A: Rd Y B: Wr X B: Rd Y B: Wr X B: Wr Y 8

SMP Example A B C A: Rd X S B: Rd X S S C: Rd X S S S A: Wr X E I I A: Wr X E I I C: Wr X I I E B: Rd X I S S A: Rd X S S S A: Rd Y S (Y) S (X) S (X) B: Wr X S (Y) E (X) I B: Rd Y S (Y) S (Y) I B: Wr X S (Y) E (X) I B: Wr Y I E (Y) I 9

Example Protocol Request Source Block state Action Read hit Proc Shared/excl Read data in cache Read miss Proc Invalid Place read miss on bus Read miss Proc Shared Conflict miss: place read miss on bus Read miss Proc Exclusive Conflict miss: write back block, place read miss on bus Write hit Proc Exclusive Write data in cache Write hit Proc Shared Place write miss on bus Write miss Proc Invalid Place write miss on bus Write miss Proc Shared Conflict miss: place write miss on bus Write miss Proc Exclusive Conflict miss: write back, place write miss on bus Read miss Bus Shared No action; allow memory to respond Read miss Bus Exclusive Place block on bus; change to shared Write miss Bus Shared Invalidate block Write miss Bus Exclusive Write back block; change to invalid 10

Coherence Protocols Two conditions for cache coherence: write propagation write serialization Cache coherence protocols: snooping directory-based write-update write-invalidate 11

Performance Improvements What determines performance on a multiprocessor: What fraction of the program is parallelizable? How does memory hierarchy performance change? New form of cache miss: coherence miss such a miss would not have happened if another processor did not write to the same cache line False coherence miss: the second processor writes to a different word in the same cache line this miss would not have happened if the line size equaled one word 12

How do Cache Misses Scale? Increasing cache capacity Compulsory Capacity Conflict Coherence True False Increasing processor count Increasing block size Increasing associativity 13

Simplifying Assumptions All transactions on a read or write are atomic on a write miss, the miss is sent on the bus, a block is fetched from memory/remote cache, and the block is marked exclusive Potential problem if the actions are non-atomic: P1 sends a write miss on the bus, P2 sends a write miss on the bus: since the block is still invalid in P1, P2 does not realize that it should write after receiving the block from P1 instead, it receives the block from memory Most problems are fixable by keeping track of more state: for example, don t acquire the bus unless all outstanding transactions for the block have completed 14

Coherence in Distributed Memory Multiprocs Distributed memory systems are typically larger bus-based snooping may not work well Option 1: software-based mechanisms message-passing systems or software-controlled cache coherence Option 2: hardware-based mechanisms directory-based cache coherence 15

Directory-Based Cache Coherence The physical memory is distributed among all processors The directory is also distributed along with the corresponding memory The physical address is enough to determine the location of memory The (many) processing nodes are connected with a scalable interconnect (not a bus) hence, messages are no longer broadcast, but routed from sender to receiver since the processing nodes can no longer snoop, the directory keeps track of sharing state 16

Distributed Memory Multiprocessors & & & & Memory I/O Memory I/O Memory I/O Memory I/O Directory Directory Directory Directory Interconnection network 17

Cache Block States What are the different states a block of memory can have within the directory? Note that we need information for each cache so that invalidate messages can be sent The block state is also stored in the cache for efficiency The directory now serves as the arbitrator: if multiple write attempts happen simultaneously, the directory determines the ordering 18

Title Bullet 19