Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Similar documents
Contributors to the course material

Cache Coherence Protocols for Sequential Consistency

EC 513 Computer Architecture

Page 1. Cache Coherence

A Cache Coherence Protocol to Implement Sequential Consistency. Memory Consistency in SMPs

EC 513 Computer Architecture

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Memory Hierarchy in a Multiprocessor

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

Interconnect Routing

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

A Basic Snooping-Based Multi-Processor Implementation

Scalable Cache Coherence

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Lecture 25: Multiprocessors

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Sizhuo Zhang TA

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Lecture 24: Board Notes: Cache Coherency

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Lecture 7: Implementing Cache Coherence. Topics: implementation details

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

Shared Memory Multiprocessors

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

A Basic Snooping-Based Multi-processor

UChicago CMSC Computer Architecture, Autumn 2018 Lab 5 Extra Credit: Multi-Core and The MESI Cache Coherence Protocol

Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Scalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

Processor Architecture

Lecture 1: Introduction

Approaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures

Computer Architecture

Scalable Cache Coherent Systems

A Basic Snooping-Based Multi-Processor Implementation

Scalable Cache Coherence

MIPS Coherence Protocol Specification

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 18 Cache Coherence

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues

Realistic Memories and Caches

Foundations of Computer Systems

Cache Coherence in Scalable Machines

Multiprocessors and Locking

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Performance study example ( 5.3) Performance study example

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

The Cache Write Problem

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand

Shared Memory Architectures. Approaches to Building Parallel Machines

12 Cache-Organization 1

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Cache Coherence in Scalable Machines

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

CSC 631: High-Performance Computer Architecture

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Multiprocessor Cache Coherency. What is Cache Coherence?

A More Sophisticated Snooping-Based Multi-Processor

Shared Memory Architectures. Shared Memory Multiprocessors. Caches and Cache Coherence. Cache Memories. Cache Memories Write Operation

Limitations of parallel processing

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Scalable Multiprocessors

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

TSO-CC Specification

Verifying Cache Coherence in ACL2. Ben Selfridge Oracle / UT Austin

Parallel Architecture. Sathish Vadhiyar

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

EE 457 Unit 7a. Cache and Memory Hierarchy

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Multiprocessors & Thread Level Parallelism

Chapter Seven. Idea: create powerful computers by connecting many smaller ones

ECE 669 Parallel Computer Architecture

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

EECS 570 Lecture 11. Directory-based Coherence. Winter 2019 Prof. Thomas Wenisch

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Physical Design of Snoop-Based Cache Coherence on Multiprocessors

ECE/CS 757: Homework 1

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Transcription:

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-1

Shared Memory Systems L2 M L2 Interconnect Modern systems often have hierarchical caches Each cache has exactly one parent but can have zero or more children Logically only a parent and its children can communicate directly Inclusion property is maintained between a parent and its children, i.e., Because usually a L i a L i+1 L i+1 >> L i November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-2

Cache-coherence problem CU-1 CU-2 A 100 200 cache-1 A 100 cache-2 CU-Memory bus A 100 200 memory Suppose CU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has a stale value Do these stale values matter? What is the view of shared memory for programming? November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-3

Cache-Coherent Memory... req res req res Monolithic Memory A monolithic or instantaneous memory processes one request at a time and responds to requests immediately A memory with hierarchy of caches is said to be coherent or atomic, if functionally it behaves like the monolithic memory November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-4

Maintaining Store Atomicity Store atomicity requires all processors to see writes occur in the same order multiple copies of an address in various caches can cause this to be violated This property can be ensured if: Only one cache at a time has the write permission for an address No cache can have a stale copy of the data after a write to the address has been performed cache coherence protocols are used to implement store atomicity November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-5

Cache Coherence rotocols Write request: the address is invalidated in all other caches before the write is performed Read request: if a dirty copy is found in some cache, that value must be used by doing a write-back and then reading the memory or forwarding that dirty value directly to the reader Such protocols are called Invalidation-based November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-6

State needed to maintain Cache Coherence Use MSI encoding in caches where I means this cache does not contain the address S means this cache has the address but so may other caches; hence it can only be read M means only this cache has the address; hence it can be read and written The states M, S, I can be thought of as an order M > S > I A transition from a lower state to a higher state is called an Upgrade A transition from a higher state to a lower state is called a Downgrade November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-7

Sibling invariant and compatibility Sibling invariant: Cache is in state M its siblings are in state I That is, the sibling states are compatible IsCompatible(M, M) = False IsCompatible(M, S) = False IsCompatible(S, M) = False All other cases = True November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-8

Cache State Transitions invalidate I flush S store load store M optimizations write-back This state diagram is helpful as long as one remembers that each transition involves cooperation of other caches and the main memory to maintain the sibling invariants November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-9

Cache Actions On a read miss (i.e., Cache state is I): In case some other cache has the address in state M then write back the dirty data to Memory Read the value from Memory and set the state to S On a write miss (i.e., Cache state is I or S): Invalidate the address in all other caches and in case some cache has the address in state M then write back the dirty data Read the value from Memory if necessary and set the state to M Misses cause Cache upgrade actions which in turn may cause further downgrades or upgrades on other caches November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-10

MSI protocol: some issues It never makes sense to have two outstanding requests for the same address from the same processor/cache It is possible to have multiple requests for the same address from different processors. Hence there is a need to arbitrate requests A cache needs to be able to evict an address in order to make room for a different address Voluntary downgrade Memory system (higher-level cache) should be able to force a lower-level cache to downgrade caches need to keep track of the state of their children s caches November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-11

Directory State Encoding Two-level (, M) system S a Interconnect a <S,I,I,I> For each address in a cache, the directory keeps two types of info c.state[a] (sibling info): do c s siblings have a copy of address a; M (means no), S (means maybe) m.child[c k ][a] (children info): the state of child c k for address a; At most one child can be in state M November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-12

Directory state encoding wait states New states to deal with waiting for responses: c.waitp[a] : Denotes if cache c is waiting for a response from its parent No means not waiting Yes (S I) means waiting for a response to transition to state S or I, respectively m.waitc[c k ][a] : Denotes if memory m is waiting for a response from its child c k No Yes (M S) Cache state in : <(M S I), (No Yes(M S))> Directory state in home memory (for each child): <[(M S I), (No Yes(S I))]> Child s state Waiting for downgrade response November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-13

A Directory-based rotocol an abstract view p2m m2p c2m m2c interconnect m2p c2m m2c p2m in out Each cache has 2 pairs of queues (c2m, m2c) to communicate with the memory (p2m, m2p) to communicate with the processor Message format: <cmd, src dst, a, s, data> Req/Resp address state FIFO message passing between each (src dst) pair except a Req cannot block a Resp Messages in one src dst path cannot block messages in another src dst path November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-14 m

rocessor Hit Rules Load-hit rule p2m.msg=(load a) & (c.state[a]>i) p2m.deq; m2p.enq(c.data[a]); Store-hit rule p2m.msg=(store a v) & c.state[a]=m p2m.deq; m2p.enq(ack); c.data[a]:=v; p2m m2p c2m m2c November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-15

rocessing misses: Requests and Responses Cache 1,5,8 3,5,7 3 Cache 1,5,8 3,5,7 5 1 2 4 6 2,6 2,4 Memory 1 Up-req send (cache) 2 Up-req proc, Up resp send (memory) 3 Up-resp proc (cache) 4 Dn-req send (memory) 5 Dn-req proc, Dn resp send (cache) 6 Dn-resp proc (memory) 7 Dn-req proc, drop (cache) 8 Voluntary Dn-resp (cache) November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-16

Invariants for a CC-protocol design Directory state is always a conservative estimate of a child s state E.g., if directory thinks that a child cache is in S state then the cache has to be in either I or S state For every request there is a corresponding response, though sometimes a response may have been generated even before the request was processed Communication system has to ensure that responses cannot be blocked by requests a request cannot overtake a response for the same address At every merger point for requests, we will assume fair arbitration to avoid starvation November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-17