EC 513 Computer Architecture

Similar documents
Page 1. Cache Coherence

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

EC 513 Computer Architecture

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Approaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures

Memory Hierarchy in a Multiprocessor

Lecture 24: Board Notes: Cache Coherency

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

A Cache Coherence Protocol to Implement Sequential Consistency. Memory Consistency in SMPs

Contributors to the course material

Cache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Architecture

Scalable Cache Coherence

Scalable Cache Coherence

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 18 Cache Coherence

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

CSC 631: High-Performance Computer Architecture

Cache Coherence in Scalable Machines

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware

Chapter 5. Multiprocessors and Thread-Level Parallelism

Scalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:

Shared Memory Architectures. Approaches to Building Parallel Machines

Cache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols

Portland State University ECE 588/688. Cache Coherence Protocols

Lecture 1: Introduction

CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)

Scalable Cache Coherent Systems

Cache Coherence and Atomic Operations in Hardware

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Scalable Multiprocessors

Chapter 5 Thread-Level Parallelism. Abdullah Muzahid

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

NOW Handout Page 1. Recap: Performance Trade-offs. Shared Memory Multiprocessors. Uniprocessor View. Recap (cont) What is a Multiprocessor?

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

Chapter 5. Multiprocessors and Thread-Level Parallelism

Homework 6. BTW, This is your last homework. Assigned today, Tuesday, April 10 Due time: 11:59PM on Monday, April 23. CSCI 402: Computer Architectures

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Computer Organization

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Consistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt

Shared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

CPS104 Computer Organization and Programming Lecture 20: Superscalar processors, Multiprocessors. Robert Wagner

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Multiprocessor Cache Coherency. What is Cache Coherence?

Cache Coherence in Bus-Based Shared Memory Multiprocessors

Snooping coherence protocols (cont.)

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Computer Architecture and Parallel Computing 并行结构与计算. Lecture 6 Coherence Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Lecture 17: Parallel Architectures and Future Computer Architectures. Shared-Memory Multiprocessors

The Cache Write Problem

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

Computer Systems Architecture

Cache Coherence in Scalable Machines

Chapter 5. Thread-Level Parallelism

Parallel Programming Platforms

ECE 485/585 Microprocessor System Design

Parallel Architecture. Sathish Vadhiyar

Limitations of parallel processing

Processor Architecture

CMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3

CS 152 Computer Architecture and Engineering

Lecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Cache Coherence: Part II Scalable Approaches

Symmetric Multiprocessors

Special Topics. Module 14: "Directory-based Cache Coherence" Lecture 33: "SCI Protocol" Directory-based Cache Coherence: Sequent NUMA-Q.

Review. EECS 252 Graduate Computer Architecture. Lec 13 Snooping Cache and Directory Based Multiprocessors. Outline. Challenges of Parallel Processing

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Shared Memory Multiprocessors

Recall: Sequential Consistency. CS 258 Parallel Computer Architecture Lecture 15. Sequential Consistency and Snoopy Protocols

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Advanced OpenMP. Lecture 3: Cache Coherency

Foundations of Computer Systems

Recall: Sequential Consistency Example. Implications for Implementation. Issues for Directory Protocols

Lecture notes for CS Chapter 4 11/27/18

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors

CSC526: Parallel Processing Fall 2016

Shared Memory Multiprocessors

Transcription:

EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy

Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100

Consistency in SMs CU-1 CU-2 A 200 Cache-1 A 100 Cache-2 CU- bus A 100 Suppose CU-1 updates A to 200 Write-back: memory and cache-2 have stale values Write-through: cache-2 has a stale value

Consistency in SMs CU-1 CU-2 A 200 Cache-1 A 100 Cache-2 CU- bus A 100 Suppose CU-1 updates A to 200 Do these stale values matter? What is the view of shared memory for programming?

Write-through Caches & SC T1 is executed rog T1 ST X, 1 ST Y,11 Cache-1 writes back Y Cache-1 X= 1 Y=11 X= 1 Y=11 Y =10 Y =11 Cache-2 Y = X = Y = X = rog T2 LD Y, R1 ST Y, R1 LD X, R2 ST X,R2 T2 executed X= 1 Y=11 Y =11 Y = 11 11 0 Cache-1 writes back X X= 1 Y=11 X = 1 Y =11 Y = 11 11 0

Write-through Caches & SC T1 is executed rog T1 ST X, 1 ST Y,11 Cache-1 X= 1 Y=11 Y =10 Cache-1 writes back Y Cache-2 Y = X = rog T2 LD Y, R1 ST Y, R1 LD X, R2 ST X,R2 T2 executed X= 1 Y=11 Y =11 Y = 11 11 0 Cache-1 writes back X X= 1 Y=11 X = 1 Y =11 Y = 11 11 0 Cache-2 writes back X & Y X= 1 Y=11 X = 1 Y =11 0 11 Y =11 11 0

Write-through Caches & SC rog T1 ST X, 1 ST Y,11 Cache-1 X=0 Y=10 Y =10 Cache-2 Y = rog T2 LD Y, R1 ST Y, R1 LD X, R2 ST X,R2 T1 is executed X= 1 Y=11 X = 1 Y =11 Y = X =0 T2 executed X= 1 Y=11 X =1 Y =11 0 11 Write-through caches don t preserve sequential consistency either Y = 11 11 0

Maintaining Sequential Consistency Motivation: We can do without locks -- SC is sufficient for writing producer-consumer and mutual exclusion codes (e.g., Dekker) roblem: Multiple copies of a location in various caches can cause SC to break down. Hardware support is required such that Only one processor at a time has write permission for a location No processor can load a stale copy of the location after a write Cache coherence protocols

A System with Multiple Caches L2 L2 Interconnect Modern systems often have hierarchical caches Each cache has exactly one parent but can have zero or more children Only a parent and its children can communicate directly M

A System with Multiple Caches L2 L2 Interconnect M Inclusion property is maintained between a parent and its children, i.e., a in L i implies that a in L i+1

Cache Coherence rotocols for SC Write request: The address is invalidated in all other caches before the write is performed, or The address is updated in all other caches after the write is performed Read request: If a dirty copy is found in some cache, a write-back is performed before the memory is read We will focus on Invalidation protocols as opposed to Update protocols

Warmup: arallel I/O Address (A) Bus hysical roc. Data (D) Cache R/W age transfers occur while the rocessor is running A Either Cache or DMA can be the Bus Master and effect transfers D R/W DMA DISK DMA stands for Direct Access

roblems with arallel I/O Cached portions of page Bus hysical roc. Cache DMA transfers DMA DISK à Disk: hysical memory may be stale if cache copy is dirty Disk à : Cache may have data corresponding to the memory

Snoopy Cache Goodman 1983 Idea: have the cache watch (or snoop upon) DMA transfers, and then do the right thing Snoopy cache tags are dual-ported Used to drive Bus when Cache is Bus Master roc. A R/W D Tags and State Data (lines) A R/W Snoopy read port attached to Bus Cache

Snoopy Cache Actions Observed Bus Cycle Read Cycle, i.e., à Disk Write Cycle, i.e., Disk à Cache State Address not cached Cached, unmodified Cached, modified Address not cached Cached, unmodified Cache Action No action No action Cache intervenes No action Cache purges its copy Cached, modified???

Shared Multiprocessor rocessor Cores Local Memories Bus 1 Snoopy Cache hysical 2 Snoopy Cache 3 Snoopy Cache DMA DISKS Use snoopy mechanism to keep all processors view of memory coherent

Cache State Transition Diagram The MSI protocol state bits Each cache line has a tag Address tag Other processor reads 1 writes back M: Modified S: Shared I: Invalid M 1 reads or writes Write miss Read miss Read by any processor S Other processor intends to write I Cache state in processor 1

Cache State Transition Diagram The MSI protocol state bits Each cache line has a tag Address tag M: Modified S: Shared I: Invalid M 1 reads or writes Write miss Read miss Read by any processor S Other processor intends to write I Other processor intends to write 1 writes back Cache state in processor 1

Cache State Transition Diagram The MSI protocol state bits Each cache line has a tag Address tag Other processor reads 1 writes back M: Modified S: Shared I: Invalid M 1 reads or writes Write miss Read miss Read by any processor S Other processor intends to write I Other processor intends to write 1 writes back Cache state in processor 1

2 rocessor Example 1 reads 1 writes 2 reads 2 writes 1 Read miss 2 reads, 1 writes back S 2 intends to write M I 1 reads or writes Write miss 2 intends to write 1 writes back 1 reads 1 writes 2 writes 1 writes 2 Read miss 1 reads, 2 writes back S 1 intends to write M I 2 reads or writes Write miss 1 intends to write 2 writes back

2 rocessor Example 1 reads 1 writes 2 reads 2 writes 1 Read miss 2 reads, 1 writes back S 2 intends to write M I 1 reads or writes Write miss 2 intends to write 1 writes back 1 reads 1 writes 2 writes 1 writes 2 Read miss 1 reads, 2 writes back S 1 intends to write M I 2 reads or writes Write miss 1 intends to write 2 writes back

Observation Other processor reads 1 writes back M 1 reads or writes Write miss Read miss Read by any processor S Other processor intent to write If a line is in the M state then no other cache can have a copy of the line! stays coherent, multiple differing copies cannot exist I Other processor intent to write 1 writes back

Next Class Cache Coherence - Directory Cache Coherence