PVCoherence. Zhang, Bringham, Erickson, & Sorin. Amlan Nayak & Jay Zhang 1

Size: px
Start display at page:

Download "PVCoherence. Zhang, Bringham, Erickson, & Sorin. Amlan Nayak & Jay Zhang 1"

Transcription

1 PVCoherence Zhang, Bringham, Erickson, & Sorin Amlan Nayak & Jay Zhang 1

2 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI MI2 SD I 2

3 Issue with Coherence Protocols Difficult to automatically verify for many core systems Better performance Complex protocols Difficult to formally verify 3

4 GOAL Architect arbitrarily large flat protocols such that they can be verified using a mostly-automated methodology 4

5 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI MI2 SD I 5

6 Coherence Protocols Two Primary Categories: Snooping & Directory-based (ex. MSI, MESI, MOESI, MESIF) 6

7 L2 $ Directory Controller Network on Chip Core 0 Core 1 Core 2 Core 3 7

8 State Space Exploration Formally verifying a coherence protocol Theorem Proving 8

9 State Space Exploration (with Murphi) 9

10 GOAL Architect arbitrarily large flat protocols such that they can be verified using a mostly-automated methodology 10

11 GOAL Architect arbitrarily large flat protocols such that they can be verified using a mostly-automated methodology 11

12 Model Protocol in Murphi Check invariants 12

13 Model Protocol in Murphi Check invariants 13

14 Murphi Processor Node State Definition Input messages Cache line State Output transition/messages 14

15 MOESI protocol Processor Cache Controller I 15

16 MOESI protocol Processor Cache Controller I 16

17 MOESI protocol Processor Cache Controller ISD I 17

18 MOESI protocol Processor Cache Controller ISD I 18

19 MOESI protocol Processor Cache Controller M OMA OMAC O MIA MA OIA E MI3 MAD SIA S MI2 IIA ISS ISD ISI EI I IEI SEI 19

20 Model Protocol in Murphi Check invariants 20

21 Check invariants 1. Permission Invariant: Single-Writer, Multiple-Reader 2. Data Invariant: Read returns value of last write 21

22 Check invariants 1. Permission Invariant: Single-Writer, Multiple-Reader 2. Data Invariant: Read returns value of last write 22

23 Cache Line ST Core 0 Core 1 Core 2 Core 3 SINGLE Writer 23

24 Cache Line LD LD LD Core 0 Core 1 Core 2 Core 3 SHARER SHARER SHARER Multiple Reader 24

25 Check invariants 1. Permission Invariant: Single-Writer, Multiple-Reader 2. Data Invariant: Read returns value of last write 25

26 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI MI2 SD I 26

27 PARAMETRIC VERIFICATION (PV) Treat number of nodes as a parameter (using Abster tool) Prove properties of design agnostic to parameter size This process scales to many nodes and is almost fully automatic 27

28 Simple-PV Process Flow How to design a readily verifiable Coherence Protocol 28

29 Create model w/ small # nodes 1. Make Param. Model (Abster) Fail NOT VERIFIABLE! Success 2. Model Check (using Murphi) Success Verification Success Fail Fix Bug Yes Counter Example Bug in Protocol? NOT VERIFIABLE! State Space Explosion No 3. Refine Model (Manual) Yes Refinable? No Simple-PV Process Flow 29

30 Create model w/ small # nodes 1. Make Param. Model (Abster) Fail Create model NOTw/ small # VERIFIABLE! nodes Success 2. Model Check (using Murphi) Success Verification Success Fail Fix Bug Yes Counter Example Bug in Protocol? NOT VERIFIABLE! State Space Explosion No 3. Refine Model (Manual) Yes Refinable? No Simple-PV Process Flow 30

31 Automatically Create Parametric Model Create parametric model from non-parametric model N concrete nodes -> 2 concrete nodes + Other Node (N-2) Abster automatic abstraction tool Abster over-approximates the behavior of the N-2 nodes If Abster fails, modify protocol until it is compatible 31

32 To Memory L2 $ N-2 parameterized nodes Directory Controller Network on Chip Core 0 Core 1 Core 2 Core 3 32

33 Create model w/ small # nodes 1. Make Param. Model (Abster) Fail Create model NOTw/ small # VERIFIABLE! nodes Success 2. Model Check (using Murphi) Success Verification Success Fail Fix Bug Yes Counter Example Bug in Protocol? NOT VERIFIABLE! State Space Explosion No 3. Refine Model (Manual) Yes Refinable? No Simple-PV Process Flow 33

34 Automatically Model Check the model (MURPHI) 34

35 Create model w/ small # nodes 1. Make Param. Model (Abster) Fail Create model NOTw/ small # VERIFIABLE! nodes Success 2. Model Check (using Murphi) Success Verification Success Fail Fix Bug Yes Counter Example Bug in Protocol? NOT VERIFIABLE! State Space Explosion No 3. Refine Model (Manual) Yes Refinable? No Simple-PV Process Flow 35

36 Manually Refine the Model Over-approximation leads to spurious invariant violations Must modify behavior of Other node KEY: Add constraint and check their validity Add invariant (lemma) must be true for non-abstracted model Check on the concrete nodes 36

37 System Architecture for PVCoherence 37

38 Memory Inclusive L2 $ Inclusive Directory $ Network on Chip Core 0 Core 1 Core 2 38

39 $ LD Miss ST Miss GetS/GetE GetM CORE Eviction PutM PutO PutE 39

40 Network on Chip VC0 VC1 VC2 Requests Responses Forwards 40

41 Objective HUMAN EFFORT AUTOMATION during protocol design 41

42 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI MI2 SD I 42

43 Guidelines for Simple-PV compliance 43

44 #1: Identical Nodes #2: No variables must depend on number of Nodes #3: No ordering over list/queue sized by number of nodes #4: Should not parameterize buffers/queues in more than 1-dim. 44

45 Identical Nodes Memory Inclusive L2 $ Inclusive Directory $ Network on Chip Core 0 Core 1 Core 2 45

46 Heterogeneous Nodes Memory Inclusive L2 $ Inclusive Directory $ Network on Chip Core 0 Core 1 Core 2 Core 0 Core 0 Core 1 Core 1 Core 2 Core 2 46

47 #1: Identical Nodes #2: No variables must depend on number of Nodes #3: No ordering over list/queue sized by number of nodes #4: Should not parameterize buffers/queues in more than 1-dim. 47

48 N-1 Nodes to track (N not known) Core 1 Core 0 Core 0 TAG STATE DATA Sharer Count Core 0 Cache Line 48

49 N-1 Nodes to track (N not known) Core 1 Core 0 Core 0 TAG STATE DATA Sharer Count Core 0 Cannot compare with parameterized value or carry out math operations 49

50 N-1 Nodes to track (N not known) Core 1 Core 0 Core 0 TAG STATE DATA Sharer Count Sharer Set Core 0 Replace with sharer set (bit vector) 50

51 #1: Identical Nodes #2: No variables must depend on number of Nodes #3: No ordering over list/queue sized by number of nodes #4: Should not parameterize buffers/queues in more than 1-dim. 51

52 Inclusive L2 $ Inclusive Directory $ Network on Chip Req3 Req2 Req1 Req0 Independent FIFO Request Queue Is permitted by this method Core 0 Core 1 Core 2 52

53 Inclusive L2 $ Inclusive Directory $ Shared Request Queue FIFO UNORDERED Network on Chip Req2 Req1 Req0 Core 0 Core 1 Core 0 Core 0 53

54 #1: Identical Nodes #2: No variables must depend on number of Nodes #3: No ordering over list/queue sized by number of nodes #4: Should not parameterize buffers/queues in more than 1-dim. 54

55 NO Buffer[N][N] Practical limitation Ack Ack(s) Core 0 Core 1 Core 0 Core 0 55

56 L2 (Directory) collects all invalidation Acks Memory Inclusive L2 $ Inclusive Directory $ Network on Chip GetM Ack Ack Core 0 Core 1 Core 0 Core 0 56

57 L2 Sends aggregate Ack to Core 0 Memory Inclusive L2 $ Inclusive Directory $ Network on Chip GetM Ack + data Core 0 Core 1 Core 0 Core 0 57

58 (16) OM Overview M Motivation Background Parametric Verification Design Guidelines PV-MOESI vs OP-MOESI Results Conclusion E MI MI2 SD I 58

59 Optimizations (OP-MOESI) 59

60 Optimization Compatible with SimplePV? Impact Adding Exclusive State Y NO IMPACT Adding Owned state Y Add 2 lemmas Adding Self-Upgrade Y Add lemma Adding Silent Evictions Y Add lemma Removing the completion messages for GetS when data response comes from L2$ Y Add lemma Removing the completion messages for GetM N -- 60

61 To ensure successful abstraction by Abster For GetM OP-MOESI to PV-MOESI Replace response counter with sharer set Let L2 collect Acks and send aggregated Ack to requesting L1 Remove point-to-point ordering in all VCs and avoid races by adding extra messages or transient states but without blocking L1 61

62 OP-MOESI to PV-MOESI To ensure successful verification by Murphi Problem: multiple in flight GetM requests without ordering Solution: L2 blocks other subsequent requests whenever it receives a GetM request until it receives a Completion message from the requesting L1 Impact: performance decrease due to blocking at L2 62

63 Evaluation: OP-MOESI vs. PV-MOESI 63

64 Runtime 64

65 Network Traffic Overhead 65

66 Performance Scalability 66

67 Storage Overhead TAG STATE DATA SHARER SET SHARER COUNTER TAG STATE DATA SHARER L2$ SET NO CHANGE Overhead is generally negligible 67

68 CONCLUSION (16) Design of parametrically verifiable coherence protocols is possible given that the guidelines introduced here are adhered to M There is no significant performance drawbacks or storage overheads MI Automation is key E MI2 68 I

69 Debate Is it necessary for a protocol to be parametrically verifiable? There are a few design corners that are cut to make such PV-compliant protocols work. Is this worth it? 69

PVCoherence: Designing Flat Coherence Protocols for Scalable Verification

PVCoherence: Designing Flat Coherence Protocols for Scalable Verification Appears in the 20 th International Symposium on High Performance Computer Architecture (HPCA) Orlando, Florida, February 17-19, 2014 PVCoherence: Designing Flat Coherence Protocols for Scalable Verification

More information

Crossing Guard: Mediating Host-Accelerator Coherence Interactions

Crossing Guard: Mediating Host-Accelerator Coherence Interactions Crossing Guard: Mediating Host-Accelerator Coherence Interactions Lena E. Olson Mark D. Hill David A. Wood University of Wisconsin-Madison {lena, markhill, david} @cs.wisc.edu Abstract Specialized hardware

More information

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins 4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the

More information

Token Coherence. Milo M. K. Martin Dissertation Defense

Token Coherence. Milo M. K. Martin Dissertation Defense Token Coherence Milo M. K. Martin Dissertation Defense Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet/ University of Wisconsin Madison (C) 2003 Milo Martin Overview Technology and software

More information

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Lecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro Lecture 7: M Wrap-Up, ache coherence Topics: handling M errors and writes, cache coherence intro 1 Optimizations for Writes (Energy, Lifetime) Read a line before writing and only write the modified bits

More information

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon

More information

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations

Lecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in

More information

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand

Fall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand Cache Coherence Nima Honarmand Cache Coherence: Problem (Review) Problem arises when There are multiple physical copies of one logical location Multiple copies of each cache block (In a shared-mem system)

More information

Lecture 24: Board Notes: Cache Coherency

Lecture 24: Board Notes: Cache Coherency Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will

More information

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University

Parallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University 18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution

More information

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization

Lecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,

More information

Presented by Anh Nguyen

Presented by Anh Nguyen Presented by Anh Nguyen u When a cache has a block in state M or E and receives a GetS from another core, if using the MSI protocol or the MESI protocol, the cache must u change the block state from M

More information

Lecture 1: Introduction

Lecture 1: Introduction Lecture 1: Introduction ourse organization: 4 lectures on cache coherence and consistency 2 lectures on transactional memory 2 lectures on interconnection networks 4 lectures on caches 4 lectures on memory

More information

Lecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro

Lecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro Lecture 7: M, ache coherence Topics: handling M errors and writes, cache coherence intro 1 hase hange Memory Emerging NVM technology that can replace Flash and DRAM Much higher density; much better scalability;

More information

Kami: A Framework for (RISC-V) HW Verification

Kami: A Framework for (RISC-V) HW Verification Kami: A Framework for (RISC-V) HW Verification Murali Vijayaraghavan Joonwon Choi, Adam Chlipala, (Ben Sherman), Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Arvind 1 The Riscy Expedition by MIT Riscy Library

More information

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri

Cache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache

More information

Lamport Clocks: Verifying A Directory Cache-Coherence Protocol. Computer Sciences Department

Lamport Clocks: Verifying A Directory Cache-Coherence Protocol. Computer Sciences Department Lamport Clocks: Verifying A Directory Cache-Coherence Protocol * Manoj Plakal, Daniel J. Sorin, Anne E. Condon, Mark D. Hill Computer Sciences Department University of Wisconsin-Madison {plakal,sorin,condon,markhill}@cs.wisc.edu

More information

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations

Lecture 5: Directory Protocols. Topics: directory-based cache coherence implementations Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604

More information

Scalable Cache Coherence

Scalable Cache Coherence Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Foundations of Computer Systems

Foundations of Computer Systems 18-600 Foundations of Computer Systems Lecture 21: Multicore Cache Coherence John P. Shen & Zhiyi Yu November 14, 2016 Prevalence of multicore processors: 2006: 75% for desktops, 85% for servers 2007:

More information

Lecture 25: Multiprocessors

Lecture 25: Multiprocessors Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed

More information

Computer Architecture

Computer Architecture 18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University

More information

Multiprocessors & Thread Level Parallelism

Multiprocessors & Thread Level Parallelism Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction

More information

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today

More information

Automated Generation of Directed Tests for Transition Coverage in Cache Coherence Protocols

Automated Generation of Directed Tests for Transition Coverage in Cache Coherence Protocols Automated Generation of Directed Tests for Transition Coverage in Cache Coherence Protocols Xiaoke Qin and Prabhat Mishra Computer and Information Science and Engineering, University of Florida, USA {xqin,

More information

Verifiable Hierarchical Protocols with Network Invariants on Parametric Systems

Verifiable Hierarchical Protocols with Network Invariants on Parametric Systems Verifiable Hierarchical Protocols with Network Invariants on Parametric Systems Opeoluwa Matthews, Jesse Bingham, Daniel Sorin http://people.duke.edu/~om26/ FMCAD 2016 - Mountain View, CA Problem Statement

More information

Portland State University ECE 588/688. Cache Coherence Protocols

Portland State University ECE 588/688. Cache Coherence Protocols Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write

More information

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Lecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012) Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment

More information

Verification of Cache Coherency Formal Test Generation

Verification of Cache Coherency Formal Test Generation Dr. Monica Farkash NXP Semiconductors, Inc. EE 382M-11, Department of Electrical and Computer Engineering The University of Texas at Austin 1 Cache Coherency Caches and their coherency Challenge Verification

More information

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico

More information

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>

Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY> Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions

More information

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence 1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P

More information

Special Topics. Module 14: "Directory-based Cache Coherence" Lecture 33: "SCI Protocol" Directory-based Cache Coherence: Sequent NUMA-Q.

Special Topics. Module 14: Directory-based Cache Coherence Lecture 33: SCI Protocol Directory-based Cache Coherence: Sequent NUMA-Q. Directory-based Cache Coherence: Special Topics Sequent NUMA-Q SCI protocol Directory overhead Cache overhead Handling read miss Handling write miss Handling writebacks Roll-out protocol Snoop interaction

More information

EECS 570 Lecture 11. Directory-based Coherence. Winter 2019 Prof. Thomas Wenisch

EECS 570 Lecture 11. Directory-based Coherence. Winter 2019 Prof. Thomas Wenisch Directory-based Coherence Winter 2019 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk, Reinhardt,

More information

Memory Hierarchy in a Multiprocessor

Memory Hierarchy in a Multiprocessor EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected

More information

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols

Lecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols Lecture 3: Directory Protocol Implementations Topics: coherence vs. msg-passing, corner cases in directory protocols 1 Future Scalable Designs Intel s Single Cloud Computer (SCC): an example prototype

More information

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem

Cache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually

More information

Suggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!

Suggested Readings! What makes a memory system coherent?! Lecture 27 Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality! 1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and

More information

Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol

Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol Daniel J. Sorin 1, Manoj Plakal 1, Anne E. Condon 1,2, Mark D. Hill 1, Milo M. Martin 1, and David A. Wood 1 1 Computer

More information

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Cache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012) Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues

More information

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware

Overview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and

More information

Overview: Shared Memory Hardware

Overview: Shared Memory Hardware Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing

More information

Document downloaded from: This paper must be cited as:

Document downloaded from:   This paper must be cited as: Document downloaded from: http://hdl.handle.net/025/37223 This paper must be cited as: Cuesta Sáez, BA.; Robles Martínez, A.; Duato Marín, JF. (202). Switch-based packing technique to reduce traffic and

More information

Shared Memory Multiprocessors

Shared Memory Multiprocessors Shared Memory Multiprocessors Jesús Labarta Index 1 Shared Memory architectures............... Memory Interconnect Cache Processor Concepts? Memory Time 2 Concepts? Memory Load/store (@) Containers Time

More information

Case Study 1: Single-Chip Multicore Multiprocessor

Case Study 1: Single-Chip Multicore Multiprocessor 44 Solutions to Case Studies and Exercises Chapter 5 Solutions Case Study 1: Single-Chip Multicore Multiprocessor 5.1 a. P0: read 120 P0.B0: (S, 120, 0020) returns 0020 b. P0: write 120 80 P0.B0: (M, 120,

More information

Verifying Cache Coherence in ACL2. Ben Selfridge Oracle / UT Austin

Verifying Cache Coherence in ACL2. Ben Selfridge Oracle / UT Austin erifying Cache Coherence in ACL Ben Selfridge Oracle / UT Austin Goals of this talk Define cache coherence Present a cache coherence protocol designed Present an ACL proof that the protocol is safe (whatever

More information

Midterm Exam 02/09/2009

Midterm Exam 02/09/2009 Portland State University Department of Electrical and Computer Engineering ECE 588/688 Advanced Computer Architecture II Winter 2009 Midterm Exam 02/09/2009 Answer All Questions PSU ID#: Please Turn Over

More information

CSE502: Computer Architecture CSE 502: Computer Architecture

CSE502: Computer Architecture CSE 502: Computer Architecture CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit

More information

Chapter 5. Multiprocessors and Thread-Level Parallelism

Chapter 5. Multiprocessors and Thread-Level Parallelism Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model

More information

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

Cache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion

More information

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 17, 2014 http://www.csg.csail.mit.edu/6.175 L21-1

More information

Cache Coherence and Atomic Operations in Hardware

Cache Coherence and Atomic Operations in Hardware Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some

More information

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

Hierarchical Cache Coherence Protocol Verification One Level at a Time Through Assume-Guarantee

Hierarchical Cache Coherence Protocol Verification One Level at a Time Through Assume-Guarantee Hierarchical Cache Coherence Protocol Verification One Level at a Time Through Assume-Guarantee Xiaofang Chen 1, Yu Yang, Ganesh Gopalakrishnan Ching-Tsun Chou School of Computing, University of Utah Intel

More information

Scalable Cache Coherence

Scalable Cache Coherence arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy

More information

Designing Cache Coherence Protocol using Murphi

Designing Cache Coherence Protocol using Murphi Designing Cache Coherence Protocol using Winter 2018 http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk, Reinhardt,

More information

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)

Lecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based) Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a

More information

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It

More information

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert

Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols. Dana Vantrease, Mikko Lipasti, Nathan Binkert Atomic Coherence: Leveraging Nanophotonics to Build Race-Free Cache Coherence Protocols Dana Vantrease, Mikko Lipasti, Nathan Binkert 1 Executive Summary Problem: Cache coherence races make protocols complicated

More information

Lecture 22: Fault Tolerance

Lecture 22: Fault Tolerance Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA 03, Wisconsin A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures, HPCA 07, Spain Error

More information

Processor Architecture

Processor Architecture Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their

More information

Advanced OpenMP. Lecture 3: Cache Coherency

Advanced OpenMP. Lecture 3: Cache Coherency Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable

More information

Module 14: "Directory-based Cache Coherence" Lecture 31: "Managing Directory Overhead" Directory-based Cache Coherence: Replacement of S blocks

Module 14: Directory-based Cache Coherence Lecture 31: Managing Directory Overhead Directory-based Cache Coherence: Replacement of S blocks Directory-based Cache Coherence: Replacement of S blocks Serialization VN deadlock Starvation Overflow schemes Sparse directory Remote access cache COMA Latency tolerance Page migration Queue lock in hardware

More information

Improving the Practicality of Transactional Memory

Improving the Practicality of Transactional Memory Improving the Practicality of Transactional Memory Woongki Baek Electrical Engineering Stanford University Programming Multiprocessors Multiprocessor systems are now everywhere From embedded to datacenter

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18: Directory-Based Cache Protocols John Wawrzynek EECS, University of California at Berkeley http://inst.eecs.berkeley.edu/~cs152 Administrivia 2 Recap:

More information

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward

More information

Spandex: A Flexible Interface for Efficient Heterogeneous Coherence Johnathan Alsop 1, Matthew D. Sinclair 1,2, and Sarita V.

Spandex: A Flexible Interface for Efficient Heterogeneous Coherence Johnathan Alsop 1, Matthew D. Sinclair 1,2, and Sarita V. Appears in the Proceedings of ISCA 2018. Spandex: A Flexible Interface for Efficient Heterogeneous Coherence Johnathan Alsop 1, Matthew D. Sinclair 1,2, and Sarita V. Adve 1 1 University of Illinois at

More information

Performance study example ( 5.3) Performance study example

Performance study example ( 5.3) Performance study example erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial

More information

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues

Lecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues Lecture 8: Directory-Based Cache Coherence Topics: scalable multiprocessor organizations, directory protocol design issues 1 Scalable Multiprocessors P1 P2 Pn C1 C2 Cn 1 CA1 2 CA2 n CAn Scalable interconnection

More information

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations

Lecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,

More information

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location.

Incoherent each cache copy behaves as an individual copy, instead of as the same memory location. Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages

More information

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Scalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels

More information

VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURISTIC PROCESSORS

VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURISTIC PROCESSORS VERIFICATION OF HIERARCHICAL CACHE COHERENCE PROTOCOLS FOR FUTURISTIC PROCESSORS by Xiaofang Chen A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements

More information

Productive Design of Extensible Cache Coherence Protocols!

Productive Design of Extensible Cache Coherence Protocols! P A R A L L E L C O M P U T I N G L A B O R A T O R Y Productive Design of Extensible Cache Coherence Protocols!! Henry Cook, Jonathan Bachrach,! Krste Asanovic! Par Lab Summer Retreat, Santa Cruz June

More information

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency

More information

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache

More information

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins 5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview Synchronization hardware primitives Cache Coherency Issues Coherence misses, false sharing Cache coherence and interconnects

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution Ravi Rajwar and Jim Goodman University of Wisconsin-Madison International Symposium on Microarchitecture, Dec. 2001 Funding

More information

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence

CS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Protocols Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152

More information

Adding Token Counting to Directory-Based Cache Coherence

Adding Token Counting to Directory-Based Cache Coherence Department of Computer & Information Science Technical Reports (CIS) University of Pennsylvania Year 2008 Adding Token Counting to -Based Cache Coherence Arun Raghavan Colin Blundell Milo M.K. Martin University

More information

ECE/CS 757: Homework 1

ECE/CS 757: Homework 1 ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)

More information

Robust Cache Coherence Protocol Verification with Inferno

Robust Cache Coherence Protocol Verification with Inferno Robust Coherence Protocol Verification with Inferno EECS 578 Project Report Team FLY-Bee Kong Bu Jiang Chenxi Lou Abstract - Fault tolerant architectures are emerging to guarantee the reliable functionality

More information

Cache Coherence Protocols for Chip Multiprocessors - I

Cache Coherence Protocols for Chip Multiprocessors - I Cache Coherence Protocols for Chip Multiprocessors - I John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 5 6 September 2016 Context Thus far chip multiprocessors

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

CS 152 Computer Architecture and Engineering. Lecture 19: Directory- Based Cache Protocols. Recap: Snoopy Cache Protocols

CS 152 Computer Architecture and Engineering. Lecture 19: Directory- Based Cache Protocols. Recap: Snoopy Cache Protocols CS 152 Computer Architecture and Engineering Lecture 19: Directory- Based Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley hap://www.eecs.berkeley.edu/~krste

More information

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model

TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model TSOtool: A Program for Verifying Memory Systems Using the Memory Consistency Model Sudheendra Hangal, Durgam Vahia, Chaiyasit Manovit, Joseph Lu and Sridhar Narayanan tsotool@sun.com ISCA-2004 Sun Microsystems

More information

Shared Symmetric Memory Systems

Shared Symmetric Memory Systems Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains: The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations

More information

CSC 631: High-Performance Computer Architecture

CSC 631: High-Performance Computer Architecture CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 10: Memory Part II CSC 631: High-Performance Computer Architecture 1 Two predictable properties of memory references: Temporal Locality:

More information

LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS. Pouya Fotouhi

LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS. Pouya Fotouhi LC-SIM: A SIMULATION FRAMEWORK FOR EVALUATING LOCATION CONSISTENCY BASED CACHE PROTOCOLS by Pouya Fotouhi A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements

More information