Lecture-22 (Cache Coherence Protocols) CS422-Spring
|
|
- Fay Singleton
- 5 years ago
- Views:
Transcription
1 Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018
2 Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2
3 Multicore Core 0 Core 1 Core k Private L1 Cache Private L1 Cache Private L1 Cache Private L2 Cache Private L2 Cache Private L2 Cache Interconnect (Packet Scheduling) I Shared Last Level Cache (L3) DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 3
4 Cache Coherence P1 Cache Bus LLC/DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, 4
5 Cache Coherence rd &sum P1 sum =0 V DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 5
6 Cache Coherence P1 rd &sum sum=0 V sum=0 V DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 6
7 Cache Coherence wr sum (sum = 3) P1 sum=0 3 V D sum=0 V DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 7
8 Cache Coherence P1 wr &sum (sum = 7) sum=3 D sum=0 7 V D DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 8
9 Cache Coherence Err! Voila rd &sum P1 print sum=3 print sum=7 sum=3 D sum=7 D DRAM sum=0 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 9
10 The Problem If multiple cores cache the same block, how do they ensure they all see a consistent state? Solution: 1. Write Propagation: All writes eventually become visible to other cores. 2. Write Serialization: All cores see write to a cache line in same order. Need a protocol to ensure (1) and (2) called cache coherence protocol CS422: Spring 2018 Biswabandan Panda, CSE@IITK 10
11 Non-solutions Keeping caches coherent is software s responsibility (+) Makes microarchitect s life (my) easier (-) Makes average programmer s life (yours) much harder CS422: Spring 2018 Biswabandan Panda, CSE@IITK 11
12 Non-solutions All caches are shared between all processors + No need for coherence -- Shared cache becomes the bandwidth bottleneck -- Scalability : 100 monkeys 1 banana CS422: Spring 2018 Biswabandan Panda, CSE@IITK 12
13 Who Is Responsible? (1) ISA - What if the ISA provided a cache flush/invalidate instruction? Such as - FLUSH-LOCAL, FLUSH-GLOBAL, FLUSH-CACHE (2) Hardware - Invalidate all other copies of block say A when a core writes to it Simplifies yours job CS422: Spring 2018 Biswabandan Panda, CSE@IITK 13
14 Invalidate vs Update Invalidate Write to a cache line, and simultaneously broadcast invalidation of address to all others Other cores clear/invalidate their cache lines Update Write to a cache line, and simultaneously broadcast written data to all others Other cores update their caches if data was present CS422: Spring 2018 Biswabandan Panda, CSE@IITK 14
15 MSI Coherence Protocol Extend single valid bit per line to three states: M(odified): only one cache, memory not updated. S(hared): one or more caches, and memory copy is up-to-date I(nvalid): not present Read - Read request on bus, transitions to S Write - ReadEx request, transitions to M state CS422: Spring 2018 Biswabandan Panda, CSE@IITK 15
16 State Transitions Core Side PrRd/ PrWr/ PrRd/ M PrWr/BusRdX S PrWr/BusRdX I PrRd/BusRd CS422: Spring 2018 Biswabandan Panda, CSE@IITK 16
17 Bus Side BusRd/ M BusRd/Flush S BusRdX/Flush I BusRd/ BusRdX/ BusRdX/ CS422: Spring 2018 Biswabandan Panda, CSE@IITK 17
18 MSI State Bits Tag Data 31 0 P1 Cache Bus Snooper Snooper Snooper DRAM X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 18
19 MSI P1 rd &X BusRd Snooper Snooper Snooper DRAM X=1 CS422: Spring 2018 Biswabandan Panda, 19
20 MSI P1 X=1 S Snooper Snooper Snooper DRAM X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 20
21 MSI P1 wr &X (X=2) X=1 2 S M Snooper Snooper Snooper DRAM X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 21
22 MSI P1 X=2 M Snooper Snooper Snooper rd &X BusRd DRAM X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 22
23 MSI P1 X=2 M S X=2 S Snooper Flush Snooper Snooper DRAM X=1 2 Cancel DRAM read CS422: Spring 2018 Biswabandan Panda, CSE@IITK 23
24 MSI P1 X=2 S I X=2 3 S Snooper Snooper Snooper wr &X X=3 M BusRdX LLC X=2 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 24
25 MSI P1 rd &X BusRd X=2 3 I S X=3 M Snooper Snooper Snooper Flush S DRAM X=2 3 Update DRAM as well CS422: Spring 2018 Biswabandan Panda, CSE@IITK 25
26 MSI P1 X=3 S X=3 S rd &X Snooper Snooper Snooper DRAM X=3 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 26
27 MSI Proc Action State P1 State State Bus Action Data From R BusRd Mem W BusRdX Mem R BusRd P1 cache W BusRdX Mem R BusRd cache R R BusRd Mem CS422: Spring 2018 Biswabandan Panda, CSE@IITK 27
28 MSI Proc Action State P1 State State Bus Action Data From R1 S - - BusRd Mem W1 M - - BusRdX Mem R3 S - S BusRd P1 cache W3 I - M BusRdX Mem R1 S - S BusRd cache R3 S - S - - R2 S S S BusRd Mem CS422: Spring 2018 Biswabandan Panda, CSE@IITK 28
29 1. Write Propagation: All writes eventually become visible to other cores. In MSI - (Through Invalidation and flush on subsequent BusRds) 2. Write Serialization: All cores see write to a cache line in same order. In MSI Writes (BusRdX) that go to the bus appear in bus order (and handled by snoopers in bus order!) CS422: Spring 2018 Biswabandan Panda, CSE@IITK 29
30 Issues Rd, Wr sequence incurs 2 transactions even when no one sharing (e.g., serial program!) BusRd (I->S) followed by BusRdX (S->M) In general, penalizing serial programs is unacceptable CS422: Spring 2018 Biswabandan Panda, CSE@IITK 30
31 Solution - MESI Add Exclusive state: Invalid modified (dirty) shared (two or more caches may have copies) Exclusive: (only this cache has clean copy, same value as in memory) How to decide I E or I S? Need to check whether someone else has copy a separate signal on bus: wired-or line asserted in response to BusRd. Call it C = copy exists CS422: Spring 2018 Biswabandan Panda, CSE@IITK 31
32 State Transitions Processor level M E PrWr/BusRdX PrRd/BusRd(!C) PrWr/BusUpgr S PrRd/BusRd(C) I PrRd/- PrWr/- PrRd/- PrWr/- PrRd/- CS422: Spring 2018 Biswabandan Panda, CSE@IITK 32
33 Bus Level M BusRdX/Flush E BusRd/Flush BusRd/FlushOpt BusRdX/FlushOpt BusRd/FlushOpt S BusRdX/FlushOpt BusUpgr/- I BusRd/- BusRdX/- BusUpgr/- CS422: Spring 2018 Biswabandan Panda, CSE@IITK 33
34 MESI: An Example P1 Cache Bus Snooper Snooper Snooper Mem Ctrl Main Memory X=1 CS422: Spring 2018 Biswabandan Panda, 34
35 P1 rd &X BusRd Snooper Snooper Snooper Mem Ctrl X=1 CS422: Spring 2018 Biswabandan Panda, 35
36 P1 X=1 E Snooper Snooper Snooper Mem Ctrl X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 36
37 P1 wr &X (X=2) X=1 2 E M Snooper Snooper Snooper Mem Ctrl X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 37
38 P1 rd &X X=2 M Snooper Snooper Snooper BusRd Mem Ctrl X=1 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 38
39 P1 X=2 M S X=2 S Snooper Snooper Snooper Flush Mem Ctrl X=1 2 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 39
40 P1 X=2 S I X=2 3 S M wr &X X=3 Snooper Snooper Snooper BusUpgr Mem Ctrl X=2 Note: BusUpgr instead of BusRdX. Mem Ctrl knows it can ignore the request CS422: Spring 2018 Biswabandan Panda, CSE@IITK 40
41 P1 rd &X X=3 X=3 S I X=3 S M Snooper Snooper Snooper BusRead Mem Ctrl X=3 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 41
42 P1 rd &X X=3 S X=3 S Snooper Snooper Snooper Mem Ctrl X=3 CS422: Spring 2018 Biswabandan Panda, CSE@IITK 42
43 P1 rd &X X=3 S X=3 S X=3 S Snooper Snooper Snooper FlushOpt BusRd Mem Ctrl X=3 Referred to as Cache-to-cache transfer in Illinois MESI protocol CS422: Spring 2018 Biswabandan Panda, CSE@IITK 43
44 Proc Action State P1 State State Bus Action Data From R1 E - - BusRd Mem W1 M R3 S - S BusRd/Flush P1 s cache W3 I - M BusUpgr - R1 S - S BusRd/Flush s cache R3 S - S - - R2 S S S BusRd/Flush Opt P1/ s cache CS422: Spring 2018 Biswabandan Panda, CSE@IITK 44
45 Coherence misses: misses due to accesses to blocks that were invalidated due to coherence events True sharing: misses that occur when multiple threads share the same data item (byte/word) False sharing: misses that occur when multiple threads share different data items located in a single cache block CS422: Spring 2018 Biswabandan Panda, CSE@IITK 45
46 Parameters True Sharing Misses False Sharing Misses Larger cache size increased increased Larger block/line size Larger associativity decreased unclear increased unclear CS422: Spring 2018 Biswabandan Panda, 46
47 Directory Based P P P P P $ $ $ $ $ Interconnection Network LLC C(k) C(k+1) C(k+j) 1 modified bit for each cache block in LLC/ memory 1 presence bit for each core, each cache block in LLC/ memory CS422: Spring 2018 Biswabandan Panda, CSE@IITK 47
Processor Architecture
Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their
More informationAdvanced OpenMP. Lecture 3: Cache Coherency
Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable
More informationMulticore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh
Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth
More informationCache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri
Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh ymmetric MultiProcessing Each processor in an MP has equal access to all parts of memory same latency and
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationSwitch Gear to Memory Consistency
Outline Memory consistency equential consistency Invalidation vs. update coherence protocols MI protocol tate diagrams imulation Gehringer, based on slides by Yan olihin 1 witch Gear to Memory Consistency
More informationShared Memory Multiprocessors
Parallel Computing Shared Memory Multiprocessors Hwansoo Han Cache Coherence Problem P 0 P 1 P 2 cache load r1 (100) load r1 (100) r1 =? r1 =? 4 cache 5 cache store b (100) 3 100: a 100: a 1 Memory 2 I/O
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationLecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment
More informationMemory Hierarchy in a Multiprocessor
EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected
More information4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins
4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationLecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationFall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley
Fall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley Avinash Kodi Department of Electrical Engineering & Computer
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationCS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationLecture 20: Multi-Cache Designs. Spring 2018 Jason Tang
Lecture 20: Multi-Cache Designs Spring 2018 Jason Tang 1 Topics Split caches Multi-level caches Multiprocessor caches 2 3 Cs of Memory Behaviors Classify all cache misses as: Compulsory Miss (also cold-start
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationLecture 1: Introduction
Lecture 1: Introduction ourse organization: 4 lectures on cache coherence and consistency 2 lectures on transactional memory 2 lectures on interconnection networks 4 lectures on caches 4 lectures on memory
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationLecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro
Lecture 7: M Wrap-Up, ache coherence Topics: handling M errors and writes, cache coherence intro 1 Optimizations for Writes (Energy, Lifetime) Read a line before writing and only write the modified bits
More informationA Basic Snooping-Based Multi-Processor Implementation
Lecture 11: A Basic Snooping-Based Multi-Processor Implementation Parallel Computer Architecture and Programming Tsinghua has its own ice cream! Wow! CMU / 清华 大学, Summer 2017 Review: MSI state transition
More informationNOW Handout Page 1. Recap. Protocol Design Space of Snooping Cache Coherent Multiprocessors. Sequential Consistency.
Recap Protocol Design Space of Snooping Cache Coherent ultiprocessors CS 28, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Snooping cache coherence solve difficult problem by applying
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationAn#Aside# Course#thus#far:# We#have#focused#on#parallelism:#how#to#divide#up#work#into# parallel#tasks#so#that#they#can#be#done#faster.
An#Aside# Course#thus#far:# We#have#focused#on#parallelism:#how#to#divide#up#work#into# parallel#tasks#so#that#they#can#be#done#faster.# Rest#of#the#term:# We#will#focused#on#concurrency:#how#to#correctly#and#
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationL7 Shared Memory Multiprocessors. Shared Memory Multiprocessors
L7 Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory Multiprocessors Symmetric Multiprocessors (SMPs) Symmetric access to all of main memory from any processor Dominate
More informationCSC 252: Computer Organization Spring 2018: Lecture 26
CSC 252: Computer Organization Spring 2018: Lecture 26 Instructor: Yuhao Zhu Department of Computer Science University of Rochester Action Items: Programming Assignment 4 grades out Programming Assignment
More informationCS315A Midterm Solutions
K. Olukotun Spring 05/06 Handout #14 CS315a CS315A Midterm Solutions Open Book, Open Notes, Calculator okay NO computer. (Total time = 120 minutes) Name (please print): Solutions I agree to abide by the
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationSnooping-Based Cache Coherence
Lecture 10: Snooping-Based Cache Coherence Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Elle King Ex s & Oh s (Love Stuff) Once word about my code profiling skills
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationCSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017
CSC/ECE 506: Architecture of Parallel Computers Program 2: Bus-Based Cache Coherence Protocols Due: Wednesday, October 25, 2017 1. Overall Problem Description In this project, you will add new features
More informationApproaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures
Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache
More informationLecture 9 Outline. Lower-Level Protocol Choices. MESI (4-state) Invalidation Protocol. MESI: Processor-Initiated Transactions
Outline protocol Dragon updatebased protocol mpact of protocol optimizations LowerLevel Protocol Choices observed in state: what transition to make? Change to : assume ll read again soon good for mostly
More informationLecture 7: PCM, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro
Lecture 7: M, ache coherence Topics: handling M errors and writes, cache coherence intro 1 hase hange Memory Emerging NVM technology that can replace Flash and DRAM Much higher density; much better scalability;
More informationAdvanced Parallel Programming I
Advanced Parallel Programming I Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2016 22.09.2016 1 Levels of Parallelism RISC Software GmbH Johannes Kepler University
More informationModern CPU Architectures
Modern CPU Architectures Alexander Leutgeb, RISC Software GmbH RISC Software GmbH Johannes Kepler University Linz 2014 16.04.2014 1 Motivation for Parallelism I CPU History RISC Software GmbH Johannes
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationLecture 24: Board Notes: Cache Coherency
Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationLecture 23: Thread Level Parallelism -- Introduction, SMP and Snooping Cache Coherence Protocol
Lecture 23: Thread Level Parallelism -- Introduction, SMP and Snooping Cache Coherence Protocol CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu
More informationShared Memory Multiprocessors
Shared Memory Multiprocessors Jesús Labarta Index 1 Shared Memory architectures............... Memory Interconnect Cache Processor Concepts? Memory Time 2 Concepts? Memory Load/store (@) Containers Time
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationComputer Architecture Lecture 19: Multiprocessors, Consistency, Coherence. Prof. Onur Mutlu ETH Zürich Fall November 2017
Computer Architecture Lecture 19: Multiprocessors, Consistency, Coherence Prof. Onur Mutlu ETH Zürich Fall 2017 29 November 2017 Summary of Last Week s Lectures Memory Latency Tolerance Runahead Execution
More informationSpecial Course on Computer Architecture
Special Course on Computer Architecture #9 Simulation of Multi-Processors Hiroki Matsutani and Hideharu Amano Outline: Simulation of Multi-Processors Background [10min] Recent multi-core and many-core
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationCache Coherence: Part 1
Cache Coherence: art 1 Todd C. Mowry CS 74 October 5, Topics The Cache Coherence roblem Snoopy rotocols The Cache Coherence roblem 1 3 u =? u =? $ 4 $ 5 $ u:5 u:5 1 I/O devices u:5 u = 7 3 Memory A Coherent
More information12 Cache-Organization 1
12 Cache-Organization 1 Caches Memory, 64M, 500 cycles L1 cache 64K, 1 cycles 1-5% misses L2 cache 4M, 10 cycles 10-20% misses L3 cache 16M, 20 cycles Memory, 256MB, 500 cycles 2 Improving Miss Penalty
More informationCS377P Programming for Performance Multicore Performance Cache Coherence
CS377P Programming for Performance Multicore Performance Cache Coherence Sreepathi Pai UTCS October 26, 2015 Outline 1 Cache Coherence 2 Cache Coherence Awareness 3 Scalable Lock Design 4 Transactional
More informationCaches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2
Parallel ystems Parallel ystems Parallel ystems Outline for lecture 3 s (a quick review) hared memory multiprocessors hierarchies coherence nooping protocols» nvalidation protocols (, )» Update protocol
More informationRecall: Sequential Consistency. CS 258 Parallel Computer Architecture Lecture 15. Sequential Consistency and Snoopy Protocols
CS 258 Parallel Computer Architecture Lecture 15 Sequential Consistency and Snoopy Protocols arch 17, 2008 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs258 ecall: Sequential Consistency
More informationMulticore MESI Based Cache Design HAS
Multicore MESI Based Cache Design HAS Ver. 2.5.2 1. Introduction The Design is targeted at developing a DUT comprising of L1 and L2 cache systems which can be utilized for undertaking functional verification.
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationLecture 24: Virtual Memory, Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors Today s topics: Virtual memory Multiprocessors, cache coherence 1 Virtual Memory Processes deal with virtual memory they have the illusion that a very large
More informationReview: Symmetric Multiprocesors (SMP)
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 2 Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from work
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationThe need for atomicity This code sequence illustrates the need for atomicity. Explain.
Lock Implementations [ 8.1] Recall the three kinds of synchronization from Lecture 6: Point-to-point Lock Performance metrics for lock implementations Uncontended latency Traffic o Time to acquire a lock
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationFall 2015 :: CSE 610 Parallel Computer Architectures. Cache Coherence. Nima Honarmand
Cache Coherence Nima Honarmand Cache Coherence: Problem (Review) Problem arises when There are multiple physical copies of one logical location Multiple copies of each cache block (In a shared-mem system)
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationShared Memory Architectures. Shared Memory Multiprocessors. Caches and Cache Coherence. Cache Memories. Cache Memories Write Operation
hared Architectures hared Multiprocessors ngo ander ngo@imit.kth.se hared Multiprocessor are often used pecial Class: ymmetric Multiprocessors (MP) o ymmetric access to all of main from any processor A
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationPARALLEL MEMORY ARCHITECTURE
PARALLEL MEMORY ARCHITECTURE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 6 is due tonight n The last
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationOutline. EEL 5764 Graduate Computer Architecture. Chapter 4 - Multiprocessors and TLP. Déjà vu all over again?
Outline EEL 5764 Graduate Computer Architecture Chapter 4 - Multiprocessors and TLP Ann Gordon-Ross Electrical and Computer Engineering University of Florida MP Motivation ID v. IMD v. MIMD Centralized
More informationLOCK-BASED CACHE COHERENCE PROTOCOL FOR CHIP MULTIPROCESSORS
The American University in Cairo School of Sciences and Engineering LOCK-BASED CACHE COHERENCE PROTOCOL FOR CHIP MULTIPROCESSORS A Thesis Submitted to The Computer Science Department In partial fulfillment
More informationConcurrent Preliminaries
Concurrent Preliminaries Sagi Katorza Tel Aviv University 09/12/2014 1 Outline Hardware infrastructure Hardware primitives Mutual exclusion Work sharing and termination detection Concurrent data structures
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationShared Symmetric Memory Systems
Shared Symmetric Memory Systems Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University
More informationComputer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>
Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationParallel Arch. Review
Parallel Arch. Review Zeljko Zilic McConnell Engineering Building Room 536 Main Points Understanding of the design and engineering of modern parallel computers Technology forces Fundamental architectural
More information