Lecture 9 Outline. Lower-Level Protocol Choices. MESI (4-state) Invalidation Protocol. MESI: Processor-Initiated Transactions
|
|
- Cuthbert Dorsey
- 6 years ago
- Views:
Transcription
1 Outline protocol Dragon updatebased protocol mpact of protocol optimizations LowerLevel Protocol Choices observed in state: what transition to make? Change to : assume ll read again soon good for mostly read data what about migratory data, thus: Change to : assume other will write to it (ynapse) read and write, then you read and write, then X reads and writes... equent ymmetry and T Alewife use adaptive protocols Gehringer, based on slides by Yan olihin 1 Gehringer, based on slides by Yan olihin (state) nvalidation Protocol : Processornitiated Transactions Problem with protocol Rd, Wr sequence incurs transactions even when no one is sharing (e.g., serial program!) ( ) followed by X or BusUpgr ( ) n general, penalizing serial programs is unacceptable Add exclusive state: nvalid odified (dirty) hared (two or more caches may have copies) xclusive: (only this cache has clean copy, same value as in memory) How to decide or? Need to check whether someone else has copy hared signal on bus: wiredor line asserted in response to PrWr/ PrRd/ PrWr/X PrRd/ PrWr/X PrRd/() PrRd/ PrWr/ PrRd/(~) Gehringer, based on slides by Yan olihin 3 Gehringer, based on slides by Yan olihin : Busnitiated Transactions tate Transition Diagram PrRd PrWr/ X/Flush X/Flush X/Flush PrWr/X PrWr/X PrWr/ / Flush X/Flush X/Flush PrRd/ () 1 X/Flush1 Gehringer, based on slides by Yan olihin 5 / X/ PrRd/ () () means shared line asserted on transaction Gehringer, based on slides by Yan olihin 1
2 Flush vs. Flush1 (Flush' in textbook) Visualization Flush: mandatory Flush' (Flush1): happens only when Cachetocache sharing is used, and, Only one cache flushes data Cache Bus ain emory Gehringer, based on slides by Yan olihin 7 Gehringer, based on slides by Yan olihin Visualization Visualization Gehringer, based on slides by Yan olihin 9 Gehringer, based on slides by Yan olihin 1 Visualization Visualization wr &X (X=) X= One less bus request due to xclusive state, esp. for serial programs Gehringer, based on slides by Yan olihin 11 Gehringer, based on slides by Yan olihin 1
3 Visualization Visualization X= X= X= X= 3 wr &X X=3 Flush BusUpgr X= Note: BusUpgr instead of X Gehringer, based on slides by Yan olihin 13 Gehringer, based on slides by Yan olihin 1 Visualization Visualization X= 3 X=3 X=3 X=3 Flush X= 3 X=3 Gehringer, based on slides by Yan olihin 15 Gehringer, based on slides by Yan olihin 1 Visualization xample (CachetoCache Transfer) Proc Action W1 tate tate tate Bus Action Data From em X=3 X=3 X=3 W3 X cache em Flush1 cache X=3 Referred to as Cachetocache transfer in llinois protocol R 1 / Cache* Gehringer, based on slides by Yan olihin 17 * Data from memory if no cachecache transfer, / Gehringer, based on slides by Yan olihin 1 3
4 xample (CachetoCache Transfer+BusUpgr) LowerLevel Protocol Choices Proc Action W1 W3 tate tate tate Bus Action BusUpgr Data From em cache cache Who supplies data on miss when not in state: memory or cache? Original, lllinois : cache assume cache faster than memory (Cachetocache transfer) Not necessarily true Adds complexity How does memory know it should supply data? (must wait for caches) election algorithm if multiple caches have valid data Valuable for distributed memory ay be cheaper to obtain from nearby cache than distant memory specially when constructed out of P nodes (tanford DAH) R 1 / Cache* * Data from memory if no cachecache transfer, / Gehringer, based on slides by Yan olihin 19 Gehringer, based on slides by Yan olihin Outline protocol Dragon updatebased protocol mpact of protocol optimizations Dragon Writeback Update Protocol Four states xclusiveclean (): and memory have it hared clean (c):, others, and maybe memory, but m not owner hared modified (m): and others but not memory, and m the owner m and c can coexist in different caches, with at most one m odified or dirty (): and, no one else On replacement: c can silently drop, m has to flush No invalid state f in cache, cannot be invalid f not present in cache, can view as being in notpresent or invalid state New processor events: PrRdiss, PrWriss ntroduced to specify actions when block not present in cache New bus transaction: BusUpd Broadcasts single word written on bus; updates other relevant caches Gehringer, based on slides by Yan olihin 1 Gehringer, based on slides by Yan olihin Dragon tate Transition Diagram Dragon: Processornitiated Transactions BusUpd/Update PrRd/ PrRd/ PrRdiss/ () / PrWr/ c PrRdiss/ () PrRdiss/(~) PrWr/BusUpd() c PrRdiss/() PrWr/BusUpd() BusUpd/Update PrWr/ BusUpd() PrWr/ PrWr/BusUpd(~) PrWriss/ ((); BusUpd) m PrWriss/ () PrWriss/ (();BusUpd) m PrWr/BusUpd(~) PrRdiss/(~) PrWr/BusUpd() PrWr/BusUpd() PrWr/ PrRd/ PrWr/BusUpd() PrRd/ PrWr/ Gehringer, based on slides by Yan olihin 3 Gehringer, based on slides by Yan olihin
5 Dragon: Busnitiated Transactions / BusUpd/Update / c BusUpd/Update Cache m Bus ain emory Gehringer, based on slides by Yan olihin 5 Gehringer, based on slides by Yan olihin Gehringer, based on slides by Yan olihin 7 Gehringer, based on slides by Yan olihin wr &X (X=) X= One less bus request due to xclusive state, esp. for serial programs Gehringer, based on slides by Yan olihin 9 Gehringer, based on slides by Yan olihin 3 5
6 X= m X= c X= 3 m c X= 3 c wr &X X=3 m BusUpd Note: BusUpdate instead of BusUpgr (no inval is performed) Gehringer, based on slides by Yan olihin 31 Gehringer, based on slides by Yan olihin 3 X=3 c X=3 m X=3 c X=3 m This is a miss in the and protocols Gehringer, based on slides by Yan olihin 33 Gehringer, based on slides by Yan olihin 3 X=3 c X=3 c X=3 m X=3 c X=3 c X=3 m Note: only one with m is responsible for cachetocache transfer replaces X Gehringer, based on slides by Yan olihin 35 Gehringer, based on slides by Yan olihin 3
7 x d t l x t Dragon xample Proc Action tate tate tate Bus Action Data From em W1 X=3 c X=3 c X=3 m W3 m c c m BusUpd/Upd cache c c m m replaces X Owner responsible for writing back to mem 3 vs. or where writeback only when the line is in state R c c m cache Gehringer, based on slides by Yan olihin 37 Gehringer, based on slides by Yan olihin 3 LowerLevel Protocol Choices Can sharedmodified state be eliminated? f update memory as well on BusUpd transactions (DC Firefly) Dragon protocol doesn t (assumes DRA memory slow to update) hould replacement of an c block be broadcast? Would allow last copy to go to xclusive state and not generate updates Replacement bus transaction is not in critical path, later update may be houldn t update local copy on write hit before controller gets bus Can mess up serialization Coherence, consistency considerations much like writethrough case Outline protocol Dragon updatebased protocol mpact of protocol optimizations n general, many subtle race conditions in protocols But first, let s illustrate quantitative assessment at logical level Gehringer, based on slides by Yan olihin 39 Gehringer, based on slides by Yan olihin Assessing Protocol Tradeoffs ethodology: Use simulator; choose parameters per earlier methodology (default 1B, way cache, byte block, 1 processors; K cache for some) Focus on frequencies, not end performance for now transcends architectural details, but not what we re really after Use idealized memory performance model to avoid changes of reference interleaving across processors with machine parameters Cheap simulation: no need to model contention mpact of Protocol Optimizations vs. (w/ BusUpgr) vs. (w/ X) Traffic (B/s) Traffic (B/s) Barnes/ Barnes/3t Barnes/3t Rdx LU/ LU/3t LU/3tRdx Ocean/ Ocean/3 Ocean/3tRdx Radiosity/ Radiosity/3t Radiosity/3tRdx Radix/ Radix/3t Radix/3t Rdx Raytrace/ ll Raytrace/3t Raytrace/3tRdx x ApplCode/ ApplCode/3t ApplCode/3tRdx Appl ApplData/3t Data/ ApplData/3tRdx OCode/ OCode/3t OCode/3tRdx OData/ OData/3t OData/3tRdx Gehringer, based on slides by Yan olihin 1 = Upgrades instead of readexclusive helps ame story when working sets don t fit for Ocean, Radix, Raytrace Gehringer, based on slides by Yan olihin 7
8 mpact of CacheBlock ize ultiprocessors add new kind of miss to cold, capacity, conflict Coherence misses: Due to invalidations True sharing: Write to same word False sharing: Write to different words Reducing misses architecturally in invalidation protocol Capacity: enlarge cache; increase block size (if spatial locality) Conflict: increase associativity Cold and coherence: only block size ncreasing block size has advantages and disadvantages Can reduce misses if spatial locality is good Can hurt too increase misses due to false sharing if spatial locality not good increase misses due to conflicts in fixedsize cache increase traffic due to fetching unnecessary data and due to false sharing can increase miss C/CC penalty 5 and ummer perhaps. hit F. cost Gehringer, based on slides by Yan olihin 3 mpact of Block ize on iss Rate For default problem size: vary block/line size from 5 Bytes iss rate (%) Barnes/ Barnes/1 Upgrade False sharing True sharing Capacity Cold Barnes/3 Barnes/ Barnes/1 Barnes/5 Lu/ Lu/1 Lu/3 Lu/ Lu/1 Lu/5 Radiosity/ Radiosity/1 Radiosity/3 Radiosity/ Radiosity/1 Radiosity/5 False sharing True sharing Decreases with larger lines: cold, capacity (due to spatial locality), true sharing (due to spatial locality) ncreases with larger lines: false sharing Working set doesn t fit: impact of capacity misses large: (Ocean, Radix) Gehringer, based on slides by Yan olihin iss rate (%) 1 1 Ocean/ Ocean/1 Upgrade Capacity Cold Ocean/3 Ocean/ Ocean/1 Ocean/5 Radix/ Radix/1 Radix/3 Radix/ Radix/1 Radix/5 Raytrace/ Raytrace/1 Raytrace/3 Raytrace/ Raytrace/1 Raytrace/5 mpact of Block ize on Traffic Traffic (bytes/inst) affects performance indirectly through contention 1 1. Traffic (bytes/instruction) Traffic (bytes/flop) Radix/ Radix/1 Radix/3 Radix/ Radix/1 Radix/5 LU/ LU/1 LU/3 LU/ LU/1 LU/5 Ocean/ Ocean/1 Ocean/3 Ocean/ Ocean/1 Ocean/5.1 Traffic (bytes/instructions) Barnes/ Barnes/1 Barnes/3 Barnes/ Barnes/1 Barnes/5 Radiosity/ Radiosity/1 Radiosity/3 Radiosity/ Radiosity/1 Radiosity/5 Raytrace/ Raytrace/1 Raytrace/3 Raytrace/ Raytrace/1 Raytrace/5 Results different than for miss rate: traffic almost always increases When working sets fits, overall traffic still small, except for Radix Fixed overhead is significant component o total traffic often minimized at 13 byte block, not smaller Working set doesn t fit: even 1byte good for Ocean due to capacity traffic behaves in opposite way as the data bus traffic Gehringer, based on slides by Yan olihin 5
Shared Memory Multiprocessors
Parallel Computing Shared Memory Multiprocessors Hwansoo Han Cache Coherence Problem P 0 P 1 P 2 cache load r1 (100) load r1 (100) r1 =? r1 =? 4 cache 5 cache store b (100) 3 100: a 100: a 1 Memory 2 I/O
More informationNOW Handout Page 1. Recap. Protocol Design Space of Snooping Cache Coherent Multiprocessors. Sequential Consistency.
Recap Protocol Design Space of Snooping Cache Coherent ultiprocessors CS 28, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Snooping cache coherence solve difficult problem by applying
More informationCache Coherence: Part 1
Cache Coherence: art 1 Todd C. Mowry CS 74 October 5, Topics The Cache Coherence roblem Snoopy rotocols The Cache Coherence roblem 1 3 u =? u =? $ 4 $ 5 $ u:5 u:5 1 I/O devices u:5 u = 7 3 Memory A Coherent
More informationCaches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2
Parallel ystems Parallel ystems Parallel ystems Outline for lecture 3 s (a quick review) hared memory multiprocessors hierarchies coherence nooping protocols» nvalidation protocols (, )» Update protocol
More informationRecall: Sequential Consistency. CS 258 Parallel Computer Architecture Lecture 15. Sequential Consistency and Snoopy Protocols
CS 258 Parallel Computer Architecture Lecture 15 Sequential Consistency and Snoopy Protocols arch 17, 2008 Prof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs258 ecall: Sequential Consistency
More informationL7 Shared Memory Multiprocessors. Shared Memory Multiprocessors
L7 Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory Multiprocessors Symmetric Multiprocessors (SMPs) Symmetric access to all of main memory from any processor Dominate
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationSwitch Gear to Memory Consistency
Outline Memory consistency equential consistency Invalidation vs. update coherence protocols MI protocol tate diagrams imulation Gehringer, based on slides by Yan olihin 1 witch Gear to Memory Consistency
More informationLecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment
More informationBasic Architecture of SMP. Shared Memory Multiprocessors. Cache Coherency -- The Problem. Cache Coherency, The Goal.
Shared emory ultiprocessors Basic Architecture of SP Buses are good news and bad news The (memory) bus is a point all processors can see and thus be informed of what is happening A bus is serially used,
More informationLecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationApproaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures
Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationLecture-22 (Cache Coherence Protocols) CS422-Spring
Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore
More informationShared Memory Architectures. Shared Memory Multiprocessors. Caches and Cache Coherence. Cache Memories. Cache Memories Write Operation
hared Architectures hared Multiprocessors ngo ander ngo@imit.kth.se hared Multiprocessor are often used pecial Class: ymmetric Multiprocessors (MP) o ymmetric access to all of main from any processor A
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationPerformance of coherence protocols
Performance of coherence protocols Cache misses have traditionally been classified into four categories: Cold misses (or compulsory misses ) occur the first time that a block is referenced. Conflict misses
More informationCS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5)
CS/ECE 757: Advanced Computer Architecture II (Parallel Computer Architecture) Symmetric Multiprocessors Part 1 (Chapter 5) Copyright 2001 Mark D. Hill University of Wisconsin-Madison Slides are derived
More informationECE PP used in class for assessing cache coherence protocols
ECE 5315 PP used in class for assessing cache coherence protocols Assessing Protocol Design The benchmark programs are executed on a multiprocessor simulator The state transitions observed determine the
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More informationProcessor Architecture
Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their
More information[ 5.4] What cache line size is performs best? Which protocol is best to use?
Performance results [ 5.4] What cache line size is performs best? Which protocol is best to use? Questions like these can be answered by simulation. However, getting the answer write is part art and part
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationThe MESI State Transition Graph
Small-scale shared memory multiprocessors Semantics of the shared address space model (Ch. 5.3-5.5) Design of the M(O)ESI snoopy protocol Design of the Dragon snoopy protocol Performance issues Synchronization
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More informationSnooping-Based Cache Coherence
Lecture 10: Snooping-Based Cache Coherence Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2017 Tunes Elle King Ex s & Oh s (Love Stuff) Once word about my code profiling skills
More informationCache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri
Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationA three-state update protocol
A three-state update protocol Whenever a bus update is generated, suppose that main memory as well as the caches updates its contents. Then which state don t we need? What s the advantage, then, of having
More informationAlewife Messaging. Sharing of Network Interface. Alewife User-level event mechanism. CS252 Graduate Computer Architecture.
CS252 Graduate Computer Architecture Lecture 18 April 5 th, 2010 ory Consistency Models and Snoopy Bus Protocols Alewife Messaging Send message write words to special network interface registers Execute
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More information4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins
4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationCS315A Midterm Solutions
K. Olukotun Spring 05/06 Handout #14 CS315a CS315A Midterm Solutions Open Book, Open Notes, Calculator okay NO computer. (Total time = 120 minutes) Name (please print): Solutions I agree to abide by the
More informationCaches (Writing) Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. P & H Chapter 5.2 3, 5.5
s (Writing) Hakim Weatherspoon CS, Spring Computer Science Cornell University P & H Chapter.,. Administrivia Lab due next onday, April th HW due next onday, April th Goals for Today Parameter Tradeoffs
More informationLecture 20: Multi-Cache Designs. Spring 2018 Jason Tang
Lecture 20: Multi-Cache Designs Spring 2018 Jason Tang 1 Topics Split caches Multi-level caches Multiprocessor caches 2 3 Cs of Memory Behaviors Classify all cache misses as: Compulsory Miss (also cold-start
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationCaches. Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches akim Weatherspoon CS 341, Spring 212 Computer Science Cornell University See P& 5.1, 5.2 (except writes) ctrl ctrl ctrl inst imm B A B D D Big Picture: emory emory: big & slow vs Caches: small &
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationThread- Level Parallelism. ECE 154B Dmitri Strukov
Thread- Level Parallelism ECE 154B Dmitri Strukov Introduc?on Thread- Level parallelism Have mul?ple program counters and resources Uses MIMD model Targeted for?ghtly- coupled shared- memory mul?processors
More informationCache Coherence in Scalable Machines
Cache Coherence in Scalable Machines COE 502 arallel rocessing Architectures rof. Muhamed Mudawar Computer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationFall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley
Fall 2012 EE 6633: Architecture of Parallel Computers Lecture 4: Shared Address Multiprocessors Acknowledgement: Dave Patterson, UC Berkeley Avinash Kodi Department of Electrical Engineering & Computer
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationCache Coherence in Scalable Machines
ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationCMSC 611: Advanced. Distributed & Shared Memory
CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor
More informationScalable Cache Coherence
arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy
More informationCaches Concepts Review
Caches Concepts Review What is a block address? Why not bring just what is needed by the processor? What is a set associative cache? Write-through? Write-back? Then we ll see: Block allocation policy on
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationChapter Seven. SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors)
Chapter Seven emories: Review SRA: value is stored on a pair of inverting gates very fast but takes up more space than DRA (4 to transistors) DRA: value is stored as a charge on capacitor (must be refreshed)
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationLecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols
Lecture 3: Directory Protocol Implementations Topics: coherence vs. msg-passing, corner cases in directory protocols 1 Future Scalable Designs Intel s Single Cloud Computer (SCC): an example prototype
More informationScalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:
Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication assist
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationSIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto
SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols
More informationThe Cache Write Problem
Cache Coherency A multiprocessor and a multicomputer each comprise a number of independent processors connected by a communications medium, either a bus or more advanced switching system, such as a crossbar
More informationECE7660 Parallel Computer Architecture. Shared Memory Multiprocessors
ECE7660 Parallel Computer Architecture Shared Memory Multiprocessors 1 Layer Perspective CAD Database Scientific modeling Parallel applications Multipr ogramming Shar ed addr ess Message passing Data parallel
More informationCache Coherence: Part II Scalable Approaches
ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor
More informationLecture 24: Board Notes: Cache Coherency
Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will
More informationCray XE6 Performance Workshop
Cray XE6 Performance Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh ymmetric MultiProcessing Each processor in an MP has equal access to all parts of memory same latency and
More informationA Scalable SAS Machine
arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scalable ache oherence Design principles of scalable cache protocols Overview of design space (8.1) Basic operation
More informationBus-based shared-memory multiprocessors, or symmetric multiprocessors,
Caching in Distributed ystems Aleksandar ilenkovic University of Belgrade n bus-based shared-memory multiprocessors, several techniques reduce cache misses and bus traffic, the key obstacles to high performance.
More informationBus-Based Coherent Multiprocessors
Bus-Based Coherent Multiprocessors Lecture 13 (Chapter 7) 1 Outline Bus-based coherence Memory consistency Sequential consistency Invalidation vs. update coherence protocols Several Configurations for
More informationOutline. EEL 5764 Graduate Computer Architecture. Chapter 4 - Multiprocessors and TLP. Déjà vu all over again?
Outline EEL 5764 Graduate Computer Architecture Chapter 4 - Multiprocessors and TLP Ann Gordon-Ross Electrical and Computer Engineering University of Florida MP Motivation ID v. IMD v. MIMD Centralized
More informationScalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels
More informationPerformance metrics for caches
Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache / total number of memory references Typically h = 0.90 to 0.97 Equivalent metric:
More informationCMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3
MS 411 omputer Systems rchitecture Lecture 21 Multiprocessors 3 Outline Review oherence Write onsistency dministrivia Snooping Building Blocks Snooping protocols and examples oherence traffic and performance
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationIntroduction. Memory Hierarchy
Introduction Why memory subsystem design is important CPU speeds increase 25%-30% per year DRAM speeds increase 2%-11% per year 1 Memory Hierarchy Levels of memory with different sizes & speeds close to
More informationLecture 11 Cache. Peng Liu.
Lecture 11 Cache Peng Liu liupeng@zju.edu.cn 1 Associative Cache Example 2 Associative Cache Example 3 Associativity Example Compare 4-block caches Direct mapped, 2-way set associative, fully associative
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationSISTEMI EMBEDDED. Computer Organization Memory Hierarchy, Cache Memory. Federico Baronti Last version:
SISTEMI EMBEDDED Computer Organization Memory Hierarchy, Cache Memory Federico Baronti Last version: 20160524 Ideal memory is fast, large, and inexpensive Not feasible with current memory technology, so
More informationComputer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy
Computer Science 432/563 Operating Systems The College of Saint Rose Spring 2016 Topic Notes: Memory Hierarchy We will revisit a topic now that cuts across systems classes: memory hierarchies. We often
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationLimitations of parallel processing
Your professor du jour: Steve Gribble gribble@cs.washington.edu 323B Sieg Hall all material in this lecture in Henessey and Patterson, Chapter 8 635-640 645, 646 654-665 11/8/00 CSE 471 Multiprocessors
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationCache Coherence (II) Instructor: Josep Torrellas CS533. Copyright Josep Torrellas
Cache Coherence (II) Instructor: Josep Torrellas CS533 Copyright Josep Torrellas 2003 1 Sparse Directories Since total # of cache blocks in machine is much less than total # of memory blocks, most directory
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationChapter 2: Memory Hierarchy Design Part 2
Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental
More informationCache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two
Cache Performance, System Performance, and Off-Chip Bandwidth... Pick any Two Bushra Ahsan and Mohamed Zahran Dept. of Electrical Engineering City University of New York ahsan bushra@yahoo.com mzahran@ccny.cuny.edu
More informationCSC/ECE 506: Architecture of Parallel Computers Sample Final Examination with Answers
CSC/ECE 506: Architecture of Parallel Computers Sample Final Examination with Answers This was a 180-minute open-book test. You were to answer five of the six questions. Each question was worth 20 points.
More information