A Scalable SAS Machine
|
|
- Allyson Quinn
- 5 years ago
- Views:
Transcription
1 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scalable ache oherence Design principles of scalable cache protocols Overview of design space (8.1) Basic operation of directory protocols (8.2) erformance issues (8.3) orrectness issues (8.4) ase studies to focus on detailed issues ( ) 2/18/2009 slide 1 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Scalable SS achine Scalable interconnection network Three important design decisions: Scalable interconnection network Distributed memory organization Scalable cache coherence protocol 2/18/2009 slide 2 OD: Lecture 8 er Stenström 2008, Sally. ckee
2 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Directory rotocols Snooping protocols use broadcasting and do not scale Scalable interconnection network N Directory entry associated with each memory block Bookkeeping tracks which nodes have copies along with state of memory ll global requests for that block are sent 2/18/2009 slide 3 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 ain em B1 Snooping dapter ommon pproach B2 Snooping dapter (a) Snooping-snooping B1 ain em Dir. B1 ain em ssist Network ssist (b) Snooping-directory B1 ain em Dir. Ss form building blocks in larger systems Network1 Network1 Network1 Network1 Directory adapter Directory adapter Dir/Snoop y adapter Dir/Snoop y adapter Network2 (c) Directory-directory Bus (or Ring) (d) Directory-snooping Examples: onvex Exemplar (directory-directory) SGI Origin, Sequent NU-Q, HL, (snooping-directory) 2/18/2009 slide 4 OD: Lecture 8 er Stenström 2008, Sally. ckee
3 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Operation of a Simple Directory rotocol 1(2) Interconnection network Local Home Remote Local node: Node initiating request Home node: Node with directory entry for block Remote node: Other node(s) involved in transaction 2/18/2009 slide 5 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Operation of a Simple Directory rotocol 2(2) Requestor 3. Read req. to owner 4a. Data Reply 1. Read request 2. Reply with owner identity 4b. Revision message Directory node for block Requestor 1. RdEx request 2. Reply with sharers identity 3a. 3b. Inval. req. Inval. req. to sharer to sharer 4a. 4b. Inval. ack Inval. ack Directory node Node with dirty copy (a) Read miss to a block in dirty state Sharer Sharer (b) Write miss to a block with two sharers Important performance issues: Number, latency, and traffic of transactions 2/18/2009 slide 6 OD: Lecture 8 er Stenström 2008, Sally. ckee
4 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Implementation of a Simple rotocol ache ache Interconnection Network Full vector directory: + 1 bits/block directory entries are distributed emory Directory presence bits dirty bit Scalability considerations: erformance: how does latency and bandwidth scale? ost: how does directory grow in size with? 2/18/2009 slide 7 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 erformance Insights Inherent program characteristics: determine whether directories provide big advantages over broadcast provide insights into how to organize and store directory information haracteristics that matter frequency of write misses how many sharers on a write miss how these scale 2/18/2009 slide 8 OD: Lecture 8 er Stenström 2008, Sally. ckee
5 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 ache Invalidation atterns LU Invalidation atterns # of invalidations Ocean Invalidation atterns to to to to to to to to to to to to to 59 % of shared writes 60 to to to to to to to to to to to to to to to 63 2/18/2009 slide 9 # of invalidations OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Sharing atterns: Summary ommon case: only a few sharers at a write, scales slowly with ode and read-only objects: no problem, never or rarely written igratory objects: only 1-2 invalidations per write ostly read objects: large but infrequent Frequently read/written objects: small but frequent invalidations Synchronization objects: low contention -> small invalidations Implications: directories useful in containing traffic (as opposed to snoop) techniques to reduce storage overhead can be important 2/18/2009 slide 10 OD: Lecture 8 er Stenström 2008, Sally. ckee
6 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Directory rotocol Taxonomy Directory Schemes entralized Distributed How to find source of directory information Flat Hierarchical How to locate copies emory-based ache-based ll approaches have different tradeoffs wrt scalability considerations 2/18/2009 slide 11 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 entralized Directory Directory Scalable interconnection network 1 N ll transactions to all blocks go to a centralized directory ay become a bottleneck Has only been popular for a small number of nodes 2/18/2009 slide 12 OD: Lecture 8 er Stenström 2008, Sally. ckee
7 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Hierarchical Directories DIR DIR Extension of snooping concept Bandwidth: limited at the root Latency: multiple directory lookups on the way ost: duplication of entries but smaller entries Therefore, not a popular approach 2/18/2009 slide 13 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Flat emory-based Schemes Example: the simple directory protocol full bit vector Scaling of performance characteristics write traffic: proportional to number of sharers write latency: invalidations can issue in parallel Scaling of storage for directory Example: (assuming 64-Byte lines) 64 nodes: 12.5% overhead 256 nodes: 50% overhead 1024 nodes: 200% overhead Storage grows as * 2/18/2009 slide 14 OD: Lecture 8 er Stenström 2008, Sally. ckee
8 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Reducing Storage Overhead Optimizations for full bit vector schemes increase cache block size use multiprocessor nodes 256 procs, 4/cluster, 128B line: 6.25% overhead rovide pointers to a few nodes (address the term) intuition: most blocks cached by only a few nodes =1024 => 10 bit pointers, can accommodate 100 pointers need an overflow strategy when there are more sharers Reducing height: (address the term) intuition: # memory blocks >> # cache blocks organize directory as a cache, rather than one entry/block 2/18/2009 slide 15 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Flat, ache-based Schemes How they work: home has a single pointer that points to head of list cache has pointer to next sharer on read, cache is linked into list ache ache ain emory (Home) Node 0 Node 1 Node 2 ache on write, send invalidations down the list Example: Scalable oherent Interface (SI) IEEE Standard 2/18/2009 slide 16 OD: Lecture 8 er Stenström 2008, Sally. ckee
9 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Scaling roperties (ache-based) erformance: Traffic on write: proportional to number of sharers Latency on write: proportional to number of sharers Storage overhead: quite good scaling along both and axes Other properties: good: mature, IEEE Standard, fair bad: complex 2/18/2009 slide 17 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 orrectness Issues Ensure basics of coherence at state transition level lines are updated/invalidated/fetched correct state transitions and actions happen Ensure ordering and serialization constraints are met coherence (single location), consistency (multiple locations) avoid deadlock, livelock, starvation roblems amplified in comparison with bus-based machines multiple copies ND multiple paths through network large latency makes optimizations attractive 2/18/2009 slide 18 OD: Lecture 8 er Stenström 2008, Sally. ckee
10 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 oherence Enforcement Revisit the simple directory protocol Requestor 3. Read req. to o wner Data Reply 4a Read request Reply with o wner identity 4b. Re vision message Dir ectory node for block Inv al. req. to sharer In val. ack Requestor 3a. 3b. In val. req. to sharer 4a. 4b RdEx request Reply with sharers identity In val. ack Dir ectory node Node with dirty cop y Shar er Shar er (a) Read miss to a block in dirty state (b) Write miss to a block with tw o sharers oherence is enforced because writes are serialized through home memory module invalidations are serialized if single path between any two nodes 2/18/2009 slide 19 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Sequential onsistency =1; while (==0) ; B=1; while (B==0) ; print ; em ache em ache :0->1 :0 ache B:0->1 em =1 delay B=1 =1 Interconnection Netw ork How do we guarantee write atomicity? 2/18/2009 slide 20 OD: Lecture 8 er Stenström 2008, Sally. ckee
11 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 Enforcing Write tomicity with the Simple rotocol Requestor 3. Read req. to o wner Data Reply 4a Read request Reply with o wner identity 4b. Re vision message Dir ectory node for block Requestor RdEx request Reply with sharers identity 3a. 3b. Inv al. req. In val. req. to sharer to sharer 4a. 4b. In val. ack In val. ack Dir ectory node Node with dirty cop y Shar er Shar er (a) Read miss to a block in dirty state (b) Write miss to a block with tw o sharers Requestor may not issue another global transaction until all invalidations have been acknowledged 2/18/2009 slide 21 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 Deadlock, Livelock,, Starvation Request-response protocol Similar issues to those discussed earlier a node may receive too many messages flow control can cause deadlock separate request and reply networks with request-reply protocol New problem: protocols often are not strict request-reply e.g. rd-excl generates inval requests (which generate ack replies) other cases to reduce latency and allow concurrency ust address livelock and starvation 2/18/2009 slide 22 OD: Lecture 8 er Stenström 2008, Sally. ckee
12 arallel omputer Organization and Design : Lecture 8 er Stenström. 2008, Sally. ckee 2009 rotocol Enhancements for Latency Forwarding messages: memory-based protocols 3:interv ention 1: req 4a:revise L H R 2:reply 4b:response (a) Strict r equest-r eply 1: req 2:interv ention L H R 4:reply 3:response (a) Intervention forwar ding 1: req 2:interv ention 3a:re vise L H R 3b:response (a) Reply forwar ding 2/18/2009 slide 23 OD: Lecture 8 er Stenström 2008, Sally. ckee 2009 rotocol Enhancements for Latency Forwarding messages: cache-based protocols 1: inval 3:inval 5:inval 1: inval 2a:inval 3a:inval H S 1 S 2 2:ack 4:ack 6:ack S 3 H S 1 S 2 2b:ack 3b:ack 4b:ack S 3 (a) (b) 1:inval 2:inval 3:inval H S 1 S 2 S 3 4:ack (c) 2/18/2009 slide 24 OD: Lecture 8 er Stenström 2008, Sally. ckee
NOW Handout Page 1. Context for Scalable Cache Coherence. Cache Coherence in Scalable Machines. A Cache Coherent System Must:
ontext for Scalable ache oherence ache oherence in Scalable Machines Realizing gm Models through net transaction protocols - efficient node-to-net interface - interprets transactions Switch Scalable network
More informationCache Coherence in Scalable Machines
ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationScalable Cache Coherence
arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy
More informationScalable Cache Coherent Systems
NUM SS Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication
More informationScalable Cache Coherent Systems Scalable distributed shared memory machines Assumptions:
Scalable ache oherent Systems Scalable distributed shared memory machines ssumptions: rocessor-ache-memory nodes connected by scalable network. Distributed shared physical address space. ommunication assist
More informationScalable Cache Coherence. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Scalable Cache Coherence Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels
More informationCache Coherence: Part II Scalable Approaches
ache oherence: art II Scalable pproaches Hierarchical ache oherence Todd. Mowry S 74 October 27, 2 (a) 1 2 1 2 (b) 1 Topics Hierarchies Directory rotocols Hierarchies arise in different ways: (a) processor
More informationRecall: Sequential Consistency Example. Implications for Implementation. Issues for Directory Protocols
ecall: Sequential onsistency Example S252 Graduate omputer rchitecture Lecture 21 pril 14 th, 2010 Distributed Shared ory rof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs252 rocessor 1 rocessor
More informationScalable Multiprocessors
Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationRecall Ordering: Scheurich and Dubois. CS 258 Parallel Computer Architecture Lecture 21 P 1 : Directory Based Protocols. Terminology for Shared Memory
ecall Ordering: Scheurich and Dubois S 258 arallel omputer rchitecture Lecture 21 : 1 : W Directory Based rotocols 2 : pril 14, 28 rof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs258 Exclusion
More informationRecall: Sequential Consistency Example. Recall: MSI Invalidate Protocol: Write Back Cache. Recall: Ordering: Scheurich and Dubois
ecall: Sequential onsistency Example S22 Graduate omputer rchitecture Lecture 2 pril 9 th, 212 Distributed Shared Memory rof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs22 rocessor 1 rocessor
More informationCache Coherence in Scalable Machines
Cache Coherence in Scalable Machines COE 502 arallel rocessing Architectures rof. Muhamed Mudawar Computer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationCMSC 411 Computer Systems Architecture Lecture 21 Multiprocessors 3
MS 411 omputer Systems rchitecture Lecture 21 Multiprocessors 3 Outline Review oherence Write onsistency dministrivia Snooping Building Blocks Snooping protocols and examples oherence traffic and performance
More informationLecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues
Lecture 8: Directory-Based Cache Coherence Topics: scalable multiprocessor organizations, directory protocol design issues 1 Scalable Multiprocessors P1 P2 Pn C1 C2 Cn 1 CA1 2 CA2 n CAn Scalable interconnection
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches Overview ost cache protocols are more complicated than two state Snooping not effective for network-based systems Consider three
More informationCache Coherence in Scalable Machines
Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed memory plus coherent replication Scalable distributed memory machines -C-M nodes connected by network communication
More informationScalable Multiprocessors
arallel Computer Organization and Design : Lecture 7 er Stenström. 2008, Sally A. ckee 2009 Scalable ultiprocessors What is a scalable design? (7.1) Realizing programming models (7.2) Scalable communication
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationCache Coherence. Todd C. Mowry CS 740 November 10, Topics. The Cache Coherence Problem Snoopy Protocols Directory Protocols
Cache Coherence Todd C. Mowry CS 740 November 10, 1998 Topics The Cache Coherence roblem Snoopy rotocols Directory rotocols The Cache Coherence roblem Caches are critical to modern high-speed processors
More informationCOMP Parallel Computing. CC-NUMA (1) CC-NUMA implementation
COP 633 - Parallel Computing Lecture 10 September 27, 2018 CC-NUA (1) CC-NUA implementation Reading for next time emory consistency models tutorial (sections 1-6, pp 1-17) COP 633 - Prins CC-NUA (1) Topics
More informationLecture 5: Directory Protocols. Topics: directory-based cache coherence implementations
Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations 1 Flat Memory-Based Directories Block size = 128 B Memory in each node = 1 GB Cache in each node = 1 MB For 64 nodes
More informationCMSC 611: Advanced. Distributed & Shared Memory
CMSC 611: Advanced Computer Architecture Distributed & Shared Memory Centralized Shared Memory MIMD Processors share a single centralized memory through a bus interconnect Feasible for small processor
More informationCMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis Interconnection Networks Massively processor networks (MPP) Thousands of nodes
More informationLecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols
Lecture 3: Directory Protocol Implementations Topics: coherence vs. msg-passing, corner cases in directory protocols 1 Future Scalable Designs Intel s Single Cloud Computer (SCC): an example prototype
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationCS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors
CS654 Advanced Computer Architecture Lec 14 Directory Based Multiprocessors Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University
More informationRecall: Sequential Consistency of Directory Protocols How to get exclusion zone for directory protocol? Recall: Mechanisms for reducing depth
S Graduate omputer Architecture Lecture 1 April 11 th, 1 Distributed Shared Memory (con t) Synchronization rof John D. Kubiatowicz http://www.cs.berkeley.edu/~kubitron/cs Recall: Sequential onsistency
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationLecture 1: Introduction
Lecture 1: Introduction ourse organization: 4 lectures on cache coherence and consistency 2 lectures on transactional memory 2 lectures on interconnection networks 4 lectures on caches 4 lectures on memory
More information5008: Computer Architecture
5008: Computer Architecture Chapter 4 Multiprocessors and Thread-Level Parallelism --II CA Lecture08 - multiprocessors and TLP (cwliu@twins.ee.nctu.edu.tw) 09-1 Review Caches contain all information on
More informationReview. EECS 252 Graduate Computer Architecture. Lec 13 Snooping Cache and Directory Based Multiprocessors. Outline. Challenges of Parallel Processing
EEC 252 Graduate Computer Architecture Lec 13 nooping Cache and Directory Based Multiprocessors David atterson Electrical Engineering and Computer ciences University of California, Berkeley http://www.eecs.berkeley.edu/~pattrsn
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 30: Multiprocessors Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Flynn Categories SISD (Single Instruction Single
More informationPage 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology
CS252 Graduate Computer Architecture Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency Review: Multiprocessor Basic issues and terminology Communication:
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture: Coherence, Synchronization. Topics: directory-based coherence, synchronization primitives (Sections )
Lecture: Coherence, Synchronization Topics: directory-based coherence, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory) keeps track
More informationReview: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology
Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share
More informationComputer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University
Computer Architecture Lecture 10: Thread Level Parallelism II (Chapter 5) Chih Wei Liu 劉志尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Caches contain all information on state of cached
More informationMemory Hierarchy in a Multiprocessor
EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected
More informationCOSC 6385 Computer Architecture. - Thread Level Parallelism (III)
OS 6385 omputer Architecture - Thread Level Parallelism (III) Spring 2013 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More information5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins
5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview Synchronization hardware primitives Cache Coherency Issues Coherence misses, false sharing Cache coherence and interconnects
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationLecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol
Lecture 24: Thread Level Parallelism -- Distributed Shared Memory and Directory-based Coherence Protocol CSE 564 Computer Architecture Fall 2016 Department of Computer Science and Engineering Yonghong
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More information5 Chip Multiprocessors (II) Robert Mullins
5 Chip Multiprocessors (II) ( MPhil Chip Multiprocessors (ACS Robert Mullins Overview Synchronization hardware primitives Cache Coherency Issues Coherence misses Cache coherence and interconnects Directory-based
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (III)
OS 6385 omputer Architecture - Thread Level Parallelism (III) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David uller, University of alifornia, Berkley http://www.eecs.berkeley.edu/~culler/courses/cs252-s05
More informationECSE 425 Lecture 30: Directory Coherence
ECSE 425 Lecture 30: Directory Coherence H&P Chapter 4 Last Time Snoopy Coherence Symmetric SMP Performance 2 Today Directory- based Coherence 3 A Scalable Approach: Directories One directory entry for
More informationPerformance study example ( 5.3) Performance study example
erformance study example ( 5.3) Coherence misses: - True sharing misses - Write to a shared block - ead an invalid block - False sharing misses - ead an unmodified word in an invalidated block CI for commercial
More informationLecture 7: Implementing Cache Coherence. Topics: implementation details
Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,
More informationLecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro
Lecture 7: M Wrap-Up, ache coherence Topics: handling M errors and writes, cache coherence intro 1 Optimizations for Writes (Energy, Lifetime) Read a line before writing and only write the modified bits
More informationApproaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures
Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache
More informationFlynn s Classification
Flynn s Classification SISD (Single Instruction Single Data) Uniprocessors MISD (Multiple Instruction Single Data) No machine is built yet for this type SIMD (Single Instruction Multiple Data) Examples:
More informationLecture 24: Board Notes: Cache Coherency
Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationLect. 6: Directory Coherence Protocol
Lect. 6: Directory Coherence Protocol Snooping coherence Global state of a memory line is the collection of its state in all caches, and there is no summary state anywhere All cache controllers monitor
More informationCSE 502 Graduate Computer Architecture Lec 19 Directory-Based Shared-Memory Multiprocessors & MP Synchronization
CSE 502 Graduate Computer Architecture Lec 19 Directory-Based Shared-Memory Multiprocessors & MP Synchronization Larry Wittie Computer Science, StonyBrook University http://www.cs.sunysb.edu/~cse502 and
More information... The Composibility Question. Composing Scalability and Node Design in CC-NUMA. Commodity CC node approach. Get the node right Approach: Origin
The Composibility Question Composing Scalability and Node Design in CC-NUMA CS 28, Spring 99 David E. Culler Computer Science Division U.C. Berkeley adapter Sweet Spot Node Scalable (Intelligent) Interconnect
More informationA More Sophisticated Snooping-Based Multi-Processor
Lecture 16: A More Sophisticated Snooping-Based Multi-Processor Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2014 Tunes The Projects Handsome Boy Modeling School (So... How
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationMultiprocessor Systems
Multiprocessor ystems 55:132/22C:160 pring2011 1 (vs. VAX-11/780) erformance 10000 1000 100 10 1 Uniprocessor erformance (ECint) From Hennessy and atterson, Computer Architecture: A Quantitative Approach,
More informationEECS 570 Lecture 11. Directory-based Coherence. Winter 2019 Prof. Thomas Wenisch
Directory-based Coherence Winter 2019 Prof. Thomas Wenisch http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk, Reinhardt,
More informationThread- Level Parallelism. ECE 154B Dmitri Strukov
Thread- Level Parallelism ECE 154B Dmitri Strukov Introduc?on Thread- Level parallelism Have mul?ple program counters and resources Uses MIMD model Targeted for?ghtly- coupled shared- memory mul?processors
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationCISC 662 Graduate Computer Architecture Lectures 15 and 16 - Multiprocessors and Thread-Level Parallelism
CISC 662 Graduate Computer Architecture Lectures 15 and 16 - Multiprocessors and Thread-Level Parallelism Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationSIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto
SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols
More informationCaches. Parallel Systems. Caches - Finding blocks - Caches. Parallel Systems. Parallel Systems. Lecture 3 1. Lecture 3 2
Parallel ystems Parallel ystems Parallel ystems Outline for lecture 3 s (a quick review) hared memory multiprocessors hierarchies coherence nooping protocols» nvalidation protocols (, )» Update protocol
More informationSwitch Gear to Memory Consistency
Outline Memory consistency equential consistency Invalidation vs. update coherence protocols MI protocol tate diagrams imulation Gehringer, based on slides by Yan olihin 1 witch Gear to Memory Consistency
More informationParallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence
Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly
More informationShared Memory SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB
Shared SMP and Cache Coherence (cont) Adapted from UCB CS252 S01, Copyright 2001 USB 1 Review: Snoopy Cache Protocol Write Invalidate Protocol: Multiple readers, single writer Write to shared data: an
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100 Consistency in SMs CU-1 CU-2 A 200 Cache-1
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationShared Memory Multiprocessors
Shared Memory Multiprocessors Jesús Labarta Index 1 Shared Memory architectures............... Memory Interconnect Cache Processor Concepts? Memory Time 2 Concepts? Memory Load/store (@) Containers Time
More informationBus-Based Coherent Multiprocessors
Bus-Based Coherent Multiprocessors Lecture 13 (Chapter 7) 1 Outline Bus-based coherence Memory consistency Sequential consistency Invalidation vs. update coherence protocols Several Configurations for
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationMultiprocessor Systems
White Paper: Virtex-II Series R WP162 (v1.1) April 10, 2003 Multiprocessor Systems By: Jeremy Kowalczyk With the availability of the Virtex-II Pro devices containing more than one Power PC processor and
More informationModule 14: "Directory-based Cache Coherence" Lecture 31: "Managing Directory Overhead" Directory-based Cache Coherence: Replacement of S blocks
Directory-based Cache Coherence: Replacement of S blocks Serialization VN deadlock Starvation Overflow schemes Sparse directory Remote access cache COMA Latency tolerance Page migration Queue lock in hardware
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationInterconnect Routing
Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through
More informationPlatforms Design Challenges with many cores
latforms Design hallenges with many cores Raj Yavatkar, Intel Fellow Director, Systems Technology Lab orporate Technology Group 1 Environmental Trends: ell 2 *Other names and brands may be claimed as the
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationLecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing
Lecture 26: Multiprocessors Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing 1 Cache Coherence Protocols Directory-based: A single location (directory)
More information