CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
|
|
- Cleopatra Eaton
- 6 years ago
- Views:
Transcription
1 CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago
2 Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today " Extra credit: out later this week " Due: 11:59pm, Dec. 1 st, Thursday " Two late days with penalty! My office hours this week are canceled 2
3 Lecture Outline! Cache coherence (continued)! Memory consistency! Synchronization 3
4 Parallel Computer Architecture! Important for both computer architects and programmers! Why do programmers need to know about parallel computer architecture? " They need to get parallel programs to be correct " They need to optimize performance in the presence of bottlenecks 4
5 Main Multi-Core Design Issues! Cache coherence " Ensure correct operation in the presence of private caches! Memory consistency: ordering of memory operations " What should the programmer expect the hardware to provide?! Shared memory synchronization " Hardware support for synchronization primitives! We will discuss the above issues! Others " Shared resource management, interconnects, 5
6 Memory Coherence Discussions Continued 6
7 Review: Cache Coherence! Intuition: reading value at memory location A should return the last value written to A by any processor! What is last?! Single processor: easy; everything follows program order! Multi-core " What if two processors write at the same time? " What if a read follows a write so closely in time such that it s physical impossible to communicate the new value? " We need all processors to see the same write order during within a single execution (ordering in different executions can be different) 7
8 Properties of Coherence! I. Program order on any processors (von Neumann model)! II. Write propagation: guarantee that updates will propagate! III. Write serialization: provide a consistent global order seen by all processors (need a global point of serialization for this store ordering)! Check yourself: locks/barriers etc. do not solve the coherence issue. Why?! Aside: do uniprocessors have coherence issues? 8
9 Review: Snooping Cache Coherence! Idea " Use a shared bus to provide a single point of serialization " All caches now have two ends, the processor and the bus, and they must observe/respond to both " All caches serve memory requests from their own processors " All caches also snoop the bus to see what everyone else is doing, and take actions accordingly to keep things coherent! Protocols " VI, MSI, MESI,! Tradeoffs " Simple vs. complex protocols, cache-to-cache transfer vs. memory access, update vs. invalidate protocols 9
10 Atomic Bus Assumption! We assume that bus operations are atomic " i.e., one operation finishes before the next one can begin " Simple, but low throughput Req 1# delay # Response 1 Req 2# delay # Response 2 Atomic Req 1 Req 2 Resp 2 Resp 2 Non-atomic! Non-atomic # Transient states " More complex! 10
11 Scalability! Snooping cache protocols are easy to understand and implement! Good for small scale! But what if you would like to have a 1000-core CMP? 11
12 Directory Based Coherence! Idea: A logically-central directory keeps track of where the copies of each cache block reside. Caches consult this directory to ensure coherence.! An example mechanism: " For each cache block in memory, store P+1 bits in directory! One bit for each cache, indicating whether the block is in cache! Exclusive bit: indicates that a cache has the only copy of the block and can update it without notifying others " On a read: set the cache s bit and arrange the supply of data " On a write: invalidate all caches that have the block and reset their bits " Have an exclusive bit associated with each block in each cache (so that the cache can update the exclusive block silently) 12
13 Directory Based Coherence 13
14 Snooping vs. Directory Coherence! Snooping + Simple: + Miss latency (critical path) is short: request # bus transaction to mem. + Global serialization is easy: bus provides this already (arbitration) - Relies on broadcast messages to be seen by all caches (in same order):! Directory # single point of serialization (bus): not scalable + Does not require broadcast to all caches + Much more scalable than bus - Adds indirection to miss latency (critical path): request # dir. # mem. - Requires extra storage space to track sharer sets - Protocols and race conditions are more complex (for high-performance) 14
15 False Sharing P1 ld word0 st word0 ld word0 st word0 Cache block/line: P2 ld word3 st word3 ld word3 st word3 word0 word1 word2 word3 15
16 Quick Tip to Avoid False Sharing! DO " Map variables written by different processors on different cache blocks " Group variables written by the same processor into the same cache block! DON T " Group variables written by different processors into the same cache block 16
17 Which Is Better? int sum [NUM_PROCS]; int product [NUM_PROCS]; sum[mynum]++; product[mynum] *=2; typedef struct { int sum; int product; } Proc; Proc x[num_procs]; x[mynum].sum++; x[mynum].product*=2; 17
18 Takeaway! Cache coherence is critical for ensuring correctness! Software-managed cache coherence very difficult! Hardware coherence protocols to help programmers write correct and high-performance programs " Snooping cache protocols: VI, MSI, MESI " How do they work? " Various design decisions and tradeoffs! Programmers, be aware of and avoid false sharing! 18
19 Main Multi-Core Design Issues! Cache coherence " Ensure correct operation in the presence of private caches! Memory consistency: ordering of memory operations " What should the programmer expect the hardware to provide?! Shared memory synchronization " Hardware support for synchronization primitives! We will discuss the above issues! Others " Shared resource management, interconnects, 19
20 Memory Consistency 20
21 Motivational Example! Dekker s algorithm for critical sections [Adve WRL Research Report 95]! Can the two processors be in the critical section at the same time given that they both obey the von Neumann model? 21
22 Motivational Example! Intuition:! Assume P1 is in critical section, which means Flag2 must be 0, which means P2 cannot have executed Flag2 = 1, which means means P2 cannot be in the critical section. [Adve WRL Research Report 95] 22
23 Both Processors in Critical Section!! Consider a store buffer (aka. write buffer) " Remember this from OoO? " Can also be used with in-order execution! load processor store (and load bypassing) cache 23
24 Both Processors in Critical Section!! Cycle 1 (A): value written in P1 s store buffer, P1 thinks A is executed, but memory is not updated until cycle 51! Cycle 1 (X): value written in P2 s store buffer, P2 thinks X is executed, but memory is not updated until cycle 52! Cycle 2 (B): P1 still sees 0 in Flag2, so it enters critical section! Cycle 2 (Y): P2 still sees 0 in Flag1, so it enters critical section A B X Y [Adve WRL Research Report 95] 24
25 Both Processors in Critical Section!! What happened? P1 s view of memory operations P2 s view of memory operations A (cycle 1) X (cycle 1) B (cycle 2) Y (cycle 2) X (cycle 51) A (cycle 52) A appeared to happen before X X appeared to happen before A 25
26 The Problem! The two processors did NOT see the same order of operations to memory! The happened before relationship between multiple updates to memory was inconsistent between the two processors points of view! As a result, each processor thought the other was not in the critical section 26
27 How Can We Solve The Problem?! Idea: Sequential consistency! I. All processors see the same order of operations to memory " i.e., all memory operations happen in an order (called the global total order) that is consistent across all processors! II. Within this global order, each processor s operations appear in sequential order with respect to its own operations. 27
28 Sequentially Consistent Operation Orders! Potential correct global orders (all are correct):! A B X Y! A X B Y! A X Y B! X A B Y! X A Y B! X Y A B A B X Y [Adve WRL Research Report 95]! Which order (interleaving) is observed depends on implementation and dynamic latencies 28
29 The General Problem of Memory Ordering! A contract between software and hardware specified by the ISA " ISA specifies what programmers can assume about memory ordering, e.g., whether sequential consistency (or another memory consistency model) is provided! Preserving an intuitive model (e.g., sequential consistency) simplifies programmer s life! But makes the hardware designer s life difficult (limits performance optimizations that can be used) 29
30 Memory Ordering in a Single Processor! Specified by the von Neumann model! Sequential consistency is trivially satisfied " Hardware executes the load and store operations in the order specified by the sequential program " Out-of-order execution does not change the semantics 30
31 Memory Ordering in a Multi-Core Design! Each processor s memory operations are in sequential order with respect to the thread running on that processor (assume each processor obeys the von Neumann model)! Multiple processors execute memory operations concurrently " Can we have incorrect execution if the order of memory operations is different from the point of view of different processors?! How does memory ordering affect performance and ease of debugging? 31
32 Memory Consistency vs. Cache Coherence! Consistency is about ordering of all memory operations from different processors (i.e., to different memory locations) " Global ordering of accesses to all memory locations! Coherence is about ordering of operations from different processors to the same memory location " Local ordering of accesses to each cache block 32
33 Memory Consistency Models 33
34 Sequential Consistency (SC)! Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions on Computers, 1979! A multiprocessor system is sequentially consistent if: " the result of any execution is the same as if the operations of all the processors were executed in some sequential order AND " the operations of each individual processor appear in this sequence in the order specified by its program 34
35 Another Way of Interpreting SC! The whole system (all processors and memory) sees the same order of all fours memory operation combinations performed by any processor! Load # load! Load # store! Store # store! Store # load 35
36 Sequential Consistency Abstraction! Memory is a switch that services one load or store at a time from any processor! All processors see the currently serviced load or store at the same time! Each processor s operations are serviced in program order P1 P2 P3 Pn MEMORY 36
37 Consequences of Sequential Consistency 1. Within the same execution, all processors see the same global order of operations to memory # No correctness issue # Satisfies the happened before intuition 2. Across different executions, different global orders can be observed (each of which is sequentially consistent) # Debugging can still be difficult (as order changes across runs) 37
38 Issues with Sequential Consistency (SC)?! Nice abstraction for programming, intuitive! Two issues " Ordering requirements too conservative " Limits the aggressiveness of performance enhancement techniques! E.g., can t use a store buffer 38
39 Weaker Memory Consistency! The ordering of operations is important when the order affects operations on shared data # i.e., when processors need to synchronize! Relaxing sequential consistency " Idea: Programmer specifies regions in which memory operations do not need to be ordered " Memory fence instructions delineate those regions! All memory operations before a fence must complete before fence is executed! All memory operations after the fence must wait for the fence to complete! Fences complete in program order 39
40 Tradeoffs: Weaker Consistency! Advantage " No need to guarantee a very strict order of memory operations # Enables the hardware implementation of performance enhancement techniques to be simpler # Can be higher performance than stricter ordering! Disadvantage " More burden on the programmer or software (need to get the fences correct)! Another example of the programmer-microarchitect tradeoff 40
41 Total Store Order (TSO)! Remember, for sequential consistency, The whole system (all processors and memory) sees the same order of all fours memory operation combinations performed by any processor " Load # load, load # store, store # store, store # load! TSO relaxes the store # load ordering requirement " Major benefit: a FIFO-based store buffer can be used! Modern ISAs that uses the TSO model " SPARC " Also similar to X86 41
42 Total Store Order (TSO) Example! TSO allows both P1 and P2 to be in the critical section! P2 is allowed to see B (load) before A (store)! P1 is allowed to see Y (load) before X (store)! How should a programmer fix Dekker s algorithm? A B X Y [Adve WRL Research Report 95] 42
43 Takeaway! To write correct parallel programs, it is crucial to understand memory consistency models! To ensure correctness! DON T rely on intuition! DON T use only normal memory operations for synchronization! DO use special synchronization instructions provided by the ISA " E.g., memory fences, ACQUIRE/RELEASE pairs, etc.! Different ISA s define different consistency models! Affects portability of programs 43
44 Main Multi-Core Design Issues! Cache coherence " Ensure correct operation in the presence of private caches! Memory consistency: ordering of memory operations " What should the programmer expect the hardware to provide?! Shared memory synchronization " Hardware support for synchronization primitives! We will discuss the above issues! Others " Shared resource management, interconnects, 44
45 How NOT To Implement Locks! Lock: while (lock_var == 1); lock_var = 1;! Unlock: lock_var = 0;! What s the problem? " Testing if lock_var is 1 and setting it to 1 are not atomic " i.e., another processor can set lock_var to 1 in between # Multiple processors acquire the lock! 45
46 Atomic Read & Write Instructions! Aka. read-modify-write! Specify a memory location and a register " I. Value in location read into a register " II. Another value stored into location " Many variants based on what values are allowed in II! Simple example: test&set " Read memory location into specified register " Store constant 1 into location " Successful if value loaded into register is 0 46
Computer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More information740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationSymmetric Multiprocessors: Synchronization and Sequential Consistency
Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November
More informationPage 1. Outline. Coherence vs. Consistency. Why Consistency is Important
Outline ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Memory Consistency Models Copyright 2006 Daniel J. Sorin Duke University Slides are derived from work by Sarita
More informationBus-Based Coherent Multiprocessors
Bus-Based Coherent Multiprocessors Lecture 13 (Chapter 7) 1 Outline Bus-based coherence Memory consistency Sequential consistency Invalidation vs. update coherence protocols Several Configurations for
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationMotivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency
Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions
More informationToday s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming
CS758: Multicore Programming Today s Outline: Shared Memory Review Shared Memory & Concurrency Introduction to Shared Memory Thread-Level Parallelism Shared Memory Prof. David A. Wood University of Wisconsin-Madison
More informationComputer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>
Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationLecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models
Lecture 13: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models 1 Coherence Vs. Consistency Recall that coherence guarantees
More informationMultiprocessors and Locking
Types of Multiprocessors (MPs) Uniform memory-access (UMA) MP Access to all memory occurs at the same speed for all processors. Multiprocessors and Locking COMP9242 2008/S2 Week 12 Part 1 Non-uniform memory-access
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationModule 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency
Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional
More informationMemory Consistency Models
Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationEE382 Processor Design. Processor Issues for MP
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I EE 382 Processor Design Winter 98/99 Michael Flynn 1 Processor Issues for MP Initialization Interrupts Virtual Memory TLB Coherency
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationM4 Parallelism. Implementation of Locks Cache Coherence
M4 Parallelism Implementation of Locks Cache Coherence Outline Parallelism Flynn s classification Vector Processing Subword Parallelism Symmetric Multiprocessors, Distributed Memory Machines Shared Memory
More informationReview: Multiprocessor. CPE 631 Session 21: Multiprocessors (Part 2) Potential HW Coherency Solutions. Bus Snooping Topology
Review: Multiprocessor CPE 631 Session 21: Multiprocessors (Part 2) Department of Electrical and Computer Engineering University of Alabama in Huntsville Basic issues and terminology Communication: share
More informationChapter 5 Thread-Level Parallelism. Abdullah Muzahid
Chapter 5 Thread-Level Parallelism Abdullah Muzahid 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors is saturating + Modern multiple issue processors are becoming very complex
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More informationPage 1. Cache Coherence
Page 1 Cache Coherence 1 Page 2 Memory Consistency in SMPs CPU-1 CPU-2 A 100 cache-1 A 100 cache-2 CPU-Memory bus A 100 memory Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationLecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing
Lecture 26: Multiprocessors Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationAleksandar Milenkovich 1
Parallel Computers Lecture 8: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationCache Coherence Protocols: Implementation Issues on SMP s. Cache Coherence Issue in I/O
6.823, L21--1 Cache Coherence Protocols: Implementation Issues on SMP s Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Coherence Issue in I/O 6.823, L21--2 Processor Processor
More informationSELECTED TOPICS IN COHERENCE AND CONSISTENCY
SELECTED TOPICS IN COHERENCE AND CONSISTENCY Michel Dubois Ming-Hsieh Department of Electrical Engineering University of Southern California Los Angeles, CA90089-2562 dubois@usc.edu INTRODUCTION IN CHIP
More informationPage 1. Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency. Bus Snooping Topology
CS252 Graduate Computer Architecture Lecture 12: Multiprocessor 2: Snooping Protocol, Directory Protocol, Synchronization, Consistency Review: Multiprocessor Basic issues and terminology Communication:
More informationCMSC Computer Architecture Lecture 18: Exam 2 Review Session. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 18: Exam 2 Review Session Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Due: 11:59pm, Dec. 1 st, Thursday " Two late days with
More informationHandout 3 Multiprocessor and thread level parallelism
Handout 3 Multiprocessor and thread level parallelism Outline Review MP Motivation SISD v SIMD (SIMT) v MIMD Centralized vs Distributed Memory MESI and Directory Cache Coherency Synchronization and Relaxed
More information4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins
4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the
More informationAleksandar Milenkovic, Electrical and Computer Engineering University of Alabama in Huntsville
Lecture 18: Multiprocessors Aleksandar Milenkovic, milenka@ece.uah.edu Electrical and Computer Engineering University of Alabama in Huntsville Parallel Computers Definition: A parallel computer is a collection
More information1. Memory technology & Hierarchy
1. Memory technology & Hierarchy Back to caching... Advances in Computer Architecture Andy D. Pimentel Caches in a multi-processor context Dealing with concurrent updates Multiprocessor architecture In
More informationUnit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth
Unit 12: Memory Consistency Models Includes slides originally developed by Prof. Amir Roth 1 Example #1 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)!
More informationParallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence
Parallel Computer Architecture Spring 2018 Distributed Shared Memory Architectures & Directory-Based Memory Coherence Nikos Bellas Computer and Communications Engineering Department University of Thessaly
More informationLecture-22 (Cache Coherence Protocols) CS422-Spring
Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore
More informationDesigning Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve
Designing Memory Consistency Models for Shared-Memory Multiprocessors Sarita V. Adve Computer Sciences Department University of Wisconsin-Madison The Big Picture Assumptions Parallel processing important
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationChapter 5. Thread-Level Parallelism
Chapter 5 Thread-Level Parallelism Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Progress Towards Multiprocessors + Rate of speed growth in uniprocessors saturated
More informationEEC 581 Computer Architecture. Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6)
EEC 581 Computer rchitecture Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6) Chansu Yu Electrical and Computer Engineering Cleveland State University cknowledgement Part of class notes
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationThread-level Parallelism. Synchronization. Explicit multithreading Implicit multithreading Redundant multithreading Summary
Chapter 11: Executing Multiple Threads Modern Processor Design: Fundamentals of Superscalar Processors Executing Multiple Threads Thread-level parallelism Synchronization Multiprocessors Explicit multithreading
More informationThe MESI State Transition Graph
Small-scale shared memory multiprocessors Semantics of the shared address space model (Ch. 5.3-5.5) Design of the M(O)ESI snoopy protocol Design of the Dragon snoopy protocol Performance issues Synchronization
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationMultiprocessors 1. Outline
Multiprocessors 1 Outline Multiprocessing Coherence Write Consistency Snooping Building Blocks Snooping protocols and examples Coherence traffic and performance on MP Directory-based protocols and examples
More informationEECS 470. Lecture 17 Multiprocessors I. Fall 2018 Jon Beaumont
Lecture 17 Multiprocessors I Fall 2018 Jon Beaumont www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth Shen, Smith, Sohi, and Vijaykumar of
More informationProblem Set 5 Solutions CS152 Fall 2016
Problem Set 5 Solutions CS152 Fall 2016 Problem P5.1: Sequential Consistency Problem P5.1.A Can X hold value of 4 after all three threads have completed? Please explain briefly. Yes / No C1, B1-B6, A1-A4,
More informationESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols CA SMP and cache coherence
Computer Architecture ESE 545 Computer Architecture Symmetric Multiprocessors and Snoopy Cache Coherence Protocols 1 Shared Memory Multiprocessor Memory Bus P 1 Snoopy Cache Physical Memory P 2 Snoopy
More informationCS3350B Computer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 7.2: Multicore TLP (1) Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson & Hennessy,
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationImportant Lessons. A Distributed Algorithm (2) Today's Lecture - Replication
Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect
More informationLecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations
Lecture 12: TM, Consistency Models Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations 1 Paper on TM Pathologies (ISCA 08) LL: lazy versioning, lazy conflict detection, committing
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches Overview ost cache protocols are more complicated than two state Snooping not effective for network-based systems Consider three
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationLecture 7: Implementing Cache Coherence. Topics: implementation details
Lecture 7: Implementing Cache Coherence Topics: implementation details 1 Implementing Coherence Protocols Correctness and performance are not the only metrics Deadlock: a cycle of resource dependencies,
More informationCS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel
More informationSwitch Gear to Memory Consistency
Outline Memory consistency equential consistency Invalidation vs. update coherence protocols MI protocol tate diagrams imulation Gehringer, based on slides by Yan olihin 1 witch Gear to Memory Consistency
More informationCS/COE1541: Intro. to Computer Architecture
CS/COE1541: Intro. to Computer Architecture Multiprocessors Sangyeun Cho Computer Science Department Tilera TILE64 IBM BlueGene/L nvidia GPGPU Intel Core 2 Duo 2 Why multiprocessors? For improved latency
More informationProcessor Architecture
Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationReplication. Consistency models. Replica placement Distribution protocols
Replication Motivation Consistency models Data/Client-centric consistency models Replica placement Distribution protocols Invalidate versus updates Push versus Pull Cooperation between replicas Client-centric
More informationParallel Computers. CPE 631 Session 20: Multiprocessors. Flynn s Tahonomy (1972) Why Multiprocessors?
Parallel Computers CPE 63 Session 20: Multiprocessors Department of Electrical and Computer Engineering University of Alabama in Huntsville Definition: A parallel computer is a collection of processing
More informationMultiprocessor Synchronization
Multiprocessor Systems Memory Consistency In addition, read Doeppner, 5.1 and 5.2 (Much material in this section has been freely borrowed from Gernot Heiser at UNSW and from Kevin Elphinstone) MP Memory
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationShared memory. Caches, Cache coherence and Memory consistency models. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16
Shared memory Caches, Cache coherence and Memory consistency models Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Shared memory Caches, Cache
More informationBeyond Sequential Consistency: Relaxed Memory Models
1 Beyond Sequential Consistency: Relaxed Memory Models Computer Science and Artificial Intelligence Lab M.I.T. Based on the material prepared by and Krste Asanovic 2 Beyond Sequential Consistency: Relaxed
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Protocols Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152
More informationLecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues
Lecture 8: Directory-Based Cache Coherence Topics: scalable multiprocessor organizations, directory protocol design issues 1 Scalable Multiprocessors P1 P2 Pn C1 C2 Cn 1 CA1 2 CA2 n CAn Scalable interconnection
More informationDistributed Shared Memory and Memory Consistency Models
Lectures on distributed systems Distributed Shared Memory and Memory Consistency Models Paul Krzyzanowski Introduction With conventional SMP systems, multiple processors execute instructions in a single
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Directory-Based Cache Protocols
CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More information