CSC/ECE 506: Architecture of Parallel Computers Sample Final Examination with Answers
|
|
- Elvin Summers
- 6 years ago
- Views:
Transcription
1 CSC/ECE 506: Architecture of Parallel Computers Sample Final Examination with Answers This was a 180-minute open-book test. You were to answer five of the six questions. Each question was worth 20 points. If you answered all six questions, your five highest scores counted. Question 1. (a) (10 points) Suppose node 5 (i.e., node 01012) in a hypercube (or n cube) of dimension 4 (i.e., 16 nodes) is going to broadcast a message to all other nodes in the minimum possible time, given that a node can only send a message to one other node at a time. Show which nodes transmit the message to which other nodes at each step in the process. Use the notation i j to indicate that node number i transmits to node number j, and use decimal notation (not binary) for node numbers. For example, at some step you might write: 4 12,2 3,7 6. Note that 6 steps are not necessarily needed. Answer: Several answers are possible; the best idea is to broadcast in dimension order. By increasing dimension, this would be Step 1: 5 4. Step 2: 5 7; 4 6. Step 3: 5 1; 4 0; 7 3; 6 2. Step 4: 5 13; 4 12; 7 15; 6 14; 1 9; 0 8; 3 11; (b) (6 points) Suppose that one of the links used in step 1 of your answer to (a) failed. Could you still perform the broadcast in the same number of steps? If so, redo part (a), showing how the broadcast can be completed in the same number of steps despite the failed link. If not, explain fully why not. Answer: No. Since one node can send to only one other node per time step, in order to complete the broadcast in four time steps, the number of nodes that have the message must double at each time step. Consequently, each node that has the message must be able to send to another node in each time step. Therefore, the first node (5 in the example) must be able to send to four different nodes on the four time steps. That can t happen if the first node is connected to only three of its neighbors. (c) (4 points) What is the smallest number of links that would have to fail to prevent a broadcast from being performed at all (regardless of how long it would take)? Why? Answer: Four links. A broadcast can be performed unless one node is completely isolated. Since there are four links per node, four links would have to fail to isolate a node completely. Question 2. (2/3 point per blank) Consider a system that has 3 processors with private writeback caches and uses the SSCI protocol, where each line holds a single integer value (e.g., block size of 4 bytes). Memory contents Variable Value X 4 Y 5
2 indicates that we don t know what s in a particular cache line. Initially, the cache for each processor is empty. Fill in the blank spaces. Operation P1 P2 P3 Memory Value State Prev Next Value State Prev Next Value State Prev Next Value State Head P1: read X 4 E EM 1 P1: X = 6 6 M EM 1 P2: read X 6 S S S 2 P3: Y = 7 7 M EM 3 P2: read Y 7 S S S 2 P3: Y = 8 7 I M EM 3 P2: X = 9 6 I M EM 2 P3: X = 10 6 I I M EM 3 P3: X= 2 6 I I M EM 3 Question 3. (2 points for part (a); 3 points for all other parts) This question concerns the organization of a memory-based cache-directory scheme. Note: In doing the calculations below, be careful not to confuse bytes with bits! (1 byte = 2 3 bits) Suppose we have a multiprocessor with 1024 nodes, each of which has 4 GB (= 2 32 bytes) of main memory. Suppose that a cache line contains 32 bytes. (a) How many cache blocks are there per node? Answer: 2 31 /2 5 = 2 26, or 64 "megablocks." (b) For parts (b) to (e), assume that the directory is kept in main memory. If we could use the full bit-vector approach for organizing the cache directory, what fraction of main memory would have to be devoted to cache directories? Answer: For each block, we would need to use 1024 bits to record which of the 1024 processors contained it. This means 2 26 blocks 2 10 bits = 2 36 bits = 2 33 bytes. Woops cache directories would consume twice as much memory as we have! (c) Assume that the average block is cached in 2 nodes. If we used pointers instead of the full bit-vector approach, what fraction of memory would be devoted to the pointers? (Answer the fraction of memory that would be devoted to pointers alone, not including other data structures that would be needed to keep track of the sharers.) Answer: Each pointer is 10 bits long. Each of the 2 26 blocks requires two pointers, on average. Therefore, there will be = bits devoted to pointers, out of 2 32 bytes altogether. Thus, 5/4 1/8 = 5/32 of main memory will be devoted to pointers. (d) Is the assumption of part (c) realistic? Explain.
3 Answer: No, it is not realistic. If the average block were cached at 2 nodes, then there would have to be twice as much cache memory as main memory in the system! Therefore, the answer of part (c) is a gross overestimate of the amount of memory that would be devoted to pointers. (e) Suppose that multicore nodes are used to save memory in the full bit-vector approach. Under the assumptions of part (a), how many cores would be needed per node so that less than 1% of main memory is devoted to the directory? Answer: Easiest way is to start with the answer to part (c). With one core per node, 5/32 of main memory is required. Each time the number of cores per node doubles, the fraction of memory needed goes down by one half: 5/64 for dual-core, 5/128 for quad core, 5/256 for 8-core, and 5/512 for 16-core. So the answer is 16 cores per node. (f) Another approach is to use a cache-based scheme, like SSCI does. Suppose that the pointers point to individual processors that are caching a block (not to multicore nodes). In a processor system with 32-byte lines, what fraction of cache bits are devoted to pointers? (Again, consider only pointers and information bits, not state bits, etc.) Answer: Each cache line contains 2 10 = 20 bits of pointers and 32 8 = 256 information bits. So pointers occupy 20/ of the cache. And that s with 32-byte lines, which are about ¼ the size of the average cache line. (g) Which approach, multicore nodes or cache-based directories, is more scalable and why? Answer: The cache-based approach, because a constant fraction of the cache is used, regardless of the number of processors. With the multicore approach, as the number of processors rises, either more memory is devoted to bit-vectors, or the number of cores per chip needs to keep increasing. Question 4. (a) (3 points) In the MSI coherence prototocol, when a block transitions from state M to state S, a flush is induced. Suppose the block were not flushed to main memory at this time. Would the protocol still work? Would it still be as efficient? Answer: It wouldn t necessarily be incorrect. However, now blocks could be in state S without main memory being up to date. This means that if another processor read the block, it would have to be supplied by a cache-to-cache transfer (this typically implies an owner cache, something that the MSI protocol does not provide). And when blocks in state S were purged from the cache, they would have to be written back to main memory if dirty. A block, once dirtied, would have to be written back at least once, and possibly several times (if it ended up being shared by several caches). So there might be many more writes if the flush were not performed when it is. On the other hand, if a block transitions between states S and M several times, several flushes would be avoided. So it depends on the relative frequency of these two conditions (dirty blocks being shared vs. multiple M S M transitions). (b) (3 points) In the MESI protocol, when a block transitions from state E to state I, the diagram shows it being flushed. Where is it flushed to? Why is this necessary? Answer: If cache-to-cache sharing is in use, it is flushed out on the bus so that it can be picked up by another processor (e.g., if another processor is trying to write it). If cache-to-cache sharing is not in use, main memory can supply the data. (Note that this is indicated Flush in the diagram instead of Flush. (c) (3 points) In the Dragon protocol, a flush is used only when a bus read comes in for a block in M or Sm state. Where is data flushed to in this case and why?
4 Answer: It is flushed onto the bus so that another processor may pick it up. The processor picking it up will hold the block in state Sc. If it were not flushed, the requesting processor would have nowhere to get up-to-date data from, since main memory may be out of date. The rest of the question relates to this situation: Assume that in a four-process shared-memory multiprocessor with private, write-back caches that the following sequence of reads and writes take place. Assume that location 100 is not in any cache at the start of the sequence. 1. P0 writes location P1 reads location P2 reads location P3 writes location P3 writes location 100. Legend: Px means Processor x s cache. M means main memory. (d) (4 points) After line 2 is executed, what will be the state of memory location 100 in each of the caches? (e) (3 points) After line 4 is executed, what will be the state of memory location 100 in each of the caches? Dragon Directory-based (FBV) P0 P1 P2 P3 M P0 P1 P2 P3 Sm Sm S S S Sc Sc Sc Sm EM I I I M (f) (2 points) After line 4 is executed, will main memory have an up-to-date copy of location 100? No No (g) (2 points) How many total writes to memory are performed by each protocol on this sequence of 5 instructions? 1 1 Question 5. Suppose two processors execute the following code: P 1 P 2 1a A = 1 1b d = A 1c f = B 1d print e 2a B = 1 2b e = B 2c print d, f (a) (8 points) Which of the eight combinations of values for (a, b, c) can be printed under sequential consistency? Answer: (0, 1, 0), (1, 0, 0), (1, 0, 1), (1, 1, 0), and (1, 1, 1). The sequence 1a, 1b, 1c, 1d, 2a, 2b, 2c (1, 0, 0). The sequence 1a, 1b, 2a, 1c, 1d, 2b, 2c (1, 0, 1). The sequence 1a, 1b, 2a, 2b, 1c, 1d, 2c (1, 1, 1). The sequence 2a, 2b, 2c, 1a, 1b, 1c, 1d (0, 1, 0). The sequence 2a, 1a, 1b, 2b, 2c, 1c, 1d (1, 1, 0).
5 (b) (4 points) Choose two of the combinations that are impossible and explain why they are not possible under sequential consistency. Answer: (0, 0, 0) is not possible because one process will inevitably finish before the other, and will have set its variable(s) to 1. Therefore, the other process will print 1 as the value of these variables. (0, 0, 1) and (0, 1, 1) are not possible under SC because if c is 1, then a must have previously been set to 1. (c) (2 points) Under processor consistency, which if any additional combinations of values can be printed? How? Answer: (0, 0, 0) is possible if writes propagate very slowly, i.e., if a process sees its own writes considerably before it sees writes done by other processes. (0, 0, 1) and (0, 1, 1) are not possible for the same reason as under SC; that is, the write to a is seen everywhere before the write to c, as both writes come from the same processor. (e) (2 points) Under weak ordering, which if any additional combinations of values can be printed? How? Answer: (0, 0, 1) and (0, 1, 1) are now possible, because writes from P 1 aren t necessarily seen in order by P 2. (f) (4 points) Pick one of these models (PC or WO) that can print combinations not possible with SC. Tell where you would insert fence (SYNC) operations to make it conform to SC. Answer: Pick any one of those models, and insert SYNCs before both print statements. This will assure that all of the writes complete before any printing is done. Question 6. (3 points each, except 5 points for (e)) Given the code for shared-memory version of the Ocean simulation (Lecture 7), what could go wrong if: (a) we remove the BARRIER on line 25d? (b) we remove the BARRIER on line 25f? (c) we add a BARRIER right before lline 19? (d) we add a new LOCKDEC at line 2b and a matching LOCK/UNLOCK around the assignment to A[i,j] on line 20? (e) we remove the BARRIER at line 16a? (f) Can we safely remove the BARRIER on line 25d? Why or why not? Answers: (a) In this situation we will be modifying the done variable before all threads have finished executing. Since we are using that variable to determine if we should continue iterating over the grid, if this gets set to true because one thread finishes early (when 'diff' is still very small) then we stop iterating too soon. (b) If one thread were to reach this point (where we remove the barrier) while other threads were still executing, it would re-enter the loop and set diff = 0 which would mean that other threads still in the previous iteration would see diff as zero and would then set done = true which would cause the simulation to exit prematurely. (c) This would not affect the results of the program, but would cause significant performance
6 degradation due to the overhead of introducing a BARRIER on every iteration of the inner loop. (d) This would effectively render the parallelization useless as it would force the execution to a state resembling serial execution. The LOCK on the inner loop would mean that only one thread could update it's grid location at one time, which is really no different from a serial program. (e) This could result in the program having unpredictable results, as the diff variable, which is shared, could be arbitrarily reset to 0 at almost any point during execution until the last thread passes line 16. This would result in the program failing to converge upon the correct result. (f) The temptation would be to add an else {done=0;} to the following if statement. If done were a shared variable this could work. However in this case, this would be incorrect because the done variable is private to each thread, meaning that some threads would exit too soon.
Lecture 24: Board Notes: Cache Coherency
Lecture 24: Board Notes: Cache Coherency Part A: What makes a memory system coherent? Generally, 3 qualities that must be preserved (SUGGESTIONS?) (1) Preserve program order: - A read of A by P 1 will
More informationComputer Architecture
18-447 Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 29: Consistency & Coherence Lecture 20: Consistency and Coherence Bo Wu Prof. Onur Mutlu Colorado Carnegie School Mellon University
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs
More informationModule 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 17: Multiprocessor Organizations and Cache Coherence. The Lecture Contains:
The Lecture Contains: Shared Memory Multiprocessors Shared Cache Private Cache/Dancehall Distributed Shared Memory Shared vs. Private in CMPs Cache Coherence Cache Coherence: Example What Went Wrong? Implementations
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming Cache design review Let s say your code executes int x = 1; (Assume for simplicity x corresponds to the address 0x12345604
More informationComputer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: <ANSWER KEY>
Computer Architecture and Engineering CS152 Quiz #5 April 27th, 2016 Professor George Michelogiannakis Name: This is a closed book, closed notes exam. 80 Minutes 19 pages Notes: Not all questions
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationLecture 11: Snooping Cache Coherence: Part II. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Lecture 11: Snooping Cache Coherence: Part II CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Announcements Assignment 2 due tonight 11:59 PM - Recall 3-late day policy Assignment
More informationCache Coherence. CMU : Parallel Computer Architecture and Programming (Spring 2012)
Cache Coherence CMU 15-418: Parallel Computer Architecture and Programming (Spring 2012) Shared memory multi-processor Processors read and write to shared variables - More precisely: processors issues
More informationCS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel
More informationCache Coherence. (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri
Cache Coherence (Architectural Supports for Efficient Shared Memory) Mainak Chaudhuri mainakc@cse.iitk.ac.in 1 Setting Agenda Software: shared address space Hardware: shared memory multiprocessors Cache
More informationChapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationModule 9: "Introduction to Shared Memory Multiprocessors" Lecture 16: "Multiprocessor Organizations and Cache Coherence" Shared Memory Multiprocessors
Shared Memory Multiprocessors Shared memory multiprocessors Shared cache Private cache/dancehall Distributed shared memory Shared vs. private in CMPs Cache coherence Cache coherence: Example What went
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationLecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working
More informationMultiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types
Chapter 5 Multiprocessor Cache Coherence Thread-Level Parallelism 1: read 2: read 3: write??? 1 4 From ILP to TLP Memory System is Coherent If... ILP became inefficient in terms of Power consumption Silicon
More informationScalable Cache Coherence
Scalable Cache Coherence [ 8.1] All of the cache-coherent systems we have talked about until now have had a bus. Not only does the bus guarantee serialization of transactions; it also serves as convenient
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In small multiprocessors, sequential consistency can be implemented relatively easily. However, this is not true for large multiprocessors. Why? This is not the
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationThe Cache-Coherence Problem
The -Coherence Problem Lecture 12 (Chapter 6) 1 Outline Bus-based multiprocessors The cache-coherence problem Peterson s algorithm Coherence vs. consistency Shared vs. Distributed Memory What is the difference
More informationLecture 11: Cache Coherence: Part II. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015
Lecture 11: Cache Coherence: Part II Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Bang Bang (My Baby Shot Me Down) Nancy Sinatra (Kill Bill Volume 1 Soundtrack) It
More informationComputer Architecture
Jens Teubner Computer Architecture Summer 2016 1 Computer Architecture Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2016 Jens Teubner Computer Architecture Summer 2016 83 Part III Multi-Core
More information4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins
4 Chip Multiprocessors (I) Robert Mullins Overview Coherent memory systems Introduction to cache coherency protocols Advanced cache coherency protocols, memory systems and synchronization covered in the
More informationShared Memory Architectures. Approaches to Building Parallel Machines
Shared Memory Architectures Arvind Krishnamurthy Fall 2004 Approaches to Building Parallel Machines P 1 Switch/Bus P n Scale (Interleaved) First-level $ P 1 P n $ $ (Interleaved) Main memory Shared Cache
More informationECE/CS 757: Homework 1
ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)
More informationLecture 24: Multiprocessing Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationShared Memory Multiprocessors
Parallel Computing Shared Memory Multiprocessors Hwansoo Han Cache Coherence Problem P 0 P 1 P 2 cache load r1 (100) load r1 (100) r1 =? r1 =? 4 cache 5 cache store b (100) 3 100: a 100: a 1 Memory 2 I/O
More informationPage 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence
SMP Review Multiprocessors Today s topics: SMP cache coherence general cache coherence issues snooping protocols Improved interaction lots of questions warning I m going to wait for answers granted it
More informationCache Coherence Tutorial
Cache Coherence Tutorial The cache coherence protocol described in the book is not really all that difficult and yet a lot of people seem to have troubles when it comes to using it or answering an assignment
More informationScalable Multiprocessors
Scalable Multiprocessors [ 11.1] scalable system is one in which resources can be added to the system without reaching a hard limit. Of course, there may still be economic limits. s the size of the system
More informationLecture 3: Directory Protocol Implementations. Topics: coherence vs. msg-passing, corner cases in directory protocols
Lecture 3: Directory Protocol Implementations Topics: coherence vs. msg-passing, corner cases in directory protocols 1 Future Scalable Designs Intel s Single Cloud Computer (SCC): an example prototype
More informationMultiprocessors & Thread Level Parallelism
Multiprocessors & Thread Level Parallelism COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Introduction
More informationLecture 2: Snooping and Directory Protocols. Topics: Snooping wrap-up and directory implementations
Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations 1 Split Transaction Bus So far, we have assumed that a coherence operation (request, snoops, responses,
More informationBus-Based Coherent Multiprocessors
Bus-Based Coherent Multiprocessors Lecture 13 (Chapter 7) 1 Outline Bus-based coherence Memory consistency Sequential consistency Invalidation vs. update coherence protocols Several Configurations for
More informationMultiprocessor Systems
White Paper: Virtex-II Series R WP162 (v1.1) April 10, 2003 Multiprocessor Systems By: Jeremy Kowalczyk With the availability of the Virtex-II Pro devices containing more than one Power PC processor and
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Directory Cache Coherence Prof. Michel A. Kinsy Shared Memory Multiprocessor Processor Cores Local Memories Memory Bus P 1 Snoopy Cache Physical Memory P
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationCSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016
CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016 1. Overall Problem Description In this project, you will add new features to a trace-driven
More informationLecture 7: PCM Wrap-Up, Cache coherence. Topics: handling PCM errors and writes, cache coherence intro
Lecture 7: M Wrap-Up, ache coherence Topics: handling M errors and writes, cache coherence intro 1 Optimizations for Writes (Energy, Lifetime) Read a line before writing and only write the modified bits
More informationEC 513 Computer Architecture
EC 513 Computer Architecture Cache Coherence - Snoopy Cache Coherence rof. Michel A. Kinsy Consistency in SMs CU-1 CU-2 A 100 Cache-1 A 100 Cache-2 CU- bus A 100 Consistency in SMs CU-1 CU-2 A 200 Cache-1
More informationSuggested Readings! What makes a memory system coherent?! Lecture 27" Cache Coherency! ! Readings! ! Program order!! Sequential writes!! Causality!
1! 2! Suggested Readings! Readings!! H&P: Chapter 5.8! Could also look at material on CD referenced on p. 538 of your text! Lecture 27" Cache Coherency! 3! Processor components! Multicore processors and
More informationLecture 3: Snooping Protocols. Topics: snooping-based cache coherence implementations
Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations 1 Design Issues, Optimizations When does memory get updated? demotion from modified to shared? move from modified in
More informationParallel Computer Architecture Lecture 5: Cache Coherence. Chris Craik (TA) Carnegie Mellon University
18-742 Parallel Computer Architecture Lecture 5: Cache Coherence Chris Craik (TA) Carnegie Mellon University Readings: Coherence Required for Review Papamarcos and Patel, A low-overhead coherence solution
More informationLecture 25: Multiprocessors
Lecture 25: Multiprocessors Today s topics: Virtual memory wrap-up Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 TLB and Cache Is the cache indexed
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L5- Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 5 Multiprocessors and Thread-Level Parallelism 1 Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model
More informationCS433 Final Exam. Prof Josep Torrellas. December 12, Time: 2 hours
CS433 Final Exam Prof Josep Torrellas December 12, 2006 Time: 2 hours Name: Instructions: 1. This is a closed-book, closed-notes examination. 2. The Exam has 6 Questions. Please budget your time. 3. Calculators
More informationMulticore Workshop. Cache Coherency. Mark Bull David Henty. EPCC, University of Edinburgh
Multicore Workshop Cache Coherency Mark Bull David Henty EPCC, University of Edinburgh Symmetric MultiProcessing 2 Each processor in an SMP has equal access to all parts of memory same latency and bandwidth
More informationLecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM
Lecture 4: Directory Protocols and TM Topics: corner cases in directory protocols, lazy TM 1 Handling Reads When the home receives a read request, it looks up memory (speculative read) and directory in
More informationPortland State University ECE 588/688. Cache Coherence Protocols
Portland State University ECE 588/688 Cache Coherence Protocols Copyright by Alaa Alameldeen 2018 Conditions for Cache Coherence Program Order. A read by processor P to location A that follows a write
More informationLecture 8: Directory-Based Cache Coherence. Topics: scalable multiprocessor organizations, directory protocol design issues
Lecture 8: Directory-Based Cache Coherence Topics: scalable multiprocessor organizations, directory protocol design issues 1 Scalable Multiprocessors P1 P2 Pn C1 C2 Cn 1 CA1 2 CA2 n CAn Scalable interconnection
More informationCS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017
CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More informationApproaches to Building Parallel Machines. Shared Memory Architectures. Example Cache Coherence Problem. Shared Cache Architectures
Approaches to Building arallel achines Switch/Bus n Scale Shared ory Architectures (nterleaved) First-level (nterleaved) ain memory n Arvind Krishnamurthy Fall 2004 (nterleaved) ain memory Shared Cache
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationOverview: Shared Memory Hardware. Shared Address Space Systems. Shared Address Space and Shared Memory Computers. Shared Memory Hardware
Overview: Shared Memory Hardware Shared Address Space Systems overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and
More informationOverview: Shared Memory Hardware
Overview: Shared Memory Hardware overview of shared address space systems example: cache hierarchy of the Intel Core i7 cache coherency protocols: basic ideas, invalidate and update protocols false sharing
More informationCS 167 Final Exam Solutions
CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are
More informationMemory Hierarchy in a Multiprocessor
EEC 581 Computer Architecture Multiprocessor and Coherence Department of Electrical Engineering and Computer Science Cleveland State University Hierarchy in a Multiprocessor Shared cache Fully-connected
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationCS 475: Parallel Programming Introduction
CS 475: Parallel Programming Introduction Wim Bohm, Sanjay Rajopadhye Colorado State University Fall 2014 Course Organization n Let s make a tour of the course website. n Main pages Home, front page. Syllabus.
More informationIntroduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1)
1 MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM (PART 1) Chapter 5 Appendix F Appendix I OUTLINE Introduction (5.1) Multiprocessor Architecture Challenges in Parallel Processing Centralized Shared Memory
More informationCache Coherence in Bus-Based Shared Memory Multiprocessors
Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors Variations Cache Coherence in Shared Memory Multiprocessors A Coherent Memory System: Intuition Formal Definition
More informationCS433 Homework 6. Problem 1 [15 points] Assigned on 11/28/2017 Due in class on 12/12/2017
CS433 Homework 6 Assigned on 11/28/2017 Due in class on 12/12/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationCache Coherence. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.
Coherence Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L25-1 Coherence Avoids Stale Data Multicores have multiple private caches for performance Need to provide the illusion
More informationLecture 26: Multiprocessors. Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing
Lecture 26: Multiprocessors Today s topics: Directory-based coherence Synchronization Consistency Shared memory vs message-passing 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationSwitch Gear to Memory Consistency
Outline Memory consistency equential consistency Invalidation vs. update coherence protocols MI protocol tate diagrams imulation Gehringer, based on slides by Yan olihin 1 witch Gear to Memory Consistency
More informationComputer Architecture Memory hierarchies and caches
Computer Architecture Memory hierarchies and caches S Coudert and R Pacalet January 23, 2019 Outline Introduction Localities principles Direct-mapped caches Increasing block size Set-associative caches
More informationSnooping coherence protocols (cont.)
Snooping coherence protocols (cont.) A four-state update protocol [ 5.3.3] When there is a high degree of sharing, invalidation-based protocols perform poorly. Blocks are often invalidated, and then have
More informationCache Coherence. Bryan Mills, PhD. Slides provided by Rami Melhem
Cache Coherence Bryan Mills, PhD Slides provided by Rami Melhem Cache coherence Programmers have no control over caches and when they get updated. x = 2; /* initially */ y0 eventually ends up = 2 y1 eventually
More informationProcessor Architecture
Processor Architecture Shared Memory Multiprocessors M. Schölzel The Coherence Problem s may contain local copies of the same memory address without proper coordination they work independently on their
More informationLecture-22 (Cache Coherence Protocols) CS422-Spring
Lecture-22 (Cache Coherence Protocols) CS422-Spring 2018 Biswa@CSE-IITK Single Core Core 0 Private L1 Cache Bus (Packet Scheduling) Private L2 DRAM CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2 Multicore
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationCache Coherence. Introduction to High Performance Computing Systems (CS1645) Esteban Meneses. Spring, 2014
Cache Coherence Introduction to High Performance Computing Systems (CS1645) Esteban Meneses Spring, 2014 Supercomputer Galore Starting around 1983, the number of companies building supercomputers exploded:
More informationPotential violations of Serializability: Example 1
CSCE 6610:Advanced Computer Architecture Review New Amdahl s law A possible idea for a term project Explore my idea about changing frequency based on serial fraction to maintain fixed energy or keep same
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with
More informationMultiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed
Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448 1 The Greed for Speed Two general approaches to making computers faster Faster uniprocessor All the techniques we ve been looking
More informationLecture 18: Coherence Protocols. Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections
Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections 4.2-4.4) 1 SMP/UMA/Centralized Memory Multiprocessor Main Memory I/O System
More informationIntroduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano
Introduction to Multiprocessors (Part II) Cristina Silvano Politecnico di Milano Outline The problem of cache coherence Snooping protocols Directory-based protocols Prof. Cristina Silvano, Politecnico
More informationCache Coherence in Scalable Machines
ache oherence in Scalable Machines SE 661 arallel and Vector Architectures rof. Muhamed Mudawar omputer Engineering Department King Fahd University of etroleum and Minerals Generic Scalable Multiprocessor
More informationSimulating ocean currents
Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on
More informationLecture 25: Multiprocessors. Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization
Lecture 25: Multiprocessors Today s topics: Snooping-based cache coherence protocol Directory-based cache coherence protocol Synchronization 1 Snooping-Based Protocols Three states for a block: invalid,
More informationScalable Cache Coherence
arallel Computing Scalable Cache Coherence Hwansoo Han Hierarchical Cache Coherence Hierarchies in cache organization Multiple levels of caches on a processor Large scale multiprocessors with hierarchy
More informationAdvanced OpenMP. Lecture 3: Cache Coherency
Advanced OpenMP Lecture 3: Cache Coherency Cache coherency Main difficulty in building multiprocessor systems is the cache coherency problem. The shared memory programming model assumes that a shared variable
More informationUsing Relaxed Consistency Models
Using Relaxed Consistency Models CS&G discuss relaxed consistency models from two standpoints. The system specification, which tells how a consistency model works and what guarantees of ordering it provides.
More informationLecture 8: Snooping and Directory Protocols. Topics: split-transaction implementation details, directory implementations (memory- and cache-based)
Lecture 8: Snooping and Directory Protocols Topics: split-transaction implementation details, directory implementations (memory- and cache-based) 1 Split Transaction Bus So far, we have assumed that a
More informationCS252 Spring 2017 Graduate Computer Architecture. Lecture 12: Cache Coherence
CS252 Spring 2017 Graduate Computer Architecture Lecture 12: Cache Coherence Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 11 Memory Systems DRAM
More informationEast Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005
Points missed: Student's Name: Total score: /100 points East Tennessee State University Department of Computer and Information Sciences CSCI 4717 Computer Architecture TEST 3 for Fall Semester, 2005 Section
More informationIncoherent each cache copy behaves as an individual copy, instead of as the same memory location.
Cache Coherence This lesson discusses the problems and solutions for coherence. Different coherence protocols are discussed, including: MSI, MOSI, MOESI, and Directory. Each has advantages and disadvantages
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationCache Coherence and Atomic Operations in Hardware
Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationChapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST
Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will
More informationShared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network
Shared Memory Multis Processor Processor Processor i Processor n Symmetric Shared Memory Architecture (SMP) cache cache cache cache Interconnection Network Main Memory I/O System Cache Coherence Cache
More information