ECE4750/CS4420 Computer Architecture L17: Memory Model Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements HW4 / Lab4 1
Overview Symmetric Multi-Processors (SMPs) MIMD processing cores Shared memory for communication How can multiple processing cores co-operate? Synchronization Memory models Sequential consistency Relaxed memory model Synchronization The need for synchronization arises whenever there are parallel processes in a system (even in a uniprocessor system) 2
A Producer-Consumer Example Producer tail head Consumer R head R Producer posting Item x: Load, (tail) Store ( ), x = +1 Store (tail), Consumer: Load R head, (head) spin: Load, (tail) if R head == goto spin Load R, (R head ) R head =R head +1 Store (head), R head process(r) A Producer-Consumer Example continued Producer posting Item x: Load, (tail) 1 Store ( ), x = +1 2 Store (tail), Can the tail pointer get updated before the item x is stored? Consumer: Load R head, (head) spin: Load, (tail) 3 if R head == goto spin Load R, (R head ) 4 R head =R head +1 Store (head), R head process(r) Programmer assumes that if 3 happens after 2, then 4 happens after 1. Problem sequences are: 3
Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs Sequential Consistency Sequential concurrent tasks: T1, T2 Shared variables: X, Y (initially X = 0, Y = 10) T1: T2: Store (X), 1 (X = 1) Load R 1, (Y) Store (Y), 11 (Y = 11) Store (Y ), R 1 (Y = Y) Load R 2, (X) Store (X ), R 2 (X = X) what are the legitimate answers for X and Y? 4
Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example? T1: T2: Store (X), 1 (X = 1) Load R 1, (Y) Store (Y), 11 (Y = 11) Store (Y ), R 1 (Y = Y) Load R 2, (X) Store (X ), R 2 (X = X) Issues in Implementing Sequential Consistency P P P P P P M Implementation of SC is complicated by two issues Out-of-order execution capability Load(a); Load(b) yes Load(a); Store(b) yes if a b Store(a); Load(b) yes if a b Store(a); Store(b) yes if a b Caches Caches can prevent the effect of a store from being seen by other processors 5
Committed Store Buffers CPU can continue execution while earlier committed stores are still propagating through memory system Processor can commit other instructions (including loads and stores) while first store is committing to memory Committed store buffer can be combined with speculative store buffer in an out-oforder CPU Local loads can bypass values from buffered stores to same address CPU Cache CPU Cache Main Memory Example 1: Store Buffers Process 1 Process 2 Store (flag 1 ),1; Store (flag 2 ),1; Load r 1, (flag 2 ); Load r 2, (flag 1 ); Question: Is it possible that r 1 =0 and r 2 =0? Initially, all memory locations contain zeros Total Store Order (TSO): IBM 370, Sparc s TSO memory model 6
Example 2: Speculative Execution Process 1 Process 2 Store (a), 1; L: Load r 1, (flag); Store (flag), 1; if r 1 == 0 goto L; Load r 2, (a); Question: Is it possible that r 1 =1 but r 2 =0? Weaker Memory Models & Memory Fence Instructions Architectures with weaker memory models provide memory fence instructions to prevent otherwise permitted reorderings of loads and stores Store (a 1 ), r2; Fence wr Load r1, (a 2 ); Similarly: The Load and Store can be reordered if a 1 =/= a 2. Insertion of Fence wr will disallow this reordering Fence rr ; Fence rw ; Fence ww ; SUN s Sparc: MEMBAR; MEMBARRR; MEMBARRW; MEMBARWR; MEMBARWW PowerPC: Sync; EIEIO 7
Multiple Consumer Example Producer tail head Consumer 1 R head R Consumer 2 R head R Producer posting Item x: Load, (tail) Store ( ), x = +1 Store (tail), Consumer: Load R head, (head) spin: Load, (tail) if R head == goto spin Load R, (R head ) R head =R head +1 Store (head), R head process(r) What is wrong with this code? Locks or Semaphores E. W. Dijkstra, 1965 A semaphore is a non-negative integer, with the following operations: P(s): if s>0, decrement s by 1, otherwise wait V(s): increment s by 1 and wake up one of the waiting processes P s and V s must be executed atomically, i.e., without interruptions or interleaved accesses to s by other processors Process i P(s) <critical section> V(s) initial value of s determines the maximum no. of processes in the critical section 8