Announcements. ECE4750/CS4420 Computer Architecture L17: Memory Model. Edward Suh Computer Systems Laboratory

Similar documents
Beyond Sequential Consistency: Relaxed Memory Models

Symmetric Multiprocessors: Synchronization and Sequential Consistency

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

Consistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt

CS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency

CS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II

Relaxed Memory-Consistency Models

Using Relaxed Consistency Models

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Parallel Computer Architecture Spring Memory Consistency. Nikos Bellas

Portland State University ECE 588/688. Memory Consistency Models

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Computer Architecture and Parallel Computing 并行结构与计算. Lecture 6 Coherence Protocols

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models

Overview: Memory Consistency

Distributed Operating Systems Memory Consistency

Shared Memory Consistency Models: A Tutorial

Parallel Computer Architecture Spring Distributed Shared Memory Architectures & Directory-Based Memory Coherence

CS5460: Operating Systems

CS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models

CS533 Concepts of Operating Systems. Jonathan Walpole

CS252 Spring 2017 Graduate Computer Architecture. Lecture 15: Synchronization and Memory Models Part 2

A Memory Model for RISC-V

Dr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

Memory Consistency Models

Lecture 12: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Lecture 11: Relaxed Consistency Models. Topics: sequential consistency recap, relaxing various SC constraints, performance comparison

Shared Memory Consistency Models: A Tutorial

Relaxed Memory Consistency

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

Concept of a process

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Lecture 12: TM, Consistency Models. Topics: TM pathologies, sequential consistency, hw and hw/sw optimizations

Relaxed Memory-Consistency Models

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

RELAXED CONSISTENCY 1

Page 1. Outline. Coherence vs. Consistency. Why Consistency is Important

The Cache-Coherence Problem

Advanced Operating Systems (CS 202)

Weak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014

Administrivia. p. 1/20

Commit-Reconcile & Fences (CRF): A New Memory Model for Architects and Compiler Writers

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Distributed Shared Memory and Memory Consistency Models

Systèmes d Exploitation Avancés

CSE Traditional Operating Systems deal with typical system software designed to be:

ECE/CS 757: Advanced Computer Architecture II

Handout 3 Multiprocessor and thread level parallelism

Hardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13

Interprocess Communication By: Kaushik Vaghani

Today s Outline: Shared Memory Review. Shared Memory & Concurrency. Concurrency v. Parallelism. Thread-Level Parallelism. CS758: Multicore Programming

Computer Science 146. Computer Architecture

Global Environment Model

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Sequential Consistency & TSO. Subtitle

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Lecture: Consistency Models, TM

Announcements. ECE4750/CS4420 Computer Architecture L11: Speculative Execution I. Edward Suh Computer Systems Laboratory

Shared Memory Consistency Models: A Tutorial

Multiprocessor Synchronization

Chapter 5. Multiprocessors and Thread-Level Parallelism

Memory Consistency Models. CSE 451 James Bornholt

Administrivia. Review: Thread package API. Program B. Program A. Program C. Correct answers. Please ask questions on Google group

Program logics for relaxed consistency

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

SELECTED TOPICS IN COHERENCE AND CONSISTENCY

Hardware Memory Models: x86-tso

Synchronization for Concurrent Tasks

Dealing with Issues for Interprocess Communication

Relaxed Memory-Consistency Models

Concurrency: a crash course

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

CSE502: Computer Architecture CSE 502: Computer Architecture

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Other consistency models

Computer Architecture

Shared Memory Architecture

Semaphores. May 10, Mutual exclusion with shared variables is difficult (e.g. Dekker s solution).

Review: Thread package API

Declarative semantics for concurrency. 28 August 2017

Module 5: Performance Issues in Shared Memory and Introduction to Coherence Lecture 10: Introduction to Coherence. The Lecture Contains:

Process Synchronization

Distributed Systems. Distributed Shared Memory. Paul Krzyzanowski

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

EECS 470. Lecture 17 Multiprocessors I. Fall 2018 Jon Beaumont

Example: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54

Memory barriers in C

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 5. Multiprocessors and Thread-Level Parallelism

2 Threads vs. Processes

Process Synchronisation (contd.) Deadlock. Operating Systems. Spring CS5212

Computer Architecture

Last Class: Synchronization

Chapter 5. Multiprocessors and Thread-Level Parallelism

Review: Thread package API

Transcription:

ECE4750/CS4420 Computer Architecture L17: Memory Model Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements HW4 / Lab4 1

Overview Symmetric Multi-Processors (SMPs) MIMD processing cores Shared memory for communication How can multiple processing cores co-operate? Synchronization Memory models Sequential consistency Relaxed memory model Synchronization The need for synchronization arises whenever there are parallel processes in a system (even in a uniprocessor system) 2

A Producer-Consumer Example Producer tail head Consumer R head R Producer posting Item x: Load, (tail) Store ( ), x = +1 Store (tail), Consumer: Load R head, (head) spin: Load, (tail) if R head == goto spin Load R, (R head ) R head =R head +1 Store (head), R head process(r) A Producer-Consumer Example continued Producer posting Item x: Load, (tail) 1 Store ( ), x = +1 2 Store (tail), Can the tail pointer get updated before the item x is stored? Consumer: Load R head, (head) spin: Load, (tail) 3 if R head == goto spin Load R, (R head ) 4 R head =R head +1 Store (head), R head process(r) Programmer assumes that if 3 happens after 2, then 4 happens after 1. Problem sequences are: 3

Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs Sequential Consistency Sequential concurrent tasks: T1, T2 Shared variables: X, Y (initially X = 0, Y = 10) T1: T2: Store (X), 1 (X = 1) Load R 1, (Y) Store (Y), 11 (Y = 11) Store (Y ), R 1 (Y = Y) Load R 2, (X) Store (X ), R 2 (X = X) what are the legitimate answers for X and Y? 4

Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies ( ) What are these in our example? T1: T2: Store (X), 1 (X = 1) Load R 1, (Y) Store (Y), 11 (Y = 11) Store (Y ), R 1 (Y = Y) Load R 2, (X) Store (X ), R 2 (X = X) Issues in Implementing Sequential Consistency P P P P P P M Implementation of SC is complicated by two issues Out-of-order execution capability Load(a); Load(b) yes Load(a); Store(b) yes if a b Store(a); Load(b) yes if a b Store(a); Store(b) yes if a b Caches Caches can prevent the effect of a store from being seen by other processors 5

Committed Store Buffers CPU can continue execution while earlier committed stores are still propagating through memory system Processor can commit other instructions (including loads and stores) while first store is committing to memory Committed store buffer can be combined with speculative store buffer in an out-oforder CPU Local loads can bypass values from buffered stores to same address CPU Cache CPU Cache Main Memory Example 1: Store Buffers Process 1 Process 2 Store (flag 1 ),1; Store (flag 2 ),1; Load r 1, (flag 2 ); Load r 2, (flag 1 ); Question: Is it possible that r 1 =0 and r 2 =0? Initially, all memory locations contain zeros Total Store Order (TSO): IBM 370, Sparc s TSO memory model 6

Example 2: Speculative Execution Process 1 Process 2 Store (a), 1; L: Load r 1, (flag); Store (flag), 1; if r 1 == 0 goto L; Load r 2, (a); Question: Is it possible that r 1 =1 but r 2 =0? Weaker Memory Models & Memory Fence Instructions Architectures with weaker memory models provide memory fence instructions to prevent otherwise permitted reorderings of loads and stores Store (a 1 ), r2; Fence wr Load r1, (a 2 ); Similarly: The Load and Store can be reordered if a 1 =/= a 2. Insertion of Fence wr will disallow this reordering Fence rr ; Fence rw ; Fence ww ; SUN s Sparc: MEMBAR; MEMBARRR; MEMBARRW; MEMBARWR; MEMBARWW PowerPC: Sync; EIEIO 7

Multiple Consumer Example Producer tail head Consumer 1 R head R Consumer 2 R head R Producer posting Item x: Load, (tail) Store ( ), x = +1 Store (tail), Consumer: Load R head, (head) spin: Load, (tail) if R head == goto spin Load R, (R head ) R head =R head +1 Store (head), R head process(r) What is wrong with this code? Locks or Semaphores E. W. Dijkstra, 1965 A semaphore is a non-negative integer, with the following operations: P(s): if s>0, decrement s by 1, otherwise wait V(s): increment s by 1 and wake up one of the waiting processes P s and V s must be executed atomically, i.e., without interruptions or interleaved accesses to s by other processors Process i P(s) <critical section> V(s) initial value of s determines the maximum no. of processes in the critical section 8