Release Consistency. Draft material for 3rd edition of Distributed Systems Concepts and Design

Similar documents
Overview: Memory Consistency

Memory Consistency Models

Mutual Exclusion and Synchronization

Unit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth

Portland State University ECE 588/688. Memory Consistency Models

Advanced Operating Systems (CS 202)

Chapter 6: Synchronization. Operating System Concepts 8 th Edition,

CS510 Advanced Topics in Concurrency. Jonathan Walpole

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

CSE Traditional Operating Systems deal with typical system software designed to be:

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

A Unified Formalization of Four Shared-Memory Models

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Synchronization. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

Relaxed Memory-Consistency Models

Relaxed Memory-Consistency Models

Interprocess Communication By: Kaushik Vaghani

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Dept. of CSE, York Univ. 1

Page 1. Cache Coherence

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

Midterm Exam Amy Murphy 19 March 2003

Relaxed Memory-Consistency Models

Administrivia. p. 1/20

Motivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency

Resource management. Real-Time Systems. Resource management. Resource management

Distributed Systems COMP 212. Lecture 1 Othon Michail

OS Process Synchronization!

Chapter 5: Process Synchronization. Operating System Concepts 9 th Edition

CHAPTER 6: PROCESS SYNCHRONIZATION

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Using Relaxed Consistency Models

Selection-based Weak Sequential Consistency Models for. for Distributed Shared Memory.

Designing Memory Consistency Models for. Shared-Memory Multiprocessors. Sarita V. Adve

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Steve Gribble. Synchronization. Threads cooperate in multithreaded programs

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Introduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization

Beyond Sequential Consistency: Relaxed Memory Models

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Topic C Memory Models

PROCESSES AND THREADS

Lecture 5: Synchronization w/locks

Concurrent Objects and Linearizability

Distributed Shared Memory (DSM) Introduction Shared Memory Systems Distributed Shared Memory Systems Advantage of DSM Systems

Operating Systems. Synchronization

Chapter 5: Process Synchronization. Operating System Concepts Essentials 2 nd Edition

Concurrency, Mutual Exclusion and Synchronization C H A P T E R 5

CSE 451: Operating Systems Winter Lecture 7 Synchronization. Hank Levy 412 Sieg Hall

Real-Time Systems. Lecture #4. Professor Jan Jonsson. Department of Computer Science and Engineering Chalmers University of Technology

Concurrent Preliminaries

Lecture: Consistency Models, TM

Chapter 6: Process Synchronization

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

CS533 Concepts of Operating Systems. Jonathan Walpole

NOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.

Chapter 6 Concurrency: Deadlock and Starvation

Lecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Distributed Shared Memory and Memory Consistency Models

Dealing with Issues for Interprocess Communication

Shared Memory Consistency Models: A Tutorial

1 The comparison of QC, SC and LIN

Distributed shared memory - an overview

Distributed Systems COMP 212. Lecture 1 Othon Michail

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Recap: Thread. What is it? What does it need (thread private)? What for? How to implement? Independent flow of control. Stack

Synchronization Spinlocks - Semaphores

Shared Memory Programming Models I

MS Windows Concurrency Mechanisms Prepared By SUFIAN MUSSQAA AL-MAJMAIE

CS370 Operating Systems

1995 Paper 10 Question 7

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Multiple-Writer Distributed Memory

DSM64: A DISTRIBUTED SHARED MEMORY SYSTEM IN USER-SPACE. A Thesis. Presented to. the Faculty of California Polytechnic State University

Learning Outcomes. Concurrency and Synchronisation. Textbook. Concurrency Example. Inter- Thread and Process Communication. Sections & 2.

Process Coordination

Potential violations of Serializability: Example 1

Chapter 6: Process Synchronization. Operating System Concepts 8 th Edition,

Memory Consistency Models

Concept of a process

Replication and Consistency. Fall 2010 Jussi Kangasharju

Concurrency and Synchronisation

Concurrency and Synchronisation

Operating Systems: William Stallings. Starvation. Patricia Roy Manatee Community College, Venice, FL 2008, Prentice Hall

SMP T-Kernel Specification

Synchronization. Race Condition. The Critical-Section Problem Solution. The Synchronization Problem. Typical Process P i. Peterson s Solution

COMP Parallel Computing. CC-NUMA (2) Memory Consistency

Computer Architecture

Multi-Processor / Parallel Processing

CS370 Operating Systems

SHARED-MEMORY COMMUNICATION

Other consistency models

Reducing Priority Inversion in Interprocessor Synchronization on a Fixed-Priority Bus

EEC 581 Computer Architecture. Lec 11 Synchronization and Memory Consistency Models (4.5 & 4.6)

Chapter 6: Process Synchronization

Advanced Operating Systems (CS 202) Memory Consistency, Cache Coherence and Synchronization

Formal Specification of RISC-V Systems Instructions

CS533 Concepts of Operating Systems. Jonathan Walpole

Threads. Concurrency. What it is. Lecture Notes Week 2. Figure 1: Multi-Threading. Figure 2: Multi-Threading

Transcription:

Draft material for 3rd edition of Distributed Systems Concepts and Design Department of Computer Science, Queen Mary & Westfield College, University of London Release Consistency 1. Introduction Chapter 17 of CDK (2nd. edition of Distributed Systems Concepts and Design by Coulouris, Dollimore and Kindberg) deliberately tries to avoid the subtleties of release consistency, because of lack of space. The paper Implementation and Performance of Munin [Carter et al 1991] 1 gives an obscure definition (without discussing the background). These notes are intended to clarify matters. 2. Classifying memory accesses Chapter 17 of CDK only considers DSM systems implemented on networks of workstations, where we implement synchronisation through message passing for reasons of efficiency. However, synchronisation can also be achieved through shared variables whether using spinlocks set using the atomic testandset instruction, or using an algorithm that requires no special atomic instructions (see any OS book for such algorithms). Programmers normally implement synchronisation using shared variables on a dedicated shared-memory multiprocessor. Pseudo-code procedures for accessing locks follow. Note that the function testandset() sets the lock to 1 and returns 0 if it finds it zero; otherwise it returns 1. It does this atomically. acquirelock(var int lock): while testandset(lock) = 1 skip; releaselock(var int lock): lock := 0; Most papers in the literature on DSM assume that programs use shared memory for synchronisation. We gave the above algorithm in high-level pseudo-code, but the underlying accesses are either LOADs or STOREs or both (testandset). We shall sometimes refer to these as reads and writes, instead. In order to develop a memory model that takes synchronisation into account we begin by classifying memory accesses. The Munin paper distinguishes between ordinary and synchronisation accesses. But this is not the whole story. In fact, the main distinction is between competing accesses and non-competing (ordinary) accesses. Two accesses are competing if: they may occur concurrently (there is no definite ordering between them) at least one is a STORE. So two LOADs can never be competing; a LOAD and a STORE to the same location made by two processes that synchronise between the operations (and so order them) are non-competing. We further divide competing accesses into synchronisation and non-synchronisation accesses: synchronisation accesses LOADs/STOREs that contribute to synchronisation 1 All references are from CDK. Version of June 26, 1996

2 non-synchronisation accesses LOADs/STOREs that are concurrent but which do not contribute to synchronisation. For example, the STORE operation implied by lock := 0 in releaselock() above is a synchronisation access. We would expect synchronisation accesses to be competing, because potentially synchronising processes must be able to access synchronisation variables concurrently and they must update them: LOADs alone could not achieve synchronisation. But not all competing accesses are synchronisation accesses there are classes of parallel algorithms in which processes make competing accesses to shared variables just to update and read one another s results, and not to synchronise. Shared Competing Synchronization Non-synchronisation Non-competing Acquire Release Synchronisation accesses are further divided into acquire accesses and release accesses, corresponding to their role in potentially blocking the process making the access, or in unblocking some other process. 3. Performing operations We distinguish between the point at which a LOAD or STORE instruction is issued when the processor first commences execution of the instruction and the point when the instruction is performed or completed. Both LOADs and STOREs can be time-consuming operations. To execute a STORE, the memory system must engage in a protocol before other processors see the written value. Similarly, performing a LOAD may require invalidations of copies of the data in remote memory. Several forms of asynchronous operation are available to increase the rate at which processors execute programs: First, STORE instructions are often implemented asynchronously. The value is placed in a buffer before being written to memory, and the effects of the STORE are observed later by other processors. In principle, a second STORE by the same processor to a different memory location could be observed before the first. Second, processors do not always wait for a previous instruction to complete before issuing the next instruction this is known as dynamic instruction scheduling. Consider the following sequence of instructions: 1. LOAD R1, A1 // Load word at A1 into R1 2. STORE R2, A2 // Store word in R2 into memory at A2 3. MOVE R1, R3 // Move R1 s data to R3 If this program executes in isolation (it does not share memory with another program) then the processor can issue (2) before a value has been fetched and put into R1 (as a result of issuing (1)). However, the processor must stall at instruction (3) until instruction (1) has completed. And it is not clear that the program would behave correctly in combination with another program accessing the same memory locations. A third source of asynchronicity is that some processors can pre-fetch values in anticipation of LOADing them.

In summary, to gain performance there is in general a degree of asynchronicity between the processor and the memory. None of the mechanisms we have mentioned would cause a single program to execute incorrectly. But when two or more programs access shared variables in DSM, we must take care that the memory system produces results that the programmer expects. 3.1 Instruction issue and performance in a DSM system We shall assume that our DSM is at least coherent: this means that every processor agrees on the order of STOREs to the same location (but they do not necessarily agree on the ordering of STOREs to different locations). Given this assumption, we may speak unambiguously of the ordering of STOREs to a given location. In a distributed shared memory system we can draw a time line for any memory operation o that processor P executes, as follows: 3 P issues o o performed w.r.t. P' at time t o performed (complete) Real time We say that a STORE has performed with respect to a processor P' at real time t. if from that point either P' reads the value v as written by P; or it reads a value written then. In other words, from time t we rule out P' reading first a different value and then v (as written by o another operation may STORE the same value!). Similarly, we say that a LOAD has performed with respect to processor P' by time t if thereafter no STORE issued by P' to the same location could possibly supply the value that P reads. For example, the processor may have pre-fetched the value it needs to LOAD. Finally, the operation o has performed if it has performed with respect to all processors. 3. Release consistency The requirements we wish to meet are: to preserve the synchronisation semantics of locks and barriers to gain performance, we allow a degree of asynchronicity between processor and memory operations but we need to constrain the overlap between competing memory accesses synchronisation accesses in particular in order to ensure correct execution. Release consistent memory is designed to satisfy these requirements. The normal definition of release consistency 2 [Gharachorloo et al 1990] stipulates: (RC1) before an ordinary LOAD or STORE access is allowed to perform with respect to any other processor, all previous acquire accesses must be performed (RC2) before a release access is allowed to perform with respect to any other processor, all previous ordinary LOAD and STORE accesses must be performed (RC3) competing ( special ) accesses are processor consistent with respect to one another. Under processor consistency as defined in [Gharachorloo et al 1990], the memory is coherent and in addition: 2 RC1 and RC2 agree with the Munin definition [Carter et al 1991], but RC3 appears to differ. The definition of processor consistency in [Gharachorloo et al 1990] also differs from other definitions!

(PC1) before a LOAD is allowed to perform with respect to any other processor, all previous LOAD accesses must be performed (PC2) before a STORE is allowed to perform with respect to any other processor, all previous accesses (LOADs and STOREs) must be performed. The simplest way to think of processor consistency is that all processors agree on the ordering of any two (competing) STORE accesses made by the same processor that is, they agree with its program order. Let us denote an ordinary memory access by o; an acquire access by a and a release access by r. Consider the following program fragment representing operations before, within and after a critical section: o beforeacq...; a ;...o between...; r ;...o afterrel. Applying the release consistency conditions in this example: (1) o between may not perform at any other processor until a has performed (2) r may not perform at any other processor until o between has performed Note that this allows a to be issued before o beforeacq is performed; similarly, o afterrel may be issued before r is performed. If (1) did not apply then processor P may enter a critical section and either: STORE data which another processor may LOAD before P has acquired mutual exclusion; or, render myself unable to LOAD a final value written by the processor that is leaving the critical section. If (2) did not apply then, when processor P exits the critical section: another processor may enter it but LOAD stale data; or, P may LOAD a value after another processor entered the critical section, which P should have read while still in it. The programmer (or a compiler) is responsible for labelling LOAD and STORE instructions as release, acquire or non-synchronisation other instructions are assumed to be ordinary. To label the program is to direct the DSM system to enforce the release consistency conditions. Gharachorloo et al [1990] describe the concept of a properly-labelled program. They prove that such a program cannot distinguish between a release-consistent (RC) DSM and a (less efficient) processor-consistent (PC) DSM. Moreover, if condition (RC3) is replaced to make competing accesses sequentially consistent (SC) instead of PC, then a properly labelled program cannot distinguish between RC DSM and SC DSM. Many programs execute equivalently under PC and SC DSM, and the weaker form of RC can be used with them. 4. An example The following code implements a critical section using the logic of the acquirelock() and releaselock() procedures that we gave above. while testandset(lock) = 1 [LOAD: acquire; STORE: non-sync] skip;... Critical Section... lock := 0; [STORE: release] In the example, the testandset instruction is treated as a LOAD followed by a STORE. The acquire and release labels ensure consistent memory accesses with respect to the critical section, as we 4

described above. The non-sync label specifies that the STORE is a non-synchronisation access. The label is necessary to order the testandset with respect to other synchronisation operations. For more information about release consistency, Gharachorloo et al [1990] give a rigorous account of their design and implementation for the DASH multiprocessor. Tim Kindberg, QMW 5