Load-reserve / Store-conditional on POWER and ARM
|
|
- Lynne Rich
- 6 years ago
- Views:
Transcription
1 Load-reserve / Store-conditional on POWER and ARM Peter Sewell (slides from Susmit Sarkar) 1 University of Cambridge June 2012
2 Correct implementations of C/C++ on hardware Can it be done?...on highly relaxed hardware? What is involved? Mapping new conructs to assembly Optimizations: which ones legal? Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
3 Correct implementations of C/C++ on hardware Can it be done?...on highly relaxed hardware? e.g. Power What is involved? Mapping new conructs to assembly Optimizations: which ones legal? Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
4 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
5 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c lwsync; lwsync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
6 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c Fence acquire Fence release Fence seq-c lwsync; lwsync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync lwsync lwsync sync (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
7 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c Fence acquire Fence release Fence seq-c lwsync; lwsync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync lwsync lwsync sync CAS relaxed loop: lwarx; cmp; bc exit; wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
8 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c Fence acquire Fence release Fence seq-c lwsync; lwsync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync Is that mapping correct? lwsync lwsync sync CAS relaxed loop: lwarx; cmp; bc exit; wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
9 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c lwsync; lwsync; sync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync Fence acquire Fence release Fence seq-c CAS relaxed lwsync lwsync sync loop: lwarx; cmp; bc exit; Answer: No! wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
10 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c lwsync; sync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync Is that mapping correct? Fence acquire lwsync Fence release lwsync Fence seq-c sync CAS relaxed loop: lwarx; cmp; bc exit; Answer: Yes! wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
11 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c Fence acquire Fence release Fence seq-c CAS relaxed lwsync; sync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync Is that the only correct mapping? lwsync lwsync sync loop: lwarx; cmp; bc exit; Answer: No! wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: (From Paul McKenney and Raul Silvera) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
12 Implementing C/C++11 on POWER: Pointwise Mapping C/C++11 Operation POWER Implementation Store (non-atomic) Load (non-atomic) Store relaxed Store release Store seq-c Load relaxed Load consume Load acquire Load seq-c lwsync; sync; (and preserve dependency) ; cmp; bc; isync sync; ; cmp; bc; isync Alternative sync; ; sync; ; sync Fence acquire Fence release Fence seq-c lwsync lwsync sync CAS relaxed loop: lwarx; cmp; bc exit; wcx.; bc loop; exit: CAS seq-c sync; loop: lwarx; cmp; bc exit; wcx.; bc loop; isync; exit: All compilers mu agree for separate compilation Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
13 Machine Synchronisation Operations x86: atomic synchronization operations, e.g. atomic add, CAS,... RISC-friendly alternative: Load-reserve/Store-conditional (aka LL/SC, larx/cx and lwarx/wcx, LDREX/STREX) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
14 Machine Synchronisation Operations x86: atomic synchronization operations, e.g. atomic add, CAS,... RISC-friendly alternative: Load-reserve/Store-conditional (aka LL/SC, larx/cx and lwarx/wcx, LDREX/STREX) Can be used to implement CAS, atomic add, spinlocks,... Universal (like CAS) [Herlihy 93] (but no ABA problem) Atomic Addition loop: lwarx r, d; add r,v,r; wcx r, d; bne loop; Informally, wcx succeeds only if no other write to the same address since la lwarx, setting a flag iff it succeeds Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
15 What is no write since...? In machine time? Neither necessary, nor sufficient Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
16 What is no write since...? In machine time? Neither necessary, nor sufficient Microarchitecturally (simplified): if cache-line ownership not lo since la lwarx (but we don t want to model the microarchitecture...) Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
17 Modeling not lo since Abractly: ownership chain modeled by buiing up coherence order Coherence: order relating ores to the same location (eventually linear) A wcx succeeds only if it is (or at lea, if it can become) coherence-next-to the write read from by lwarx...and no other write can later come in between Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
18 Modeling not lo since Abractly: ownership chain modeled by buiing up coherence order Coherence: order relating ores to the same location (eventually linear) A wcx succeeds only if it is (or at lea, if it can become) coherence-next-to the write read from by lwarx...and no other write can later come in between Isolate key concept: write reaching coherence point coherence is linear below this write, and no new edges will be added below Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
19 Coherence points and a successful wcx Atomic Addition loop: lwarx r, x; add r,3,r; wcx r, x; bne loop; Coherence order for x: c:w x=4 i:w x=0 j:w x=1 a:w x=2 b:w x=3 Suppose lwarx reads from the a:w x:2 Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
20 Coherence points and a successful wcx Atomic Addition loop: lwarx r, x; add r,3,r; wcx r, x; bne loop; Coherence order for x: c:w x=4 i:w x=0 j:w x=1 a:w x=2 b:w x=3 Suppose lwarx reads from the a:w x:2 wcx can succeed if this becomes possible: writes that have reached coherence point i:wx=0 j:wx=1 a:wx=2 d:w x=5 c:w x=4 b:w x=3 Warning: wcx can fail spuriously Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
21 Load-reserve/ore-conditional and ordering Same-thread load-reserve/ore-conditionals ordered by program order If all memory accesses are l-r/s-c sequences Then: only SC behaviour But... normal loads/ores (to different addresses) not ordered; the l-r/s-c do not act as a barrier Confusion here led to Linux bug... bad barrier placement in atomic-add-return Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
22 Correctness of the Mapping Theorem: For any sane, non-optimising compiler following the mapping: compilation DRF C/C++ prog POWER prog C/C++11 semantics POWER semantics C/C++11 execution observations POWER execution observations Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
23 Correctness of the Mapping Theorem: For any sane, non-optimising compiler following the mapping: C/C++11 semantics DRF C/C++ prog C/C++11 execution observations Preserves memory accesses; compilation Uses the mapping table; Respects the thread local semantics of C/C++, preserving dependencies POWER semantics POWER execution POWER prog observations Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
24 Correctness of the Mapping Theorem: For any sane, non-optimising compiler following the mapping: compilation DRF C/C++ prog C/C++11 semantics C/C++11 execution observations POWER semantics POWER execution POWER prog observations From POWER trace, bui key relations (happens-before, SC order) Required properties from abs. machine properties If trace looks like it produces data race, bui the C/C++ data race Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
25 For details... see Synchronising C/C++ and POWER, Sarkar et al., PLDI In the paper: A formal model of load-reserve/ore-conditional (in Lem) An executable model with exploration tool (ppcmem) Simplifications to the C/C++11 lock model Models tight again each other: relaxing the Power model wou make C/C++11 unimplementable Peter Sewell (Cambridge) Load-reserve / Store-conditional on POWER and ARM June / 10
The C/C++ Memory Model: Overview and Formalization
The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions
More informationProgram logics for relaxed consistency
Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro
More informationMulticore Programming: C++0x
p. 1 Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 2 C++0x: the next C++ Specified by the
More informationThe C1x and C++11 concurrency model
The C1x and C++11 concurrency model Mark Batty University of Cambridge January 16, 2013 C11 and C++11 Memory Model A DRF model with the option to expose relaxed behaviour in exchange for high performance.
More informationRelaxed Memory: The Specification Design Space
Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally
More informationHardware Memory Models: x86-tso
Hardware Memory Models: x86-tso John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 9 20 September 2016 Agenda So far hardware organization multithreading
More informationUnderstanding POWER multiprocessors
Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory
More informationDeclarative semantics for concurrency. 28 August 2017
Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program
More informationHigh-level languages
High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationReasoning about the C/C++ weak memory model
Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 13 October 2014 Talk outline I. Introduction Weak memory models The C11 concurrency model
More informationMulticore Programming Java Memory Model
p. 1 Multicore Programming Java Memory Model Peter Sewell Jaroslav Ševčík Tim Harris University of Cambridge MSR with thanks to Francesco Zappa Nardelli, Susmit Sarkar, Tom Ridge, Scott Owens, Magnus O.
More informationPOWER 8: up to 192 cores, each with up to 8 h/w threads
POWER and ARM p. 1 IBM POWER: high-end server processor POWER 8: up to 192 cores, each with up to 8 h/w threads https://en.wikipedia.org/wiki/power8 Power7: IBM s Next-Generation Server Processor. Kalla,
More informationCS510 Advanced Topics in Concurrency. Jonathan Walpole
CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to
More informationShared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics
Shared Memory Programming with OpenMP Lecture 8: Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the
More informationC++ Concurrency - Formalised
C++ Concurrency - Formalised Salomon Sickert Technische Universität München 26 th April 2013 Mutex Algorithms At most one thread is in the critical section at any time. 2 / 35 Dekker s Mutex Algorithm
More informationAn introduction to weak memory consistency and the out-of-thin-air problem
An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential
More informationMemory Consistency Models
Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt
More informationThe Java Memory Model
The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation
More informationC11 Compiler Mappings: Exploration, Verification, and Counterexamples
C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold
More informationRELAXED CONSISTENCY 1
RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes
More informationStability in Weak Memory Models
Stability in Weak Memory Models Jade Alglave 1,2 and Luc Maranget 2 1 Oxford University 2 INRIA Abstract. Concurrent programs running on weak memory models exhibit relaxed behaviours, making them hard
More informationHardware models: inventing a usable abstraction for Power/ARM. Friday, 11 January 13
Hardware models: inventing a usable abstraction for Power/ARM 1 Hardware models: inventing a usable abstraction for Power/ARM Disclaimer: 1. ARM MM is analogous to Power MM all this is your next phone!
More informationC++ Memory Model. Don t believe everything you read (from shared memory)
C++ Memory Model Don t believe everything you read (from shared memory) The Plan Why multithreading is hard Warm-up example Sequential Consistency Races and fences The happens-before relation The DRF guarantee
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationDistributed Operating Systems Memory Consistency
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Distributed Operating Systems Memory Consistency Marcus Völp (slides Julian Stecklina, Marcus Völp) SS2014 Concurrent
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationReasoning About The Implementations Of Concurrency Abstractions On x86-tso. By Scott Owens, University of Cambridge.
Reasoning About The Implementations Of Concurrency Abstractions On x86-tso By Scott Owens, University of Cambridge. Plan Intro Data Races And Triangular Races Examples 2 sequential consistency The result
More informationUnit 12: Memory Consistency Models. Includes slides originally developed by Prof. Amir Roth
Unit 12: Memory Consistency Models Includes slides originally developed by Prof. Amir Roth 1 Example #1 int x = 0;! int y = 0;! thread 1 y = 1;! thread 2 int t1 = x;! x = 1;! int t2 = y;! print(t1,t2)!
More informationLowering C11 Atomics for ARM in LLVM
1 Lowering C11 Atomics for ARM in LLVM Reinoud Elhorst Abstract This report explores the way LLVM generates the memory barriers needed to support the C11/C++11 atomics for ARM. I measure the influence
More informationTaming release-acquire consistency
Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics
More informationProgramming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean? Hans-J. Boehm 10/25/2010 1 Disclaimers: This is an overview talk. Much of this work was done by others or jointly. I m relying particularly
More informationExample: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54
Example: The Dekker Algorithm on SMP Systems Memory Consistency The Dekker Algorithm 43 / 54 Using Memory Barriers: the Dekker Algorithm Mutual exclusion of two processes with busy waiting. //flag[] is
More informationMotivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency
Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions
More information<atomic.h> weapons. Paolo Bonzini Red Hat, Inc. KVM Forum 2016
weapons Paolo Bonzini Red Hat, Inc. KVM Forum 2016 The real things Herb Sutter s talks atomic Weapons: The C++ Memory Model and Modern Hardware Lock-Free Programming (or, Juggling Razor Blades)
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Shared-Memory Multi-Processors Shared-Memory Multiprocessors Multiple threads use shared memory (address space) SysV Shared Memory or Threads in software Communication implicit
More informationTyped Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts
Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationShared Memory Consistency Models: A Tutorial
Shared Memory Consistency Models: A Tutorial By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The
More informationAdvanced OpenMP. Memory model, flush and atomics
Advanced OpenMP Memory model, flush and atomics Why do we need a memory model? On modern computers code is rarely executed in the same order as it was specified in the source code. Compilers, processors
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationDon t sit on the fence: A static analysis approach to automatic fence insertion
Don t sit on the fence: A static analysis approach to automatic fence insertion Power, ARM! SC! \ / ~. ~ ) / ( / / \_/ \ / /\ / \ Jade Alglave, Daniel Kroening, Vincent Nimal, Daniel Poetzl University
More informationCan Seqlocks Get Along with Programming Language Memory Models?
Can Seqlocks Get Along with Programming Language Memory Models? Hans-J. Boehm HP Labs Hans-J. Boehm: Seqlocks 1 The setting Want fast reader-writer locks Locking in shared (read) mode allows concurrent
More informationCS 252 Graduate Computer Architecture. Lecture 11: Multiprocessors-II
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs252
More informationCS 152 Computer Architecture and Engineering. Lecture 19: Synchronization and Sequential Consistency
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~krste
More informationTSO-CC: Consistency-directed Coherence for TSO. Vijay Nagarajan
TSO-CC: Consistency-directed Coherence for TSO Vijay Nagarajan 1 People Marco Elver (Edinburgh) Bharghava Rajaram (Edinburgh) Changhui Lin (Samsung) Rajiv Gupta (UCR) Susmit Sarkar (St Andrews) 2 Multicores
More informationMechanised industrial concurrency specification: C/C++ and GPUs. Mark Batty University of Kent
Mechanised industrial concurrency specification: C/C++ and GPUs Mark Batty University of Kent It is time for mechanised industrial standards Specifications are written in English prose: this is insufficient
More informationOther consistency models
Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012
More informationRelease Consistency. Draft material for 3rd edition of Distributed Systems Concepts and Design
Draft material for 3rd edition of Distributed Systems Concepts and Design Department of Computer Science, Queen Mary & Westfield College, University of London Release Consistency 1. Introduction Chapter
More informationCross-ISA Machine Emulation for Multicores
Cross-ISA Machine Emulation for Multicores Emilio G. Cota Paolo Bonzini Alex Bennée Luca P. Carloni Columbia University Red Hat, Inc. Linaro, Ltd. Columbia University CGO 2017 Austin, TX 1 Demand for Scalable
More informationM4 Parallelism. Implementation of Locks Cache Coherence
M4 Parallelism Implementation of Locks Cache Coherence Outline Parallelism Flynn s classification Vector Processing Subword Parallelism Symmetric Multiprocessors, Distributed Memory Machines Shared Memory
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationWeak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014
Weak memory models Mai Thuong Tran PMA Group, University of Oslo, Norway 31 Oct. 2014 Overview 1 Introduction Hardware architectures Compiler optimizations Sequential consistency 2 Weak memory models TSO
More informationCS533 Concepts of Operating Systems. Jonathan Walpole
CS533 Concepts of Operating Systems Jonathan Walpole Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor The effect of optimizations on a uniprocessor The effect
More informationReaders-Writers Problem. Implementing shared locks. shared locks (continued) Review: Test-and-set spinlock. Review: Test-and-set on alpha
Readers-Writers Problem Implementing shared locks Multiple threads may access data - Readers will only observe, not modify data - Writers will change the data Goal: allow multiple readers or one single
More informationSymmetric Multiprocessors: Synchronization and Sequential Consistency
Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November
More informationCS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. Lecture 19 Memory Consistency Models
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering and Computer Sciences University of California
More informationFormal Verification and Linux-Kernel Concurrency
Paul E. McKenney, IBM Distinguished Engineer, Linux Technology Center Member, IBM Academy of Technology Beaver BarCamp, April 18, 2015 Formal Verification and Linux-Kernel Concurrency Overview Two Definitions
More informationCOMP Parallel Computing. CC-NUMA (2) Memory Consistency
COMP 633 - Parallel Computing Lecture 11 September 26, 2017 Memory Consistency Reading Patterson & Hennesey, Computer Architecture (2 nd Ed.) secn 8.6 a condensed treatment of consistency models Coherence
More informationAdministrivia. p. 1/20
p. 1/20 Administrivia Please say your name if you answer a question today If we don t have a photo of you yet, stay after class If you didn t get test email, let us know p. 2/20 Program A int flag1 = 0,
More informationEN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,
More informationA Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture. Anthony Fox and Magnus O. Myreen University of Cambridge
A Trustworthy Monadic Formalization of the ARMv7 Instruction Set Architecture Anthony Fox and Magnus O. Myreen University of Cambridge Background Instruction set architectures play an important role in
More informationControl Instructions
Control Instructions Tuesday 22 September 15 Many slides adapted from: and Design, Patterson & Hennessy 5th Edition, 2014, MK and from Prof. Mary Jane Irwin, PSU Summary Previous Class Instruction Set
More informationControl Instructions. Computer Organization Architectures for Embedded Computing. Thursday, 26 September Summary
Control Instructions Computer Organization Architectures for Embedded Computing Thursday, 26 September 2013 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition,
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationIntroduction. Coherency vs Consistency. Lec-11. Multi-Threading Concepts: Coherency, Consistency, and Synchronization
Lec-11 Multi-Threading Concepts: Coherency, Consistency, and Synchronization Coherency vs Consistency Memory coherency and consistency are major concerns in the design of shared-memory systems. Consistency
More informationImplementing the C11 memory model for ARM processors. Will Deacon February 2015
1 Implementing the C11 memory model for ARM processors Will Deacon February 2015 Introduction 2 ARM ships intellectual property, specialising in RISC microprocessors Over 50 billion
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationTriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
TriCheck: Verification at the Trisection of Software, Hardware, and ISA Caroline Trippel, Yatin A. Manerkar, Daniel Lustig*, Michael Pellauer*, Margaret Martonosi Princeton University *NVIDIA ASPLOS 2017
More informationThread Synchronization: Foundations. Properties. Safety properties. Edsger s perspective. Nothing bad happens
Edsger s perspective Testing can only prove the presence of bugs Thread Synchronization: Foundations Properties Property: a predicate that is evaluated over a run of the program (a trace) every message
More informationCS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models
CS252 Graduate Computer Architecture Fall 2015 Lecture 14: Synchroniza>on and Memory Models Krste Asanovic krste@eecs.berkeley.edu http://inst.eecs.berkeley.edu/~cs252/fa15 Synchroniza>on The need for
More informationPaul E. McKenney, Distinguished Engineer IBM Linux Technology Center
After 25 Years, C/C++ Understands Concurrency Paul E. McKenney, Distinguished Engineer February 1, 2008 What This Talk is Not... Not introducing new synchronization mechanisms The point of standardization
More informationShow No Weakness: Sequentially Consistent Specifications of TSO Libraries
Show No Weakness: Sequentially Consistent Specifications of TSO Libraries Alexey Gotsman 1, Madanlal Musuvathi 2, and Hongseok Yang 3 1 IMDEA Software Institute 2 Microsoft Research 3 University of Oxford
More informationLinearizability of Persistent Memory Objects
Linearizability of Persistent Memory Objects Michael L. Scott Joint work with Joseph Izraelevitz & Hammurabi Mendes www.cs.rochester.edu/research/synchronization/ Compiler-Driven Performance Workshop,
More informationLinearizability of Persistent Memory Objects
Linearizability of Persistent Memory Objects Michael L. Scott Joint work with Joseph Izraelevitz & Hammurabi Mendes www.cs.rochester.edu/research/synchronization/ Workshop on the Theory of Transactional
More informationChasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems
Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems Matthew D. Sinclair, Johnathan Alsop, Sarita V. Adve University of Illinois @ Urbana-Champaign hetero@cs.illinois.edu
More informationGPU Concurrency: Weak Behaviours and Programming Assumptions
GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.
More informationLecture 10: Avoiding Locks
Lecture 10: Avoiding Locks CSC 469H1F Fall 2006 Angela Demke Brown (with thanks to Paul McKenney) Locking: A necessary evil? Locks are an easy to understand solution to critical section problem Protect
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency
More informationAbstraction, Reality Checks, and RCU
Abstraction, Reality Checks, and RCU Paul E. McKenney IBM Beaverton University of Toronto Cider Seminar July 26, 2005 Copyright 2005 IBM Corporation 1 Overview Moore's Law and SMP Software Non-Blocking
More informationUsing Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm
Using Weakly Ordered C++ Atomics Correctly Hans-J. Boehm 1 Why atomics? Programs usually ensure that memory locations cannot be accessed by one thread while being written by another. No data races. Typically
More informationRepairing Sequential Consistency in C/C++11
Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Chung-Kil Hur Seoul National University, Korea gil.hur@sf.snu.ac.kr Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org
More informationConsistency & Coherence. 4/14/2016 Sec5on 12 Colin Schmidt
Consistency & Coherence 4/14/2016 Sec5on 12 Colin Schmidt Agenda Brief mo5va5on Consistency vs Coherence Synchroniza5on Fences Mutexs, locks, semaphores Hardware Coherence Snoopy MSI, MESI Power, Frequency,
More informationLecture 18: Coherence and Synchronization. Topics: directory-based coherence protocols, synchronization primitives (Sections
Lecture 18: Coherence and Synchronization Topics: directory-based coherence protocols, synchronization primitives (Sections 5.1-5.5) 1 Cache Coherence Protocols Directory-based: A single location (directory)
More informationOrder Is A Lie. Are you sure you know how your code runs?
Order Is A Lie Are you sure you know how your code runs? Order in code is not respected by Compilers Processors (out-of-order execution) SMP Cache Management Understanding execution order in a multithreaded
More informationDr. George Michelogiannakis. EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory
CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory http://inst.eecs.berkeley.edu/~cs152!
More informationVerification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel
Verification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel Paul E. McKenney, IBM Linux Technology Center Joint work with Lihao Liang*, Daniel Kroening, and Tom Melham, University of Oxford
More informationReview of last lecture. Peer Quiz. DPHPC Overview. Goals of this lecture. Lock-based queue
Review of last lecture Design of Parallel and High-Performance Computing Fall 2016 Lecture: Linearizability Motivational video: https://www.youtube.com/watch?v=qx2driqxnbs Instructor: Torsten Hoefler &
More informationRCU in the Linux Kernel: One Decade Later
RCU in the Linux Kernel: One Decade Later by: Paul E. Mckenney, Silas Boyd-Wickizer, Jonathan Walpole Slides by David Kennedy (and sources) RCU Usage in Linux During this same time period, the usage of
More informationWhat is uop Cracking?
Nehalem - Part 1 What is uop Cracking? uops are components of larger macro ops. uop cracking is taking CISC like instructions to RISC like instructions it would be good to crack CISC ops in parallel
More informationMemory Consistency Models. CSE 451 James Bornholt
Memory Consistency Models CSE 451 James Bornholt Memory consistency models The short version: Multiprocessors reorder memory operations in unintuitive, scary ways This behavior is necessary for performance
More information740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University Readings: Memory Consistency Required Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess
More informationBeyond Sequential Consistency: Relaxed Memory Models
1 Beyond Sequential Consistency: Relaxed Memory Models Computer Science and Artificial Intelligence Lab M.I.T. Based on the material prepared by and Krste Asanovic 2 Beyond Sequential Consistency: Relaxed
More informationModule 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency
Memory Consistency Models Memory consistency SC SC in MIPS R10000 Relaxed models Total store ordering PC and PSO TSO, PC, PSO Weak ordering (WO) [From Chapters 9 and 11 of Culler, Singh, Gupta] [Additional
More informationMemory Models for C/C++ Programmers
Memory Models for C/C++ Programmers arxiv:1803.04432v1 [cs.dc] 12 Mar 2018 Manuel Pöter Jesper Larsson Träff Research Group Parallel Computing Faculty of Informatics, Institute of Computer Engineering
More informationPredictable Timing Analysis of x86 Multicores using High-Level Parallel Patterns
Predictable Timing Analysis of x86 Multicores using High-Level Parallel Patterns Kevin Hammond, Susmit Sarkar and Chris Brown University of St Andrews, UK T: @paraphrase_fp7 E: kh@cs.st-andrews.ac.uk W:
More informationCSE 506: Opera.ng Systems Memory Consistency
Memory Consistency Don Porter 1 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Kernel RCU File System Networking Sync Today s Lecture Memory Memory Consistency Device CPU Management
More informationRepairing Sequential Consistency in C/C++11
Technical Report MPI-SWS-2016-011, November 2016 Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon
More informationChasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems
Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems Matthew D. Sinclair *, Johnathan Alsop^, Sarita V. Adve + * University of Wisconsin-Madison ^ AMD Research + University
More information