The Semantics of x86-cc Multiprocessor Machine Code
|
|
- Valentine Brett McCoy
- 6 years ago
- Views:
Transcription
1 The Semantics of x86-cc Multiprocessor Machine Code Susmit Sarkar Computer Laboratory University of Cambridge Joint work with: Peter Sewell, Scott Owens, Tom Ridge, Magnus Myreen (U.Cambridge) Francesco Zappa Nardelli, Jade Alglave, Thomas Braibant (INRIA) ARG lunch, November 2008
2 Shared Memory Multiprocessors are now everywhere Programmer model: many processors operating on (the illusion of) a single shared memory Also known as: sequential consistency Traditional concurrency semantics presupposes sequential consistency, for parallel languages or process calculi: (P 0 P 0 M 0 ) (P 1 P 1 M 1 ) (P 2 P 2 M 2 )...
3 Shared Memory Programmer model: many processors operating on (the illusion of) a single shared memory But: For typical real shared-memory multiprocessors, the illusion of a single shared memory is not very good. For performance reasons, they only have approximately consistent views of that memory, aka weak memory models, aka relaxed memory models. They are not sequentially consistent. Different processors can observe actions in different orders. We can t think about these systems in terms of global time
4 Approximately Consistent Memory: One Intel/AMD Example Initial shared memory values: x = 0 y = 0 Per-processor registers: r A r B Processor A Processor B store x := 1 store y := 1 load r A := y load r B := x Processor A MOV [x] $1 MOV EAX [y] Processor B MOV [y] $1 MOV EBX [x] Final register values: r A =? r B =?
5 Approximately Consistent Memory: One Intel/AMD Example Initial shared memory values: x = 0 y = 0 Per-processor registers: r A r B Processor A Processor B store x := 1 store y := 1 load r A := y load r B := x Processor A MOV [x] $1 MOV EAX [y] Processor B MOV [y] $1 MOV EBX [x] Final register values: r A = 0 and r B = 0 is possible Each processor can do its own store action before the store of the other processor. Makes it hard to understand what your programs are doing! Already a real problem for OS, compiler, and library authors.
6 Problems Most real multiprocessors (x86, PPC, SPARC, ARM,...) provide non-sequentially-consistent, or weak, or relaxed memory To write efficient low-level concurrent code you have to understand exactly what guarantees are provided But:...the guarantees are subtle, and differ between architectures...the processor documentation is typically very ambiguous, hard to understand, and sometimes, incomplete and unsound...(almost) none of the last 40 years of research on verifying concurrent algorithms deals with these real weak memory models...(almost all) previous WMM work doesn t cover x86, and isn t integrated with instruction semantics
7 Plan 1. Find out what the architecture and processors say and do Aim: Model should be sound w.r.t. the architecture (and hence w.r.t current and future processors) and strong enough for reasoning about (racy) code, but may be looser than the behaviour of any particular processor. 2. Express it in nice clear unambiguous mathematics 3. Test that the mathematics and hardware correspond 4. Prove metatheory (e.g. that for well-synchronized programs you don t need to think about this stuff)
8
9 Sources Intel 64 and IA-32 Architectures Software Developer s Manual, vols 1,2A,2B,3A,3B (Rev 28, July 2008) In multiprocessor systems, maintenance of cache consistency may, in rare circumstances, require intervention by system software. [Vol 3A 10-5] AMD 64 Architecture Programmer s Manual, vols 1,2,3 (September 2007) Personal communication with a couple of Intel experts. You?
10 Timeline of memory model descriptions Pre-IWP Nov 2006 Intel manuals, rev 22 IWP/Rev 28 Aug 2007 Intel white paper v1.0 Sep 2007 AMD manual, rev 3.14 Jul 2008 Intel manuals, rev 28 Rev 29 Nov 2008 (last week) Intel manual, rev 29
11 Not all of x86 For now, only the basic user-code scenario: coherent write-back memory no misaligned accesses, exceptions, or non-temporal operations no self-modifying code no page-table changes Sufficient for user space code and most kernel code
12 Two styles of semantics in WMM lit.: Semantics of Memory model Operational : idealised machines, with buffers, etc. Non-operational or axiomatic : constraints on ordering relations. Ideally: both, with a correspondence theorem. First: axiomatic. A view order per processor, with constraints on how they relate to each other.
13 Instructions and Events Program is instructions, but reordering is over read/write events: proc:0 proc:1 INC [100] INC [100] eiid:1 (of INC [100]) iiid: proc:0;po:0 R [100]=0 eiid:5 (of INC [100]) iiid: proc:1;po:0 R [100]=0 iico iico eiid:3 (of INC [100]) iiid: proc:0;po:0 W [100]=1 eiid:7 (of INC [100]) iiid: proc:1;po:0 W [100]=1 inc-inc: (event structure 4) For program reasoning need both (unlike most lit.). Non-atomic instructions. Record iico.
14 Locked Instructions proc:0 proc:1 LOCK; INC [100] LOCK; INC [100] Event Structures event structure = [ procs : proc set; events : event set; intra causality : event reln; atomicity : event set set]
15 View Orders A collection of view orders vo gives, for each processor p, a linear order vo p of the relevant events. The relevant events are: all the events of processor p, and all the memory write events of other processors
16 vo:1 vo:0 eiid:5 (of INC [100]) iiid: proc:1;po:0 R [100]=0 eiid:1 (of INC [100]) iiid: proc:0;po:0 R [100]=0 vo:1 iico vo:0 eiid:3 (of INC [100]) iico iiid: proc:0;po:0 W [100]=1 proc:0 proc:1 INC [100] INC [100] vo:1 P6 vo:0 eiid:7 (of INC [100]) iiid: proc:1;po:0 W [100]=1
17 Preserved Program Order 5 of the 8 Intel WP principles are straightforward. P1. LOADS ARE NOT REORDERED WITH OTHER LOADS. P2. STORES ARE NOT REORDERED WITH OTHER STORES. iwp2.1/amd1 proc:0 proc:1 po:0 MOV [100] $1 MOV EAX [200] po:1 MOV [200] $1 MOV EBX [100] Required: (1:EAX=1) (1:EBX=1) P3. STORES ARE NOT REORDERED WITH OLDER LOADS. P4. LOADS MAY BE REORDERED WITH OLDER STORES TO DIFFERENT LOCATIONS BUT NOT WITH OLDER STORES TO THE SAME LOCATION. P8. LOADS AND STORES ARE NOT REORDERED WITH LOCKED INS
18 Preserved Program Order, Formalised preserved program order E = {(e 1,e 2 ) (e 1,e 2 ) (po strict E) (( p r.(loc e 1 = loc e 2 ) (loc e 1 = SOME (LOCATION REG p r))) (mem load e 1 mem load e 2 ) (mem store e 1 mem store e 2 ) (mem load e 1 mem store e 2 ) (mem store e 1 mem load e 2 (loc e 1 = loc e 2 )) ((mem load e 1 mem store e 1 ) locked E e 2 ) (locked E e 1 (mem load e 2 mem store e 2 )))}
19 Total order on stores to each location P6. IN A MULTIPROCESSOR SYSTEM, STORES TO THE SAME LOCATION HAVE A TOTAL ORDER. write serialization candidates E =...the set of all relations which are the union, for each location, of a linear order over all the store events to that location in E. iwp2.6 proc:0 proc:1 proc:2 proc:3 po:0 MOV [100] $1 MOV [100] $2 MOV EAX [100] MOV ECX [100] po:1 MOV EBX [100] MOV EDX [100] Forbidden: 2:EAX=1 2:EBX=2 3:ECX=2 3:EDX=1
20 Total order on locked instructions P7. IN A MULTIPROCESSOR SYSTEM, LOCKED INSTRUCTIONS HAVE A TOTAL ORDER. lock serialization candidates E =...similar, but on instructions iwp2.7/amd7 proc:0 proc:1 proc:2 proc:3 po:0 XCHG [100] EAX XCHG [200] EBX MOV ECX [100] MOV ESI [200] po:1 MOV EDX [200] MOV EDI [100] Initial state: 0:EAX= 1 1:EBX= 1 (elsewhere 0) Forbidden: 2:ECX=1 2:EDX=0 3:ESI=1 3:EDI=0
21 Transitive visibility Key question: how to capture condition P5 Intel 64 memory ordering ensures transitive visibility of stores i.e. stores that are causally related appear to execute in an order consistent with the causal relation Transitivity from reads-from to preserved-program-order: proc:0 proc:1 proc:2 MOV [100] $1 MOV EAX [100] MOV EBX [200] MOV [200] $1 Required: (1:EAX=1 2:EBX=1) (2:ECX=1) MOV ECX [100]
22 Reads-from maps A reads-from map for an event structure is a set of pairs (ew, er) identifying, for some of its read events, a write event to the same location with the same value. Other read events are presumed to read from the initial state.
23 Causality Believe transitive also through write- and lock-serialization orders, and intra-instruction causality. Interpret causally with happens before E X = E.intra causality (preserved program order E) X.write serialization X.lock serialization X.rfmap
24 In full, an execution witness X, for an event structure E, comprises: an initial state initial state, a family of view orders (one for each processor) vo, a per-location global order on memory writes write serialization, a global order on locked instructions lock serialization, a reads-from map rfmap, together satisfying the valid execution predicate below.
25 (then the final state is determined by the initial state overridden by the last memory and register writes) Valid Executions For each processor p: (a) p s view order is consistent with happens before (strict(vo p) happens bef ore is acyclic) (b) the reads-from map is satisfied by the view orders (for any write ew and read er in rfmap and in the relevant view order events for p, ew vo p er and there is no other intervening write to the same location) (c) the initial state constraint is satisfied by the rfmap and view orders (for each read er that does not have a corresponding write in rfmap, the initial state contains the read value and that there is no other write ew to that location preceding er in the view order) (d) that atomicity conditions are satisfied by each view order (for any two events in the same atomicity equivalence class, there is no third event e that occurs between them that isn t in that class.)
26 Example valid execution vo:0 eiid:0 (of MOV [100] $1) iiid: proc:0;po:0 W [100]=1 iwp2.4/amd9 proc:0 proc:1 po:0 MOV [100] $1 MOV [200] $1 vo:0 P4 eiid:1 (of MOV EAX [100]) iiid: proc:0;po:1 R [100]=1 rf po:1 MOV EAX [100] MOV ECX [200] po:2 MOV EBX [200] MOV EDX [100] Allowed: 0:EBX=0 1:EDX=0 P1 vo:0 iico eiid:3 (of MOV EAX [100]) iiid: proc:0;po:1 W 0:EAX=1 vo:0 eiid:6 (of MOV EBX [200]) iiid: proc:0;po:2 R [200]=0 iico vo:0 An execution in which processor 0 sees its write before that of processor 1 whereas processor 1 sees them in the opposite order. eiid:8 (of MOV EBX [200]) iiid: proc:0;po:2 W 0:EBX=0 vo:1 eiid:9 (of MOV [200] $1) iiid: proc:1;po:0 W [200]=1 eiid:10 (of MOV ECX [200]) iiid: proc:1;po:1 R [200]=1 vo:0 P4 rf vo:1 vo:1 vo:1 vo:1 iico eiid:12 (of MOV ECX [200]) iiid: proc:1;po:1 W 1:ECX=1 P1 vo:1 eiid:15 (of MOV EDX [100]) iiid: proc:1;po:2 R [100]=0 iico eiid:17 (of MOV EDX [100]) iiid: proc:1;po:2 W 1:EDX=0 iwp2.4/amd9: Litmus Test (event structure 6)
27 Instruction Semantics Decoding: " 8B /r MOV r32, r/m32 "; " B8+rd id MOV r32, imm32 "; Microcode combinators: seqm : a M ( a b M) b M parm : a M b M ( a b)m read reg : iiid Xreg word32 M... x86 exec ii (XBINOP binop name ds) len = parm unit (seqm (read eip ii) (λx. write eip ii (x + len))) (seqm (parm (read src ea ii ds) (read dest ea ii ds)) (λ((ea src, val src), (ea dest, val dest)). write binop ii binop name val dest val src ea dest))
28 Validating the semantics Too complex to work with by hand! (both combinatorially and twistily) Write executable version, in OCaml Formalise semantics, in HOL Test behaviour of real processors: the instruction semantics (directly against HOL) the memory model Prove metatheory
29 Testing the instruction semantics Generate 6000 conjectures like this (for MOV EAX EBX) from a real processor: (XREAD REG EBX s = 0x6F5BE65Bw) = (XREAD EIP s = 0x804848Bw) = (XREAD MEM 0x804848Bw s = SOME 0x89w) = (XREAD MEM 0x804848Cw s = SOME 0xD8w) = (XREAD REG EAX(the(X86 NEXT s)) = 0x6F5BE65Bw) (XREAD REG EBX(the(X86 NEXT s)) = 0x6F5BE65Bw) (XREAD EIP(the(X86 NEXT s)) = 0x804848Dw) Prove in HOL (automatically...) 32-bit MOV, CMOVE, CMOVNE, XADD, XCHG, CMPXCHG; ADD, AND, CMP, OR, SUB, TEST, XOR; INC, DEC, NOT, NEG; POP, PUSH; JUMP, CALL, RET, LOOP.
30 Testing the memory model (* Test iwp2.4/amd9 : Intra-processor forwarding is allowed*) {x = 0; y = 0}; exists (%r2 = 0 /\ %r4 = 0); P0 P1 ; mov [x], 1 mov [y], 1 ; mov %r1, [x] mov %r3, [y] ; mov %r2, [y] mov %r4, [x] We found a witness for the case : exists r2 = 0 /\ r4 = 0 Histogram of results (x,1) (y,1) (%r1,1) (%r2,0) (%r3,1) (%r4,0) 412 (x,1) (y,1) (%r1,1) (%r2,1) (%r3,1) (%r4,0) (x,1) (y,1) (%r1,1) (%r2,0) (%r3,1) (%r4,1) (x,1) (y,1) (%r1,1) (%r2,1) (%r3,1) (%r4,1) 23
31 Metatheory 1: Nice executions The model follows the statements in the manual... and so is (superficially) quite weak e.g. accesses to different registers need not follow program order Theorem: All valid executions are equivalent to nice valid executions Nice: register and memory reads are in program order (but memory writes can be arbitrarily delayed) Proved in HOL [Tom Ridge]
32 Metatheory 2: Data Race freedom We would like to program as if memory is sequentially consistent (for well-behaved programs) Theorem:...for race-free event structures, all valid executions are equivalence to valid sequential executions Race free: Intensional definition no pair of events, one a memory read and another a memory write, to the same location, unrelated by happens-before Proved in HOL [Scott Owens]
33 Metatheory 3: Operational model The axiomatic model is good for proofs, but is not suited for calculation Theorem:...the axiomatic model is equivalent to a deadlock-free operational semantics model [hand proof, without lock prefix] The operational semantics, due to property of niceness, need only delay visibility of memory writes
34 Summary of X86 models Pre-IWP (pre-aug 2007) (Intel/AMD) Extremely vague IWP/Rev 28 (Intel/AMD, formalized by X86-CC) Moderately clear, except for causality (interpreted in X86-CC) Unsound with hardware Too weak for programmers (?) (IRIW, MFENCEs do not lead to sequential consistency)
35 The Rev 28 Manual / X86-CC is in some cases stronger than hardware Rev 28 of Intel manual: The Rev 28/X86-CC model is not sound P4. READS MAY BE REORDERED WITH OLDER WRITES TO DIFFERENT LOCATIONS BUT NOT WITH OLDER WRITES TO THE SAME LOCATION. P6. WRITES TO THE SAME LOCATION HAVE A TOTAL ORDER. n6 proc:0 proc:1 poi:0 MOV [x] $1 MOV [y] $2 poi:1 MOV EAX [x] MOV [x] $2 poi:2 MOV EBX [y] Forbidden: 0:EAX=1 0:EBX=0 x=1 Observed (rarely, but reproducibly) on real hardware (Core 2), and allowed in Rev 29
36 What do programmers on X86 use? Generally assumed: somewhat like Total Store Order on SPARC x86/iriw proc:0 proc:1 proc:2 proc:3 poi:0 MOV [x] $1 MOV [y] $1 MOV EAX [x] MOV ECX [y] poi:1 MOV EBX [y] MOV EDX [x] Forbidden: 2:EAX=1 2:EBX=0 3:ECX=1 3:EDX=0 Allowed in X86-CC/Rev 28, explicitly allowed by AMD (X86-CC is in this respect weaker than TSO) Forbidden in Rev 29 (this is a weakness, not an unsoundness)
37 Forbidden in Rev 28/X86-CC Rev 29 of Intel manual: Revised model is (probably) too weak P6. ANY TWO STORES ARE SEEN IN A CONSISTENT ORDER BY PROCESSORS OTHER THAN THOSE PERFORMING THE STORES... and delete the P6. WRITES TO THE SAME LOCATION HAVE A TOTAL ORDER x86/n5 proc:0 proc:1 poi:0 MOV [x] $1 MOV [x] $2 poi:1 MOV EAX [x] MOV EBX [x] Forbidden: 0:EAX=2 1:EBX=1 This would be allowed (as far as we can tell) in Rev 29, and would be very strange for programmers
38 Comparison of X86 models Pre-IWP (pre-aug 2007) (Intel/AMD) Extremely vague IWP/Rev 28 (Intel/AMD, formalized by X86-CC) Moderately clear, except for causality (interpreted in X86-CC) Unsound with hardware Too weak for programmers (?) (IRIW, MFENCEs do not lead to sequential consistency) Rev 29 (Intel, AMD in progress) Moderately clear, except for causality (old interpretation does not work, not clear what does) Sound (as far as we know) with hardware Too weak for programmers (n5) X86-TSO (Us, in progress) Clear Sound (as far as we know) with hardware Strong enough for programmers (?) (experience of TSO programmers)
39
X-86 Memory Consistency
X-86 Memory Consistency Andreas Betz University of Kaiserslautern a betz12@cs.uni-kl.de Abstract In recent years multiprocessors have become ubiquitous and with them the need for concurrent programming.
More informationA better x86 memory model: x86-tso (extended version)
A better x86 memory model: x86-tso (extended version) Scott Owens Susmit Sarkar Peter Sewell University of Cambridge http://www.cl.cam.ac.uk/users/pes20/weakmemory March 25, 2009 Revision : 1746 Abstract
More informationHardware Memory Models: x86-tso
Hardware Memory Models: x86-tso John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 9 20 September 2016 Agenda So far hardware organization multithreading
More informationx86-tso: A Rigorous and Usable Programmer s Model for x86 Multiprocessors
x86-tso: A Rigorous and Usable Programmer s Model for x86 Multiprocessors Peter Sewell University of Cambridge Francesco Zappa Nardelli INRIA Susmit Sarkar University of Cambridge Magnus O. Myreen University
More informationC++ Concurrency - Formalised
C++ Concurrency - Formalised Salomon Sickert Technische Universität München 26 th April 2013 Mutex Algorithms At most one thread is in the critical section at any time. 2 / 35 Dekker s Mutex Algorithm
More informationRelaxed Memory: The Specification Design Space
Relaxed Memory: The Specification Design Space Mark Batty University of Cambridge Fortran meeting, Delft, 25 June 2013 1 An ideal specification Unambiguous Easy to understand Sound w.r.t. experimentally
More informationMulticore Programming: C++0x
p. 1 Multicore Programming: C++0x Mark Batty University of Cambridge in collaboration with Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber November, 2010 p. 2 C++0x: the next C++ Specified by the
More informationMulticore Programming Java Memory Model
p. 1 Multicore Programming Java Memory Model Peter Sewell Jaroslav Ševčík Tim Harris University of Cambridge MSR with thanks to Francesco Zappa Nardelli, Susmit Sarkar, Tom Ridge, Scott Owens, Magnus O.
More informationMulticore Semantics and Programming
Multicore Semantics and Programming Peter Sewell University of Cambridge Tim Harris MSR with thanks to Francesco Zappa Nardelli, Jaroslav Ševčík, Susmit Sarkar, Tom Ridge, Scott Owens, Magnus O. Myreen,
More informationUnderstanding POWER multiprocessors
Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2,3 Luc Maranget 3 Derek Williams 4 1 University of Cambridge 2 Oxford University 3 INRIA 4 IBM June 2011 Programming shared-memory
More informationThe C/C++ Memory Model: Overview and Formalization
The C/C++ Memory Model: Overview and Formalization Mark Batty Jasmin Blanchette Scott Owens Susmit Sarkar Peter Sewell Tjark Weber Verification of Concurrent C Programs C11 / C++11 In 2011, new versions
More informationTaming release-acquire consistency
Taming release-acquire consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) POPL 2016 Weak memory models Weak memory models provide formal sound semantics
More informationRELAXED CONSISTENCY 1
RELAXED CONSISTENCY 1 RELAXED CONSISTENCY Relaxed Consistency is a catch-all term for any MCM weaker than TSO GPUs have relaxed consistency (probably) 2 XC AXIOMS TABLE 5.5: XC Ordering Rules. An X Denotes
More informationThe Java Memory Model
The Java Memory Model The meaning of concurrency in Java Bartosz Milewski Plan of the talk Motivating example Sequential consistency Data races The DRF guarantee Causality Out-of-thin-air guarantee Implementation
More informationC11 Compiler Mappings: Exploration, Verification, and Counterexamples
C11 Compiler Mappings: Exploration, Verification, and Counterexamples Yatin Manerkar Princeton University manerkar@princeton.edu http://check.cs.princeton.edu November 22 nd, 2016 1 Compilers Must Uphold
More informationHigh-level languages
High-level languages High-level languages are not immune to these problems. Actually, the situation is even worse: the source language typically operates over mixed-size values (multi-word and bitfield);
More informationFoundations of the C++ Concurrency Memory Model
Foundations of the C++ Concurrency Memory Model John Mellor-Crummey and Karthik Murthy Department of Computer Science Rice University johnmc@rice.edu COMP 522 27 September 2016 Before C++ Memory Model
More informationSemantics, languages and algorithms for multicore programming
Semantics, languages and algorithms for multicore programming Albert Cohen Luc Maranget Francesco Zappa Nardelli Concurrency, in theory Concurrency theory is fundamental Many of the concepts and techniques
More informationT Jarkko Turkulainen, F-Secure Corporation
T-110.6220 2010 Emulators and disassemblers Jarkko Turkulainen, F-Secure Corporation Agenda Disassemblers What is disassembly? What makes up an instruction? How disassemblers work Use of disassembly In
More informationFormal Specification of RISC-V Systems Instructions
Formal Specification of RISC-V Systems Instructions Arvind Andy Wright, Sizhuo Zhang, Thomas Bourgeat, Murali Vijayaraghavan Computer Science and Artificial Intelligence Lab. MIT RISC-V Workshop, MIT,
More informationTyped Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts
Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts Toshiyuki Maeda and Akinori Yonezawa University of Tokyo Quiz [Environment] CPU: Intel Xeon X5570 (2.93GHz)
More informationProgram logics for relaxed consistency
Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 1st Lecture, 28 July 2014 Outline Part I. Weak memory models 1. Intro
More informationMoscova. Jean-Jacques Lévy. March 23, INRIA Paris Rocquencourt
Moscova Jean-Jacques Lévy INRIA Paris Rocquencourt March 23, 2011 Research team Stats Staff 2008-2011 Jean-Jacques Lévy, INRIA Karthikeyan Bhargavan, INRIA James Leifer, INRIA Luc Maranget, INRIA Francesco
More informationConcurrent Objects. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
Concurrent Objects Companion slides for The by Maurice Herlihy & Nir Shavit Concurrent Computation memory object object 2 Objectivism What is a concurrent object? How do we describe one? How do we implement
More informationRepairing Sequential Consistency in C/C++11
Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon Kang Seoul National University, Korea jeehoon.kang@sf.snu.ac.kr
More informationW4118: PC Hardware and x86. Junfeng Yang
W4118: PC Hardware and x86 Junfeng Yang A PC How to make it do something useful? 2 Outline PC organization x86 instruction set gcc calling conventions PC emulation 3 PC board 4 PC organization One or more
More informationThe C1x and C++11 concurrency model
The C1x and C++11 concurrency model Mark Batty University of Cambridge January 16, 2013 C11 and C++11 Memory Model A DRF model with the option to expose relaxed behaviour in exchange for high performance.
More informationReasoning about the Implementation of Concurrency Abstractions on x86-tso
Reasoning about the Implementation of Concurrency Abstractions on x86-tso Scott Owens University of Cambridge Abstract. With the rise of multi-core processors, shared-memory concurrency has become a widespread
More informationPractical Malware Analysis
Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the
More informationOverview: Memory Consistency
Overview: Memory Consistency the ordering of memory operations basic definitions; sequential consistency comparison with cache coherency relaxing memory consistency write buffers the total store ordering
More informationCS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08
CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models Review. Why are relaxed memory-consistency models needed? How do relaxed MC models require programs to be changed? The safety net between operations whose order needs
More informationGPU Concurrency: Weak Behaviours and Programming Assumptions
GPU Concurrency: Weak Behaviours and Programming Assumptions Jyh-Jing Hwang, Yiren(Max) Lu 03/02/2017 Outline 1. Introduction 2. Weak behaviors examples 3. Test methodology 4. Proposed memory model 5.
More informationRepairing Sequential Consistency in C/C++11
Technical Report MPI-SWS-2016-011, November 2016 Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org Jeehoon
More informationC++ Memory Model. Don t believe everything you read (from shared memory)
C++ Memory Model Don t believe everything you read (from shared memory) The Plan Why multithreading is hard Warm-up example Sequential Consistency Races and fences The happens-before relation The DRF guarantee
More informationImplementing Sequential Consistency In Cache-Based Systems
To appear in the Proceedings of the 1990 International Conference on Parallel Processing Implementing Sequential Consistency In Cache-Based Systems Sarita V. Adve Mark D. Hill Computer Sciences Department
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In small multiprocessors, sequential consistency can be implemented relatively easily. However, this is not true for large multiprocessors. Why? This is not the
More informationReasoning about the C/C++ weak memory model
Reasoning about the C/C++ weak memory model Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) 13 October 2014 Talk outline I. Introduction Weak memory models The C11 concurrency model
More informationMachine Programming 3: Procedures
Machine Programming 3: Procedures CS61, Lecture 5 Prof. Stephen Chong September 15, 2011 Announcements Assignment 2 (Binary bomb) due next week If you haven t yet please create a VM to make sure the infrastructure
More informationDeclarative semantics for concurrency. 28 August 2017
Declarative semantics for concurrency Ori Lahav Viktor Vafeiadis 28 August 2017 An alternative way of defining the semantics 2 Declarative/axiomatic concurrency semantics Define the notion of a program
More informationAdministrivia. p. 1/20
p. 1/20 Administrivia Please say your name if you answer a question today If we don t have a photo of you yet, stay after class If you didn t get test email, let us know p. 2/20 Program A int flag1 = 0,
More informationApplied Theorem Proving: Modelling Instruction Sets and Decompiling Machine Code. Anthony Fox University of Cambridge, Computer Laboratory
Applied Theorem Proving: Modelling Instruction Sets and Decompiling Machine Code Anthony Fox University of Cambridge, Computer Laboratory Overview This talk will mainly focus on 1. Specifying instruction
More informationSharedArrayBuffer and Atomics Stage 2.95 to Stage 3
SharedArrayBuffer and Atomics Stage 2.95 to Stage 3 Shu-yu Guo Lars Hansen Mozilla November 30, 2016 What We Have Consensus On TC39 agreed on Stage 2.95, July 2016 Agents API (frozen) What We Have Consensus
More informationMemory Models. Registers
Memory Models Most machines have a single linear address space at the ISA level, extending from address 0 up to some maximum, often 2 32 1 bytes or 2 64 1 bytes. Some machines have separate address spaces
More informationx86 Assembly Crash Course Don Porter
x86 Assembly Crash Course Don Porter Registers ò Only variables available in assembly ò General Purpose Registers: ò EAX, EBX, ECX, EDX (32 bit) ò Can be addressed by 8 and 16 bit subsets AL AH AX EAX
More informationWeak memory models. Mai Thuong Tran. PMA Group, University of Oslo, Norway. 31 Oct. 2014
Weak memory models Mai Thuong Tran PMA Group, University of Oslo, Norway 31 Oct. 2014 Overview 1 Introduction Hardware architectures Compiler optimizations Sequential consistency 2 Weak memory models TSO
More informationCMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 15: Memory Consistency and Synchronization Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 5 (multi-core) " Basic requirements: out later today
More informationAssembly Language: Function Calls" Goals of this Lecture"
Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and returning" Passing parameters" Storing local variables" Handling registers without interference"
More informationReasoning between Programming Languages and Architectures
École normale supérieure Mémoire d habilitation à diriger des recherches Specialité Informatique Reasoning between Programming Languages and Architectures Francesco Zappa Nardelli Présenté aux rapporteurs
More informationX86 Addressing Modes Chapter 3" Review: Instructions to Recognize"
X86 Addressing Modes Chapter 3" Review: Instructions to Recognize" 1 Arithmetic Instructions (1)! Two Operand Instructions" ADD Dest, Src Dest = Dest + Src SUB Dest, Src Dest = Dest - Src MUL Dest, Src
More informationProgram Exploitation Intro
Program Exploitation Intro x86 Assembly 04//2018 Security 1 Univeristà Ca Foscari, Venezia What is Program Exploitation "Making a program do something unexpected and not planned" The right bugs can be
More informationParallel Computer Architecture Spring Memory Consistency. Nikos Bellas
Parallel Computer Architecture Spring 2018 Memory Consistency Nikos Bellas Computer and Communications Engineering Department University of Thessaly Parallel Computer Architecture 1 Coherence vs Consistency
More informationThe Java Memory Model
Jeremy Manson 1, William Pugh 1, and Sarita Adve 2 1 University of Maryland 2 University of Illinois at Urbana-Champaign Presented by John Fisher-Ogden November 22, 2005 Outline Introduction Sequential
More informationMemory Consistency Models
Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt
More informationAn introduction to weak memory consistency and the out-of-thin-air problem
An introduction to weak memory consistency and the out-of-thin-air problem Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS) CONCUR, 7 September 2017 Sequential consistency 2 Sequential
More informationThe Geometry of Innocent Flesh on the Bone
The Geometry of Innocent Flesh on the Bone Return-into-libc without Function Calls (on the x86) Hovav Shacham hovav@cs.ucsd.edu CCS 07 Technical Background Gadget: a short instructions sequence (e.x. pop
More informationAssembly Language: Function Calls" Goals of this Lecture"
Assembly Language: Function Calls" 1 Goals of this Lecture" Help you learn:" Function call problems:" Calling and urning" Passing parameters" Storing local variables" Handling registers without interference"
More informationCS5460: Operating Systems
CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that
More informationSymmetric Multiprocessors: Synchronization and Sequential Consistency
Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November
More informationChapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary
Chapter 4 Processor Architecture: Y86 (Sections 4.1 & 4.3) with material from Dr. Bin Ren, College of William & Mary 1 Outline Introduction to assembly programing Introduction to Y86 Y86 instructions,
More informationAssembly Language: Function Calls
Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and returning Passing parameters Storing local variables Handling registers without interference
More informationWeak Memory Models: an Operational Theory
Opening Weak Memory Models: an Operational Theory INRIA Sophia Antipolis 9th June 2008 Background on weak memory models Memory models, what are they good for? Hardware optimizations Contract between hardware
More informationAssembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction
Assembly Programmer s View Lecture 4A Machine-Level Programming I: Introduction E I P CPU isters Condition Codes Addresses Data Instructions Memory Object Code Program Data OS Data Topics Assembly Programmer
More informationRelaxed Memory Consistency
Relaxed Memory Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationObjectives. Making TOS preemptive Avoiding race conditions
TOS Arno Puder 1 Objectives Making TOS preemptive Avoiding race conditions 2 Status Quo TOS is non-preemptive. i.e., a process has to relinquish control of the CPU voluntarily via resign() The implication
More informationTaming Release-Acquire Consistency
Taming Release-Acquire Consistency Ori Lahav Nick Giannarakis Viktor Vafeiadis Max Planck Institute for Software Systems (MPI-SWS), Germany {orilahav,nickgian,viktor}@mpi-sws.org * POPL * Artifact Consistent
More informationSystems I. Machine-Level Programming I: Introduction
Systems I Machine-Level Programming I: Introduction Topics Assembly Programmerʼs Execution Model Accessing Information Registers IA32 Processors Totally Dominate General Purpose CPU Market Evolutionary
More informationReturn-orientated Programming
Return-orientated Programming or The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86) Hovav Shacham, CCS '07 Return-Oriented oriented Programming programming
More informationProcesses (Intro) Yannis Smaragdakis, U. Athens
Processes (Intro) Yannis Smaragdakis, U. Athens Process: CPU Virtualization Process = Program, instantiated has memory, code, current state What kind of memory do we have? registers + address space Let's
More informationRelaxed Memory-Consistency Models
Relaxed Memory-Consistency Models [ 9.1] In Lecture 13, we saw a number of relaxed memoryconsistency models. In this lecture, we will cover some of them in more detail. Why isn t sequential consistency
More informationRepairing Sequential Consistency in C/C++11
Repairing Sequential Consistency in C/C++11 Ori Lahav MPI-SWS, Germany orilahav@mpi-sws.org Chung-Kil Hur Seoul National University, Korea gil.hur@sf.snu.ac.kr Viktor Vafeiadis MPI-SWS, Germany viktor@mpi-sws.org
More informationadministrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions?
administrivia today start assembly probably won t finish all these slides Assignment 4 due tomorrow any questions? exam on Wednesday today s material not on the exam 1 Assembly Assembly is programming
More informationPart 1. Shared memory: an elusive abstraction
Part 1. Shared memory: an elusive abstraction Francesco Zappa Nardelli INRIA Paris-Rocquencourt http://moscova.inria.fr/~zappa/projects/weakmemory Based on work done by or with Peter Sewell, Jaroslav Ševčík,
More informationMotivations. Shared Memory Consistency Models. Optimizations for Performance. Memory Consistency
Shared Memory Consistency Models Authors : Sarita.V.Adve and Kourosh Gharachorloo Presented by Arrvindh Shriraman Motivations Programmer is required to reason about consistency to ensure data race conditions
More informationExample: The Dekker Algorithm on SMP Systems. Memory Consistency The Dekker Algorithm 43 / 54
Example: The Dekker Algorithm on SMP Systems Memory Consistency The Dekker Algorithm 43 / 54 Using Memory Barriers: the Dekker Algorithm Mutual exclusion of two processes with busy waiting. //flag[] is
More informationarxiv: v1 [cs.pl] 29 Aug 2018
Memory Consistency Models using Constraints Özgür Akgün, Ruth Hoffmann, and Susmit Sarkar arxiv:1808.09870v1 [cs.pl] 29 Aug 2018 School of Computer Science, University of St Andrews, UK {ozgur.akgun, rh347,
More informationLoad-reserve / Store-conditional on POWER and ARM
Load-reserve / Store-conditional on POWER and ARM Peter Sewell (slides from Susmit Sarkar) 1 University of Cambridge June 2012 Correct implementations of C/C++ on hardware Can it be done?...on highly relaxed
More informationCOREMU: a Portable and Scalable Parallel Full-system Emulator
COREMU: a Portable and Scalable Parallel Full-system Emulator Haibo Chen Parallel Processing Institute Fudan University http://ppi.fudan.edu.cn/haibo_chen Full-System Emulator Useful tool for multicore
More informationProgramming Paradigms for Concurrency Lecture 3 Concurrent Objects
Programming Paradigms for Concurrency Lecture 3 Concurrent Objects Based on companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Thomas Wies New York University
More informationMultiprocessor Solution
Mutual Exclusion Multiprocessor Solution P(sema S) begin while (TAS(S.flag)==1){}; { busy waiting } S.Count= S.Count-1 if (S.Count < 0){ insert_t(s.qwt) BLOCK(S) {inkl.s.flag=0)!!!} } else S.flag =0 end
More informationComputer Organization Chapter 4. Prof. Qi Tian Fall 2013
Computer Organization Chapter 4 Prof. Qi Tian Fall 2013 1 Topics Dec. 6 (Friday) Final Exam Review Record Check Dec. 4 (Wednesday) 5 variable Karnaugh Map Quiz 5 Dec. 2 (Monday) 3, 4 variables Karnaugh
More informationAn Experience Like No Other. Stack Discipline Aug. 30, 2006
15-410 An Experience Like No Other Discipline Aug. 30, 2006 Bruce Maggs Dave Eckhardt Slides originally stolen from 15-213 15-410, F 06 Synchronization Registration If you're here but not registered, please
More informationCNIT 127: Exploit Development. Ch 1: Before you begin. Updated
CNIT 127: Exploit Development Ch 1: Before you begin Updated 1-14-16 Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend, such as Denial
More informationReverse Engineering II: Basics. Gergely Erdélyi Senior Antivirus Researcher
Reverse Engineering II: Basics Gergely Erdélyi Senior Antivirus Researcher Agenda Very basics Intel x86 crash course Basics of C Binary Numbers Binary Numbers 1 Binary Numbers 1 0 1 1 Binary Numbers 1
More informationCS510 Advanced Topics in Concurrency. Jonathan Walpole
CS510 Advanced Topics in Concurrency Jonathan Walpole Threads Cannot Be Implemented as a Library Reasoning About Programs What are the valid outcomes for this program? Is it valid for both r1 and r2 to
More informationReasoning About The Implementations Of Concurrency Abstractions On x86-tso. By Scott Owens, University of Cambridge.
Reasoning About The Implementations Of Concurrency Abstractions On x86-tso By Scott Owens, University of Cambridge. Plan Intro Data Races And Triangular Races Examples 2 sequential consistency The result
More informationWeak Memory Models with Matching Axiomatic and Operational Definitions
Weak Memory Models with Matching Axiomatic and Operational Definitions Sizhuo Zhang 1 Muralidaran Vijayaraghavan 1 Dan Lustig 2 Arvind 1 1 {szzhang, vmurali, arvind}@csail.mit.edu 2 dlustig@nvidia.com
More informationData-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.
Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some
More informationCoherence and Consistency
Coherence and Consistency 30 The Meaning of Programs An ISA is a programming language To be useful, programs written in it must have meaning or semantics Any sequence of instructions must have a meaning.
More informationOptiCode: Machine Code Deobfuscation for Malware Analysis
OptiCode: Machine Code Deobfuscation for Malware Analysis NGUYEN Anh Quynh, COSEINC CONFidence, Krakow - Poland 2013, May 28th 1 / 47 Agenda 1 Obfuscation problem in malware analysis
More informationNOW Handout Page 1. Memory Consistency Model. Background for Debate on Memory Consistency Models. Multiprogrammed Uniprocessor Mem.
Memory Consistency Model Background for Debate on Memory Consistency Models CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley for a SAS specifies constraints on the order in which
More informationAssembly Language: Function Calls. Goals of this Lecture. Function Call Problems
Assembly Language: Function Calls 1 Goals of this Lecture Help you learn: Function call problems: Calling and urning Passing parameters Storing local variables Handling registers without interference Returning
More informationSystem calls and assembler
System calls and assembler Michal Sojka sojkam1@fel.cvut.cz ČVUT, FEL License: CC-BY-SA 4.0 System calls (repetition from lectures) A way for normal applications to invoke operating system (OS) kernel's
More informationReverse Engineering II: The Basics
Reverse Engineering II: The Basics Gergely Erdélyi Senior Manager, Anti-malware Research Protecting the irreplaceable f-secure.com Binary Numbers 1 0 1 1 - Nibble B 1 0 1 1 1 1 0 1 - Byte B D 1 0 1 1 1
More informationAssembly Language for Intel-Based Computers, 4 th Edition. Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit
Assembly Language for Intel-Based Computers, 4 th Edition Kip R. Irvine Chapter 2: IA-32 Processor Architecture Included elements of the IA-64 bit Slides prepared by Kip R. Irvine Revision date: 09/25/2002
More informationAssembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam
Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying
More informationCS 31: Intro to Systems Functions and the Stack. Martin Gagne Swarthmore College February 23, 2016
CS 31: Intro to Systems Functions and the Stack Martin Gagne Swarthmore College February 23, 2016 Reminders Late policy: you do not have to send me an email to inform me of a late submission before the
More informationCS Bootcamp x86-64 Autumn 2015
The x86-64 instruction set architecture (ISA) is used by most laptop and desktop processors. We will be embedding assembly into some of our C++ code to explore programming in assembly language. Depending
More informationScott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998
Scott M. Lewandowski CS295-2: Advanced Topics in Debugging September 21, 1998 Assembler Syntax Everything looks like this: label: instruction dest,src instruction label Comments: comment $ This is a comment
More informationAssembly III: Procedures. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Assembly III: Procedures Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu IA-32 (1) Characteristics Region of memory managed with stack discipline
More information