Featherweight Monitors with Bacon Bits

Size: px
Start display at page:

Download "Featherweight Monitors with Bacon Bits"

Transcription

1 Featherweight Monitors with Bacon Bits David F. Bacon IBM T.J. Watson Research Center

2 Contributors Chet Murthy Tamiya Onodera Mauricio Serrano Mark Wegman Rob Strom Kevin Stoodley

3 Introduction It s the same old sad story: Java has threads and synchronized methods But synchronization is dog-slow So synchronization is optional Shoot the foot of your choice: Get bad performance, or Get bug-prone code

4 Its s Worse Than That Libraries must be thread-safe All non-trivial methods are synchronized Library call to set a bit in a bit vector:» ~50 instructions to lock and unlock the object» ~10 instructions method call overhead» ~5 instructions to actually set the bit Locking overhead frequently above 25% even in single-threaded applications!

5 Java Locking Overhead HPJ Alpha Time (seconds) Base NoSync NoCheck 2 0 trans javac jgl jacorb jobe toba javalex jax javacup Benchmark

6 Java Synchronization Features Thread can lock an object repeatedly locks nest nesting count must be kept On exception, thread must release locks call stack implicitly names all locked objects therefore, list of locked objects not needed

7 Locking Scenarios by Frequency Object is unlocked. We already locked the object a few times We already locked the object a lot of times Object is locked and we are the first to queue up Object is locked and other threads are waiting

8 Nested Locking Depth 100% 90% 80% 70% 60% 50% 40% Third Second First 30% 20% 10% 0% trans javac jgl jacorb jobe toba javalex jax javacup parser jolt espresso netrexx null

9 Repeated Locking (by depth) 100% 90% 80% 70% 60% 50% 40% 30% Third-Same Third-Different Second-Same Second-Different First-Same First-Different 20% 10% 0% trans javac jgl jacorb jobe toba javalex jax javacup parser jolt espresso netrexx null

10 Assumptions Atomic compare-and-swap available Thread objects aligned on 8-byte boundaries.

11 Why Fast (Un)locking is Hard Must atomically release lock and check queue Object classptr Thread 2 DATA Owner Queue Thread 1 Thread 3

12 Object 39 classptr DATA 1 3 Solution:Bacon Bit Lock Structure Thread Pointer 31 Thread Bacon bit 0: no one queued to lock 1: threads are queued Short Count 0-2: # of locks - 1 3: count is >= 3 and is stored in FatLock

13 Inline Lock Operation Must check: no one owns the lock inline void Monitor::enter() { if (! CompareAndSwap(lockWord, 0, thread)) outoflineenter(); }

14 How Locking Works thread LOCK thread Thread 1 Object 8 classptr 0 DATA 0 Thread 1 Object 8 classptr 0 DATA 0 UNLOCK

15 Inline Unlock Operation Must check: we own the lock lock count is 1 no one is waiting for the lock inline void Monitor::exit() { if (! CompareAndSwap(lockWord, thread, 0)) outoflineexit(); }

16 Race Conditions and Lock Transfer Problem: simultaneous unlock and enqueue Solution is in the Bacon bit always set if a thread is enqueued on object. never modified without acquiring globallock on locktable. all changes to monitor lockword must be via CompareAndSwap(). unless object and globallock are locked

17 void Monitor::enqueueForLock() { locktable.lock(); while (true) { unsigned temp = lockword; Enqueue Operation } } if (temp!= 0 && CompareAndSwap(lockWord, temp, temp BaconBit)) { mon = locktable.inflatemonitor(this); mon->addlocker(thread); locktable.unlock(); thread->suspend(); return; } if (CompareAndSwap(lockWord, 0, thread)) { locktable.unlock(); return; }

18 locktable globallock quickcells hashtable FatLock 1 monitor count locklist FatLock 2 monitor count locklist Object 3 classptr 0 0 DATA Before Enqueue by Thread 2 Thread 1 Thread 2 thread

19 locktable globallock quickcells hashtable FatLock 2 monitor count locklist FatLock 1 monitor 0 locklist Thread 2 After Enqueue by Thread 2 Object 3 classptr 1 0 DATA Thread 1 thread

20 Deeply Nested Locks When count reaches 3 lock globallock inflate monitor set long count to 3 don t set Bacon bit

21 locktable globallock quickcells hashtable FatLock 1 monitor count locklist FatLock 2 monitor count locklist Object 3 classptr Thread DATA Before Deep Nesting Thread 1 thread

22 locktable globallock quickcells hashtable FatLock 2 monitor count locklist FatLock 1 monitor 3 locklist Thread 2 Object 3 classptr 0 3 DATA After Deep Nesting Thread 1 thread

23 locktable globallock quickcells hashtable FatLock 3 monitor count locklist FatLock 4 monitor count locklist Big Example Object 15 Thread 1 classptr 0 1 DATA 8 FatLock 1 monitor 0 locklist thread Thread 2 Object 8 classptr 1 0 DATA 39 FatLock 2 monitor locklist Object 39 classptr 0 3 DATA Thread 3

24 Intel x86 Implementation (486 +) 7.5 cycles on Pentium forward jump to stub predicted not taken ; ebx is the this pointer LOCK: mov ecx, THREAD ; ecx = thread xor eax, eax ; eax = 0 cmpxchg [ebx], ecx ; C&S(lockWord, eax, ecx) jnz stub ; swap failed; try slowly lockdone: stub:call outoflinelock ; do it the slow way j lockdone ; and return

25 Problem: Deeply Nested Locks Each FatLock access locks the global lock decreases performance of recursive methods increases global lock contention Solution: cache deeply nested FatLock test for cache hit before locking global lock If it s a hit, we ve already locked the object

26 Problem: synchronized() blocks Synchronized methods are lexically nested synchronized() blocks become bytecodes: lexical nesting not checked by verifier throwing exception might not release locks synchronized(foo) { } javac monitorenter foo monitorexit foo

27 Solution for Non-nested Locking If possible, verify nesting at compile-time Else inflate monitor on non-nested lock set Bacon bit in lockword search locktable when thread is killed unlock does not need modification

28 Architectural Adaptations Uniprocessor with synchronous scheduler don t need C&S MOV Uniprocessor with asynchronous scheduler use atomic C&S in cache CMPXCHG Strongly ordered multiprocessor (Pentium) use atomic C&S in RAM LOCK#CMPXCHG Weakly ordered multiprocessor (Pentium Pro) atomic C&S and cache flush LOCK#CMPXCHG CPUID

29 Advantages of Bacon-bit Locks Absolutely minimal cost for common case 4 instructions/7.5 cycles on Pentium compare to 6 cycles for no-op CALL-RET Next most common case also very fast 10 instructions/17 cycles on Pentium Space overhead only 1 word per object Global lock only used in rare cases Co-exists with locks not lexically-nested

30 Advantages of Bacon-bit Locks Scalable (can partition global lock) Almost no spin locking required Lock acquisition is fair Same implementation works and is fast on synchronously scheduled uniprocessor asynchronously scheduled uniprocessor strongly ordered multiprocessor weakly ordered multiprocessor

Thin Locks: Featherweight Synchronization for Java

Thin Locks: Featherweight Synchronization for Java Thin Locks: Featherweight Synchronization for Java David F. Bacon Ravi Konuru Chet Murthy Mauricio Serrano IBM T.J. Watson Research Center Introduction It s the same old sad story: Java has threads and

More information

Thin Locks: Featherweight Synchronization for Java

Thin Locks: Featherweight Synchronization for Java Thin Locks: Featherweight Synchronization for Java D. Bacon 1 R. Konuru 1 C. Murthy 1 M. Serrano 1 Presented by: Calvin Hubble 2 1 IBM T.J. Watson Research Center 2 Department of Computer Science 16th

More information

[537] Locks. Tyler Harter

[537] Locks. Tyler Harter [537] Locks Tyler Harter Review: Threads+Locks CPU 1 CPU 2 running thread 1 running thread 2 RAM PageDir A PageDir B CPU 1 CPU 2 running thread 1 running thread 2 RAM PageDir A PageDir B Virt Mem (PageDir

More information

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler , Compilation Technology Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan TestaRossa JIT compiler

More information

CS5460: Operating Systems

CS5460: Operating Systems CS5460: Operating Systems Lecture 9: Implementing Synchronization (Chapter 6) Multiprocessor Memory Models Uniprocessor memory is simple Every load from a location retrieves the last value stored to that

More information

Computer Science 61 Scribe Notes Tuesday, November 25, 2014 (aka the day before Thanksgiving Break)

Computer Science 61 Scribe Notes Tuesday, November 25, 2014 (aka the day before Thanksgiving Break) Computer Science 61 Scribe Notes Tuesday, November 25, 2014 (aka the day before Thanksgiving Break) Problem Set 6 Released! People have fun with it Make Games Snake Game Hack JavaScript Due Wed., last

More information

THREADS: (abstract CPUs)

THREADS: (abstract CPUs) CS 61 Scribe Notes (November 29, 2012) Mu, Nagler, Strominger TODAY: Threads, Synchronization - Pset 5! AT LONG LAST! Adversarial network pong handling dropped packets, server delays, overloads with connection

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Multithreading Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to large compute clusters Can perform multiple

More information

Last Time. Response time analysis Blocking terms Priority inversion. Other extensions. And solutions

Last Time. Response time analysis Blocking terms Priority inversion. Other extensions. And solutions Last Time Response time analysis Blocking terms Priority inversion And solutions Release jitter Other extensions Today Timing analysis Answers a question we commonly ask: At most long can this code take

More information

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08 CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 21: Generating Pentium Code 10 March 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Simple Code Generation Three-address code makes it

More information

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 9: Multiprocessor OSs & Synchronization. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 9: Multiprocessor OSs & Synchronization CSC 469H1F Fall 2006 Angela Demke Brown The Problem Coordinated management of shared resources Resources may be accessed by multiple threads Need to control

More information

COREMU: a Portable and Scalable Parallel Full-system Emulator

COREMU: a Portable and Scalable Parallel Full-system Emulator COREMU: a Portable and Scalable Parallel Full-system Emulator Haibo Chen Parallel Processing Institute Fudan University http://ppi.fudan.edu.cn/haibo_chen Full-System Emulator Useful tool for multicore

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Software Speculative Multithreading for Java

Software Speculative Multithreading for Java Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

The New Java Technology Memory Model

The New Java Technology Memory Model The New Java Technology Memory Model java.sun.com/javaone/sf Jeremy Manson and William Pugh http://www.cs.umd.edu/~pugh 1 Audience Assume you are familiar with basics of Java technology-based threads (

More information

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011 Synchronization CS61, Lecture 18 Prof. Stephen Chong November 3, 2011 Announcements Assignment 5 Tell us your group by Sunday Nov 6 Due Thursday Nov 17 Talks of interest in next two days Towards Predictable,

More information

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam Assembly Language Lecture 2 - x86 Processor Architecture Ahmed Sallam Introduction to the course Outcomes of Lecture 1 Always check the course website Don t forget the deadline rule!! Motivations for studying

More information

Atomicity via Source-to-Source Translation

Atomicity via Source-to-Source Translation Atomicity via Source-to-Source Translation Benjamin Hindman Dan Grossman University of Washington 22 October 2006 Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){

More information

Concurrency: Mutual Exclusion (Locks)

Concurrency: Mutual Exclusion (Locks) Concurrency: Mutual Exclusion (Locks) Questions Answered in this Lecture: What are locks and how do we implement them? How do we use hardware primitives (atomics) to support efficient locks? How do we

More information

Cache Coherence and Atomic Operations in Hardware

Cache Coherence and Atomic Operations in Hardware Cache Coherence and Atomic Operations in Hardware Previously, we introduced multi-core parallelism. Today we ll look at 2 things: 1. Cache coherence 2. Instruction support for synchronization. And some

More information

Other consistency models

Other consistency models Last time: Symmetric multiprocessing (SMP) Lecture 25: Synchronization primitives Computer Architecture and Systems Programming (252-0061-00) CPU 0 CPU 1 CPU 2 CPU 3 Timothy Roscoe Herbstsemester 2012

More information

Synchronising Threads

Synchronising Threads Synchronising Threads David Chisnall March 1, 2011 First Rule for Maintainable Concurrent Code No data may be both mutable and aliased Harder Problems Data is shared and mutable Access to it must be protected

More information

Memory Consistency Models

Memory Consistency Models Memory Consistency Models Contents of Lecture 3 The need for memory consistency models The uniprocessor model Sequential consistency Relaxed memory models Weak ordering Release consistency Jonas Skeppstedt

More information

Java Locks: Analysis and Acceleration. by Kiyokuni Kawachiya

Java Locks: Analysis and Acceleration. by Kiyokuni Kawachiya Java Locks: Analysis and Acceleration by Kiyokuni Kawachiya A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Media and

More information

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms.

Module 7: Synchronization Lecture 13: Introduction to Atomic Primitives. The Lecture Contains: Synchronization. Waiting Algorithms. The Lecture Contains: Synchronization Waiting Algorithms Implementation Hardwired Locks Software Locks Hardware Support Atomic Exchange Test & Set Fetch & op Compare & Swap Traffic of Test & Set Backoff

More information

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 4 - Concurrency and Synchronization Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Mutual exclusion Hardware solutions Semaphores IPC: Message passing

More information

Process Synchronisation (contd.) Operating Systems. Autumn CS4023

Process Synchronisation (contd.) Operating Systems. Autumn CS4023 Operating Systems Autumn 2017-2018 Outline Process Synchronisation (contd.) 1 Process Synchronisation (contd.) Synchronization Hardware 6.4 (SGG) Many systems provide hardware support for critical section

More information

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading

Overview. Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading How C++ Works 1 Overview Constructors and destructors Virtual functions Single inheritance Multiple inheritance RTTI Templates Exceptions Operator Overloading Motivation There are lot of myths about C++

More information

Programming in Parallel COMP755

Programming in Parallel COMP755 Programming in Parallel COMP755 All games have morals; and the game of Snakes and Ladders captures, as no other activity can hope to do, the eternal truth that for every ladder you hope to climb, a snake

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5

EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 EN164: Design of Computing Systems Lecture 24: Processor / ILP 5 Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Assembly Language. Lecture 2 x86 Processor Architecture

Assembly Language. Lecture 2 x86 Processor Architecture Assembly Language Lecture 2 x86 Processor Architecture Ahmed Sallam Slides based on original lecture slides by Dr. Mahmoud Elgayyar Introduction to the course Outcomes of Lecture 1 Always check the course

More information

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions

Overview. CMSC 330: Organization of Programming Languages. Concurrency. Multiprocessors. Processes vs. Threads. Computation Abstractions CMSC 330: Organization of Programming Languages Multithreaded Programming Patterns in Java CMSC 330 2 Multiprocessors Description Multiple processing units (multiprocessor) From single microprocessor to

More information

Multiprocessor Synchronization

Multiprocessor Synchronization Multiprocessor Synchronization Material in this lecture in Henessey and Patterson, Chapter 8 pgs. 694-708 Some material from David Patterson s slides for CS 252 at Berkeley 1 Multiprogramming and Multiprocessing

More information

Distributed Operating Systems

Distributed Operating Systems Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 203 Topics Synchronization Locks Performance Distributed Operating Systems 203 Marcus Völp 2 Overview Introduction Hardware

More information

Concurrency, Thread. Dongkun Shin, SKKU

Concurrency, Thread. Dongkun Shin, SKKU Concurrency, Thread 1 Thread Classic view a single point of execution within a program a single PC where instructions are being fetched from and executed), Multi-threaded program Has more than one point

More information

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return? Synchronization Part 1: Synchronization - Locks Dekker s Algorithm and the Bakery Algorithm provide software-only synchronization. Thanks to advancements in hardware, synchronization approaches have been

More information

Practical Malware Analysis

Practical Malware Analysis Practical Malware Analysis Ch 4: A Crash Course in x86 Disassembly Revised 1-16-7 Basic Techniques Basic static analysis Looks at malware from the outside Basic dynamic analysis Only shows you how the

More information

Advance Operating Systems (CS202) Locks Discussion

Advance Operating Systems (CS202) Locks Discussion Advance Operating Systems (CS202) Locks Discussion Threads Locks Spin Locks Array-based Locks MCS Locks Sequential Locks Road Map Threads Global variables and static objects are shared Stored in the static

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

IV. Process Synchronisation

IV. Process Synchronisation IV. Process Synchronisation Operating Systems Stefan Klinger Database & Information Systems Group University of Konstanz Summer Term 2009 Background Multiprogramming Multiple processes are executed asynchronously.

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

System Software Assignment 1 Runtime Support for Procedures

System Software Assignment 1 Runtime Support for Procedures System Software Assignment 1 Runtime Support for Procedures Exercise 1: Nested procedures Some programming languages like Oberon and Pascal support nested procedures. 1. Find a run-time structure for such

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 Process creation in UNIX All processes have a unique process id getpid(),

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions

CPSC 313, 04w Term 2 Midterm Exam 2 Solutions 1. (10 marks) Short answers. CPSC 313, 04w Term 2 Midterm Exam 2 Solutions Date: March 11, 2005; Instructor: Mike Feeley 1a. Give an example of one important CISC feature that is normally not part of a

More information

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( )

Lecture 24: Multiprocessing Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 24: Multiprocessing Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Most of the rest of this

More information

Mutex Implementation

Mutex Implementation COS 318: Operating Systems Mutex Implementation Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Revisit Mutual Exclusion (Mutex) u Critical

More information

THREADS & CONCURRENCY

THREADS & CONCURRENCY 27/04/2018 Sorry for the delay in getting slides for today 2 Another reason for the delay: Yesterday: 63 posts on the course Piazza yesterday. A7: If you received 100 for correctness (perhaps minus a late

More information

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator CS 412/413 Introduction to Compilers and Translators Andrew Myers Cornell University Administration Design reports due Friday Current demo schedule on web page send mail with preferred times if you haven

More information

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems

Multiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing

More information

History of the Intel 80x86

History of the Intel 80x86 Intel s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975-8080 introduced 8-bit microprocessor 1978-8086 introduced 16 bit microprocessor

More information

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency Constructive Computer Architecture Symmetric Multiprocessors: Synchronization and Sequential Consistency Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November

More information

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018

Threads and Synchronization. Kevin Webb Swarthmore College February 15, 2018 Threads and Synchronization Kevin Webb Swarthmore College February 15, 2018 Today s Goals Extend processes to allow for multiple execution contexts (threads) Benefits and challenges of concurrency Race

More information

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that

More information

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines.

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines. For our next chapter, we will discuss the emulation process which is an integral part of virtual machines. 1 2 For today s lecture, we ll start by defining what we mean by emulation. Specifically, in this

More information

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University CS 333 Introduction to Operating Systems Class 3 Threads & Concurrency Jonathan Walpole Computer Science Portland State University 1 The Process Concept 2 The Process Concept Process a program in execution

More information

Preemptive Scheduling and Mutual Exclusion with Hardware Support

Preemptive Scheduling and Mutual Exclusion with Hardware Support Preemptive Scheduling and Mutual Exclusion with Hardware Support Thomas Plagemann With slides from Otto J. Anshus & Tore Larsen (University of Tromsø) and Kai Li (Princeton University) Preemptive Scheduling

More information

RAID 0 (non-redundant) RAID Types 4/25/2011

RAID 0 (non-redundant) RAID Types 4/25/2011 Exam 3 Review COMP375 Topics I/O controllers chapter 7 Disk performance section 6.3-6.4 RAID section 6.2 Pipelining section 12.4 Superscalar chapter 14 RISC chapter 13 Parallel Processors chapter 18 Security

More information

CS 111. Operating Systems Peter Reiher

CS 111. Operating Systems Peter Reiher Operating System Principles: Mutual Exclusion and Asynchronous Completion Operating Systems Peter Reiher Page 1 Outline Mutual Exclusion Asynchronous Completions Page 2 Mutual Exclusion Critical sections

More information

CS 537 Lecture 11 Locks

CS 537 Lecture 11 Locks CS 537 Lecture 11 Locks Michael Swift 10/17/17 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 Concurrency: Locks Questions answered in this lecture: Review: Why threads

More information

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data. Synchronization Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data. Coherency protocols do not: make sure that only one thread accesses shared data

More information

Processes and Tasks What comprises the state of a running program (a process or task)?

Processes and Tasks What comprises the state of a running program (a process or task)? Processes and Tasks What comprises the state of a running program (a process or task)? Microprocessor Address bus Control DRAM OS code and data special caches code/data cache EAXEBP EIP DS EBXESP EFlags

More information

Dealing with Issues for Interprocess Communication

Dealing with Issues for Interprocess Communication Dealing with Issues for Interprocess Communication Ref Section 2.3 Tanenbaum 7.1 Overview Processes frequently need to communicate with other processes. In a shell pipe the o/p of one process is passed

More information

Virtual Machine Design

Virtual Machine Design Virtual Machine Design Lecture 4: Multithreading and Synchronization Antero Taivalsaari September 2003 Session #2026: J2MEPlatform, Connected Limited Device Configuration (CLDC) Lecture Goals Give an overview

More information

Lecture 10: Avoiding Locks

Lecture 10: Avoiding Locks Lecture 10: Avoiding Locks CSC 469H1F Fall 2006 Angela Demke Brown (with thanks to Paul McKenney) Locking: A necessary evil? Locks are an easy to understand solution to critical section problem Protect

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

Process Coordination and Shared Data

Process Coordination and Shared Data Process Coordination and Shared Data Lecture 19 In These Notes... Sharing data safely When multiple threads/processes interact in a system, new species of bugs arise 1. Compiler tries to save time by not

More information

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1

Lecture #16: Introduction to Runtime Organization. Last modified: Fri Mar 19 00:17: CS164: Lecture #16 1 Lecture #16: Introduction to Runtime Organization Last modified: Fri Mar 19 00:17:19 2010 CS164: Lecture #16 1 Status Lexical analysis Produces tokens Detects & eliminates illegal tokens Parsing Produces

More information

ECE 391 Exam 1 Review Session - Spring Brought to you by HKN

ECE 391 Exam 1 Review Session - Spring Brought to you by HKN ECE 391 Exam 1 Review Session - Spring 2018 Brought to you by HKN DISCLAIMER There is A LOT (like a LOT) of information that can be tested for on the exam, and by the nature of the course you never really

More information

Stanford University Computer Science Department CS 240 Quiz 1 Spring May 6, total

Stanford University Computer Science Department CS 240 Quiz 1 Spring May 6, total Stanford University Computer Science Department CS 240 Quiz 1 Spring 2004 May 6, 2004 This is an open-book exam. You have 50 minutes to answer eight out of nine questions. Write all of your answers directly

More information

Locks. Dongkun Shin, SKKU

Locks. Dongkun Shin, SKKU Locks 1 Locks: The Basic Idea To implement a critical section A lock variable must be declared A lock variable holds the state of the lock Available (unlocked, free) Acquired (locked, held) Exactly one

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

Java On Steroids: Sun s High-Performance Java Implementation. History

Java On Steroids: Sun s High-Performance Java Implementation. History Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact

More information

Computer Science 146. Computer Architecture

Computer Science 146. Computer Architecture Computer Architecture Spring 24 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 2: More Multiprocessors Computation Taxonomy SISD SIMD MISD MIMD ILP Vectors, MM-ISAs Shared Memory

More information

Section 5: Thread Synchronization

Section 5: Thread Synchronization CS162 February 16, 2018 Contents 1 Warmup 2 1.1 Thread safety........................................... 2 2 Vocabulary 2 3 Problems 3 3.1 The Central Galactic Floopy Corporation...........................

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 6

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 6 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 6 LAST TIME: SYSTEM V AMD64 ABI How to implement basic C abstractions in x86-64? C subroutines with arguments, and local/global variables Began

More information

CHAPTER 6: PROCESS SYNCHRONIZATION

CHAPTER 6: PROCESS SYNCHRONIZATION CHAPTER 6: PROCESS SYNCHRONIZATION The slides do not contain all the information and cannot be treated as a study material for Operating System. Please refer the text book for exams. TOPICS Background

More information

MethodHandlesArrayElementGetterBench.testCreate Analysis. Copyright 2016, Oracle and/or its affiliates. All rights reserved.

MethodHandlesArrayElementGetterBench.testCreate Analysis. Copyright 2016, Oracle and/or its affiliates. All rights reserved. MethodHandlesArrayElementGetterBench.testCreate Analysis Overview Benchmark : nom.indy.methodhandlesarrayelementgetterbench.testcreate Results with JDK8 (ops/us) JDK8 Intel 234 T8 T8 with -XX:FreqInlineSize=325

More information

Dynamic Dispatch and Duck Typing. L25: Modern Compiler Design

Dynamic Dispatch and Duck Typing. L25: Modern Compiler Design Dynamic Dispatch and Duck Typing L25: Modern Compiler Design Late Binding Static dispatch (e.g. C function calls) are jumps to specific addresses Object-oriented languages decouple method name from method

More information

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST

Chapter 8. Multiprocessors. In-Cheol Park Dept. of EE, KAIST Chapter 8. Multiprocessors In-Cheol Park Dept. of EE, KAIST Can the rapid rate of uniprocessor performance growth be sustained indefinitely? If the pace does slow down, multiprocessor architectures will

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

Lecture #7: Shared objects and locks

Lecture #7: Shared objects and locks Lecture #7: Shared objects and locks Review -- 1 min Independent v. cooperating threads -- can't reason about all possible interleavings Too much milk: Solution #3 to too much milk works, but it is really

More information

Kernel Synchronization I. Changwoo Min

Kernel Synchronization I. Changwoo Min 1 Kernel Synchronization I Changwoo Min 2 Summary of last lectures Tools: building, exploring, and debugging Linux kernel Core kernel infrastructure syscall, module, kernel data structures Process management

More information

Motivation & examples Threads, shared memory, & synchronization

Motivation & examples Threads, shared memory, & synchronization 1 Motivation & examples Threads, shared memory, & synchronization How do locks work? Data races (a lower level property) How do data race detectors work? Atomicity (a higher level property) Concurrency

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections

Agenda. Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections Thread Safety Agenda Highlight issues with multi threaded programming Introduce thread synchronization primitives Introduce thread safe collections 2 2 Need for Synchronization Creating threads is easy

More information

Branch Prediction Memory Alignment Cache Compiler Optimisations Loop Optimisations PVM. Performance

Branch Prediction Memory Alignment Cache Compiler Optimisations Loop Optimisations PVM. Performance PVM Performance Branch Prediction Memory Alignment Cache Temporal Locality Spatial Locality Compiler Optimisations Dead Code Elimination Inlining Zero Cost Abstractions Compile Time Execution Tail Call

More information

Pipelining, Branch Prediction, Trends

Pipelining, Branch Prediction, Trends Pipelining, Branch Prediction, Trends 10.1-10.4 Topics 10.1 Quantitative Analyses of Program Execution 10.2 From CISC to RISC 10.3 Pipelining the Datapath Branch Prediction, Delay Slots 10.4 Overlapping

More information

Concurrency: Locks. Announcements

Concurrency: Locks. Announcements CS 537 Introduction to Operating Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Concurrency: Locks Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau Questions answered in this lecture:

More information

Review for Midterm. Starring Ari and Tyler

Review for Midterm. Starring Ari and Tyler Review for Midterm Starring Ari and Tyler Basic OS structure OS has two chief goals: arbitrating access to resources, and exposing functionality. Often go together: we arbitrate hardware by wrapping in

More information

Distributed Operating Systems

Distributed Operating Systems Distributed Operating Systems Synchronization in Parallel Systems Marcus Völp 2009 1 Topics Synchronization Locking Analysis / Comparison Distributed Operating Systems 2009 Marcus Völp 2 Overview Introduction

More information

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,

More information

CS140 Operating Systems and Systems Programming

CS140 Operating Systems and Systems Programming CS140 Operating Systems and Systems Programming Midterm Exam July 25th, 2006 Total time = 60 minutes, Total Points = 100 Name: (please print) In recognition of and in the spirit of the Stanford University

More information

Operating Systems and Protection CS 217

Operating Systems and Protection CS 217 Operating Systems and Protection CS 7 Goals of Today s Lecture How multiple programs can run at once o es o Context switching o control block o Virtual Boundary between parts of the system o User programs

More information

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 05: Coherency of Memory Hierarchy Prof. Sherief Reda School of Engineering Brown University Material from: Parallel Computer Organization and Design by Debois,

More information