Taking a Virtual Machine Towards Many-Cores. Rickard Green - Patrik Nyblom -

Similar documents
Chapter 5. Multiprocessors and Thread-Level Parallelism

RCU in the Linux Kernel: One Decade Later

Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory Barriers and Memory Fences

Other consistency models

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

Synchronization. Coherency protocols guarantee that a reading processor (thread) sees the most current update to shared data.

Concurrent programming: From theory to practice. Concurrent Algorithms 2015 Vasileios Trigonakis

Parallel Programming in Distributed Systems Or Distributed Systems in Parallel Programming

Multiprocessor Cache Coherence. Chapter 5. Memory System is Coherent If... From ILP to TLP. Enforcing Cache Coherence. Multiprocessor Types

RWLocks in Erlang/OTP

Computer Architecture

Foundations of the C++ Concurrency Memory Model

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

CPSC/ECE 3220 Fall 2017 Exam Give the definition (note: not the roles) for an operating system as stated in the textbook. (2 pts.

Module 15: "Memory Consistency Models" Lecture 34: "Sequential Consistency and Relaxed Models" Memory Consistency Models. Memory consistency

Netconf RCU and Breakage. Paul E. McKenney IBM Distinguished Engineer & CTO Linux Linux Technology Center IBM Corporation

Multiprocessor Synchronization

Advance Operating Systems (CS202) Locks Discussion

G52CON: Concepts of Concurrency

Chapter 5 Asynchronous Concurrent Execution

Portland State University ECE 588/688. Memory Consistency Models

Questions answered in this lecture: CS 537 Lecture 19 Threads and Cooperation. What s in a process? Organizing a Process

Advanced Topic: Efficient Synchronization

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

Introduction to Operating Systems Prof. Chester Rebeiro Department of Computer Science and Engineering Indian Institute of Technology, Madras

Relaxed Memory-Consistency Models

Intel Transactional Synchronization Extensions (Intel TSX) Linux update. Andi Kleen Intel OTC. Linux Plumbers Sep 2013

Parallel Algorithm Engineering

Oracle Database Mobile Server, Version 12.2

Jukka Julku Multicore programming: Low-level libraries. Outline. Processes and threads TBB MPI UPC. Examples

Sequential Consistency & TSO. Subtitle

Beyond Sequential Consistency: Relaxed Memory Models

Threads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011

740: Computer Architecture Memory Consistency. Prof. Onur Mutlu Carnegie Mellon University

ERLANG EVOLVES FOR MULTI-CORE AND CLOUD ENVIRONMENTS

10/10/ Gribble, Lazowska, Levy, Zahorjan 2. 10/10/ Gribble, Lazowska, Levy, Zahorjan 4

What s in a process?

Using Relaxed Consistency Models

CSE 120 Principles of Operating Systems

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

What s in a traditional process? Concurrency/Parallelism. What s needed? CSE 451: Operating Systems Autumn 2012

A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm

Multithreading Deep Dive. my twitter. a deep investigation on how.net multithreading primitives map to hardware and Windows Kernel.

Lecture 10: Multi-Object Synchronization

The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith

Overview: Memory Consistency

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Fractal: A Software Toolchain for Mapping Applications to Diverse, Heterogeneous Architecures

2 Threads vs. Processes

Threads. Computer Systems. 5/12/2009 cse threads Perkins, DW Johnson and University of Washington 1

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

Relaxed Memory Consistency

Operating Systems. Lecture 4 - Concurrency and Synchronization. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Processor Architecture

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Transaction Management: Introduction (Chap. 16)

CS5460/6460: Operating Systems. Lecture 14: Scalability techniques. Anton Burtsev February, 2014

IT 540 Operating Systems ECE519 Advanced Operating Systems

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

MultiJav: A Distributed Shared Memory System Based on Multiple Java Virtual Machines. MultiJav: Introduction

Lecture: Transactional Memory. Topics: TM implementations

Oracle Risk Management Cloud

Shared Memory Programming with OpenMP. Lecture 8: Memory model, flush and atomics

COREMU: a Portable and Scalable Parallel Full-system Emulator

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Kotlin/Native concurrency model. nikolay

Threads. CS3026 Operating Systems Lecture 06

Advanced Operating Systems (CS 202)

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

Håkan Sundell University College of Borås Parallel Scalable Solutions AB

Midterm Exam Solutions Amy Murphy 28 February 2001

Handout 3 Multiprocessor and thread level parallelism

Multiprocessors 2007/2008

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

CS370 Operating Systems

250P: Computer Systems Architecture. Lecture 14: Synchronization. Anton Burtsev March, 2019

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008

Utilizing the IOMMU scalably

Distributed Operating Systems Memory Consistency

The Java Memory Model

1) If a location is initialized to 0, what will the first invocation of TestAndSet on that location return?

Effective Data-Race Detection for the Kernel

A Scalable Ordering Primitive for Multicore Machines. Sanidhya Kashyap Changwoo Min Kangnyeon Kim Taesoo Kim

Parallel and Distributed Computing

Using Weakly Ordered C++ Atomics Correctly. Hans-J. Boehm

Module 6: Process Synchronization. Operating System Concepts with Java 8 th Edition

Threading and Synchronization. Fahd Albinali

CS252 Spring 2017 Graduate Computer Architecture. Lecture 14: Multithreading Part 2 Synchronization 1

10 Parallel Organizations: Multiprocessor / Multicore / Multicomputer Systems

Towards Transparent and Efficient Software Distributed Shared Memory

Chapter 6 Concurrency: Deadlock and Starvation

Distributed Shared Memory and Memory Consistency Models

CSC501 Operating Systems Principles. OS Structure

Kernel Synchronization I. Changwoo Min

CMSC 330: Organization of Programming Languages

On the Scalability of the Erlang Term Storage

Transcription:

Taking a Virtual Machine Towards Many-Cores Rickard Green - rickard@erlang.org Patrik Nyblom - pan@erlang.org

What we all know by now Number of cores per processor AMD Opteron Intel Xeon Sparc Niagara 20 15 10 5 2005 2006 2007 2008 2009 2010 2011 0

What we all know by now Number of cores per processor AMD Opteron Intel Xeon Sparc Niagara 20 15 10 5 2005 2006 2007 2008 2009 2010 2011 Number of cores per multiprocessor system AMD Opteron Intel Xeon Sparc Niagara 80 0 60 40 20 2005 2006 2007 2008 2009 2010 2011 0

Nemesis of Scalability Resource Contention

Nemesis of Scalability Resource Contention High level algorithms Server...

Nemesis of Scalability Resource Contention High level algorithms Server... Software synchronization mechanisms ocks ock type ock implementation ock free data structures...

Nemesis of Scalability High level algorithms Server... Software synchronization mechanisms ocks ock type ock implementation ock free data structures... Hardware Processor communication Cache line Memory barrier... Resource Contention

Nemesis of Scalability High level algorithms Server... Software synchronization mechanisms ocks ock type ock implementation ock free data structures... Hardware Processor communication Cache line Memory barrier... Resource Contention Awareness

Shared Memory MultiProcessor System

Shared Memory MultiProcessor System Cache Cache Cache Cache

Cache

Cache Cache line

Shared Memory MultiProcessor System

Shared Memory MultiProcessor System

Shared Memory MultiProcessor System

Shared Memory MultiProcessor System

Shared Memory MultiProcessor System

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero Read-lock table ookup element Read-unlock table Operate on element Deletion of element Remove from table Add number of schedulers to element reference count Decrement reference count once for lost table reference Schedule confirm deletion jobs on all schedulers Each scheduler decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero Read-lock table ookup element Read-unlock table Operate on element Deletion of element Remove from table Add number of schedulers to element reference count Decrement reference count once for lost table reference Schedule confirm deletion jobs on all schedulers Each scheduler decrease reference count and release if it reached zero

Table Element ookup Read-lock table ookup element Increment element reference count Read-unlock table Operate on element Decrease element reference count Release if reference count reached zero Deletion of element Remove from table Decrease reference count and release if it reached zero ookup element Operate on element Deletion of element Remove from table Add number of schedulers to element reference count Decrement reference count once for lost table reference Schedule confirm deletion jobs on all schedulers Each scheduler decrease reference count and release if it reached zero

Order of Memory Operations

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) r1 = oad(c2) r2 = oad(c1) r1 r2

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) r1 = oad(c2) r2 = oad(c1) r1 0 0 r2

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) r1 = oad(c2) r2 = oad(c1) r1 r2 0 0 0 1

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) r1 = oad(c2) r2 = oad(c1) r1 r2 0 0 0 1 1 0

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) r1 = oad(c2) r2 = oad(c1) r1 r2 0 0 0 1 1 0 1 1

Order of Memory Operations c1 = c2 = 1 Thread 1 Thread 2 Store(c1, 0) Store(c2, 0) MemoryBarrier r1 = oad(c2) MemoryBarrier r2 = oad(c1) r1 r2 0 0 0 1 1 0

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist S S S S

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist S S S S S S S S

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist S S S S S S S S

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist S S S S S S S S

Hardware Memory Barriers Hardware architectures Very different ordering guarantees x86 - quite strict... alpha - very relaxed Hardware memory barriers Tools for enforcing ordering Expensive Architecture dependent Sometimes lightweight versions exist S S S S S S S S

Old Atomic API ethr_atomic_init() ethr_atomic_set() ethr_atomic_read() ethr_atomic_inc() ethr_atomic_dec() ethr_atomic_add() ethr_atomic_inc_read_mb() ethr_atomic_dec_read_mb() ethr_atomic_add_read_mb() ethr_atomic_read_bor_mb() ethr_atomic_read_band_mb() ethr_atomic_xchg_mb() ethr_atomic_cmpxchg_mb()

Old Atomic API Word size ethr_atomic_init() ethr_atomic_set() ethr_atomic_read() ethr_atomic_inc() ethr_atomic_dec() ethr_atomic_add() ethr_atomic_inc_read_mb() ethr_atomic_dec_read_mb() ethr_atomic_add_read_mb() ethr_atomic_read_bor_mb() ethr_atomic_read_band_mb() ethr_atomic_xchg_mb() ethr_atomic_cmpxchg_mb()

Atomic API R15

Atomic API R15 32-bit Word size Double word size ethr_atomic32_init() ethr_atomic32_set() ethr_atomic32_read() ethr_atomic32_inc_read() ethr_atomic32_dec_read() ethr_atomic32_inc() ethr_atomic32_dec() ethr_atomic32_add_read() ethr_atomic32_add() ethr_atomic32_read_bor() ethr_atomic32_read_band() ethr_atomic32_xchg() ethr_atomic32_cmpxchg() ethr_atomic32_init_mb() ethr_atomic32_set_mb() ethr_atomic32_read_mb() ethr_atomic32_inc_read_mb() ethr_atomic32_dec_read_mb() ethr_atomic32_inc_mb() ethr_atomic32_dec_mb() ethr_atomic32_add_read_mb() ethr_atomic32_add_mb() ethr_atomic32_read_bor_mb() ethr_atomic32_read_band_mb() ethr_atomic32_xchg_mb() ethr_atomic32_cmpxchg_mb() ethr_atomic32_init_acqb() ethr_atomic32_set_acqb() ethr_atomic32_read_acqb() ethr_atomic32_inc_read_acqb() ethr_atomic32_dec_read_acqb() ethr_atomic32_inc_acqb() ethr_atomic32_dec_acqb() ethr_atomic32_add_read_acqb() ethr_atomic32_add_acqb() ethr_atomic32_read_bor_acqb() ethr_atomic32_read_band_acqb() ethr_atomic32_xchg_acqb() ethr_atomic32_cmpxchg_acqb() ethr_atomic32_init_relb() ethr_atomic32_set_relb() ethr_atomic32_read_relb() ethr_atomic32_inc_read_relb() ethr_atomic32_dec_read_relb() ethr_atomic32_inc_relb() ethr_atomic32_dec_relb() ethr_atomic32_add_read_relb() ethr_atomic32_add_relb() ethr_atomic32_read_bor_relb() ethr_atomic32_read_band_relb() ethr_atomic32_xchg_relb() ethr_atomic32_cmpxchg_relb() ethr_atomic32_init_ddrb() ethr_atomic32_set_ddrb() ethr_atomic32_read_ddrb() ethr_atomic32_inc_read_ddrb() ethr_atomic32_dec_read_ddrb() ethr_atomic32_inc_ddrb() ethr_atomic32_dec_ddrb() ethr_atomic32_add_read_ddrb() ethr_atomic32_add_ddrb() ethr_atomic32_read_bor_ddrb() ethr_atomic32_read_band_ddrb() ethr_atomic32_xchg_ddrb() ethr_atomic32_cmpxchg_ddrb() ethr_atomic32_init_rb() ethr_atomic32_set_rb() ethr_atomic32_read_rb() ethr_atomic32_inc_read_rb() ethr_atomic32_dec_read_rb() ethr_atomic32_inc_rb() ethr_atomic32_dec_rb() ethr_atomic32_add_read_rb() ethr_atomic32_add_rb() ethr_atomic32_read_bor_rb() ethr_atomic32_read_band_rb() ethr_atomic32_xchg_rb() ethr_atomic32_cmpxchg_rb() ethr_atomic32_init_wb() ethr_atomic32_set_wb() ethr_atomic32_read_wb() ethr_atomic32_inc_read_wb() ethr_atomic32_dec_read_wb() ethr_atomic32_inc_wb() ethr_atomic32_dec_wb() ethr_atomic32_add_read_wb() ethr_atomic32_add_wb() ethr_atomic32_read_bor_wb() ethr_atomic32_read_band_wb() ethr_atomic32_xchg_wb() ethr_atomic32_cmpxchg_wb() ethr_atomic_init() ethr_atomic_set() ethr_atomic_read() ethr_atomic_inc_read() ethr_atomic_dec_read() ethr_atomic_inc() ethr_atomic_dec() ethr_atomic_add_read() ethr_atomic_add() ethr_atomic_read_bor() ethr_atomic_read_band() ethr_atomic_xchg() ethr_atomic_cmpxchg() ethr_atomic_init_mb() ethr_atomic_set_mb() ethr_atomic_read_mb() ethr_atomic_inc_read_mb() ethr_atomic_dec_read_mb() ethr_atomic_inc_mb() ethr_atomic_dec_mb() ethr_atomic_add_read_mb() ethr_atomic_add_mb() ethr_atomic_read_bor_mb() ethr_atomic_read_band_mb() ethr_atomic_xchg_mb() ethr_atomic_cmpxchg_mb() ethr_atomic_init_acqb() ethr_atomic_set_acqb() ethr_atomic_read_acqb() ethr_atomic_inc_read_acqb() ethr_atomic_dec_read_acqb() ethr_atomic_inc_acqb() ethr_atomic_dec_acqb() ethr_atomic_add_read_acqb() ethr_atomic_add_acqb() ethr_atomic_read_bor_acqb() ethr_atomic_read_band_acqb() ethr_atomic_xchg_acqb() ethr_atomic_cmpxchg_acqb() ethr_atomic_init_relb() ethr_atomic_set_relb() ethr_atomic_read_relb() ethr_atomic_inc_read_relb() ethr_atomic_dec_read_relb() ethr_atomic_inc_relb() ethr_atomic_dec_relb() ethr_atomic_add_read_relb() ethr_atomic_add_relb() ethr_atomic_read_bor_relb() ethr_atomic_read_band_relb() ethr_atomic_xchg_relb() ethr_atomic_cmpxchg_relb() ethr_atomic_init_ddrb() ethr_atomic_set_ddrb() ethr_atomic_read_ddrb() ethr_atomic_inc_read_ddrb() ethr_atomic_dec_read_ddrb() ethr_atomic_inc_ddrb() ethr_atomic_dec_ddrb() ethr_atomic_add_read_ddrb() ethr_atomic_add_ddrb() ethr_atomic_read_bor_ddrb() ethr_atomic_read_band_ddrb() ethr_atomic_xchg_ddrb() ethr_atomic_cmpxchg_ddrb() ethr_atomic_init_rb() ethr_atomic_set_rb() ethr_atomic_read_rb() ethr_atomic_inc_read_rb() ethr_atomic_dec_read_rb() ethr_atomic_inc_rb() ethr_atomic_dec_rb() ethr_atomic_add_read_rb() ethr_atomic_add_rb() ethr_atomic_read_bor_rb() ethr_atomic_read_band_rb() ethr_atomic_xchg_rb() ethr_atomic_cmpxchg_rb() ethr_atomic_init_wb() ethr_atomic_set_wb() ethr_atomic_read_wb() ethr_atomic_inc_read_wb() ethr_atomic_dec_read_wb() ethr_atomic_inc_wb() ethr_atomic_dec_wb() ethr_atomic_add_read_wb() ethr_atomic_add_wb() ethr_atomic_read_bor_wb() ethr_atomic_read_band_wb() ethr_atomic_xchg_wb() ethr_atomic_cmpxchg_wb() ethr_dw_atomic_init() ethr_dw_atomic_set() ethr_dw_atomic_read() ethr_dw_atomic_cmpxchg() ethr_dw_atomic_init_mb() ethr_dw_atomic_set_mb() ethr_dw_atomic_read_mb() ethr_dw_atomic_cmpxchg_mb() ethr_dw_atomic_init_acqb() ethr_dw_atomic_set_acqb() ethr_dw_atomic_read_acqb() ethr_dw_atomic_cmpxchg_acqb() ethr_dw_atomic_init_relb() ethr_dw_atomic_set_relb() ethr_dw_atomic_read_relb() ethr_dw_atomic_cmpxchg_relb() ethr_dw_atomic_init_ddrb() ethr_dw_atomic_set_ddrb() ethr_dw_atomic_read_ddrb() ethr_dw_atomic_cmpxchg_ddrb() ethr_dw_atomic_init_rb() ethr_dw_atomic_set_rb() ethr_dw_atomic_read_rb() ethr_dw_atomic_cmpxchg_rb ethr_dw_atomic_init_wb() ethr_dw_atomic_set_wb() ethr_dw_atomic_read_wb() ethr_dw_atomic_cmpxchg_wb()

ocked Free Queue Queue First ast

ock Free Queue First ast

ock Free Queue First ast

ock Free Queue First ast

ock Free Queue First ast

Thread Progress We want a mechanism that can be used in order to determine when all interesting threads have at least once returned to the event loop code have at least once executed a full memory barrier will see all modifications I have made Interesting threads are managed : We know about all of them They do not hang They will always return to a certain point Typically - the schedulers and a few other threads The mechanism is called Thread Progress in the VM code Unmanaged threads need to be taken care of by other means First ast

Thread Progress We want a mechanism that can be used in order to determine when all interesting threads have at least once returned to the event loop code have at least once executed a full memory barrier will see all modifications I have made Interesting threads are managed : We know about all of them They do not hang They will always return to a certain point Typically - the schedulers and a few other threads The mechanism is called Thread Progress in the VM code Unmanaged threads need to be taken care of by other means First ast

Thread Progress Master thread coordinates Thread progress communicated via thread specific cache lines Progress Progress 1 Progress 2 Progress 3 Progress 4 Progress N

Thread Progress

Thread Progress

Thread Progress

Memory Allocators R11B Scheduler N Scheduler 2 Scheduler 1 Thread N Thread 1 sl_alloc eheap_alloc binary_alloc ets_alloc mseg_alloc

Memory Allocators R12B-1 Scheduler N Scheduler 2 Scheduler 1 Thread N Thread 1 sl_alloc sl_alloc sl_alloc eheap_alloc eheap_alloc eheap_alloc binary_alloc binary_alloc binary_alloc ets_alloc ets_alloc ets_alloc mseg_alloc

Memory Allocators R15B Scheduler N Scheduler 2 Scheduler 1 Thread N Thread 1 sl_alloc sl_alloc sl_alloc eheap_alloc eheap_alloc eheap_alloc binary_alloc binary_alloc binary_alloc ets_alloc ets_alloc ets_alloc mseg_alloc mseg_alloc mseg_alloc

Memory Allocators R15B Scheduler N Scheduler 2 Scheduler 1 Thread N Thread 1 sl_alloc sl_alloc sl_alloc sl_alloc eheap_alloc eheap_alloc eheap_alloc eheap_alloc binary_alloc binary_alloc binary_alloc binary_alloc ets_alloc ets_alloc ets_alloc ets_alloc mseg_alloc mseg_alloc mseg_alloc mseg_alloc

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading r11-r15 Schedulers Export and module tables Code 00100101101 01000101010 10101011001 00100100100 11010101010 01010101001 01010101010 10010101001 01010001010 10101001010 10010100100 10101001001 01000101010 10000001110

Code-loading R16 Next ast Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 oad Code Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 oad Code Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 oad Code 0010 0101 1010 1000 1010 1010 1010 1100 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 oad Code 0010 0101 1010 1000 1010 1010 1010 1100 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 001001 011010 100010 101010 101011 001001 001001 001101

Code-loading R16 oad Code 0010 0101 1010 1000 1010 1010 1010 1100 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current

Code-loading R16 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 00100 10110 10100 01010 10101 01011 00100 10010

Code-loading R16 Next ast 00100 10110 10100 01010 10101 01011 00100 10010 Current 00100 10110 10100 01010 10101 01011 00100 10010

recipes Divide or multiply resources Separate memory Separate cache-lines Asynchronously schedule work Delayed dealloc Only involve relevant parties Fine-grained synchronization ock-free queues Reader optimized RW-ocks Postpone until possible Thread progress ock-free queues Code loading Table cleanup

Questions?