RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg

Size: px

Start display at page:

Download "RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg"

Giles Dorsey
6 years ago
Views:

1 RiSE: Relaxed Systems Engineering? Christoph Kirsch University of Salzburg

2 Application: >10k #threads, producer/consumer, blocking Hardware: CPUs, cores, MMUs,, caches

3 Application: >10k #threads, producer/consumer, blocking allocate Hardware: CPUs, cores, MMUs,, caches

4 Application: >10k #threads, producer/consumer, blocking allocate access Hardware: CPUs, cores, MMUs,, caches

5 Application: >10k #threads, producer/consumer, blocking allocate access share Hardware: CPUs, cores, MMUs,, caches

6 Application: >10k #threads, producer/consumer, blocking allocate access share deallocate Hardware: CPUs, cores, MMUs,, caches

7 Application: >10k #threads, producer/consumer, blocking allocate access share deallocate throughput Hardware: CPUs, cores, MMUs,, caches

8 Application: >10k #threads, producer/consumer, blocking allocate access share deallocate throughput scalability Hardware: CPUs, cores, MMUs,, caches

9 Application: >10k #threads, producer/consumer, blocking allocate access share deallocate throughput scalability latency Hardware: CPUs, cores, MMUs,, caches

10 Application: >10k #threads, producer/consumer, blocking allocate access share deallocate throughput scalability latency consumption Hardware: CPUs, cores, MMUs,, caches

12 free lists

13 free lists thread-local

14 free lists core-local thread-local

15 free lists core-local thread-local CPU-local

16 global free lists core-local thread-local CPU-local

17 lock-based global free lists core-local thread-local CPU-local

18 lock-based global lock-free free lists core-local thread-local CPU-local

19 lock-based global lock-free is it a stack? free lists core-local thread-local CPU-local

20 lock-based global lock-free is it a stack? free lists is it a queue? core-local thread-local CPU-local

21 TryReviveSlow StealFailed ReviveFailed ReviveOkSwapHot NoBlockFastAlloc BlockDoNothing NoHotNoBlock OthersNotSlow NoHotButBlock FastAlloc MineNotSlow StealOkSwapHot TryStealSpan allocation transitions start live Terminate dead RemoteFree deallocation transitions StillFast FastFree MakeSlow TryMakeSafe SlowFree Emptied ReviveSwapHot StaySlow FastFree NotEmpty Figure 5: State machine for thread limited to one size class

22 Relaxed Semantics vs. Operational Performance vs. Denotational Performance

23 Relaxed Semantics [PaCT13] [CF13] [POPL13] vs. Operational Performance vs. Denotational Performance

24 Relaxed Semantics [PaCT13] [CF13] [POPL13] vs. Operational Performance vs. [RACES12] Denotational Performance

25 jemalloc llalloc ptmalloc2 nedmalloc tbb tcmalloc streamflow hoard compact scalloc scalloc-eager scalloc-reuse total allocation time in seconds (logscale, less is better) total deallocation time in seconds (logscale, less is better) average consumption in MB (logscale, less is better) B B 256-1KB 1-4KB 4-16KB 16-64KB object size in bytes (logscale) (a) Allocation time KB 256KB-1MB 1-4MB B B 256-1KB 1-4KB 4-16KB 16-64KB object size in bytes (logscale) (b) Deallocation time KB 256KB-1MB 1-4MB B B 256-1KB 1-4KB 4-16KB 16-64KB KB object size in bytes (logscale) (c) Memory consumption 256KB-1MB 1-4MB Figure 7: ACDC for increasing object sizes per-thread total allocation time seconds (logscale, less is better) per-thread total deallocation time seconds (logscale, less is better) per-thread average consumption in kb (less is better) number of threads (a) Allocation time number of threads (b) Deallocation time number of threads (c) Memory consumption Figure 8: ACDC for an increasing number of threads allocating thread-local objects from a large size range per-thread total allocation time seconds (logscale, less is better) per-thread total deallocation time seconds (logscale, less is better) per-thread average consumption in kb (less is better) number of threads (a) Allocation time number of threads (b) Deallocation time number of threads (c) Memory consumption Figure 9: ACDC for an increasing number of threads allocating shared objects from a large size range

26 Scalable Concurrent Data Structures: scal.cs.uni-salzburg.at github.com/cksystemsgroup/scal Scalable Concurrent Memory Allocator: github.com/cksystemsgroup/scalloc Allocator Benchmarking: acdc.cs.uni-salzburg.at github.com/cksystemsgroup/acdc

Design of Concurrent and Distributed Data Structures

METIS Spring School, Agadir, Morocco, May 2015 Design of Concurrent and Distributed Data Structures Christoph Kirsch University of Salzburg Joint work with M. Dodds, A. Haas, T.A. Henzinger, A. Holzer,