Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017]

Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017] Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, Grégoire Gomes PUBLIC Non-Volatile Memories Workshop March 12, 2018

Motivation NVM can replace both main memory and storage single-level database storage architecture without I/O Fail-safe persistent NVM memory management is conditio sine qua non for enabling this novel architecture paradigm Existing persistent allocators are general-purpose and do not address the versatile needs of database systems We present PAllocator, a highly scalable fail-safe persistent allocator 2

Outlook What is a persistent allocator? PAllocator s design decisions Experimental evaluation Conclusion 3

What Characterizes a Persistent Allocator? Application A persistent allocator must: 1. Provide a recoverable addressing scheme 2. Avoid persistent memory leaks Transient allocator application address space Persistent allocator Virtual memory subsystem DRAM NVM 4

1. Recoverable Addressing Scheme Start address Program root at known offset Offset Volatile pointer = File start address + Offset (mmap) Virtual Address Space PPtr: {File ID, Offset} NVM File 5

2. Preventing Memory Leaks pptr = allocate(size); persist(&pptr); Traditional interface has a blind spot Reference passing allocate(pptr &pptr, size_t allocsize) pptr is owned by the data structure 6

PAllocator Design We explore the following design dimensions 1. Pool structure (single file vs. multiple files) 2. Allocation strategies 3. Concurrency Handling 4. Persistent fragmentation We do not consider garbage collection We assume hardware-managed wear-leveling 7

1. Pool Structure: Single Vs. Multiple Files Pool as Single File Pros 8-byte persistent pointers possible Easier to implement Cons Hard to shrink Huge block allocation a problem Pool as Multiple Files Pros Easier to grow and shrink Easy, fragmentation-free huge allocation handling Cons 16-byte persistent pointers Multiple files better suited for database systems 8

2. Allocation Strategies Three allocation strategies One file per allocation Segregated-fit for small blocks (e.g., < 4 KB) Best-fit for medium and large blocks (e.g., [4 KB, 16 MB)) One file per allocation not realistic Significant overhead and wasted memory for small blocks Filesystem might struggle to handle huge number of files except for huge blocks! Fragmentation handling pushed to filesystem 9

Segregated-Fit Allocation Strategy Fixed-size memory chunk, e.g., 8 KB, divided into fixed-size blocks Bitmap One allocation == one bit flip! Allocated block Free block Multiple class sizes Reduced fragmentation with moderate number of class sizes Not suitable for larger block allocations 10

2. Allocation Strategies: Best-Fit Allocation Strategy Allocate multiple of a predetermined size (e.g., system page size) Allocation Free blocks index sorted by block size DRAM NVM Inner nodes Inner nodes Coalescing Global block index sorted by block offset Indexes implemented with the FPTree, a hybrid NVM- DRAM B+-Tree [SIGMOD 16] Suitable for large blocks Segment (e.g., 128 MB) Prone to fragmentation 11

3. Concurrency Handling Thread-local allocation One allocator object per thread The standard in general-purpose allocators Used for small block allocations Local allocator requests chunks from global pool Need to be merged with global pool when thread terminates Does not scale under high concurrency Frequent chunk requests to the global pool 12

3. Concurrency Handling Core-local allocation One allocator object per physical core Local allocators request large files from global pool Socket 1 C1 C2 Alloc Alloc QPI Socket 2 C1 C2 Alloc Alloc Robust performance under high concurrency Stable local allocators Greedy 13

4. Persistent Fragmentation Restart is a last resort, but valid way of defragmenting volatile memory does not apply to NVM File system solutions do not apply to NVM - File systems benefit from an additional indirection layer - NVM is directly accessed with load/store instructions Need new defragmentation mechanisms 14

4. Persistent Fragmentation Most file systems have support for sparse files Defragmentation idea: Punch holes in free blocks Iterate until target size reached Used Free Hole Find largest free block Punch hole using fallocate Used Free Hole Free Used Must keep file size unchanged to maintain validity of offsets 15

PAllocator: Architecture Overview Allocator Objects Small Alloc 1 Small Alloc n Big Alloc 1 Big Alloc n Huge Alloc Persistent Allocators Small Block Allocator Big Block Allocator Huge Block Allocator Segment Manager Shared list of free segments Failure-Atomic Segment Provider Segment ownership map NVM-aware Filesystem File creation, deletion, memory mapping 16

Throughput [op/thread/sec] PAllocator Performance Evaluation Random-Size Allocation/Deallocation (64 B - 128 KB) 500000 400000 300000 200000 100000 0 1 2 4 8 16 Threads 1.7x 7.6x PAllocator NVML jemalloc PAllocator scales nearly linearly 17

KOPS/S Allocator Performance Impact on the FPTree 300 250 200 100% Insert 1.4x 400 350 300 250 50% Find, 50% Insert 1.2x 150 200 100 50 150 100 50 0 PAllocator NVML 0 PAllocator NVML Persistent allocators do impact database performance 18

Time [sec] Allocator Recovery Time 100 10 516x 1 0,1 0,01 29.5x 4.6x PAllocator NVML Makalu nvm_malloc 0,001 0,0001 60 600 6000 60000 Allocated Data Size [MB] 1 TB PAllocator (0.75s), NVML (3.5s), Makalu (394.5s), nvm_malloc (22.5s) 19

Conclusion NVM has the potential to disrupt database storage architecture Memory management is a necessary building block We presented PAllocator: Designed for large NVM systems Highly scalable Fast recovery Defragmentation capability Allocator Objects Persistent Allocators Segment Manager NVM-aware Filesystem Small Alloc 1 Small Big Big Alloc n Alloc 1 Huge Alloc n Alloc Small Block Allocator Shared list of free segments Big Block Allocator Failure-Atomic Segment Provider File creation, deletion, memory mapping Huge Block Allocator Segment ownership map 20

State-of-the-Art Allocator Purpose Pool structure Allocation strategies Concurrency handling Garbage collection Defragmentation Source Mnemosyne General Multiple files Segregated-fit + best-fit Thread-local for small blocks Yes No ASPLOS 11 NV-Heaps General Single file Undefined Thread-local Yes No ASPLOS 11 nvm_malloc General Single file Segregated-fit + best-fit NVML General Single file Segregated-fit + best-fit Makalu General Single file Segregated-fit + best-fit Thread-local for small blocks Thread-local for small blocks Thread-local for small blocks No No ADMS 15 No No http://pmem.i o/nvml/ Yes (offline) No OOPSLA 16 PAllocator Large systems Multiple files Segregated-fit + best-fit + file Core-local No Yes VLDB 17 Salient differences in design decisions For completeness: NVMalloc and Walloc focus on wear-leveling 21