Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017]

Similar documents
* Contributed while interning at SAP. September 1 st, 2017 PUBLIC

[537] Fast File System. Tyler Harter

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Instant Recovery for Main-Memory Databases

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

Paging. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

NVthreads: Practical Persistence for Multi-threaded Applications

Operating Systems. File Systems. Thomas Ropars.

Preview. Memory Management

The Google File System

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

ECE 598 Advanced Operating Systems Lecture 12

ECE 598 Advanced Operating Systems Lecture 10

Paging. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Dynamic Memory Allocation

I/O and file systems. Dealing with device heterogeneity

Main Points. File layout Directory layout

Memory Management. Today. Next Time. Basic memory management Swapping Kernel memory allocation. Virtual memory

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

A.Arpaci-Dusseau. Mapping from logical address space to physical address space. CS 537:Operating Systems lecture12.fm.2

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

File Systems. Chapter 11, 13 OSPP

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

The Google File System

Last Class: Deadlocks. Where we are in the course

Optimizing Dynamic Memory Management

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

Memory Management. Dr. Yingwu Zhu

EECS 482 Introduction to Operating Systems

Memory Management Basics

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems

A Caching-Oriented FTL Design for Multi-Chipped Solid-State Disks. Yuan-Hao Chang, Wei-Lun Lu, Po-Chun Huang, Lue-Jane Lee, and Tei-Wei Kuo

Memory management. Johan Montelius KTH

Memory Management. To do. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

Parallel storage allocator

The Google File System

Parallel Memory Defragmentation on a GPU

Distributed File Systems II

File Systems. CS170 Fall 2018

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

Deukyeon Hwang UNIST. Wook-Hee Kim UNIST. Beomseok Nam UNIST. Hanyang Univ.

CS4500/5500 Operating Systems File Systems and Implementations

memory management Vaibhav Bajpai

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

A Practical Scalable Distributed B-Tree

Princeton University. Computer Science 217: Introduction to Programming Systems. Dynamic Memory Management

COMP 530: Operating Systems File Systems: Fundamentals

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Chapter 14: File-System Implementation

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Section 10: Device Drivers, FAT, Queuing Theory, Memory Mapped Files

File Systems: Fundamentals

The Google File System (GFS)

Chapter 9 Memory Management

CSE506: Operating Systems CSE 506: Operating Systems

CS399 New Beginnings. Jonathan Walpole

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Memory Management 3/29/14 21:38

File Systems: Fundamentals

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Page Tables. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

THE TROUBLE WITH MEMORY

Events, Memory Management

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 13: Address Translation

Robust Memory Management Schemes

Virtual Memory Outline

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Locality and The Fast File System. Dongkun Shin, SKKU

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1

In multiprogramming systems, processes share a common store. Processes need space for:

Process s Address Space. Dynamic Memory. Backing the Heap. Dynamic memory allocation 3/29/2013. When a process starts the heap is empty

Heap Compression for Memory-Constrained Java

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases

GFS: The Google File System. Dr. Yingwu Zhu

FILE SYSTEM IMPLEMENTATION. Sunu Wibirama

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Paging. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Princeton University Computer Science 217: Introduction to Programming Systems. Dynamic Memory Management

How to Reduce Data Capacity in Objectbased Storage: Dedup and More

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

An Analysis of Persistent Memory Use with WHISPER

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

Buffer Management for XFS in Linux. William J. Earl SGI

An Analysis of Persistent Memory Use with WHISPER

EECS 482 Introduction to Operating Systems

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

Dept. Of Computer Science, Colorado State University

Software and Tools for HPE s The Machine Project

Dynamic Memory Allocation

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

Virtual Memory I. Jo, Heeseung

Transcription:

Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017] Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, Grégoire Gomes PUBLIC Non-Volatile Memories Workshop March 12, 2018

Motivation NVM can replace both main memory and storage single-level database storage architecture without I/O Fail-safe persistent NVM memory management is conditio sine qua non for enabling this novel architecture paradigm Existing persistent allocators are general-purpose and do not address the versatile needs of database systems We present PAllocator, a highly scalable fail-safe persistent allocator 2

Outlook What is a persistent allocator? PAllocator s design decisions Experimental evaluation Conclusion 3

What Characterizes a Persistent Allocator? Application A persistent allocator must: 1. Provide a recoverable addressing scheme 2. Avoid persistent memory leaks Transient allocator application address space Persistent allocator Virtual memory subsystem DRAM NVM 4

1. Recoverable Addressing Scheme Start address Program root at known offset Offset Volatile pointer = File start address + Offset (mmap) Virtual Address Space PPtr: {File ID, Offset} NVM File 5

2. Preventing Memory Leaks pptr = allocate(size); persist(&pptr); Traditional interface has a blind spot Reference passing allocate(pptr &pptr, size_t allocsize) pptr is owned by the data structure 6

PAllocator Design We explore the following design dimensions 1. Pool structure (single file vs. multiple files) 2. Allocation strategies 3. Concurrency Handling 4. Persistent fragmentation We do not consider garbage collection We assume hardware-managed wear-leveling 7

1. Pool Structure: Single Vs. Multiple Files Pool as Single File Pros 8-byte persistent pointers possible Easier to implement Cons Hard to shrink Huge block allocation a problem Pool as Multiple Files Pros Easier to grow and shrink Easy, fragmentation-free huge allocation handling Cons 16-byte persistent pointers Multiple files better suited for database systems 8

2. Allocation Strategies Three allocation strategies One file per allocation Segregated-fit for small blocks (e.g., < 4 KB) Best-fit for medium and large blocks (e.g., [4 KB, 16 MB)) One file per allocation not realistic Significant overhead and wasted memory for small blocks Filesystem might struggle to handle huge number of files except for huge blocks! Fragmentation handling pushed to filesystem 9

Segregated-Fit Allocation Strategy Fixed-size memory chunk, e.g., 8 KB, divided into fixed-size blocks Bitmap One allocation == one bit flip! Allocated block Free block Multiple class sizes Reduced fragmentation with moderate number of class sizes Not suitable for larger block allocations 10

2. Allocation Strategies: Best-Fit Allocation Strategy Allocate multiple of a predetermined size (e.g., system page size) Allocation Free blocks index sorted by block size DRAM NVM Inner nodes Inner nodes Coalescing Global block index sorted by block offset Indexes implemented with the FPTree, a hybrid NVM- DRAM B+-Tree [SIGMOD 16] Suitable for large blocks Segment (e.g., 128 MB) Prone to fragmentation 11

3. Concurrency Handling Thread-local allocation One allocator object per thread The standard in general-purpose allocators Used for small block allocations Local allocator requests chunks from global pool Need to be merged with global pool when thread terminates Does not scale under high concurrency Frequent chunk requests to the global pool 12

3. Concurrency Handling Core-local allocation One allocator object per physical core Local allocators request large files from global pool Socket 1 C1 C2 Alloc Alloc QPI Socket 2 C1 C2 Alloc Alloc Robust performance under high concurrency Stable local allocators Greedy 13

4. Persistent Fragmentation Restart is a last resort, but valid way of defragmenting volatile memory does not apply to NVM File system solutions do not apply to NVM - File systems benefit from an additional indirection layer - NVM is directly accessed with load/store instructions Need new defragmentation mechanisms 14

4. Persistent Fragmentation Most file systems have support for sparse files Defragmentation idea: Punch holes in free blocks Iterate until target size reached Used Free Hole Find largest free block Punch hole using fallocate Used Free Hole Free Used Must keep file size unchanged to maintain validity of offsets 15

PAllocator: Architecture Overview Allocator Objects Small Alloc 1 Small Alloc n Big Alloc 1 Big Alloc n Huge Alloc Persistent Allocators Small Block Allocator Big Block Allocator Huge Block Allocator Segment Manager Shared list of free segments Failure-Atomic Segment Provider Segment ownership map NVM-aware Filesystem File creation, deletion, memory mapping 16

Throughput [op/thread/sec] PAllocator Performance Evaluation Random-Size Allocation/Deallocation (64 B - 128 KB) 500000 400000 300000 200000 100000 0 1 2 4 8 16 Threads 1.7x 7.6x PAllocator NVML jemalloc PAllocator scales nearly linearly 17

KOPS/S Allocator Performance Impact on the FPTree 300 250 200 100% Insert 1.4x 400 350 300 250 50% Find, 50% Insert 1.2x 150 200 100 50 150 100 50 0 PAllocator NVML 0 PAllocator NVML Persistent allocators do impact database performance 18

Time [sec] Allocator Recovery Time 100 10 516x 1 0,1 0,01 29.5x 4.6x PAllocator NVML Makalu nvm_malloc 0,001 0,0001 60 600 6000 60000 Allocated Data Size [MB] 1 TB PAllocator (0.75s), NVML (3.5s), Makalu (394.5s), nvm_malloc (22.5s) 19

Conclusion NVM has the potential to disrupt database storage architecture Memory management is a necessary building block We presented PAllocator: Designed for large NVM systems Highly scalable Fast recovery Defragmentation capability Allocator Objects Persistent Allocators Segment Manager NVM-aware Filesystem Small Alloc 1 Small Big Big Alloc n Alloc 1 Huge Alloc n Alloc Small Block Allocator Shared list of free segments Big Block Allocator Failure-Atomic Segment Provider File creation, deletion, memory mapping Huge Block Allocator Segment ownership map 20

State-of-the-Art Allocator Purpose Pool structure Allocation strategies Concurrency handling Garbage collection Defragmentation Source Mnemosyne General Multiple files Segregated-fit + best-fit Thread-local for small blocks Yes No ASPLOS 11 NV-Heaps General Single file Undefined Thread-local Yes No ASPLOS 11 nvm_malloc General Single file Segregated-fit + best-fit NVML General Single file Segregated-fit + best-fit Makalu General Single file Segregated-fit + best-fit Thread-local for small blocks Thread-local for small blocks Thread-local for small blocks No No ADMS 15 No No http://pmem.i o/nvml/ Yes (offline) No OOPSLA 16 PAllocator Large systems Multiple files Segregated-fit + best-fit + file Core-local No Yes VLDB 17 Salient differences in design decisions For completeness: NVMalloc and Walloc focus on wear-leveling 21