NVthreads: Practical Persistence for Multi-threaded Applications

Size: px
Start display at page:

Download "NVthreads: Practical Persistence for Multi-threaded Applications"

Transcription

1 NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu*, Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster, TU Darmstadt and Purdue University * Work was done at Hewlett Packard Labs. NVMW 2018 NVthreads was published in EuroSys 2017 This work was supported by Hewlett Packard Labs, NSF TC , NSF TWC , and ERC FP

2 What is non-volatile memory (NVM)? Key features: persistence, good performance, byte addressability Persistence - Retain data without power Good performance - Outperform traditional filesystem interface Byte addressability - Allow for pure memory operations 2

3 Programming interfaces for NVM NVM aware filesystems: BPFS, PMFS, PMEM - Pro: provide good performance - Con: require applications to use file-system interfaces and may need hardware modifications Durable transaction and heaps: NV-Heaps, Mnemosyne - Pro: allow fine-grained NVM access - Con: force programs to use transactions and require non-trivial effort to retrofit transactions in lock-based programs Problem: Can we provide a simpler programming interface? 4

4 NVM-aware apps programming 1 : # Add element to the tail of list 2 : pthread_lock(&m); 3 : malloc(&e, sizeof(*e)); 4 : 5 : 6 : e->value = 5; Challenges: 1.data consistency programmability volatile caches performance 7 : 8 : 9 : e->next = NULL; 10: 11: NVM 12: head->next = e; //crash 13: 14: 15: tail = e; head 1. tail e 5. NULL 16: pthread_unlock(&m); 8

5 NVM-aware apps programming 1 : # Add element to the tail of list 2 : pthread_lock(&m); 3 : malloc(&e, sizeof(*e)); 4 : <save old value of e->value> 5 : 6 : e->value = 5; Challenges: 1.data consistency 2.programmability volatile caches performance 7 : <save old value of e->next> 8 : 9 : e->next = NULL; 10: <save old value of head->next> 11: NVM 12: head->next = e; 13: <save old value of tail> 14: 15: tail = e; head 1. e 5. tail NULL 16: pthread_unlock(&m); 9

6 NVM-aware apps programming 1 : # Add element to the tail of list 2 : pthread_lock(&m); 3 : malloc(&e, sizeof(*e)); 4 : <save old value of e->value> 5 : <flush log entry to NVM> 6 : e->value = 5; 7 : <save old value of e->next> 8 : <flush log entry to NVM> Challenges: 1.data consistency 2.programmability 3.volatile caches performance Cache 9 : e->next = NULL; flushing 10: <save old value of head->next> 11: <flush log entry to NVM> NVM 12: head->next = e; 13: <save old value of tail> 14: <flush log entry to NVM> 15: tail = e; head 1. e 5. tail NULL 16: pthread_unlock(&m); 10

7 NVM-aware apps programming 1 : # Add element to the tail of list 2 : pthread_lock(&m); 3 : malloc(&e, sizeof(*e)); 4 : <save old value of e->value> 5 : <flush log entry to NVM> 6 : e->value = 5; 7 : <save old value of e->next> 8 : <flush log entry to NVM> Challenges: 1.data consistency 2.programmability 3.volatile caches 4.performance Cache 9 : e->next = NULL; flushing 10: <save old value of head->next> 11: <flush log entry to NVM> NVM 12: head->next = e; 13: <save old value of tail> 14: <flush log entry to NVM> 15: tail = e; head 1. e 5. tail NULL 16: pthread_unlock(&m); 11

8 Challenges of using NVM Data consistency - Ensure data consistency even after crash Volatile caches - Manage data movement from volatile caches to NVM Programmability - Avoid extensive program modifications Performance - Minimize runtime overhead!proposal: NVthreads, a programming model and runtime that adds persistence to multi-threaded C/C++ programs 13

9 Goals of NVthreads Make existing lock-based C/C++ applications crash tolerant Minimize porting effort - Drop-in replacement for pthreads library - No need for transactions Advantages of the NVthreads - Good performance - Easier to develop NVM-aware applications 14

10 Key ideas Use synchronization points to infer consistent regions (cf. Atlas [OOPSLA 14]) - Does not require applications to use transactions Execute multithreaded program as multi-process program (cf. DThreads [SOSP 11]) - Process memory buffers uncommitted writes Track data modifications at page granularity - Amortizes logging overhead vs fine-grained tracking 15

11 Using NVthreads Ease of use: bash$ gcc foo.c o foo.out rdynamic libnvthread.so ldl Unmodified C/C++ application User space Kernel space Hardware Modifications Allocate data in NVM: nvmalloc() Recover data in NVM: nvrecover() NVthreads library Multi-process, intercepting synchronization, tracking data, maintaining log Operating system Memory allocation and file system interface for both DRAM and NVM DRAM Volatile main memory e.g., stacks NVM Persistent regions e.g., linked list on heap 19 Add recovery code, specify persistent allocations Link to NVthreads library DRAM NVM

12 NVthreads: programming model 1 void main(){ 2 if( crashed() ){ 3 int *c = (int*) nvmalloc(sizeof(int), c ); 4 *c = nvrecover(c, sizeof(int), c ); 5 } 6 else{ // normal execution 7 int *c = (int*) nvmalloc(sizeof(int), c ); 8... // thread creation 9 m.lock() 10 *c = *c+1; m.unlock() 13 } 14 } Locks mark boundary for durable code section. 22

13 NVthreads: programming model 1 void main(){ 2 if( crashed() ){ 3 int *c = (int*)nvmalloc(sizeof(int), nvmalloc(sizeof(int), c ); 4 *c = nvrecover(c, sizeof(int), c ); 5 } 6 else{ // normal execution Application specific recovery code. Programer needs to add. 7 int *c = (int*) nvmalloc(sizeof(int), c ); 8... // thread creation 9 m.lock() 10 *c = *c+1; m.unlock() 13 } 14 } 23

14 Example: linked list NVthreads guarantees that the linked list is atomically appended w.r.t. failures 1 : # L is a persistent list 2 : Start threads {T1, T2, T3} T1 Critical section (add e1) 3 : 4 : # Add element to the tail of list 5 : pthread_lock(&m); 6 : nvmalloc(&e, sizeof(*e)); T2 Critical section (add e2) Recovery phase (execute redo ops) 7 : e->val = localval; 8 : tail->next = e; 9 : e->prev = tail; // crash! 10: tail = e; 11: pthread_unlock(&m) T3 NVM 25 Critical section (add e3) L={} L={e1} L={e1, e2} state of the list data structure L

15 Implementing atomic durability Convert threads to processes (cf. DThreads [SOSP 11]) - Each process works on private memory, no undo log shared address space disjoint address spaces At synchronization points, propagate private updates, execute processes sequentially Track dirty pages and log them to NVM for recovery - Apply redo log in the event of crash 26

16 From threads to processes T1 Track dirty pages Start NVM log write Stop Merge shared state Wait Pass token T2 Wait Track dirty pages Start NVM log write Stop Merge shared state Parallel phase Critical section Parallel phase 33 Execute Wait

17 Redo logging Parallel phase Critical section Clean page Dirtied page Shared state T1 Rego log log dirty pages merge updated bytes write back to NVM sync() NVM 34

18 Tracking data dependencies A T1 X=1 cond_wait() X=Y=0 dependence T2 Y=X cond_signal() NVM Log1 Log2 Log3 NVthreads maintains metadata for memory pages per lockset to track data dependencies. B 46

19 Evaluation Environment - Ubuntu (Linux ) - Two Intel Xeon X5650 processors (12cores@2.67GHz) - 198GB RAM and 600GB SSD Applications - PARSEC benchmarks, Phoenix benchmarks, PageRank, K-means NVM emulator - Linux tmpfs on DRAM emulating nvmfs (provided by Hewlett Packard Labs) - Injected 1000ns delay to each 4KB page write via RDTSCP instruction 47

20 Performance vs pthreads Phoenix and PARSEC benchmarks No recovery protocol 16 Slowdown (x) histogram kmeans linear regression matrix multiply pca reverse index string match word count blackscholes canneal dedup ferret streamcluster swaptions Pthreads Dthreads NVthreads (nvmfs 1000ns) Atlas 48

21 Performance vs pthreads 9 out of 14 applications: NVthreads incurs less than 20% overhead vs pthreads Remaining 5 applications: 4x to 7x slowdown vs pthreads 16 Slowdown (x) histogram kmeans linear regression matrix multiply pca reverse index string match word count blackscholes canneal dedup ferret streamcluster swaptions Pthreads Dthreads NVthreads (nvmfs 1000ns) Atlas 50

22 Performance vs Atlas [OOPSLA 14] 10 out of 12 applications: NVthreads is 7% to 100x faster vs Atlas Slowdown (x) x x histogram kmeans linear regression matrix multiply pca reverse index string match word count blackscholes canneal dedup ferret streamcluster swaptions Pthreads Dthreads NVthreads (nvmfs 1000ns) Atlas 52

23 Performance vs Atlas [OOPSLA 14] 10 out of 12 applications: NVthreads is 7% to 100x faster vs Atlas Remaining 2 applications: 7% to 2x slower vs Atlas 16 Slowdown (x) x x histogram kmeans linear regression matrix multiply pca reverse index string match word count blackscholes canneal dedup ferret streamcluster swaptions Pthreads Dthreads NVthreads (nvmfs 1000ns) Atlas 53

24 Is coarse grained tracking a good fit? 9 out of 14 applications touch more than 55% of each page It is worthwhile to track data at page granularity in these apps % of each page modified linear regression (25) string match (37) histogram (44) blackscholes (89) swaptions (483) matrix multiply (4K) kmeans (10K) pca (11K) word count (12K) ferret (150K) streamcluster (180K) dedup (2.3M) reverse index (2.7M) canneal (7.4M) 54

25 NVthreads is faster than fine-grained tracking Microbenchmark: 4 threads randomly modify parts of 1000 memory pages Mnemosyne [ASPLOS 11] and Atlas [OOPSLA 14] use word-level tracking NVthreads is 3x to 30x faster than fine-grained tracking Slowdown over pthreads (x) % 10% 25% 50% 75% 100% Percentage of page modified 56 NVthreads (nvm-1000ns) Atlas (no-clflush) Mnemosyne Atlas

26 Benefits of recovery (K-means) We made K-means crash at synthetic program points, recover, continue until convergence at ~160th iteration NVthreads K-means provides up to 1.9x speedup vs pthreads NVthreads requires only 4 SLOC changes to make K-means crash tolerant Speedup over over pthreads (x) Input size Pthreads NVthreads (nvm=1000ns) 1M 10M 20M 30M 1M 10M 20M 30M 1M 10M 20M 30M 1M 10M 20M 30M Iteration when crash occured 58

27 Summary NVthreads allows programmers to easily leverage NVM with just few lines of source code changes Recovery requires only redo log because multi-process execution buffers private updates Coarse-grained page-level tracking amortizes logging overheads NVthreads prototype is publicly available at: 61

System Software for Persistent Memory

System Software for Persistent Memory System Software for Persistent Memory Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran and Jeff Jackson 72131715 Neo Kim phoenixise@gmail.com Contents

More information

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements

More information

arxiv: v1 [cs.dc] 13 Dec 2017

arxiv: v1 [cs.dc] 13 Dec 2017 Persistent Memory Programming Abstractions in Context of Concurrent Applications Ajay Singh cs15mtech01001@iith.ac.in IIT Hyderabad Marc Shapiro marc.shapiro@acm.org INRIA & LIP6 Gael Thomas gael.thomas@telecom-sudparis.eu

More information

An Analysis of Persistent Memory Use with WHISPER

An Analysis of Persistent Memory Use with WHISPER An Analysis of Persistent Memory Use with WHISPER Sanketh Nalli, Swapnil Haria, Michael M. Swift, Mark D. Hill, Haris Volos*, Kimberly Keeton* University of Wisconsin- Madison & *Hewlett- Packard Labs

More information

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early

More information

An Analysis of Persistent Memory Use with WHISPER

An Analysis of Persistent Memory Use with WHISPER An Analysis of Persistent Memory Use with WHISPER Sanketh Nalli, Swapnil Haria, Michael M. Swift, Mark D. Hill, Haris Volos*, Kimberly Keeton* University of Wisconsin- Madison & *Hewlett- Packard Labs

More information

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, Jie Wu Huazhong University of Science and Technology, China 3th USENIX Symposium on Operating Systems

More information

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS

SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS SOLVING THE DRAM SCALING CHALLENGE: RETHINKING THE INTERFACE BETWEEN CIRCUITS, ARCHITECTURE, AND SYSTEMS Samira Khan MEMORY IN TODAY S SYSTEM Processor DRAM Memory Storage DRAM is critical for performance

More information

Soft Updates Made Simple and Fast on Non-volatile Memory

Soft Updates Made Simple and Fast on Non-volatile Memory Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University @ NVMW 18 Non-volatile Memory (NVM) ü Non-volatile

More information

Distributed Shared Persistent Memory

Distributed Shared Persistent Memory Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory (PM/NVM) Byte Addressable Persistent CPU Cache Low Latency Capacity Cost effective PM DRAM 2 Many PM Work, but

More information

Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)

Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in

More information

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1, Hyunsub Song, Beomseok Nam, Sam H. Noh UNIST 1 Hongik University Persistent Memory (PM) Persistent memory

More information

Blurred Persistence in Transactional Persistent Memory

Blurred Persistence in Transactional Persistent Memory Blurred Persistence in Transactional Persistent Memory Youyou Lu, Jiwu Shu, Long Sun Tsinghua University Overview Problem: high performance overhead in ensuring storage consistency of persistent memory

More information

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory

SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory Hyeonho Song, Sam H. Noh UNIST HotStorage 2018 Contents Persistent Memory Motivation SAY-Go Design Implementation Evaluation

More information

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin) : LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)

More information

DTHREADS: Efficient Deterministic

DTHREADS: Efficient Deterministic DTHREADS: Efficient Deterministic Multithreading Tongping Liu, Charlie Curtsinger, and Emery D. Berger Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 {tonyliu,charlie,emery}@cs.umass.edu

More information

Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017]

Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017] Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017] Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, Grégoire Gomes PUBLIC Non-Volatile

More information

Deukyeon Hwang UNIST. Wook-Hee Kim UNIST. Beomseok Nam UNIST. Hanyang Univ.

Deukyeon Hwang UNIST. Wook-Hee Kim UNIST. Beomseok Nam UNIST. Hanyang Univ. Deukyeon Hwang UNIST Wook-Hee Kim UNIST Youjip Won Hanyang Univ. Beomseok Nam UNIST Fast but Asymmetric Access Latency Non-Volatility Byte-Addressability Large Capacity CPU Caches (Volatile) Persistent

More information

Caching and reliability

Caching and reliability Caching and reliability Block cache Vs. Latency ~10 ns 1~ ms Access unit Byte (word) Sector Capacity Gigabytes Terabytes Price Expensive Cheap Caching disk contents in RAM Hit ratio h : probability of

More information

File Systems: Consistency Issues

File Systems: Consistency Issues File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data

More information

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory

BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory JOY ARULRAJ JUSTIN LEVANDOSKI UMAR FAROOQ MINHAS PER-AKE LARSON Microsoft Research NON-VOLATILE MEMORY [NVM] PERFORMANCE DRAM VOLATILE

More information

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories

Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories Steven Swanson Director, Non- Vola;le System Laboratory Computer Science and Engineering University of California, San

More information

COS 318: Operating Systems. Journaling, NFS and WAFL

COS 318: Operating Systems. Journaling, NFS and WAFL COS 318: Operating Systems Journaling, NFS and WAFL Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Topics Journaling and LFS Network

More information

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique

Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad Alshboul, James Tuck, and Yan Solihin Email: maalshbo@ncsu.edu ARPERS Research Group Introduction Future

More information

Demand-Driven Software Race Detection using Hardware

Demand-Driven Software Race Detection using Hardware Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse, Zhiqiang Ma, Matthew I. Frank Ramesh Peri, Todd Austin University of Michigan Intel Corporation CSCADS Aug

More information

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)

STORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us) 1 STORAGE LATENCY 2 RAMAC 350 (600 ms) 1956 10 5 x NAND SSD (60 us) 2016 COMPUTE LATENCY 3 RAMAC 305 (100 Hz) 1956 10 8 x 1000x CORE I7 (1 GHZ) 2016 NON-VOLATILE MEMORY 1000x faster than NAND 3D XPOINT

More information

Non-Volatile Memory Through Customized Key-Value Stores

Non-Volatile Memory Through Customized Key-Value Stores Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)

More information

PASTE: A Networking API for Non-Volatile Main Memory

PASTE: A Networking API for Non-Volatile Main Memory PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets

More information

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han

High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han High-Performance Transaction Processing in Journaling File Systems Y. Son, S. Kim, H. Y. Yeom, and H. Han Seoul National University, Korea Dongduk Women s University, Korea Contents Motivation and Background

More information

Hardware Undo+Redo Logging. Matheus Ogleari Ethan Miller Jishen Zhao CRSS Retreat 2018 May 16, 2018

Hardware Undo+Redo Logging. Matheus Ogleari Ethan Miller Jishen Zhao   CRSS Retreat 2018 May 16, 2018 Hardware Undo+Redo Logging Matheus Ogleari Ethan Miller Jishen Zhao https://users.soe.ucsc.edu/~mogleari/ CRSS Retreat 2018 May 16, 2018 Typical Memory and Storage Hierarchy: Memory Fast access to working

More information

Rethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.

Rethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab. 1 Rethink the Sync 황인중, 강윤지, 곽현호 Authors 2 USENIX Symposium on Operating System Design and Implementation (OSDI 06) System Structure Overview 3 User Level Application Layer Kernel Level Virtual File System

More information

Dalí: A Periodically Persistent Hash Map

Dalí: A Periodically Persistent Hash Map Dalí: A Periodically Persistent Hash Map Faisal Nawab* 1, Joseph Izraelevitz* 2, Terence Kelly*, Charles B. Morrey III*, Dhruva R. Chakrabarti*, and Michael L. Scott 2 1 Department of Computer Science

More information

ThyNVM. Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems

ThyNVM. Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems ThyNVM Enabling So1ware- Transparent Crash Consistency In Persistent Memory Systems Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu TWO- LEVEL STORAGE MODEL MEMORY CPU STORAGE

More information

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University

Falcon: Scaling IO Performance in Multi-SSD Volumes. The George Washington University Falcon: Scaling IO Performance in Multi-SSD Volumes Pradeep Kumar H Howie Huang The George Washington University SSDs in Big Data Applications Recent trends advocate using many SSDs for higher throughput

More information

Mnemosyne Lightweight Persistent Memory

Mnemosyne Lightweight Persistent Memory Mnemosyne Lightweight Persistent Memory Haris Volos Andres Jaan Tack, Michael M. Swift University of Wisconsin Madison Executive Summary Storage-Class Memory (SCM) enables memory-like storage Persistent

More information

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, and Jie Wu, Huazhong University of Science and Technology https://www.usenix.org/conference/osdi18/presentation/zuo

More information

Characterizing Multi-threaded Applications based on Shared-Resource Contention

Characterizing Multi-threaded Applications based on Shared-Resource Contention Characterizing Multi-threaded Applications based on Shared-Resource Contention Tanima Dey Wei Wang Jack W. Davidson Mary Lou Soffa Department of Computer Science University of Virginia Charlottesville,

More information

Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs

Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs Sudarsun Kannan College of Computing, Georgia Tech sudarsun@gatech.edu Moinuddin Qureshi School of ECE, Georgia Tech

More information

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Chen, Vijay Chidambaram, Emmett Witchel The University of Texas at Austin

More information

Architectural Support for Atomic Durability in Non-Volatile Memory

Architectural Support for Atomic Durability in Non-Volatile Memory Architectural Support for Atomic Durability in Non-Volatile Memory Arpit Joshi, Vijay Nagarajan, Stratis Viglas, Marcelo Cintra NVMW 2018 Summary Non-Volatile Memory (NVM) - on the memory bus enables in-memory

More information

Hardware Support for NVM Programming

Hardware Support for NVM Programming Hardware Support for NVM Programming 1 Outline Ordering Transactions Write endurance 2 Volatile Memory Ordering Write-back caching Improves performance Reorders writes to DRAM STORE A STORE B CPU CPU B

More information

SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory

SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory Ellis Giles Rice University Houston, Texas erg@rice.edu Kshitij Doshi Intel Corp. Portland, OR kshitij.a.doshi@intel.com

More information

The SNIA NVM Programming Model. #OFADevWorkshop

The SNIA NVM Programming Model. #OFADevWorkshop The SNIA NVM Programming Model #OFADevWorkshop Opportunities with Next Generation NVM NVMe & STA SNIA 2 NVM Express/SCSI Express: Optimized storage interconnect & driver SNIA NVM Programming TWG: Optimized

More information

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E)

RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E) RECOVERY CHAPTER 21,23 (6/E) CHAPTER 17,19 (5/E) 2 LECTURE OUTLINE Failures Recoverable schedules Transaction logs Recovery procedure 3 PURPOSE OF DATABASE RECOVERY To bring the database into the most

More information

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based?

Agenda. Designing Transactional Memory Systems. Why not obstruction-free? Why lock-based? Agenda Designing Transactional Memory Systems Part III: Lock-based STMs Pascal Felber University of Neuchatel Pascal.Felber@unine.ch Part I: Introduction Part II: Obstruction-free STMs Part III: Lock-based

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

UNIT 9 Crash Recovery. Based on: Text: Chapter 18 Skip: Section 18.7 and second half of 18.8

UNIT 9 Crash Recovery. Based on: Text: Chapter 18 Skip: Section 18.7 and second half of 18.8 UNIT 9 Crash Recovery Based on: Text: Chapter 18 Skip: Section 18.7 and second half of 18.8 Learning Goals Describe the steal and force buffer policies and explain how they affect a transaction s properties

More information

RAMP-White / FAST-MP

RAMP-White / FAST-MP RAMP-White / FAST-MP Hari Angepat and Derek Chiou Electrical and Computer Engineering University of Texas at Austin Supported in part by DOE, NSF, SRC,Bluespec, Intel, Xilinx, IBM, and Freescale RAMP-White

More information

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang Datacenter 3 Monolithic Computer OS / Hypervisor 4 Can monolithic Application Hardware

More information

Closing the Performance Gap Between Volatile and Persistent K-V Stores

Closing the Performance Gap Between Volatile and Persistent K-V Stores Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs

More information

SFS: Random Write Considered Harmful in Solid State Drives

SFS: Random Write Considered Harmful in Solid State Drives SFS: Random Write Considered Harmful in Solid State Drives Changwoo Min 1, 2, Kangnyeon Kim 1, Hyunjin Cho 2, Sang-Won Lee 1, Young Ik Eom 1 1 Sungkyunkwan University, Korea 2 Samsung Electronics, Korea

More information

An Efficient Memory-Mapped Key-Value Store for Flash Storage

An Efficient Memory-Mapped Key-Value Store for Flash Storage An Efficient Memory-Mapped Key-Value Store for Flash Storage Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas Institute of Computer Science (ICS) Foundation for Research

More information

New Abstractions for Fast Non-Volatile Storage

New Abstractions for Fast Non-Volatile Storage New Abstractions for Fast Non-Volatile Storage Joel Coburn, Adrian Caulfield, Laura Grupp, Ameen Akel, Steven Swanson Non-volatile Systems Laboratory Department of Computer Science and Engineering University

More information

Topics. " Start using a write-ahead log on disk " Log all updates Commit

Topics.  Start using a write-ahead log on disk  Log all updates Commit Topics COS 318: Operating Systems Journaling and LFS Copy on Write and Write Anywhere (NetApp WAFL) File Systems Reliability and Performance (Contd.) Jaswinder Pal Singh Computer Science epartment Princeton

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 1 (R&G ch. 18) Last Class Basic Timestamp Ordering Optimistic Concurrency

More information

JOURNALING techniques have been widely used in modern

JOURNALING techniques have been widely used in modern IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, XXXX 2018 1 Optimizing File Systems with a Write-efficient Journaling Scheme on Non-volatile Memory Xiaoyi Zhang, Dan Feng, Member, IEEE, Yu Hua, Senior

More information

Multiple-Writer Distributed Memory

Multiple-Writer Distributed Memory Multiple-Writer Distributed Memory The Sequential Consistency Memory Model sequential processors issue memory ops in program order P1 P2 P3 Easily implemented with shared bus. switch randomly set after

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

Operating Systems. File Systems. Thomas Ropars.

Operating Systems. File Systems. Thomas Ropars. 1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating

More information

NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems

NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems Jun Yang 1, Qingsong Wei 1, Cheng Chen 1, Chundong Wang 1, Khai Leong Yong 1 and Bingsheng He 2 1 Data Storage Institute, A-STAR, Singapore

More information

RDMA Requirements for High Availability in the NVM Programming Model

RDMA Requirements for High Availability in the NVM Programming Model RDMA Requirements for High Availability in the NVM Programming Model Doug Voigt HP Agenda NVM Programming Model Motivation NVM Programming Model Overview Remote Access for High Availability RDMA Requirements

More information

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo

Lecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo Lecture 21: Logging Schemes 15-445/645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction

More information

Failure-atomic Synchronization-free Regions

Failure-atomic Synchronization-free Regions Failure-atomic Synchronization-free Regions Vaibhav Gogte, Stephan Diestelhorst $, William Wang $, Satish Narayanasamy, Peter M. Chen, Thomas F. Wenisch NVMW 2018, San Diego, CA 03/13/2018 $ Promise of

More information

Deterministic Process Groups in

Deterministic Process Groups in Deterministic Process Groups in Tom Bergan Nicholas Hunt, Luis Ceze, Steven D. Gribble University of Washington A Nondeterministic Program global x=0 Thread 1 Thread 2 t := x x := t + 1 t := x x := t +

More information

Reminder from last time

Reminder from last time Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer INSPECTOR: Data Provenance Using Intel Processor Trace (PT) Citation for published version: Thalheim, J, Bhatotia, P & Fetzer, C 2016, INSPECTOR: Data Provenance Using Intel

More information

Hierarchical PLABs, CLABs, TLABs in Hotspot

Hierarchical PLABs, CLABs, TLABs in Hotspot Hierarchical s, CLABs, s in Hotspot Christoph M. Kirsch ck@cs.uni-salzburg.at Hannes Payer hpayer@cs.uni-salzburg.at Harald Röck hroeck@cs.uni-salzburg.at Abstract Thread-local allocation buffers (s) are

More information

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan LegoOS A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang Y 4 1 2 Monolithic Server OS / Hypervisor 3 Problems? 4 cpu mem Resource

More information

High Performance Transactions in Deuteronomy

High Performance Transactions in Deuteronomy High Performance Transactions in Deuteronomy Justin Levandoski, David Lomet, Sudipta Sengupta, Ryan Stutsman, and Rui Wang Microsoft Research Overview Deuteronomy: componentized DB stack Separates transaction,

More information

Farewell to Servers: Resource Disaggregation

Farewell to Servers: Resource Disaggregation Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation Yiying Zhang 2 Monolithic Computer OS / Hypervisor 3 Can monolithic Application Hardware servers

More information

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research

More information

VMM Emulation of Intel Hardware Transactional Memory

VMM Emulation of Intel Hardware Transactional Memory VMM Emulation of Intel Hardware Transactional Memory Maciej Swiech, Kyle Hale, Peter Dinda Northwestern University V3VEE Project www.v3vee.org Hobbes Project 1 What will we talk about? We added the capability

More information

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam TokuDB internals Percona team, Vlad Lesin, Sveta Smirnova Slides plan Introduction in Fractal Trees and TokuDB Files Block files Fractal

More information

Instant Recovery for Main-Memory Databases

Instant Recovery for Main-Memory Databases Instant Recovery for Main-Memory Databases Ismail Oukid*, Wolfgang Lehner*, Thomas Kissinger*, Peter Bumbulis, and Thomas Willhalm + *TU Dresden SAP SE + Intel GmbH CIDR 2015, Asilomar, California, USA,

More information

Performance Issues in Parallelization. Saman Amarasinghe Fall 2010

Performance Issues in Parallelization. Saman Amarasinghe Fall 2010 Performance Issues in Parallelization Saman Amarasinghe Fall 2010 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date

More information

Distributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck)

Distributed Memory and Cache Consistency. (some slides courtesy of Alvin Lebeck) Distributed Memory and Cache Consistency (some slides courtesy of Alvin Lebeck) Software DSM 101 Software-based distributed shared memory (DSM) provides anillusionofsharedmemoryonacluster. remote-fork

More information

SHERIFF: Precise Detection and Automatic Mitigation of False Sharing

SHERIFF: Precise Detection and Automatic Mitigation of False Sharing SHERIFF: Precise Detection and Automatic Mitigation of False Sharing Tongping Liu Emery D. Berger Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 {tonyliu,emery}@cs.umass.edu

More information

Don t stack your Log on my Log

Don t stack your Log on my Log Don t stack your Log on my Log Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, Swaminathan Sundararaman Oct 5, 2014 c 1 Outline Introduction Log-stacking models Problems with stacking logs Solutions

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs

Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs Runtime Data Management on Non-volatile Memory-based Heterogeneous Memory for Task-Parallel Programs Kai Wu Jie Ren University of California, Merced PASA Lab Dong Li SC 18 1 Non-volatile Memory is Promising

More information

Performance Issues in Parallelization Saman Amarasinghe Fall 2009

Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Performance Issues in Parallelization Saman Amarasinghe Fall 2009 Today s Lecture Performance Issues of Parallelism Cilk provides a robust environment for parallelization It hides many issues and tries

More information

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM

Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM Presented by Steve Graves, McObject and Jeff Chang, AgigA Tech Santa Clara, CA 1 The Problem: Memory Latency NON-VOLATILE MEMORY HIERARCHY

More information

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)

Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Goal A Distributed Transaction We want a transaction that involves multiple nodes Review of transactions and their properties

More information

Distributed caching for cloud computing

Distributed caching for cloud computing Distributed caching for cloud computing Maxime Lorrillere, Julien Sopena, Sébastien Monnet et Pierre Sens February 11, 2013 Maxime Lorrillere (LIP6/UPMC/CNRS) February 11, 2013 1 / 16 Introduction Context

More information

Problems Caused by Failures

Problems Caused by Failures Problems Caused by Failures Update all account balances at a bank branch. Accounts(Anum, CId, BranchId, Balance) Update Accounts Set Balance = Balance * 1.05 Where BranchId = 12345 Partial Updates - Lack

More information

THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE

THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE THE IN-PLACE WORKING STORAGE TIER OPPORTUNITIES FOR SOFTWARE INNOVATORS KEN GIBSON, INTEL, DIRECTOR MEMORY SW ARCHITECTURE I/O LATENCY WILL SOON EXCEED MEDIA LATENCY 30 NVM Tread 25 NVM xfer Controller

More information

* Contributed while interning at SAP. September 1 st, 2017 PUBLIC

* Contributed while interning at SAP. September 1 st, 2017 PUBLIC Adaptive Recovery for SCM-Enabled Databases Ismail Oukid (TU Dresden & SAP), Daniel Bossle* (SAP), Anisoara Nica (SAP), Peter Bumbulis (SAP), Wolfgang Lehner (TU Dresden), Thomas Willhalm (Intel) * Contributed

More information

SLM-DB: Single-Level Key-Value Store with Persistent Memory

SLM-DB: Single-Level Key-Value Store with Persistent Memory SLM-DB: Single-Level Key-Value Store with Persistent Memory Olzhas Kaiyrakhmet and Songyi Lee, UNIST; Beomseok Nam, Sungkyunkwan University; Sam H. Noh and Young-ri Choi, UNIST https://www.usenix.org/conference/fast19/presentation/kaiyrakhmet

More information

Beyond Block I/O: Rethinking

Beyond Block I/O: Rethinking Beyond Block I/O: Rethinking Traditional Storage Primitives Xiangyong Ouyang *, David Nellans, Robert Wipfel, David idflynn, D. K. Panda * * The Ohio State University Fusion io Agenda Introduction and

More information

arxiv: v2 [cs.dc] 2 May 2017

arxiv: v2 [cs.dc] 2 May 2017 High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing Yingchao Huang University of California, Merced yhuang46@ucmerced.edu Kai Wu University of California,

More information

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger

Data Criticality in Network-On-Chip Design. Joshua San Miguel Natalie Enright Jerger Data Criticality in Network-On-Chip Design Joshua San Miguel Natalie Enright Jerger Network-On-Chip Efficiency Efficiency is the ability to produce results with the least amount of waste. Wasted time Wasted

More information

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1

Memory Management. Disclaimer: some slides are adopted from book authors slides with permission 1 Memory Management Disclaimer: some slides are adopted from book authors slides with permission 1 CPU management Roadmap Process, thread, synchronization, scheduling Memory management Virtual memory Disk

More information

Distributed Shared Persistent Memory

Distributed Shared Persistent Memory Yizhou Shan Purdue University shan13@purdue.edu Shin-Yeh Tsai Purdue University tsai46@purdue.edu Yiying Zhang Purdue University yiying@purdue.edu ABSTRACT Next-generation non-volatile memories (NVMs)

More information

Accessing NVM Locally and over RDMA Challenges and Opportunities

Accessing NVM Locally and over RDMA Challenges and Opportunities Accessing NVM Locally and over RDMA Challenges and Opportunities Wendy Elsasser Megan Grodowitz William Wang MSST - May 2018 Emerging NVM A wide variety of technologies with varied characteristics Address

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information