WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems
|
|
- Tabitha Marsh
- 6 years ago
- Views:
Transcription
1 WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1, Hyunsub Song, Beomseok Nam, Sam H. Noh UNIST 1 Hongik University
2 Persistent Memory (PM) Persistent memory is expected to replace both DRAM & NAND NAND STT-MRAM PCM DRAM Non-volatility o o o x Read (ns) 2.5 X Write (ns) 2 X Byte-addressable x o o o Density Gbit/cm Gbit/cm Gbit/cm Gbit/cm 2 K. Suzuki and S. Swanson. A Survey of Trends in Non-Volatile Memory Technologies: , IMW 2015 Non-volatile High performance Persistent Memory 2
3 Indexing Structure for PM Storage Systems B+Tree
4 Consistency Issue of B+tree in PM B+tree is a block-based index Key sorting à Block granularity write Rebalancing à Multi-blocks granularity write Persistent memory Byte-addressable à Byte granularity write Write reordering Can result in consistency problem 4
5 Consistency Issue of B+tree in PM Traditional case Volatile CPU Caches Write reordering Not persistent data 3 DRAM Non-volatile Block based storage Block granularity update
6 Consistency Issue of B+tree in PM PM case Volatile CPU Caches Byte granularity update Write reordering Non-volatile Crash Persistent Memory Persistent data Garbage data persistently stored 6
7 Primitives for Data Consistency in PM Durability CLFLUSH (Flush cache line) Can be reordered Ordering MFENCE (Load and Store fence) Order CPU cache line flush instructions Volatile Non-volatile CPU Caches Persistent Memory 7
8 Primitives for Data Consistency in PM Durability CLFLUSH (Flush cache line) Can be reordered Ordering MFENCE (Load and Store fence) Order CPU cache line flush instructions Volatile Non-volatile CPU Serialization of CLFLUSH and MFENCE is known to cause large overhead CPU Caches Persistent Memory 8
9 Primitives for Data Consistency in PM Atomicity 8-byte failure atomicity Need only CLFLUSH Logging or CoW based atomicity Non-volatile (more than 8 bytes) Log area Data area Requires duplicate copies
10 Primitives for Data Consistency in PM Atomicity 8-byte failure atomicity Need only CLFLUSH Logging or CoW based atomicity (more than 8 bytes) Non-volatile Logging increases cache line Log areaflush overhead Data area Requires duplicate copies
11 B+tree Variants for Persistent Memory How can we ensure consistency using failure-atomic writes without logging? Unsorted keys à Append-only with metadata Failure-atomic update of metadata wb+tree (VLDB 15) NVTree (FAST 15) FPTree (SIGMOD 16) Slot array bmp K1 Kn P1 Pn Entry Cnt. Flag (+/-) K1 P1 Flag (+/-) K2 P2 Flag (+/-) K3 P3 bmp Fingerprints K1 P1 Kn Pn Pnext Unsorted key à Decreases search performance 11
12 B+tree Variants for Persistent Memory Logging still necessary Multi-block granularity updates due to node splits and merges Cannot update atomically 35 New key Overflow Split Logging-based solution wb+tree, FPTree Tree reconstruction based solution NVTree large overhead 12
13 B+tree Variants for Persistent Memory Key sorting Fundamental characteristics of B+tree cause problems Rebalancing Overflow New key Split
14 B+tree Variants for Persistent Memory Key sorting Why use B+ trees in the first place? Fundamental characteristics of B+tree cause problems Perhaps Rebalancing there is a better tree data structure more suited for PM? Overflow New key Split
15 Our Contributions Show Radix Tree is a suitable data structure for PM Propose optimal radix tree variants WORT and WOART WORT: Write Optimal Radix Tree WOART: Write Optimal redesigned Adaptive Radix Tree (ART) Optimal: maintain consistency only with single failure-atomic write without any duplicate copies 15
16 Radix Tree Deterministic structure A C C A A C Z C ACA ACC ACZ CAC 16
17 Radix Tree Deterministic structure No key comparison A C C A A C Z C ACA ACC ACZ CAC 17
18 Radix Tree Deterministic structure No key comparison Only 8-byte pointer entries Implicitly stored keys 8-byte pointer A C C A A C Z C ACA ACC ACZ CAC 18
19 Radix Tree Deterministic structure No key comparison Only 8-byte pointer entries Implicitly stored keys No problem caused by key sorting A C N A A C Z C ACA ACC ACZ CAC 19
20 Radix Tree Deterministic structure No key comparison Only 8-byte pointer entries Implicitly stored keys No problem caused by key sorting A C N A No modification of other keys Single 8-byte pointer write per node Easy to use failure-atomic write A C Z C ACA ACC ACZ CAC 20
21 Problem of Deterministic Structure For sparse key distribution Waste excessive memory space à Optimized through path compression High utilization Low utilization key key key key key key key key 21
22 Path Compression in Radix Tree Path compression Search paths that do not need to be distinguished can be removed Unnecessary search path A C C A A C Z ACA ACC ACZ C CAC 22
23 Path Compression in Radix Tree Path compression Common search path is compressed in header Improve memory utilization & indexing performance A Compression header C A C Z ACA ACC ACZ 23
24 Node Split with Path Compression Path compression split Prefix keys are not equal AZ!= AC AZA to be inserted A C A C Z ACA ACC ACZ 24
25 Node Split with Path Compression Path compression split Split A C 1 New parent allocation C Z A C A C Z AZA ACA ACC ACZ 25
26 Node Split with Path Compression Path compression split 2 Decompression of old common prefix AC A A C Z C Z AZA ACA ACC ACZ 26
27 Node Split with Path Compression Path compression split A C However, this split process causes consistency Z 2 Old common prefix update problem in PM. AC A C Z ACA ACC ACZ AZA 27
28 Path compression Problem in PM 28
29 Consistency Issue of Path Compression Path compression split cause updates of multiple nodes have to employ expensive logging methods Consistent state A C Z AZA A C Z Inconsistent state A C Crash ACA ACC ACZ 29
30 Path compression Solution 30
31 WORT (Write-Optimal Radix Tree) for PM Failure-atomic path compression Add node depth field to compression header Compression header (8 bytes) struct Header { unsigned char depth; unsigned char PrefixArr[7]; } 0 A C A C Z ACA ACC ACZ 31
32 WORT (Write-Optimal Radix Tree) for PM Failure-atomic path compression Add node depth field to compression header AZA to be inserted Compression header (8 bytes) 0 A C A C Z ACA ACC ACZ 32
33 WORT (Write-Optimal Radix Tree) for PM Failure-atomic path compression Add node depth field to compression header Compression header (8 bytes) 0 A Consistent state 2 2 Decompression of old common prefix Inconsistent state Crash 0 A C C A C Z ACA ACC ACZ Z AZA 33
34 WORT (Write-Optimal Radix Tree) for PM Failure-atomic path compression Failure detection in WORT Depth in a header Counted depth à Crashed header Compression header (8 bytes) 0 A C Z Inconsistent state 0 A C AZA A C Z Not equal to expected tree depth (2) ACA ACC ACZ 34
35 WORT (Write-Optimal Radix Tree) for PM Failure-atomic path compression Failure recovery in WORT Compression header can be reconstructed à Atomically overwrite Compression header (8 bytes) ACA ACC Consistent state 2 0 A C Z Inconsistent state 0 A C AZA A C Z ACA ACC ACZ 35
36 Write Optimal Data Structure for PM Our proposed radix tree variant is optimal for PM Consistency is always guaranteed with a single 8-byte failure-atomic write without any additional copies for logging or CoW WORT (Write Optimal Radix Tree) WOART (Write Optimal Adaptive Radix Tree) 1. Failure-atomic path compression 2. Redesigned adaptive node 36
37 Evaluation Experimental environment System configuration Description CPU Intel Xeon E5-2620V3 X 2 OS PM Linux CentOS 6.6 (64bit) kernel v4.7.0 Emulated with 256GB DRAM Write latency: Injecting additional stall cycles 37
38 Evaluation Experimental environment Comparison group Radix tree variants B+tree variants WORT wb+tree (VLDB 15) NVTree (FAST 15) FPTree (SIGMOD 16) PM PM DRAM DRAM 38
39 Evaluation Experimental environment Synthetic Workload Characteristics Dense [1 N] Sparse [ ) N Clustered [ ) 39
40 Evaluation Insertion performance WORT outperform the B+tree variants in general 40
41 Evaluation Insertion performance WORT outperform the B+tree variants in general DRAM-based internal node à more favorable performance for FPTree 41
42 Evaluation Insertion performance WORT vs wb+tree Performance differences increase in proportion to write latency 42
43 Evaluation CLFLUSH count per operation B-tree variants incur more cache flush instructions 43
44 Evaluation Search performance WORT always perform better than B+Tree variants 44
45 Evaluation Range query performance Performance gap for range query decreases for PM indexes compared with it between WORT and original B+Tree B+Tree variants do not keep the keys sorted à Rearrangement overhead 45
46 Evaluation MC-benchmark performance on Memcached WORT outperform B+Tree variants in both SET and GET Additional indirection & flush overhead in B-tree variants 46
47 Conclusion Showed suitability of radix tree as PM indexing structure Proposed optimal radix tree variants WORT and WOART Optimal: maintain consistency only with single failure-atomic write without any duplicate copies 47
48 Thank you Any question? 48
Deukyeon Hwang UNIST. Wook-Hee Kim UNIST. Beomseok Nam UNIST. Hanyang Univ.
Deukyeon Hwang UNIST Wook-Hee Kim UNIST Youjip Won Hanyang Univ. Beomseok Nam UNIST Fast but Asymmetric Access Latency Non-Volatility Byte-Addressability Large Capacity CPU Caches (Volatile) Persistent
More informationEndurable Transient Inconsistency in Byte- Addressable Persistent B+-Tree
Endurable Transient Inconsistency in Byte- Addressable Persistent B+-Tree Deukyeon Hwang and Wook-Hee Kim, UNIST; Youjip Won, Hanyang University; Beomseok Nam, UNIST https://www.usenix.org/conference/fast18/presentation/hwang
More informationSLM-DB: Single-Level Key-Value Store with Persistent Memory
SLM-DB: Single-Level Key-Value Store with Persistent Memory Olzhas Kaiyrakhmet and Songyi Lee, UNIST; Beomseok Nam, Sungkyunkwan University; Sam H. Noh and Young-ri Choi, UNIST https://www.usenix.org/conference/fast19/presentation/kaiyrakhmet
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, Jie Wu Huazhong University of Science and Technology, China 3th USENIX Symposium on Operating Systems
More informationNV-Tree Reducing Consistency Cost for NVM-based Single Level Systems
NV-Tree Reducing Consistency Cost for NVM-based Single Level Systems Jun Yang 1, Qingsong Wei 1, Cheng Chen 1, Chundong Wang 1, Khai Leong Yong 1 and Bingsheng He 2 1 Data Storage Institute, A-STAR, Singapore
More informationBzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory
BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory JOY ARULRAJ JUSTIN LEVANDOSKI UMAR FAROOQ MINHAS PER-AKE LARSON Microsoft Research NON-VOLATILE MEMORY [NVM] PERFORMANCE DRAM VOLATILE
More informationSAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory
SAY-Go: Towards Transparent and Seamless Storage-As-You-Go with Persistent Memory Hyeonho Song, Sam H. Noh UNIST HotStorage 2018 Contents Persistent Memory Motivation SAY-Go Design Implementation Evaluation
More informationSystem Software for Persistent Memory
System Software for Persistent Memory Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran and Jeff Jackson 72131715 Neo Kim phoenixise@gmail.com Contents
More informationJOURNALING techniques have been widely used in modern
IEEE TRANSACTIONS ON COMPUTERS, VOL. XX, NO. X, XXXX 2018 1 Optimizing File Systems with a Write-efficient Journaling Scheme on Non-volatile Memory Xiaoyi Zhang, Dan Feng, Member, IEEE, Yu Hua, Senior
More informationWrite-Optimized Dynamic Hashing for Persistent Memory
Write-Optimized Dynamic Hashing for Persistent Memory Moohyeon Nam, UNIST (Ulsan National Institute of Science and Technology); Hokeun Cha, Sungkyunkwan University; Young-ri Choi and Sam H. Noh, UNIST
More informationWrite-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory Pengfei Zuo, Yu Hua, and Jie Wu, Huazhong University of Science and Technology https://www.usenix.org/conference/osdi18/presentation/zuo
More informationDalí: A Periodically Persistent Hash Map
Dalí: A Periodically Persistent Hash Map Faisal Nawab* 1, Joseph Izraelevitz* 2, Terence Kelly*, Charles B. Morrey III*, Dhruva R. Chakrabarti*, and Michael L. Scott 2 1 Department of Computer Science
More informationHardware Support for NVM Programming
Hardware Support for NVM Programming 1 Outline Ordering Transactions Write endurance 2 Volatile Memory Ordering Write-back caching Improves performance Reorders writes to DRAM STORE A STORE B CPU CPU B
More informationPhase Change Memory An Architecture and Systems Perspective
Phase Change Memory An Architecture and Systems Perspective Benjamin C. Lee Stanford University bcclee@stanford.edu Fall 2010, Assistant Professor @ Duke University Benjamin C. Lee 1 Memory Scaling density,
More informationarxiv: v1 [cs.dc] 3 Jan 2019
A Secure and Persistent Memory System for Non-volatile Memory Pengfei Zuo *, Yu Hua *, Yuan Xie * Huazhong University of Science and Technology University of California, Santa Barbara arxiv:1901.00620v1
More informationBlurred Persistence in Transactional Persistent Memory
Blurred Persistence in Transactional Persistent Memory Youyou Lu, Jiwu Shu, Long Sun Tsinghua University Overview Problem: high performance overhead in ensuring storage consistency of persistent memory
More informationSTORAGE LATENCY x. RAMAC 350 (600 ms) NAND SSD (60 us)
1 STORAGE LATENCY 2 RAMAC 350 (600 ms) 1956 10 5 x NAND SSD (60 us) 2016 COMPUTE LATENCY 3 RAMAC 305 (100 Hz) 1956 10 8 x 1000x CORE I7 (1 GHZ) 2016 NON-VOLATILE MEMORY 1000x faster than NAND 3D XPOINT
More informationAerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song
Aerie: Flexible File-System Interfaces to Storage-Class Memory [Eurosys 2014] Operating System Design Yongju Song Outline 1. Storage-Class Memory (SCM) 2. Motivation 3. Design of Aerie 4. File System Features
More informationNon-Volatile Memory Through Customized Key-Value Stores
Non-Volatile Memory Through Customized Key-Value Stores Leonardo Mármol 1 Jorge Guerra 2 Marcos K. Aguilera 2 1 Florida International University 2 VMware L. Mármol, J. Guerra, M. K. Aguilera (FIU and VMware)
More informationPhase Change Memory An Architecture and Systems Perspective
Phase Change Memory An Architecture and Systems Perspective Benjamin Lee Electrical Engineering Stanford University Stanford EE382 2 December 2009 Benjamin Lee 1 :: PCM :: 2 Dec 09 Memory Scaling density,
More informationNVthreads: Practical Persistence for Multi-threaded Applications
NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu*, Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster,
More informationMnemosyne Lightweight Persistent Memory
Mnemosyne Lightweight Persistent Memory Haris Volos Andres Jaan Tack, Michael M. Swift University of Wisconsin Madison Executive Summary Storage-Class Memory (SCM) enables memory-like storage Persistent
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationFailure-Atomic Slotted Paging for Persistent Memory
Failure-Atomic Slotted Paging for Persistent Memory Jihye Seo Wook-ee Kim Woongki Baek Beomseok am Sam. oh UIS (Ulsan ational Institute of Science and echnology) {jihye,okie90,wbaek,bsnam,samhnoh}@unist.ac.kr
More informationLoose-Ordering Consistency for Persistent Memory
Loose-Ordering Consistency for Persistent Memory Youyou Lu 1, Jiwu Shu 1, Long Sun 1, Onur Mutlu 2 1 Tsinghua University 2 Carnegie Mellon University Summary Problem: Strict write ordering required for
More informationFine-grained Metadata Journaling on NVM
32nd International Conference on Massive Storage Systems and Technology (MSST 2016) May 2-6, 2016 Fine-grained Metadata Journaling on NVM Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue
More informationAccessing NVM Locally and over RDMA Challenges and Opportunities
Accessing NVM Locally and over RDMA Challenges and Opportunities Wendy Elsasser Megan Grodowitz William Wang MSST - May 2018 Emerging NVM A wide variety of technologies with varied characteristics Address
More informationNVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory
NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory Dhananjoy Das, Sr. Systems Architect SanDisk Corp. 1 Agenda: Applications are KING! Storage landscape (Flash / NVM)
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationSoft Updates Made Simple and Fast on Non-volatile Memory
Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University @ NVMW 18 Non-volatile Memory (NVM) ü Non-volatile
More informationEECS 482 Introduction to Operating Systems
EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha OS Abstractions Applications Threads File system Virtual memory Operating System Next few lectures:
More informationAddressing Scalability and Consistency Issues in Hybrid File System for BPRAM and NAND Flash
7th IEEE International Workshop on Storage Network Architecture and Parallel I/O SNAPI 2011 Denver, Colorado May 25, 2011 Addressing Scalability and Consistency Issues in Hybrid File System for BPRAM and
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationSoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory
SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory Ellis Giles Rice University Houston, Texas erg@rice.edu Kshitij Doshi Intel Corp. Portland, OR kshitij.a.doshi@intel.com
More informationWhat You can Do with NVDIMMs. Rob Peglar President, Advanced Computation and Storage LLC
What You can Do with NVDIMMs Rob Peglar President, Advanced Computation and Storage LLC A Fundamental Change Requires An Ecosystem Windows Server 2016 Windows 10 Pro for Workstations Linux Kernel 4.2 and
More informationEMSOFT 09 Yangwook Kang Ethan L. Miller Hongik Univ UC Santa Cruz 2009/11/09 Yongseok Oh
RCFFS : Adding Aggressive Error Correction to a high-performance Compressing Flash File System EMSOFT 09 Yangwook Kang Ethan L. Miller Hongik Univ UC Santa Cruz 2009/11/09 Yongseok Oh ysoh@uos.ac.kr 1
More informationLoose-Ordering Consistency for Persistent Memory
Loose-Ordering Consistency for Persistent Memory Youyou Lu, Jiwu Shu, Long Sun and Onur Mutlu Department of Computer Science and Technology, Tsinghua University, Beijing, China State Key Laboratory of
More informationLazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique
Lazy Persistency: a High-Performing and Write-Efficient Software Persistency Technique Mohammad Alshboul, James Tuck, and Yan Solihin Email: maalshbo@ncsu.edu ARPERS Research Group Introduction Future
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationRedrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories
Redrawing the Boundary Between So3ware and Storage for Fast Non- Vola;le Memories Steven Swanson Director, Non- Vola;le System Laboratory Computer Science and Engineering University of California, San
More informationRelaxing Persistent Memory Constraints with Hardware-Driven Undo+Redo Logging
Relaxing Persistent Memory Constraints with Hardware-Driven Undo+Redo Logging Matheus A. Ogleari mogleari@ucsc.edu Ethan L. Miller elm@ucsc.edu University of California, Santa Cruz November 19, 2016 Jishen
More informationThe SNIA NVM Programming Model. #OFADevWorkshop
The SNIA NVM Programming Model #OFADevWorkshop Opportunities with Next Generation NVM NVMe & STA SNIA 2 NVM Express/SCSI Express: Optimized storage interconnect & driver SNIA NVM Programming TWG: Optimized
More informationThe Adaptive Radix Tree
Department of Informatics, University of Zürich MSc Basismodul The Adaptive Radix Tree Rafael Kallis Matrikelnummer: -708-887 Email: rk@rafaelkallis.com September 8, 08 supervised by Prof. Dr. Michael
More informationI/O and file systems. Dealing with device heterogeneity
I/O and file systems Abstractions provided by operating system for storage devices Heterogeneous -> uniform One/few storage objects (disks) -> many storage objects (files) Simple naming -> rich naming
More informationOperating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)
2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:
More information* Contributed while interning at SAP. September 1 st, 2017 PUBLIC
Adaptive Recovery for SCM-Enabled Databases Ismail Oukid (TU Dresden & SAP), Daniel Bossle* (SAP), Anisoara Nica (SAP), Peter Bumbulis (SAP), Wolfgang Lehner (TU Dresden), Thomas Willhalm (Intel) * Contributed
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationNew Abstractions for Fast Non-Volatile Storage
New Abstractions for Fast Non-Volatile Storage Joel Coburn, Adrian Caulfield, Laura Grupp, Ameen Akel, Steven Swanson Non-volatile Systems Laboratory Department of Computer Science and Engineering University
More informationAdvanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)
: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)
More informationLog-Free Concurrent Data Structures
Log-Free Concurrent Data Structures Abstract Tudor David IBM Research Zurich udo@zurich.ibm.com Rachid Guerraoui EPFL rachid.guerraoui@epfl.ch Non-volatile RAM (NVRAM) makes it possible for data structures
More informationMoneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories
Moneta: A High-Performance Storage Architecture for Next-generation, Non-volatile Memories Adrian M. Caulfield Arup De, Joel Coburn, Todor I. Mollov, Rajesh K. Gupta, Steven Swanson Non-Volatile Systems
More informationSoftware Design for Persistent Memory Systems. Howard Chu CTO, Symas Corp
Software Design for Persistent Memory Systems Howard Chu CTO, Symas Corp. hyc@symas.com 2018-03-07 Personal Intro Howard Chu Founder and CTO Symas Corp. Developing Free/Open Source software since 1980s
More informationCBM: A Cooperative Buffer Management for SSD
3 th International Conference on Massive Storage Systems and Technology (MSST 4) : A Cooperative Buffer Management for SSD Qingsong Wei, Cheng Chen, Jun Yang Data Storage Institute, A-STAR, Singapore June
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationBIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory
HotStorage 18 BIBIM: A Prototype Multi-Partition Aware Heterogeneous New Memory Gyuyoung Park 1, Miryeong Kwon 1, Pratyush Mahapatra 2, Michael Swift 2, and Myoungsoo Jung 1 Yonsei University Computer
More informationA Cost-efficient NVM-based Journaling Scheme for File Systems
2017 IEEE 35th International Conference on Computer Design A Cost-efficient NVM-based Journaling Scheme for File Systems Xiaoyi Zhang, Dan Feng, Yu Hua and Jianxi Chen Wuhan National Lab for Optoelectronics,
More informationEnergy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs
Energy Aware Persistence: Reducing Energy Overheads of Memory-based Persistence in NVMs Sudarsun Kannan College of Computing, Georgia Tech sudarsun@gatech.edu Moinuddin Qureshi School of ECE, Georgia Tech
More informationModularizing B+ Trees: Three Level B+ Trees Work Fine
Modularizing B+ Trees: Three Level B+ Trees Work Fine Shigero Sasaki s sasaki@di.jp.nec.com Cloud System Research Laboratories, NEC 7 Shimonumabe, Kawasaki, Kanagawa, Japan Takuya Araki t araki@dc.jp.nec.com
More informationAccelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740
Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740 A performance study with NVDIMM-N Dell EMC Engineering September 2017 A Dell EMC document category Revisions Date
More informationPCR*-Tree: PCM-aware R*-Tree
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 32, XXXX-XXXX (2016) PCR*-Tree: PCM-aware R*-Tree ELKHAN JABAROV AND MYONG-SOON PARK 1 Department of Computer and Radio Communications Engineering Korea University,
More informationOperating Systems. File Systems. Thomas Ropars.
1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating
More informationA New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.
A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance
More informationPASTE: A Network Programming Interface for Non-Volatile Main Memory
PASTE: A Network Programming Interface for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Giuseppe Lettieri (Università di Pisa) Lars Eggert and Douglas Santry (NetApp) USENIX NSDI 2018
More informationFine-grained Metadata Journaling on NVM
Fine-grained Metadata Journaling on NVM Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue Email:{CHEN Cheng, yangju, WEI Qingsong, wangc, XUE Mingdi}@dsi.a-star.edu.sg Data Storage Institute,
More informationEnabling Persistent Memory Use in Java. Steve Dohrmann Sr. Staff Software Engineer, Intel
Enabling Persistent Memory Use in Java Steve Dohrmann Sr. Staff Software Engineer, Intel Motivation Java is a very popular language on servers, especially for databases, data grids, etc., e.g. Apache projects:
More informationChapter 11: Implementing File Systems
Silberschatz 1 Chapter 11: Implementing File Systems Thursday, November 08, 2007 9:55 PM File system = a system stores files on secondary storage. A disk may have more than one file system. Disk are divided
More informationManaging Array of SSDs When the Storage Device is No Longer the Performance Bottleneck
Managing Array of Ds When the torage Device is No Longer the Performance Bottleneck Byung. Kim, Jaeho Kim, am H. Noh UNIT (Ulsan National Institute of cience & Technology) Outline Motivation & Observation
More informationTrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa
TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa EPL646: Advanced Topics in Databases Christos Hadjistyllis
More informationHashKV: Enabling Efficient Updates in KV Storage via Hashing
HashKV: Enabling Efficient Updates in KV Storage via Hashing Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, Yinlong Xu The Chinese University of Hong Kong University of Science and Technology of China
More informationData Organization and Processing
Data Organization and Processing Indexing Techniques for Solid State Drives (NDBI007) David Hoksza http://siret.ms.mff.cuni.cz/hoksza Outline SSD technology overview Motivation for standard algorithms
More informationClosing the Performance Gap Between Volatile and Persistent K-V Stores
Closing the Performance Gap Between Volatile and Persistent K-V Stores Yihe Huang, Harvard University Matej Pavlovic, EPFL Virendra Marathe, Oracle Labs Margo Seltzer, Oracle Labs Tim Harris, Oracle Labs
More informationCOS 318: Operating Systems. Journaling, NFS and WAFL
COS 318: Operating Systems Journaling, NFS and WAFL Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Topics Journaling and LFS Network
More informationPhysical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.
Physical Disk Structure Physical Data Organization and Indexing Chapter 11 1 4 Access Path Refers to the algorithm + data structure (e.g., an index) used for retrieving and storing data in a table The
More informationAn Efficient Memory-Mapped Key-Value Store for Flash Storage
An Efficient Memory-Mapped Key-Value Store for Flash Storage Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas Institute of Computer Science (ICS) Foundation for Research
More informationInstant Recovery for Main-Memory Databases
Instant Recovery for Main-Memory Databases Ismail Oukid*, Wolfgang Lehner*, Thomas Kissinger*, Peter Bumbulis, and Thomas Willhalm + *TU Dresden SAP SE + Intel GmbH CIDR 2015, Asilomar, California, USA,
More informationP2FS: supporting atomic writes for reliable file system design in PCM storage
LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,
More informationMemory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017]
Memory Management Techniques for Large-Scale Persistent-Main-Memory Systems [VLDB 2017] Ismail Oukid, Daniel Booss, Adrien Lespinasse, Wolfgang Lehner, Thomas Willhalm, Grégoire Gomes PUBLIC Non-Volatile
More informationInstructor: Amol Deshpande
Instructor: Amol Deshpande amol@cs.umd.edu } Storage and Query Processing Storage and memory hierarchy Query plans and how to interpret EXPLAIN output } Other things Midterm grades: later today Look out
More informationCSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale
CSE 120: Principles of Operating Systems Lecture 10 File Systems November 6, 2003 Prof. Joe Pasquale Department of Computer Science and Engineering University of California, San Diego 2003 by Joseph Pasquale
More informationFlash Memory Summit Persistent Memory - NVDIMMs
Flash Memory Summit 2018 Persistent Memory - NVDIMMs Contents Persistent Memory Overview NVDIMM Conclusions 2 Persistent Memory Memory & Storage Convergence Today Volatile and non-volatile technologies
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data
More informationNOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System
NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationBenchmark: In-Memory Database System (IMDS) Deployed on NVDIMM
Benchmark: In-Memory Database System (IMDS) Deployed on NVDIMM Presented by Steve Graves, McObject and Jeff Chang, AgigA Tech Santa Clara, CA 1 The Problem: Memory Latency NON-VOLATILE MEMORY HIERARCHY
More informationStorage. Holger Pirk
Storage Holger Pirk Purpose of this lecture We are going down the rabbit hole now We are crossing the threshold into the system (No more rules without exceptions :-) Understand storage alternatives in-memory
More informationOvercoming System Memory Challenges with Persistent Memory and NVDIMM-P
Overcoming System Memory Challenges with Persistent Memory and NVDIMM-P JEDEC Server Forum 2017 Bill Gervasi, Discobolus Designs Copyright 2017 Jonathan Hinkle, Lenovo Datacenter Research and Technology
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationCS5460: Operating Systems Lecture 20: File System Reliability
CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More information2368 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 10, OCTOBER 2014
2368 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 10, OCTOBER 2014 B p -Tree: A Predictive B + -Tree for Reducing Writes on Phase Change Memory Weiwei Hu, Guoliang Li, Member, IEEE,
More informationFile Systems: Consistency Issues
File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data
More informationExt3/4 file systems. Don Porter CSE 506
Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers
More informationTopics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability
Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More informationManaging Non-Volatile Memory in Database Systems
Managing Non-Volatile Memory in Database Systems Alexander van Renen Technische Universität München renen@in.tum.de Thomas Neumann Technische Universität München neumann@in.tum.de Yoshiyasu Doi Fujitsu
More informationBenchmarking Persistent Memory in Computers
Benchmarking Persistent Memory in Computers Testing with MongoDB Presenter: Adam McPadden Co-Authors: Moshik Hershcovitch and Revital Eres August 2017 1 Overview Objective Background System Configuration
More informationEasyChair Preprint. LESS: Logging Exploiting SnapShot
EasyChair Preprint 692 LESS: Logging Exploiting SnapShot Hanseung Sung, Minhwa Jin, Mincheol Shin, Hongchan Roh, Wongi Choi and Sanghyun Park EasyChair preprints are intended for rapid dissemination of
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationDesign of Flash-Based DBMS: An In-Page Logging Approach
SIGMOD 07 Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon,, Korea 440-746 wonlee@ece.skku.ac.kr Bongki Moon Department of Computer
More information