Soft Updates Made Simple and Fast on Non-volatile Memory Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University @ NVMW 18
Non-volatile Memory (NVM) ü Non-volatile ü Byte-addressable ü High throughput and low latency 2
NVM File Systems (NVMFS) Existing NVMFS use journaling or copy-on-write for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency? A A File System Metadata Journal Area B C C D E E 3
NVM File Systems (NVMFS) Existing NVMFS use journaling or copy-on-write for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency? A A File System Metadata Journal Area B C C D E E 4
Soft Updates Latest metadata in DRAM Updated in DRAM with dependency tracked ü DRAM performance ü No synchronous disk writes Consistent metadata in disks Persisted to disks with dependency enforced ü Always consistent ü Immediately usable after crash DRAM (Page cache) DISK Traditional Soft Updates 5
Soft Updates Update dependencies E.g., allocating a new data block 1. Allocate in bitmap 2. Fill data in the block 3. Update pointer to the block block bitmap new data block 6
Soft Updates Is Complicated Delayed disk writes Auxiliary structures for each update More complex dependencies block bitmap new data block Figures from Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem, ATC 99 7
Soft Updates Is Complicated Delayed disk writes Auxiliary structures for each operation More complex dependencies Cyclic dependencies Rolling back/forward Block #4 #5 Inode #6 Inode #7 Block <A, #4> <--, #0> <E, #7> block (in page cache) #4 Rollback #6 #5 #6 #7 block #4 #5 #6 #7 Flush block to disks block #4 #5 #6 #7 Rollforward #6 block #4 #5 #6 #7 8
Soft Updates Is Complicated Delayed disk writes Auxiliary structures for each operation More complex dependencies Cyclic dependencies Rolling back/forward The mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks 9
Soft Updates Meets NVM Soft Updates ü No synchronous cache flushes ü Immediately usable after crash NVM: byte-addressable and fast ü ect write to NVM without delays ü No false sharing => no rolling back/forward ü Simple dependency tracking/enforcement 10
SoupFS A simple and fast NVMFS derived from soft updates Hashtable-based directories No false sharing Pointer-based dual views No synchronous cache flushes Semantic-aware dependency tracking/enforcement Simple dependency tracking/enforcement Get the best of both Soft Updates and NVM 11
Overview Background Design & Implementation Hashtable-based directories Pointer-based dual views Semantic-aware dependency tracking/enforcement Evaluation Conclusion 12
Overview Background Design & Implementation Hashtable-based directories Pointer-based dual views Semantic-aware dependency tracking/enforcement Evaluation Conclusion 13
Block-based ectories Block-based file systems usually use block-based directories False sharing Cyclic dependency Rolling back/forward Slow access Linear scan indirect block 1.TxT 32.TxT 38 fs-long-lon g.exe 512 2 l+f.dir 12 14
Hashtable-based ectories Optimized for cache lines ü No false sharing Buckets 0 1 2 3 4 ü No cyclic dependency Efficient access ü No linear scan Filename Pointer Pointer Latest Consistent Hash Len Filename 15
Overview Background Design & Implementation ü Hashtable-based directories Pointer-based dual views Semantic-aware dependency tracking/enforcement Evaluation Conclusion 16
Dual Views Latest view in page cache Consistent view in disks DRAM (Page cache) Dual views Eliminate synchronous writes Provide usability after crash DISK Traditional Soft Updates 17
Dual Views Latest view in page cache Consistent view in disks NVM Latest view? DRAM (Page cache) Another copy of metadata in DRAM Double writes Double storage overhead Unnecessary synchronizations DISK NVM Challenge: How to present latest view efficiently? Soft Updates on NVM 18
Pointer-based Dual Views Reuse data structures in both views Distinguish views by different pointers/structures DRAM NVM Soft Updates on NVM 19
Pointer-based Dual Views Reuse data structures in both views Distinguish views by different pointers/structures Data Structures In Consistent View In Latest View SoupFS VFS dentry consistent next pointer latest next pointer hash table bucket latest bucket if exists B-tree root/height in SoupFS root/height in VFS 20
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A Latest View Consistent View 21
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A Latest View Ø create E Consistent View File E 22
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A Latest View Ø create E Latest Buckets File E Consistent View 23
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View Ø create E Latest Buckets File E Consistent View 24
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View Ø create E Latest Buckets 3 Consistent View File E 25
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E Latest Buckets 3 Consistent View File E 26
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E Ø unlink B Latest Buckets 3 Consistent View File E 27
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E Ø unlink B Latest Buckets 3 Consistent View File E 28
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E 29
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E 30
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E 31
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E 32
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E 33
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E File E 34
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E File E 35
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C B A E Latest View create E unlink B Latest Buckets 3 Consistent View File E File E 36
Pointer-based Dual Views Buckets 0 1 2 3 4 Volatile in DRAM Updates to NVM w/o persistence guarantee Persisted in NVM VFS D Filename Latest Consistent C A E Latest View create E unlink B Latest Buckets 3 File A File C File D Consistent View File E File A File C File D File E 37
Pointer-based Dual Views Reuse data structures in both views Distinguish views by different pointers/structures DRAM ü Eliminate synchronous writes ü Provide usability after crash NVM ü No double write ü Little space overhead Soft Updates on NVM 38
Overview Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views Semantic-aware dependency tracking/enforcement Evaluation Conclusion 39
Dependency Tracking Auxiliary structures for each updates 40
Dependency Tracking Auxiliary structures for each updates The semantic gap between the page cache (where enforcement happens) and the file system (where tracking happens) After removing page cache, SoupFS involves semantics in dependency tracking/enforcement 41
Semantic-aware Dependency Tracking Track semantic operations with complementary information Enough for dependency enforcement Operation Type diradd dirrem sizechg attrchg Complementary Information (pointers/integers) added dentry, source directory, overwritten removed dentry, destination directory the old and new file size nothing Information tagged with is for rename operation. 42
Semantic-aware Dependency Tracking Track semantic operations with complementary information Enough for dependency enforcement Operations are stored in operation list of each VFS dirty list VFS VFS VFS list next operation list list next operation list list next operation list list next operation type Complimentary information 43
Semantic-aware Dependency Enforcement Persister daemons traverse the dirty list in background persist each operation from the latest view to the consistent view with respect to update dependencies dirty list VFS VFS VFS list next operation list list next operation list list next operation list list next operation type Complimentary information 44
Overview Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion 45
Evaluation Setup Platform Intel Xeon E5 server with two 8-core processors 48 GB DRAM and 64 GB NVDIMM File Systems SoupFS, PMFS, NOVA, Ext4-DAX, Ext4 NVM Write Delay Simulation ndelay() after clflush Benchmarks Micro-benchmarks: 100 iterations of 10 4 create/unlink/mkdir/rmdir Filebench and Postmark 46
Micro-benchmark Latency Latency (us/op) 20 15 10 5 Inefficient Organization 57.5 57.7 Ext4 Ext4-DAX PMFS NOVA SoupFS CDF 1 0.8 0.6 0.4 0.2 EXT4 Ext4-DAX PMFS NOVA SoupFS 0 create unlink mkdir rmdir 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Latency (us) Lowest Latency 47
Sensitivity to NVM Write Delay Latency(us) 70 65 60 55 15 PMFS Create Latency (us) 20 16 12 PMFS NOVA SoupFS Unlink ~250% 10 5 NOVA SoupFS ~200% 8 4 0 0 200 400 600 800 Delay (ns) 0 No effect 0 200 400 600 800 Delay (ns) 48
Postmark & Filebench Throughput (MB/s) 350 300 250 200 150 100 50 Ext4 Ext4-DAX PMFS NOVA SoupFS Postmark ~50% Throughput (x1000 ops/s) 1200 1000 800 600 400 200 Fileserver-1K Ext4 Ext4-DAX PMFS NOVA SoupFS 0 Read Write 0 1 2 3 4 5 6 7 8 9 10 1112 Threads 19 20 16 1718 13 1415 49
Overview Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion 50
Conclusion Soft updates is complicated due to the mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks We design and implement SoupFS ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Soft updates can be made simple and fast on NVM Thanks & Questions? ;-) 51
52