Non-Blocking Writes to Files Daniel Campello, Hector Lopez, Luis Useche 1, Ricardo Koller 2, and Raju Rangaswami 1 Google, Inc. 2 IBM TJ Watson
Memory Memory Synchrony vs Asynchrony Applications have different alternatives to persist data Synchronous Writes Process Asynchronous Writes Process Kernel Thread Page Cache Page Cache Backend Storage Backend Storage 2
Introduction Memory access granularity is smaller than disk s Partial writes to pages not present in the cache require page fetch. 5. Return Page Cache Process 1. Write (x) 2. Miss OS 0 1 0 1 1 1 0 0 0 Why wait for data that the application doesn t need? 3. Issue Fetch 4. Complete Backing Store 3
% of Write Operations Motivation: Breakdown of write operations 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Types of Page Writes (%) No overwrite; write is to a new page Overwrite; multiple of 4 KB aligned Overwrite; >= 4 KB unaligned Overwrite; < 4 KB On an average, 63.12% of the writes involved partial page overwrites. Depending of page cache size, these overwrites could result in varying degrees of page fetches. 4
Partial Over-Writes to Out-of-core Page (%) Motivation: LRU Cache Simulation 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% ug-filesrv gsf-filesrv moodle backup usr1 usr2 facebook twitter 0.00% 0.001 0.1 10 1000 100000 Page cache size (MB) 5
Non-blocking Writes: Asynchronous Fetch Patch 7. Merge Page Cache Process 4. Buffer 1. Write (x) 2. Miss OS 0 1 0 1 1 1 0 0 0 5. Return 3. Issue Fetch 6. Complete Backing Store Benefits: 1. Application execution time reduction 2. Increased backing store bandwidth usage 6
Non-blocking Writes: Asynchronous Fetch Patch 7. Merge Page Cache Process 4. Buffer 1. Write (x) 2. Miss OS 0 1 0 1 1 1 0 0 0 5. Return 3. Issue Fetch 6. Complete Backing Store Why issue a page fetch for data that the application doesn t need? 7
Non-blocking Writes: Lazy Fetch 5. Read (x) Patch 9. Merge Page Cache Process 1. Write (x) OS 3. Buffer 2. Miss 0 1 0 1 1 1 0 0 0 4. Return 10. Return 7. Issue page fetch 6. Miss 8. Complete Backing Store Benefits and Drawbacks: 1. Allocating page memory and fetching only if necessary 2. Resource utilization is unpredictable and can be bursty 8
Managing Page Fetches Page Fetch Policies Asynchronous Lazy Page Fetch Mechanisms Foreground Background 9
Memory Memory Foreground vs Background Mechanisms Metadata misses generate extra blocking fetches Foreground Process Background Process NBW Thread Page Cache Page Cache Backend Storage Backend Storage 10
Evaluation Specifics Implementation We found this problem in other OSes (FreeBSD, Xen) We implemented non-blocking writes for files in the Linux kernel by modifying the generic virtual file system (VFS) layer Workloads Filebench micro-benchmark SPECsfs2008 benchmark Mobibench system call trace-replay 11
Operations / second Operations / second Filebench evaluation: Writes and Read-Writes 9000 8000 7000 6000 5000 4000 3000 200 2000 1000 0 Random Writes 128 256 512 1024 2048 4096 Size of operation (bytes) 8000 6000 4000 2000 0 128 256 512 1024 2048 4096 BW Size of operation (bytes) NBW-Async-FG NBW-Async-BG NBW-Lazy Random Read-Write NBW-Async-FG: 50-60% performance improvement for Random Writes 12
Operations / second Operations / second Filebench evaluation: Writes and Read-Writes 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Random Writes 128 256 512 1024 2048 4096 Size of operation (bytes) 8000 Random Read-Write 6000 4000 2000 0 128 256 512 1024 2048 4096 BW Size of operation (bytes) NBW-Async-FG NBW-Async-BG NBW-Lazy NBW-Async-FG: 50-60% performance improvement for Random Writes NBW-Async-BG: 6.7x-30x (Random Writes) and 3.4x-20x (Random RW) NBW-Lazy: Up to 45x performance improvement 13
Operations / second Filebench evaluation: Reads 160 140 120 Random Reads 100 80 60 40 20 0 128 256 512 1024 2048 4096 Size of operation (bytes) BW NBW-Async-FG NBW-Async-BG NBW-Lazy Non-blocking writes do not affect the performance of read-only workloads 14
Normalized Latency SPECsfs2008 evaluation SPECsfs2008 100 80 60 40 20 BW NBW-Async-FG NBW-Async-BG NBW-Lazy 0 Write (10%) Read (18%) Others (72%) Overall (100%) Around 70% write latency decrease NBW-Lazy read latency slightly affected by delayed fetching Overall latency is in direct relation to write and read latency 15
Normalized Latency Normalized Latency Mobibench System-call Trace Replay Mobibench HDD Mobibench SSD 120 120 100 100 80 80 60 60 40 40 20 20 0 Facebook Twitter BW 0 Facebook Twitter NBW-Async-FG NBW-Async-BG NBW-Lazy Average operation latency decreased by 20-40% 16
Related Work Huge body of work on optimizing writes performance O_NONBLOCK flag for OPEN system call If no data can be read or written, returns EAGAIN (or EWOULDBLOCK) Requires application modification Asynchronous I/O Library (POSIX AIO) Multiple threads to perform I/O operations (expensive and scales poorly) Requires application modification Non-blocking writes do not require application modification 17
Conclusions Operating systems block processes on asynchronous writes in many cases! With non-blocking writes, we demonstrate that such blocking is unnecessary and detrimental to performance General solution for any kind of modern OS Does not require application modification Evaluation demonstrated how non-blocking writes effectively reduces write latency across a wide range of workloads 18
Thank You! The traces used in this paper are available at http://sylab-srv.cs.fiu.edu/dokuwiki/ doku.php?id=projects:nbw:start
Page Fetch Asynchrony Write P Read P Blocking write Non-blocking write Write P Read P Time Waiting I/O: Thinking: 20
Page Fetch Parallelism Blocking write Write P Write Q Read P Write P Read P Non-blocking write Time Write Q Blocking I/O: Background I/O: 21
Motivation: Production workloads Production system trace descriptions Workload ug-filserv gsf-filesrv moodle backup usr1 usr2 facebook twitter Description Undergrad NFS/CIFS fileserver Grad/Staff/Faculty NFS/CIFS fileserver Web & DB server for department CMS Nightly backups of department servers Researcher 1 desktop Researcher 2 desktop Mobibench facebook trace Mobibench twitter trace 22
Highlights On Correctness Ordering of page updates Handling of disk errors Journaling File Systems OS-initiated page accesses Page persistence and syncs Handling read-write dependencies Multi-core and kernel preemption 23