File System (NFS) vnfs: Maximizing NFS Performance with Compounds and Vectorized I/O l An IETF standardized storage protocol l Provides transparent remote file access l Needs to overcome the high network latency Ming Chen, Dean Hildebrand, Ï Henry Nelson, å Jasmit Saluja, Ashok Sankar Harihara Subramony, and Erez Zadok Stony Brook University, Ï IBM Research Almaden, å Ward Melville High School https://github.com/sbu-fsl/txn-compound 2017 Stony Brook University 2 NFSv4 Compound Procedures NFSv4 Compound Procedures 3 4 NFSv4 Compound Procedures NFSv4 Compound Procedures 5 6
NFSv4 Compound Procedures NFSv4 Compound Procedures Bandwidth Wasted Bandwidth Utilized Compound Request: READ ~/.bashrc ; READ ~/.bash_profile ; READ ~/.bash_login Compound Reply: ~/.bashrc content; ~/.bash_profile content; ~/.bash_login content 7 8 Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc 0 9 10 Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. stat(2): FH (fh1) and attributes of /home. (2) PUTFH fh1; LOOKUP Bob ; GETFH; GETATTR. FH (fh2) and attributes of /home/bob. 2 stat(2): open(2): FH (fh1) and attributes of /home. (2) PUTFH fh1; LOOKUP Bob ; GETFH; GETATTR. FH (fh2) and attributes of /home/bob. (3) PUTFH fh2; OPEN.bashrc ; GETFH; GETATTR. FH (fh3) and attributes of ~/.bashrc. 3 11 12
Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. stat(2): open(2): read(2): FH (fh1) and attributes of /home. (2) PUTFH fh1; LOOKUP Bob ; GETFH; GETATTR. FH (fh2) and attributes of /home/bob. (3) PUTFH fh2; OPEN.bashrc ; GETFH; GETATTR. FH (fh3) and attributes of ~/.bashrc. (4) PUTFH fh3; READ 0 4096; ~/.bashrc file content. 4 stat(2): open(2): read(2): FH (fh1) and attributes of /home. (2) PUTFH fh1; LOOKUP Bob ; GETFH; GETATTR. FH (fh2) and attributes of /home/bob. (3) PUTFH fh2; OPEN.bashrc ; GETFH; GETATTR. FH (fh3) and attributes of ~/.bashrc. (4) PUTFH fh3; READ 0 4096; ~/.bashrc file content. 5 close(2): (5) PUTFH fh3; CLOSE; GETATTR. Attributes of ~/.bashrc. 13 14 Compounds Are Underused! l Limited by the synchronous POSIX file-system API u Example: reading /home/bob/.bashrc (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. Need a Batching File-System API l Target: open, read, and close multiple files in one RPC (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. stat(2): open(2): read(2): FH (fh1) and attributes of /home. (2) PUTFH fh1; LOOKUP Bob ; GETFH; GETATTR. FH (fh2) and attributes of /home/bob. (3) PUTFH fh2; OPEN.bashrc ; GETFH; GETATTR. FH (fh3) and attributes of ~/.bashrc. (4) PUTFH fh3; READ 0 4096; ~/.bashrc file content. 5 5 close(2): (5) PUTFH fh3; CLOSE; GETATTR. Attributes of ~/.bashrc. 15 16 Need a Batching File-System API l Target: open, read, and close multiple files in one RPC Need a Batching File-System API l Target: open, read, and close multiple files in one RPC Vectorized High-level File-system API (1) PUTROOTFH; LOOKUP home ; GETFH; GETATTR. 5 Vectorized High-level File-system API PUTROOTFH; LOOKUP home ; LOOKUP Bob ; GETFH; GETATTR; SAVEFH; OPEN.bashrc ; READ 0 4096; CLOSE; GETFH; GETATTR; RESTOREFH; OPEN.bash_profile ; READ 0 4096; CLOSE; GETFH; GETATTR; RESTOREFH; OPEN.bash_login ; READ 0 4096; CLOSE; GETFH; GETATTR. 1 File handles, attributes, and file contents of.bashrc,.bash_profile, and.bash_login. 17 18
Need a Batching File-System API l Target: open, read, and close multiple files in one RPC Vectorized High-level File-system API PUTROOTFH; LOOKUP home ; LOOKUP Bob ; GETFH; GETATTR; SAVEFH; OPEN.bashrc ; READ 0 4096; CLOSE; GETFH; GETATTR; RESTOREFH; OPEN.bash_profile ; READ 0 4096; CLOSE; GETFH; GETATTR; RESTOREFH; OPEN.bash_login ; READ 0 4096; CLOSE; GETFH; GETATTR. File handles, attributes, and file contents of.bashrc,.bash_profile, and.bash_login. 1 vread/vwrite l Read/write many files u Unlike readv/writev(2): many (non-contiguous) offsets of many files u Append to file when write offset is UINT64_MAX l Automatic file opening/closing u Pass states using current filehandle and current stateid struct vio { struct vfile vfile; // [IN]: a file identified by path or descriptor uint64_t offset; // [IN]: offset of read/write, UINT64_MAX means append uint64_t length; // [IN]: bytes to read/write; [OUT]: bytes read/written const char *data; // [IN]: data to write; [OUT]: buffer for read uint32_t flags; // [IN] flags: is_creation, is_write_stable }; // [OUT] flags: is_eof, is_write_stable struct vres vread(struct vio *ios, int n); struct vres vwrite(struct vio *ios, int n); 19 20 vread/vwrite vread/vwrite l Read/write many files u Unlike readv/writev(2): many (non-contiguous) offsets of many files u Append to file when write offset is UINT64_MAX l Automatic file opening/closing u Pass states using current filehandle and current stateid struct vio { struct vfile vfile; // [IN]: a file identified by path or descriptor uint64_t offset; // [IN]: offset of read/write, UINT64_MAX means append uint64_t length; // [IN]: bytes to read/write; [OUT]: bytes read/written const char *data; // [IN]: data to write; [OUT]: buffer for read uint32_t flags; // [IN] flags: is_creation, is_write_stable }; // [OUT] flags: is_eof, is_write_stable struct vres vread(struct vio *ios, int n); struct vres vwrite(struct vio *ios, int n); l Read/write many files u Unlike readv/writev(2): many (non-contiguous) offsets of many files u Append to file when write offset is UINT64_MAX l Automatic file opening/closing u Pass states using current filehandle and current stateid struct vio { struct vfile vfile; // [IN]: a file identified by path or descriptor uint64_t offset; // [IN]: offset of read/write, UINT64_MAX means append uint64_t length; // [IN]: bytes to read/write; [OUT]: bytes read/written const char *data; // [IN]: data to write; [OUT]: buffer for read uint32_t flags; // [IN] flags: is_creation, is_write_stable }; // [OUT] flags: is_eof, is_write_stable struct vres vread(struct vio *ios, int n); struct vres vwrite(struct vio *ios, int n); 21 22 vread/vwrite vgetattrs/vsetattrs l Read/write many files u Unlike readv/writev(2): many (non-contiguous) offsets of many files u Append to file when write offset is UINT64_MAX l Automatic file opening/closing u Pass states using current filehandle and current stateid struct vio { struct vfile vfile; // [IN]: a file identified by path or descriptor uint64_t offset; // [IN]: offset of read/write, UINT64_MAX means append uint64_t length; // [IN]: bytes to read/write; [OUT]: bytes read/written const char *data; // [IN]: data to write; [OUT]: buffer for read uint32_t flags; // [IN] flags: is_creation, is_write_stable }; // [OUT] flags: is_eof, is_write_stable struct vres vread(struct vio *ios, int n); struct vres vwrite(struct vio *ios, int n); l vgetattrs: get attributes of files l vsetattrs: set attributes of files u Useful for copying attributes of files, e.g., tar and rsync u Combines chmod, chown, utimes, truncate u Linux kernel uses inode_operations->setattr u NFSv4 uses the SETATTR operation 23 24
vgetattrs/vsetattrs l vgetattrs: get attributes of files l vsetattrs: set attributes of files u Useful for copying attributes of files, e.g., tar and rsync u Combines chmod, chown, utimes, truncate u Linux kernel uses inode_operations->setattr u NFSv4 uses the SETATTR operation vgetattrs/vsetattrs l vgetattrs: get attributes of files l vsetattrs: set attributes of files u Useful for copying attributes of files, e.g., tar and rsync u Combines chmod, chown, utimes, truncate u Linux kernel uses inode_operations->setattr u NFSv4 uses the SETATTR operation 25 26 vcopy/vsscopy vcopy/vsscopy l Copy many files partly or entirely u Create destination files if necessary l vcopy READ src1 ; READ src2 data read from src1 and src2 l Copy many files partly or entirely u Create destination files if necessary l vcopy READ src1 ; READ src2 data read from src1 and src2 WRITE dst1 ; WRITE dst2 #bytes written to dst1 and dst2 l vsscopy ( Side Copy in NFSv4.2) COPY src1 to dst1 ; COPY src2 to dst2 #bytes copied from src1 to dst1, src2 to dst2 WRITE dst1 ; WRITE dst2 #bytes written to dst1 and dst2 l vsscopy ( Side Copy in NFSv4.2) COPY src1 to dst1 ; COPY src2 to dst2 #bytes copied from src1 to dst1, src2 to dst2 27 28 vcopy/vsscopy vcopy/vsscopy l Copy many files partly or entirely u Create destination files if necessary l vcopy READ src1 ; READ src2 data read from src1 and src2 l Copy many files partly or entirely u Create destination files if necessary l vcopy READ src1 ; READ src2 data read from src1 and src2 WRITE dst1 ; WRITE dst2 #bytes written to dst1 and dst2 l vsscopy ( Side Copy in NFSv4.2) COPY src1 to dst1 ; COPY src2 to dst2 #bytes copied from src1 to dst1, src2 to dst2 WRITE dst1 ; WRITE dst2 #bytes written to dst1 and dst2 l vsscopy ( Side Copy in NFSv4.2) COPY src1 to dst1 ; COPY src2 to dst2 #bytes copied from src1 to dst1, src2 to dst2 29 30
l Copy many files partly or entirely u Create destination files if necessary l vcopy vcopy/vsscopy READ src1 ; READ src2 data read from src1 and src2 WRITE dst1 ; WRITE dst2 #bytes written to dst1 and dst2 l vsscopy ( Side Copy in NFSv4.2) COPY src1 to dst1 ; COPY src2 to dst2 #bytes copied from src1 to dst1, src2 to dst2 l vopen/vclose u Large files Other Operations u Maintain close-to-open consistency l vsymlink/vreadlink/vhardlink u Example: create a symlink tree: cp -sr l vmkdir/vlistdir u Example: create /a, /a/b, and /a/b/c u vlistdir: list multiple directories (recursively) l vremove l vrename 31 32 Other Operations Other Operations l vopen/vclose u Large files u Maintain close-to-open consistency l vsymlink/vreadlink/vhardlink u Example: create a symlink tree: cp -sr l vmkdir/vlistdir u Example: create /a, /a/b, and /a/b/c u vlistdir: list multiple directories (recursively) l vremove l vrename l vopen/vclose u Large files u Maintain close-to-open consistency l vsymlink/vreadlink/vhardlink u Example: create a symlink tree: cp -sr l vmkdir/vlistdir u Example: create /a, /a/b, and /a/b/c u vlistdir: list multiple directories (recursively) l vremove l vrename 33 34 Other Operations Other Operations l vopen/vclose u Large files u Maintain close-to-open consistency l vsymlink/vreadlink/vhardlink u Example: create a symlink tree: cp -sr l vmkdir/vlistdir u Example: create /a, /a/b, and /a/b/c u vlistdir: list multiple directories (recursively) l vremove l vrename l vopen/vclose u Large files u Maintain close-to-open consistency l vsymlink/vreadlink/vhardlink u Example: create a symlink tree: cp -sr l vmkdir/vlistdir u Example: create /a, /a/b, and /a/b/c u vlistdir: list multiple directories (recursively) l vremove l vrename 35 36
Architecture Architecture Applications NFS Applications NFS vnfs Lib vnfs vnfs Lib vnfs User Kernel User Kernel VFS ing (TCP/IP) VFS ing (TCP/IP) NFS NFS 37 38 Architecture Architecture Applications NFS Applications NFS vnfs Lib vnfs vnfs Lib NFSv4 files vnfs API vnfs User Kernel non-nfsv4 files POSIX API User Kernel non-nfsv4 files POSIX API VFS ing (TCP/IP) VFS ing (TCP/IP) NFS NFS 39 40 Architecture Implementation l NFS-Ganesha Applications NFS u An open-source user-space NFS server u File-System Abstraction Layer (FSAL) that is similar to VFS User Kernel vnfs Lib non-nfsv4 files POSIX API VFS NFSv4 files vnfs API vnfs socket ing (TCP/IP) NFS l side u vnfs client based on NFS-Ganesha Proxy FSAL (NFSv4.1) u vnfs library u No client-side cache yet l side u NFS-Ganesha VFS FSAL u Side Copy & atomic file appending l Code u C/C++: added 10,632 to NFS-Ganesha; deleted 1,407 u https://github.com/sbu-fsl/txn-compound 41 42
Implementation l NFS-Ganesha u An open-source user-space NFS server u File-System Abstraction Layer (FSAL) that is similar to VFS l side u vnfs client based on NFS-Ganesha Proxy FSAL (NFSv4.1) u vnfs library u No client-side cache yet l side u NFS-Ganesha VFS FSAL u Side Copy & atomic file appending l Code u C/C++: added 10,632 to NFS-Ganesha; deleted 1,407 u https://github.com/sbu-fsl/txn-compound Implementation l NFS-Ganesha u An open-source user-space NFS server u File-System Abstraction Layer (FSAL) that is similar to VFS l side u vnfs client based on NFS-Ganesha Proxy FSAL (NFSv4.1) u vnfs library u No client-side cache yet l side u NFS-Ganesha VFS FSAL u Side Copy & atomic file appending l Code u C/C++: added 10,632 to NFS-Ganesha; deleted 1,407 u https://github.com/sbu-fsl/txn-compound 43 44 Implementation Evaluation l NFS-Ganesha u An open-source user-space NFS server u File-System Abstraction Layer (FSAL) that is similar to VFS l side u vnfs client based on NFS-Ganesha Proxy FSAL (NFSv4.1) u vnfs library u No client-side cache yet l side u NFS-Ganesha VFS FSAL u Side Copy & atomic file appending l Code u C/C++: added 10,632 to NFS-Ganesha; deleted 1,407 u https://github.com/sbu-fsl/txn-compound l Experimental setup u Two six-core machines with 64GB RAM and a 10GbE NIC u Running CentOS 7 with 3.14 kernel u Intel S3700 200GB SSD u latency of 0.2ms u Use netem to emulate different networks u Baseline: Linux in-kernel NFSv4.1 client l Benchmarks & application porting u Micro-benchmarks u GNU Coreutils (cp, ls, rm) u Tar/Untar u Filebench u HTTP/2 server (nghttp2) 45 46 Evaluation Evaluation l Experimental setup u Two six-core machines with 64GB RAM and a 10GbE NIC u Running CentOS 7 with 3.14 kernel u Intel S3700 200GB SSD u latency of 0.2ms u Use netem to emulate different networks u Baseline: Linux in-kernel NFSv4.1 client l Benchmarks & application porting u Micro-benchmarks u GNU Coreutils (cp, ls, rm) u Tar/Untar u Filebench u HTTP/2 server (nghttp2) l Experimental setup u Two six-core machines with 64GB RAM and a 10GbE NIC u Running CentOS 7 with 3.14 kernel u Intel S3700 200GB SSD u latency of 0.2ms u Use netem to emulate different networks u Baseline: Linux in-kernel NFSv4.1 client l Benchmarks & application porting u Micro-benchmarks u GNU Coreutils (cp, ls, rm) u Tar/Untar u Filebench u HTTP/2 server (nghttp2) 47 48
GNU Coreutils (cp) l Copy the Linux source tree (cp -r) 49 GNU Coreutils (cp) l GNU Coreutils (cp) l Copy the Linux source tree (cp -r) 51 Copy the Linux source tree (cp -r) l 53 52 GNU Coreutils (cp) l Copy the Linux source tree (cp -r) 4 50 Copy the Linux source tree (cp -r) 4 GNU Coreutils (cp) GNU Coreutils (cp) l Copy the Linux source tree (cp -r) 2 54
GNU Coreutils (cp) l GNU Coreutils (ls, cp, rm) Copy the Linux source tree (cp -r) 62 4 2 55 GNU Coreutils (ls, cp, rm) 57 56 GNU Coreutils (ls, cp, rm) GNU Coreutils (ls, cp, rm) 58 GNU Coreutils (ls, cp, rm) 16 259 7 106 2.5 12 59 60
GNU Coreutils (ls, cp, rm) Compounding Degree 16 259 7 106 2.5 12 cp: ++170 and --16 in C ls: ++392 and --203 in C rm: ++21 and --1 in C Latency: 0.2ms 61 Compounding Degree 63 62 Compounding Degree Comparable or Slightly better Latency: 0.2ms Compounding Degree Latency: 0.2ms 64 Compounding Degree 2 47 2 47 2.1 16.7 2.1 16.7 0.86 7.6 0.86 7.6 1 2.6 Comparable or Slightly better 1 2.6 Latency: 0.2ms 65 Comparable or Slightly better Latency: 0.2ms 66 Much better
Compounding Degree Filebench Workloads 2 47 2.1 16.7 Much better 0.86 7.6 1 2.6 Comparable or Slightly better Latency: 0.2ms 67 Filebench Workloads 68 Filebench Workloads 4.4 5.2 4.4 5.2 0.87 4.8 69 Filebench Workloads 70 Filebench Workloads 1.8 14 1.8 14 4.4 5.2 4.4 5.2 0.87 4.8 0.87 4.8 Filebench: ++759 and --37 in C 71 72
HTTP/2 HTTP/2 Web HTTP/2 GET HTTP/2 PUSH HTTP/2 (NFS ) READ 1.jpg ; READ 2.jpg data of 1.jpg ; 2.jpg NFS Web HTTP/2 GET HTTP/2 PUSH HTTP/2 (NFS ) READ 1.jpg ; READ 2.jpg data of 1.jpg ; 2.jpg NFS The HTTP/2 server reads and pushes a typical Web page that contains 96 objects (html, jpg, js, css) totaling round 2MB. The HTTP/2 server reads and pushes a typical Web page that contains 96 objects (html, jpg, js, css) totaling round 2MB. 3.5 9.9 73 74 HTTP/2 Conclusions Web HTTP/2 GET The HTTP/2 server reads and pushes a typical Web page that contains 96 objects (html, jpg, js, css) totaling round 2MB. HTTP/2 PUSH HTTP/2 (NFS ) READ 1.jpg ; READ 2.jpg data of 1.jpg ; 2.jpg 3.5 9.9 NFS l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks nghttp2: ++543 and --108 in C++ 75 76 Conclusions Conclusions l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks 77 78
Conclusions l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks Conclusions l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks 79 80 Conclusions Limitations and Future Work l A set of vectorized file-system API to take advantage of NFSv4 compound procedures without changing NFS protocols or servers l Implemented vnfs in user-space l Porting applications was generally easy l -side caching (appending) l Transactional compounds l Parallel processing of operations l Improved performance by up to 200 l vnfs made NFS more usable in high-latency networks 81 82 Limitations and Future Work Limitations and Future Work l -side caching (appending) l Transactional compounds l Parallel processing of operations l -side caching (appending) l Transactional compounds l Parallel processing of operations 83 84
Limitations and Future Work l -side caching (appending) l Transactional compounds l Parallel processing of operations vnfs: Maximizing NFS Performance with Compounds and Vectorized I/O Q&A Ming Chen, Dean Hildebrand, Henry Nelson, Jasmit Saluja, Ashok Sankar Harihara Subramony, and Erez Zadok https://github.com/sbu-fsl/txn-compound 85 March 2017 2, 2017 Stony Brook University 86