An efficient method to avoid path lookup in file access auditing in IO path to improve file system IO performance

Similar documents
FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

SNAPSHOTS FOR IBRIX A HIGHLY DISTRIBUTED SEGMENTED PARALLEL FS

CSE506: Operating Systems CSE 506: Operating Systems

The ability to reliably determine file timestamps on modern filesystems using copy-on-write

W4118 Operating Systems. Instructor: Junfeng Yang

FILE SYSTEMS, PART 2. CS124 Operating Systems Winter , Lecture 24

LANGUAGE RUNTIME NON-VOLATILE RAM AWARE SWAPPING

ABSTRACT 2. BACKGROUND

File System Implementation

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Status of the Linux NFS client

Idea of Metadata Writeback Cache for Lustre

Ext3/4 file systems. Don Porter CSE 506

Operating Systems Design Exam 2 Review: Spring 2011

CS 416: Opera-ng Systems Design March 23, 2012

Chapter 11: File-System Interface

File System Code Walkthrough

What is a file system

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

Fall 2017 :: CSE 306. File Systems Basics. Nima Honarmand

Keeping Up With The Linux Kernel. Marc Dionne AFS and Kerberos Workshop Pittsburgh

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

CS 416: Operating Systems Design April 22, 2015

ECE 598 Advanced Operating Systems Lecture 19

CS444 3/08/05. Linux Virtual File System

CS 4284 Systems Capstone

Virtual File System. Don Porter CSE 306

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

OPERATING SYSTEM. Chapter 12: File System Implementation

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Tux3 File System

File System Internals. Jo, Heeseung

Chapter 10: File System. Operating System Concepts 9 th Edition

Chapter 11: Implementing File

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Chapter 11: File-System Interface

Synchronizing Media Content In A Shared Virtual Reality Environment

CSE506: Operating Systems CSE 506: Operating Systems

ECE 598 Advanced Operating Systems Lecture 18

File Systems. What do we need to know?

NTFS Recoverability. CS 537 Lecture 17 NTFS internals. NTFS On-Disk Structure

Chapter 11: File-System Interface. Operating System Concepts 9 th Edition

CS-537: Final Exam (Fall 2013) The Worst Case

3/26/2014. Contents. Concepts (1) Disk: Device that stores information (files) Many files x many users: OS management

CSE325 Principles of Operating Systems. File Systems. David P. Duggan. March 21, 2013

SMD149 - Operating Systems - File systems

ECE 598 Advanced Operating Systems Lecture 14

File-System Interface. File Structure. File Concept. File Concept Access Methods Directory Structure File-System Mounting File Sharing Protection

Operating Systems. Operating Systems Professor Sina Meraji U of T

VFS, Continued. Don Porter CSE 506

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Logical disks. Bach 2.2.1

SPIN Operating System

File Systems. ECE 650 Systems Programming & Engineering Duke University, Spring 2018

Chapter 7: File-System

Chapter 12: File System Implementation

Computer Systems Laboratory Sungkyunkwan University

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

OCFS2: Evolution from OCFS. Mark Fasheh Senior Software Developer Oracle Corporation

File Systems: Allocation Issues, Naming, and Performance CS 111. Operating Systems Peter Reiher

Directory. File. Chunk. Disk

Provenance Tracking System (PTS)

STORAGE AND SELECTION OF CELL MARKERS

CS3600 SYSTEMS AND NETWORKS

Distributed File Systems. Distributed Systems IT332

Notification Features For Digital Content In A Mobile-Optimized Format

<Insert Picture Here> Btrfs Filesystem

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

CS370 Operating Systems

Basic filesystem concepts. Tuesday, November 22, 2011

Navigation of blogs related by a tag

Tux3 linux filesystem project

Soft Updates Made Simple and Fast on Non-volatile Memory

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Files and the Filesystems. Linux Files

Chapter 11: Implementing File Systems

SCAM Portfolio Scalability

F 4. Both the directory structure and the files reside on disk Backups of these two structures are kept on tapes

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

ECE 650 Systems Programming & Engineering. Spring 2018

OCFS2 Mark Fasheh Oracle

File System Implementation

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

NFS Version 4 Open Source Project. William A.(Andy) Adamson Center for Information Technology Integration University of Michigan

File System (FS) Highlights

Chapter 10: File System Implementation

Chapter 12: File System Implementation

Chapter 11: Implementing File Systems

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

File System: Interface and Implmentation

Implementation of a Tag-Based File System

Chapter 11: File System Implementation

BTREE FILE SYSTEM (BTRFS)

Advanced Intrusion Detection

Various Versioning & Snapshot FileSystems

VIRTUAL FILE SYSTEM AND FILE SYSTEM CONCEPTS Operating Systems Design Euiseong Seo

Operating System Concepts Ch. 11: File System Implementation

Transcription:

Technical Disclosure Commons Defensive Publications Series March 21, 2017 An efficient method to avoid path lookup in file access auditing in IO path to improve file system IO performance Arun Vishnu P K Hewlett Packard Enterprise Ramesh Kannan K Hewlett Packard Enterprise Rajkumar Kannan Hewlett Packard Enterprise Follow this and additional works at: http://www.tdcommons.org/dpubs_series Recommended Citation P K, Arun Vishnu; K, Ramesh Kannan; and Kannan, Rajkumar, "An efficient method to avoid path lookup in file access auditing in IO path to improve file system IO performance", Technical Disclosure Commons, (March 21, 2017) This work is licensed under a Creative Commons Attribution 4.0 License. This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons.

P K et al.: An efficient method to avoid path lookup in file access auditing An efficient method to avoid path lookup in file access auditing in IO path to improve file system IO performance Abstract: One of the biggest challenges in metadata management schemes that sit outside the filesystem layer is their ability to index meaningful path information of files that are being referenced in an external system like a database or in a metadata journal file. Path to a file is a critical requirement that allows both meaningful interpretation of the locality of the file and its metadata and also secondly allows for more efficient user mode services that can transform the file or its metadata. Additionally path information is very essential in compliance systems where audit logs need to tell what happened to a file and where it is located. However when the data path is being audited from layers such as protocols, it becomes harder to reconstruct the entire path information for all the files given that the protocol layers do not directly integrate with the underlying Filesystem. The protocol layers would then need to rely on system cache to get the path data and sometimes this may not be possible making it required for the protocol to actually do an expensive reverse path walk, reconstructing the path. This actually heavily degrades the performance of the system. In this paper we discuss a mechanism that allows us to record enough information about the file using the unique ID of itself and its parent in the protocol layer such that if and when required the path information can be reconstructed based on a reliable reverse lookup in a database or a file based journal system. The idea is to have enough information to reconstruct the path at a later time and outside the system where the information was initially originated from. The paper also talks of keeping this system consistent under all conditions. DESCRIPTION OF INVENTION: Problems Solved: File access auditing (FAA) is an essential requirement for high secure data centers and also for organizations which conform to various regulatory compliance requirements. Since FAA happens in the IO path for every IO request, it is a critical that the auditing framework takes less resources to log audit information. In general an audit log contains information like client details, operation details, operation status and the file object details. One of the crucial information in audit record is the absolute path of the file object in order to uniquely identify the file object on which the IO operation is performed. Finding the absolute path needs traversing the file path from the location of file being audited till the root of the filesystem/mount point. In the typical Linux kernel, an object name dentry carries the name information and each dentry maintains a pointer to its parent dentry. A module which needs to form the path has to lock the dentry and then traverse to the parent dentry and do complex string operation to form a complete file path. The invention proposes an efficient approach where the audit framework will never do a path lookup for logging audit record but it logs enough information sufficient for a post-processing tool to build the path Published by Technical Disclosure Commons, 2017 2

Defensive Publications Series, Art. 431 [2017] when needed. The audit information can be plumped to a database to make the post-processing efficient. Prior Solutions: Description: The below block diagram explains the different components involved in file access auditing and the audit event flow. The diagram covers both NFS and SMB protocols. Third Party Audit client HP SMB Service Messaging Service Server Audit Library Audit Daemon Internal Audit client User Mode Kernel mode ADE FS Get POID Get Audit status Set Audit Status NFSD (Kernel Mode) NetLink Protocol Kernel API for Auditing Further in the disclosure we selected NFS protocol auditing on the ADE file system to explain the proposed method. We recommend to turn on auditing for the desired shares before any operation. But enabling auditing in on a shares which is currently under usage is handled. Along all mandatory information like, client, user, operation and start, we record three more items. 1) Object name 2) Object ID 3) Parent ID 3

P K et al.: An efficient method to avoid path lookup in file access auditing Object name is the name of the object being audited. It can be name of a file or a directory. Object ID is a unique ID which represents the object. The parent ID is the unique ID of the parent directory where the current object located. The Object ID will be provided by the ADE fs which is known as the permanent object ID (POID). The POID will be a unique identifier of the file/directory throughout its life and will not change even during rename operation. POID is 128 bit long and will not be reused in the life span of the filesystem. In the context of auditing a Linux kernel based NFS protocol, a dentry is associated with every operation. The dentry carries name, inode and parent inode as standard and makes our approach much more efficient. All create operations are recoded by default to keep the audit logs self-sufficient for any postprocessing. Below an example where a share fs/vfs/fstore/nfs1 is being audited. When the share get mounted on a client we log the share-root information nfs1 10001 10000 Created directory dir1 Dir1 10002 10001 Created file foo.txt inside dir1 Foo.txt 10003 10002 Created file foot.txt inside root of the share Foo.txt 10004 10001 Here we have two objects named foo.txt is available but with unique POID and parent POID. Published by Technical Disclosure Commons, 2017 4

Defensive Publications Series, Art. 431 [2017] Path Formation: The audit events generated from the protocol server will be transferred to audit client, where the post process happens. The audit client will form the file path on reception of an audit event. IN general audit clients are dedicated servers which process and maintain the audit logs for future references. Referring the above example, when audit client receives event Foo.txt 10003 10002 It starts forming the path. First it takes the Object name from current event, which is Foo.txt. Now the client will do a look up on the past logs to resolve the name for the parent POID, which is 10002. The past log Dir1 10002 10001 Will provide you the name for 10002, which is Dir1 and path become Dir/Foo.txt. But the lookup will not stop as this event has a valid parent POID which is 10001. The process is repeated to figure the name for 10001 and path become nfs1/dir1/foo.txt, which is a complete path from share root. Handle rename: Rename is another challenge with a file path management. As per the above explained mechanism, the POID which represents a file will remain same even when the file is renamed. In order to make the audit information self-sufficient for future auditing, we will log old and new name on a successful rename operation. But directory rename is much complicated as it affects the entire files and directory inside the directory being renamed. The FAA modules does not do any extra logging on a directory rename operation. But the audit client need to do address by a runtime name look up to locate all past logs affected by the rename and update the path against them. The look will be based on the POID of directory got renamed. Enabling auditing on share with data: The best practice is always to enable the auditing before starting any operation on the share. But this is not guaranteed. When auditing get enabled on share which already has a data, simply logging POID and parent POID will not be sufficient to build the complete path as we might not logged all directories involved in forming the path. In order to handle this we maintain a flag on the inode. This flag be initialized as zero on newly allocated inode and will be set to one when we log the object. Also we traverse the parent s parent and so on in case the flag is not set. Below flow chart explain the control flow: 5

P K et al.: An efficient method to avoid path lookup in file access auditing Start auditing current operation Auditing Enabled Yes NO Audit operation with object name and POID Is Parent dirctory audited earlier Yes End of operation audit NO This loop will continue until we reach root of the share or until we hit a directory which is already audited Audit the parent directory name and POID Mark the parent directory as audited Advantages: 1) Audit logs generated are self-sufficient to create the complete path. Audit clients need not apply any ID to path conversion logic to resolve the path. Also audit clients will not contact ADE fs to resolve a POID to path conversion. 2) Delete and rename operations are handled without any special workflow. 3) Performance impact on protocol IO is minimal. 4) Since the path is not logged in audit log, constructing the path in every IO is completely avoided thus the impact on IO due to audit logging is reduced to a great extent. Published by Technical Disclosure Commons, 2017 6

Defensive Publications Series, Art. 431 [2017] 5) Not logging the complete path also avoids additional IO (to write the audit log record) and reduces the audit log size. Disclosed by P K Arun Vishnu, Ramesh Kannan K and Kannan Rajkumar, Hewlett Packard Enterprise 7