File Management Marc s s first try, Please don t t sue me.
Introduction Files Long-term existence Can be temporally decoupled from applications Sharable between processes Can be structured to the task Can be viewed in various logical manners Can have permissions for individuals or groups Can be manipulated in a variety of ways
File Manipulation Operations Create Delete Open Close Read (all or a portion) Write (append or update)
Internal File Structure Byte (most UNIX) Field Record File Database
Internal File Structure (cont) Field: Basic logical element of data Characterized by length and data type ASCII String, decimal, integer, etc Fixed or variable length With variable-length, length, may have subfields Length may be indicated by demarcation
Internal File Structure (cont) Record: A collection of related fields Can be treated as a unit by app or user Can be fixed or variable length If # of fields is variable, each has a name Entire record usually has a length
Internal File Structure (cont) File: A collection of similar records Treated as a single entity Can be referenced by name Access control restrictions implemented Sometimes enforced at the record or field level
Internal File Structure (cont) Database: Collection of related data (many files) Various explicit relationships between data Usually managed by a DBMS Not usually built-in to an OS
Internal File Structure (cont) Database: Collection of related data (many files) Various explicit relationships between data Usually managed by a DBMS Not usually built-in to an OS
File Access Operations Operating primarily on records, but abstraction can be applied to just bytes: Retrieve_All Read all records into memory in sequence Retrieve_One Usually associated with interactive, transaction-oriented oriented applications
File Access Operations (cont) Retrieve_Next/Previous Retrieve next record in some predefined logical sequence. Often associated with search Insert_One May involve random access, or appending Delete_One Certain linkages or other data structures may require updating to preserve sequencing
File Access Operations (cont) Update_One One-two punch: Retrieve a record, update one or more fields, then rewirte the updated record back into the file. With variable-length length fields/records, may require much more data structure manipulation. Retrieve_Few Get some specified number of records Usually used in databases when selecting on certain criteria
File Management Systems Meet data management requirements of user Guarantee, whenever possible, that file data are valid Optimize performance (both throughput and response time) Provide I/O support for various storage devices Minimize or eliminate the potential for lost or destroyed data Provide a standardized set of I/O interface routines to use processes Provide I/O support for multiple users
File System Architecture Device drivers Responsible for starting and completing I/O requests to various peripheral devices Basic file system (physical I/O level in OS) Deals with interchange of blocks of data Does not understand content Basic I/O supervisor (part of OS) Maintains control structures for device I/O, scheduling, and file status. Logical I/O General-purpose facility for accessing records Maintains basic data about files (indices, etc)
File Organization and Access Several, sometimes conflicting criteria for organization of files: Short access time Ease of update Economy of storage Simple maintenance Reliability Conflict: economy of storage vs. redundancy Redundancy increases access speed and reliability, but also increases storage requirements
Pile Common File Organizations Data are collected in the order in which they arrive Each record consists of one burst of data Records may have a wildly varying assortment of fields and field-lengths lengths Each field must be self-describing Record access is by exhaustive search. When you don t t know what you ll get, this uses space well and is easy to update
Common File Organizations (cont) Sequential File Fixed format used for records Length and position of each field known, requiring that only values of fields must be stored First field of every record is key field, records then stored in key sequence (can have variations) NOT good for interactive applications with individual record queries or updates Inserting records is also inefficent,, requiring periodic batch merges Can be implemented by organizing file physically as linked list
Common File Organizations (cont) Indexed Sequential File Uses an index to support random access Requires an overflow file to handle additions Index uses same key as main file, and has a pointer into the file, greatly improves search time. Can have multilevel indices to get blazing fast speed
Common File Organizations (cont) Indexed File Uses an index to support random access Maintains multiple indices for each type of field that may be the subject of a search Records are accessed only by their indices, never by traversal Variable-length length fields can be used Exhaustive index and partial index may be used
Common File Organizations (cont) Hashed File Hashes on the key value to go directly to the record on disk. Primarily efficient for fixed-length records and Retreive_One operations
File Directories Is almost always a file itself Contains info for each file like: File name, type, organization Volume, starting address, size used/allocated Owner, access info, permitted actions Creation date, creator, last accessed, last accessor,, last modified, last modifier, last backup, current usage
Search File Directory Operations Locate directory entry corresponding to specified file Create file Add new directory entry Delete file Remove directory entry List Show directory contents, with possible filters Update Change properties of the directory or some file attributes only stored in the directory
Directory Structure Could have a simple, single directory Many files make it unwieldy for users Hierarchical approach is widely used Master directory with a number of files and other directories contained within Recursive substructure allows virtually unlimited (in modern systems) number of levels Usually uses a hashed structure to store entries
Naming Directory Structure (cont) Directory trees prevent the need for unique file or directory names on different levels Pathname (in UNIX) specifies the level from the top (root or master directory) /User_B/Draw/ABC Too complicated to specify full path every time, so we have concept of working directory, both for applications and users: If in User_B directory: access./draw/abc
Access Rights Individuals or groups of users are granted certain rights to files or directories, in the following hierarchy: None Can t t even know about existence of file or directory Knowledge User can determine that file exists and its owner Execution User can load & execute program but cannot copy Read User can read file for any purpose Append User can add data to the file but cannot modify or delete Update User can modify, delete, and add to the file s s data (possibly graded) Change protection User can change the access rights granted to other users Deletion User can delete the file from the file system and do anything else.
Simultaneous Access When access is granted to append or update a file to more than one user, the OS or file management system must enforce discipline. A brute-force approach is to allow a user to lock the entire file when it is to be updated. A finer grain of control is to lock individual records during update. This is the readers/writers problem, and the classic issues of mutual exclusion and deadlock must be addressed.
Record Blocking Blocks are the unit of I/O for secondary storage Records are logical unit of access, and must be organized in blocks to perform I/O Three methods: Fixed blocking Fixed-length records are used, with integral number of records stored in a block. Internal fragmentation Variable-length length spanned blocking Variable-length length records are used, packed into blocks with no unused space. Pointers used to span blocks Variable-length length unspanned blocking Same as above without spanning, with wasted space in most blocks, because of inability to use remainders
Record Blocking (cont) Fixed blocking common for sequential files with fixed-length records Variable-length length spanned blocking is efficient of storage and does not limit record size, but more complicated to implement and sometimes inefficient. Files are more difficult to update Variable-length length unspanned blocking results in wasted space and limits record size to the size of the block Record-blocking technique may interact with VM. Page may be implemented as integral number of blocks, or vice versa
File Allocation Preallocation vs Dynamic Allocation Preallocation Max file size is declared at time of creation Almost impossible to estimate reliably for most applications Potentially very wasteful Dynamic: Allocate space to a file in portions as necessary Sound familiar?
File Allocation (cont) Portion Size Choosing a size is a tradeoff. Consider: Contiguity of space increases performance, especially for Retrieve_Next Having a large number of small portions increases the size of tables needed to manage the allocation info Having fixed-size portions (blocks) simplifies the reallocation of space Having variable-size or small fixed-size portions minimizes waste of unused storage due to overallocation Leads to 2 alternatives: Variable, large contiguous portions Better performance, but space hard to reuse Blocks Provide greater flexibility, but may require complex FA structures
File Allocation (cont) Methods Contiguous allocation preallocation File Allocation Table (FAT) needs one entry per file, showing start block and length External fragmentation occurs fairly quickly Defragmentation is required to maintain performance Chained allocation On individual block basis Each block contains a pointer to next block Any free block can be added to a chain No external fragmentation Unfortunately,, cannot capitalize on principle of locality
File Allocation (cont) Indexed allocation FAT contains a separate one-level index per file File index kept in its own block Allocation can be in either fixed-size blocks or variable-size portions By blocks eliminates external fragmentation By portions improves locality File consolidation on a regular basis will improve performance Supports both sequential and direct access
File Allocation (cont) Free Space Management In addition to FAT we need disk allocation table (DAT) to manage free space Bit Tables A vector containing one bit for each block on the disk Can be very fast in main memory, tradeoff is space Chained Free Portions Free portions are chained together by using a pointer and length value in each free portion Lends itself to high amounts of fragmentation, and even deletion of highly fragmented files becomes a chore Indexing (only for variable-size portions) Treats free space like a file and uses an index table. One entry for every free portion, quite efficient Free Block List Each block assigned a number sequentially and list of the numbers s of all free blocks is maintained in a reserved portion of the storage. Efficiency can be achieved by maintaining a small portion of the list in memory at any given time
Reliability Consider this scenario: User A requests a file allocation to add to an existing file The request is granted and the disk and file allocation tables are updated in main memory but not yet on disk The system crashes and subsequently restarts User B requests a file allocation and is allocated space on disk that overlaps the last allocation to user A User A accesses the overlapped portion via a reference that is stored inside A s A s file
Reliability (cont) Solution: Lock the disk allocation table on disk, preventing another user from altering the table until the current allocation is completed Search the DAT (in memory) for available space Allocate space, update DAT, and update disk (write DAT back to disk, and possibly update pointers for chained allocation). Update the FAT on disk Unlock the DAT