File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Similar documents
File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Disk Scheduling. Based on the slides supporting the text

Disk Scheduling. Chapter 14 Based on the slides supporting the text and B.Ramamurthy s slides from Spring 2001

Chapter 14: Mass-Storage Systems. Disk Structure

Module 13: Secondary-Storage Structure

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Structure

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

Chapter 10: Mass-Storage Systems

CSE325 Principles of Operating Systems. Mass-Storage Systems. David P. Duggan. April 19, 2011

Disc Allocation and Disc Arm Scheduling?

Chapter 12: Secondary-Storage Structure. Operating System Concepts 8 th Edition,

Free Space Management

OPERATING SYSTEMS CS3502 Spring Input/Output System Chapter 9

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Chapter 14: Mass-Storage Systems

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

CS420: Operating Systems. Mass Storage Structure

Module 13: Secondary-Storage

Overview of Mass Storage Structure

Mass-Storage Structure

EIDE, ATA, SATA, USB,

V. Mass Storage Systems

CS3600 SYSTEMS AND NETWORKS

Operating Systems 2010/2011

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Part IV I/O System. Chapter 12: Mass Storage Structure

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Mass-Storage Structure

CSE 4/521 Introduction to Operating Systems. Lecture 27 (Final Exam Review) Summer 2018

Chapter 12: Mass-Storage Systems. Operating System Concepts 8 th Edition,

Operating Systems. No. 9 ศร ณย อ นทโกส ม Sarun Intakosum

Chapter 10: Mass-Storage Systems

Operating Systems. Operating Systems Professor Sina Meraji U of T

Tape pictures. CSE 30341: Operating Systems Principles

Lecture 9. I/O Management and Disk Scheduling Algorithms

I/O Systems and Storage Devices

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

UNIT 4 Device Management

CS370 Operating Systems

Disk Scheduling COMPSCI 386

Chapter 11. I/O Management and Disk Scheduling

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management

Chapter 12: Mass-Storage

I/O Device Controllers. I/O Systems. I/O Ports & Memory-Mapped I/O. Direct Memory Access (DMA) Operating Systems 10/20/2010. CSC 256/456 Fall

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

Part IV I/O System Chapter 1 2: 12: Mass S torage Storage Structur Structur Fall 2010

Silberschatz, et al. Topics based on Chapter 13

Block Device Driver. Pradipta De

I/O Management and Disk Scheduling. Chapter 11

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Introduction. Secondary Storage. File concept. File attributes

Chapter 12: Mass-Storage

Chapter 12: Mass-Storage

Chapter 11 I/O Management and Disk Scheduling

Disks and I/O Hakan Uraz - File Organization 1

MODULE 4. FILE SYSTEM AND SECONDARY STORAGE

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

CSC369 Lecture 9. Larry Zhang, November 16, 2015

CS370 Operating Systems

Chapter 14 Mass-Storage Structure

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

Secondary Storage (Chp. 5.4 disk hardware, Chp. 6 File Systems, Tanenbaum)

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

File Systems: Fundamentals

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

Chapter 14: Mass-Storage Systems

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao

File Systems: Fundamentals

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

MASS-STORAGE STRUCTURE

Today: Secondary Storage! Typical Disk Parameters!

CS 537 Fall 2017 Review Session

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Chapter 11: File System Implementation. Objectives

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Introduction to OS. File Management. MOS Ch. 4. Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

Chapter 5: Input Output Management. Slides by: Ms. Shree Jaswal

CS 318 Principles of Operating Systems

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

Input/Output Management

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

EECS 482 Introduction to Operating Systems

Operating Systems. V. Input / Output. Eurecom

File Systems. File system interface (logical view) File system implementation (physical view)

CS510 Operating System Foundations. Jonathan Walpole

Final Exam Preparation Questions

19 File Structure, Disk Scheduling

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

CSE380 - Operating Systems. Communicating with Devices

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

File System Internals. Jo, Heeseung

CS 318 Principles of Operating Systems

Transcription:

File File System Implementation Operating Systems Hebrew University Spring 2007 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write bytes. Interpretation of the data is left to the application using it. File descriptor - file system internal structure that lets file system locate file on disk 1 2 Permissions and Data Layout Owner: the user who owns this file. Permissions: who is allowed to access this file. Modification time: when this file was last modified. Size: how many bytes of data are there. Data location: where on the disk the file s data is stored. Operations Open: gain access to a file. Close: relinquish access to a file. Read: read data from a file, usually from the current position. Write: write data to a file, usually at the current position. Append: add data at the end of a file. Seek: move to a specific position in a file. Rewind: return to the beginning of the file. Set attributes: e.g. to change the access permissions. 3 4 Storing and Accessing File Data Store data in a file involves: Allocation of disk blocks to the file. Read and write operations Optimization - avoid disk access by caching data in memory. Opening a File fd=open( myfile",r), leads to the following sequence of actions: 1. The file system reads the current directory, and finds that myfile is represented internally by entry 13 in the list of files maintained on the disk. 2. Entry 13 from that data structure is read from the disk and copied into the kernel s open files table. This includes various file attributes, such as the file s owner and its access permissions, and a list of the file blocks. 3. The user s access rights are checked against the file s permissions to ascertain that reading (R) is allowed. 4. The user s variable fd (for file descriptor ) is made to point to the allocated entry in the open files table. 5 6 1

Opening a File Cont. Opening a File Cont. Entry 13 from that data structure is read from the disk and copied into the kernel s open files table. 7 The file system reads the current directory, and finds that myfile is represented internally by entry 13 in the list of files maintained on the disk. User s access rights are checked and the user s variable fd is made to point to the allocated entry in the open files table 8 Opening a File Cont. Why do we need the open file table? Time Locality principle. Saving disk s calls. All the operations on the file is performed through the file descriptor. Reading from a File read(fd,buf,100), leads to the following sequence of actions: 1. The argument fd identifies the open file by pointing into the kernel s open files table. 2. The system gains access to the list of blocks that contain the file s data. 3. The file system therefore reads disk block number 5 into its buffer cache (Why? Place Locality principle).. 4. The desired range of 100 bytes is copied into the user s memory at the address indicated by buf. 9 10 Reading from a File Cont, Reading from a File Cont. the system gains access to the list of blocks that contain the file s data. The argument fd identifies the open file by pointing into the kernel s open files table. 11 The file system therefore reads disk block number 5 into its buffer cache. the desired range of 100 bytes is copied into the user s memory at the 12 address indicated by buf. 2

Writing to a File Writing to a File Cont. We have two assumptions: 1. We want to write 100 bytes, starting with byte 2000 in the file. 2. Each disk block is 1024 bytes. Therefore, the data we want to write spans the end of the second block to the beginning of the third block. The full block must first be read into the buffer cache. Then the part being written is We will write 48 bytes at the end of block number 8 and the rest of the data should go into the third block. The third block must be allocated from the pool of free blocks. modified by overwriting it with the new data. 13 Let s assume that block number 2 on the disk was free, and that this block was 14 allocated. Writing to a file Cont. The Location in The File Q. Is there a need to read it from the disk before modifying it? Since this is a new block, we can just allocate a block in the buffer cache, prescribe that it now represents block number 2, and copy the requested data to it (52 bytes). 15 Finally, the modified blocks are written back to the disk. The read system call provides a buffer address for placing the data in the user s memory, but does not indicate the offset in the file from which the data should be taken. The operating system maintains the current offset into the file, and updates it at the end of each operation. If random access is required, the process can set the file pointer to any desired value by using the seek system call. 16 Mapping File Blocks How do we find the blocks that together constitute the file? How do we find the right one if we want to access the file at a particular offset? 17 The Unix i-node In Unix, files are represented internally by a structure known as an inode, it includes an index of disk blocks. The index is arranged in a hierarchical manner: Few (e.g. 10) direct pointers, which list the first blocks of the file. One single indirect pointer - points to a whole block of additional direct pointers. One double indirect pointer - points to a block of indirect pointers. One triple indirect pointer - points to a block of double indirect pointers. 18 3

i-node - Example i-node Structure Blocks are 1024 bytes. Each pointer is 4 bytes. The 10 direct pointers then provide access to a maximum of 10 KB. The indirect block contains 256 additional pointers, for a total of 266 blocks (266 KB). The double indirect block has 256 pointers to indirect blocks, so it represents a total of 65536 blocks. File sizes can grow to a bit over 64 MB. 19 20 Super Block How the kernel assigns inodes and disk blocks? In order to make this assignment, the kernel hold a super block, this block contains: The size of the file system A list of free blocks available on the fs A list of free inodes in the fs Etc i-node Assignment to a New File The file system contain a list of inodes, when a process needs a new inode, the kernel can search the inodes list for a free one. To improve performance, the super block contains the number of free inodes in the file system. If the super block list of free inodes is empty, the kernel searches the disk and places as many free inode numbers as possible into the super block. Freeing an inode - The kernel places the inode number in the super block free inodes list. 21 22 Allocation of Disk Blocks When a process writes data to a file, the kernel must allocate disk blocks from the file system for direct or indirect block. Super block contains the number of free disk blocks in the file system. When the kernel wants to allocate a block from the file system, it allocates the next available block in the super block list. Direct Memory Access When a process needs a block from the disk, the cpu needs to copy the requested block from the disk to the main memory. The is a waste of cpu time. If we could exempt the cpu from this job, it will be available to run other ready processes. This is the reason that the operating system contains the DMA. 23 24 4

Direct Memory Access Cont. Sequence of actions: 1. Os generates the DMA and pass it the needed parameters (address on disk, address on main memory, amount of data to copy) 2. The running process is transferred to the blocked queue and a new process from the ready queue is selected to run. 3. The DMA transfers the requested data from the disk to the main memory. 4. The DMA sends an interrupt to the cpu, indicating the IO operation has been finished. 5. The waiting process is transferred to the ready queue. 25 Sharing a File Assume there is a log file that needs to be written by more than one process. If each process has its own file pointer, there is a danger that one process will overwrite log entries of another process. If the file pointer is shared, each new log entry will indeed be added to the end of the file. Unix uses a set of three tables to control access to files. 26 Sharing a File - Cont. Sharing a File - Cont. The inodetable- each file may appear at most once in this table. The open files table an entry in this table is allocated every time a file is opened. Each entry contains a pointer to the inode table and a position within the file. There could be multiple open file entries pointing to the same inode. The file descriptor table separate for each process. Each entry point to an entry in the open files table. The index of this slot is the fd that returned by open. 27 28 Disk Structure I/O Management and Disk Scheduling Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. 29 30 5

Moving-Head Disk Mechanism 31 Disk Scheduling The operating system is responsible for using hardware efficiently - having a fast access time and disk bandwidth. Access time has two major components Seek time is the time for the disk to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Minimize seek time Seek time seek distance Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for 32 service and the completion of the last transfer. Disk Scheduling - Cont. Whenever a process needs I/O from the disk, it issues a system call to the OS with the following information. Whether the operation is input or output What the disk address for the transfer is What the memory address for the transfer is What the number of bytes to be transferred is. Several algorithms exist to schedule the servicing of disk I/O requests. We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 FCFS FCFS Advantage: fair Disadvantage: slow Illustration shows total head movement of 640 cylinders. 33 34 Head pointer 53 Shortest-Seek-Time-First (SSTF) Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling May cause starvation of some requests. Illustration shows total head movement of 236 cylinders. SSTF - Cont. Total head movement=236 cylinders 35 36 6

SCAN or Elevator Algorithm The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Sometimes called the elevator algorithm. This algorithm does not give uniform wait time. Illustration shows total head movement of 236 cylinders. 37 SCAN - Cont. Total head movement=236 cylinders (or 208 if the head doesn t reach the edges) 38 C-SCAN Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. C-SCAN - Cont. 39 40 C-LOOK Version of C-SCAN Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. C-LOOK - Cont. 41 42 7

Selecting a Disk-Scheduling Algorithm SSTF is common and has a natural advantage over FCFS SCAN and C-SCAN perform better for systems that place a heavy load on the disk (little chance to starvation). Performance depends on the number and types of requests. Requests for disk service can be influenced by the fileallocation method. Either SSTF or LOOK is a reasonable choice for the default algorithm. 43 Layout on Disk The conventional Unix file system is composed of 3 parts: the superblock, inodes, and data blocks. The conventional layout: the superblock is the first block on the disk. Next come all the inodes. All the rest are data blocks. The problem with this layout: much seeking. Consider the example of opening a file named "/a/b/c : access the root inode, to find the blocks used to implement it. reads these blocks to find which inode has been allocated to directory a. read the inode for a to get to its blocks read the blocks to find b, end so on. If all the inodes are concentrated at one end of the disk, while the blocks are dispersed throughout the disk, this means repeated seeking back and forth. 44 Cylinder Group Cylinder group - a set of adjacent cylinders. A file system consists of a set of cylinder groups. Each cylinder group has a redundant copy of the super block, space for inodes and a bit map describing available blocks in the cylinder group. Basic idea: put related information together in the same cylinder group and unrelated information apart in different cylinder groups. Use a bunch of heuristics. Try to put all inodes for a given directory in the same cylinder group. Try to put blocks for one file adjacent in the cylinder group. For long files redirect blocks to a new cylinder group every megabyte. Increased block size. The minimum block size is now 4096 bytes. Helps read bandwidth and write bandwidth for big files. 45 Layout on Disk Cont. A possible solution is to try and put inodes and related blocks next to each other, rather than concentrating all the inodes in one place. This was done in the Unix file system. 46 Indirect Blocks For large files, most of the file data is in indirect blocks. The file system therefore switches to a new cylinders group whenever a new indirect block is allocated. Assuming 12 direct blocks of size 8 KB each, the first indirect block is allocated when the file size reaches 96 KB. Having to perform a seek at this relatively small size leads to a substantial reduction in the achievable bandwidth. The solution to this is to make the first indirect block a special case, that stays in the same Block Fragment Each disk block can be chopped up into 2, 4, or 8 fragments. Each file contains at most one fragment which holds the last part of data in the file. If have 8 small files they together only occupy one disk block. Can allocate larger fragments if the end of the file is larger than one eighth of the disk block. When increase the size of the file, may need to copy out the last fragment if the size gets too big. cylinders group as the inode. 47 48 8

RAID Structure Problem with disk Data transfer rate is limited by serial access. Reliability Solution to both problems: Redundant arrays of inexpensive disks (RAID) In the past, RAID (combination of cheap disks) was alternative for large and expensive disks. Today: RAID is used for their higher reliability and higher datatransfer rate. So the I in RAID stands for independent instead of inexpensive. So RAID stands for Redundant Arrays of Independent Disks. 49 RAID Approaches RAID 1: mirroring there are two copies of each block on distinct disks. This allows for fast reading (you can access the less loaded copy), but wastes disk space and delays writing. 50 RAID Approaches - Cont RAID 3: parity disk data blocks are distributed among the disks in round-robin manner. For each set of blocks, a parity block is computed and stored on a separate disk. If one of the original data blocks is lost due to a disk failure, its data can be reconstructed from the other blocks and the parity. 51 RAID Approaches - Cont RAID 5: distributed parity in RAID 3, the parity disk participates in every write operation (because this involves updating some block and the parity), and becomes a bottleneck. The solution is to store the parity blocks on different disks. 52 9