File Systems. Operating System Design MOSIG 1. Instructor: Arnaud Legrand Class Assistants: Benjamin Negrevergne, Sascha Hunold.

Size: px
Start display at page:

Download "File Systems. Operating System Design MOSIG 1. Instructor: Arnaud Legrand Class Assistants: Benjamin Negrevergne, Sascha Hunold."

Transcription

1 File Systems Operating System Design MOSIG 1 Instructor: Arnaud Legrand Class Assistants: Benjamin Negrevergne, Sascha Hunold November 24, 2010 A. Legrand File Systems 1 / 95

2 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems I/O Systems 2 / 95

3 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems I/O Systems Organization 3 / 95

4 Memory and I/O buses 1880Mbps Memory 1056Mbps I/O bus CPU Crossbar CPU accesses physical memory over a bus Devices access memory over I/O bus with DMA Devices can appear to be a region of memory A. Legrand File Systems I/O Systems Organization 4 / 95

5 Realistic PC architecture AGP bus CPU CPU North Bridge frontside bus Advanced Programable Interrupt Controller bus Main memory USB PCI bus South Bridge ISA bus PCI IRQs I/O APIC A. Legrand File Systems I/O Systems Organization 5 / 95

6 What is memory? SRAM Static RAM Like two NOT gates circularly wired input-to-output 4 6 transistors per bit, actively holds its value Very fast, used to cache slower memory DRAM Dynamic RAM A capacitor + gate, holds charge to indicate bit value 1 transistor per bit extremely dense storage Charge leaks need slow comparator to decide if bit 1 or 0 Must re-write charge after reading, and periodically refresh VRAM Video RAM Dual ported, can write while someone else reads A. Legrand File Systems I/O Systems Organization 6 / 95

7 What is I/O bus? E.g., PCI A. Legrand File Systems I/O Systems Organization 7 / 95

8 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems I/O Systems Communicating With a Device 8 / 95

9 Communicating with a device Memory-mapped device registers Certain physical addresses correspond to device registers Load/store gets status/sends instructions not real memory Device memory device may have memory OS can write to directly on other side of I/O bus Special I/O instructions Some CPUs (e.g., x86) have special I/O instructions Like load & store, but asserts special I/O pin on CPU OS can allow user-mode access to I/O ports with finer granularity than page DMA place instructions to card in main memory Typically then need to poke card by writing to register Overlaps unrelated computation with moving data over (typically slower than memory) I/O bus A. Legrand File Systems I/O Systems Communicating With a Device 9 / 95

10 DMA buffers 100 Memory buffers Buffer descriptor list Include list of buffer locations in main memory Card reads list then accesses buffers (w. DMA) Descriptions sometimes allow for scatter/gather I/O A. Legrand File Systems I/O Systems Communicating With a Device 10 / 95

11 Example: Network Interface Card Host I/O bus Bus interface Link interface Network link Adaptor Link interface talks to wire/fiber/antenna Typically does framing, link-layer CRC FIFOs on card provide small amount of buffering Bus interface logic uses DMA to move packets to and from buffers in main memory A. Legrand File Systems I/O Systems Communicating With a Device 11 / 95

12 Example: IDE disk read w. DMA A. Legrand File Systems I/O Systems Communicating With a Device 12 / 95

13 Driver architecture Device driver provides several entry points to kernel Reset, ioctl, output, interrupt, read, write, strategy... How should driver synchronize with card? E.g., Need to know when transmit buffers free or packets arrive Need to know when disk request complete One approach: Polling Sent a packet? Loop asking card when buffer is free Waiting to receive? Keep asking card if it has packet Disk I/O? Keep looping until disk ready bit set Disadvantages of polling? A. Legrand File Systems I/O Systems Communicating With a Device 13 / 95

14 Driver architecture Device driver provides several entry points to kernel Reset, ioctl, output, interrupt, read, write, strategy... How should driver synchronize with card? E.g., Need to know when transmit buffers free or packets arrive Need to know when disk request complete One approach: Polling Sent a packet? Loop asking card when buffer is free Waiting to receive? Keep asking card if it has packet Disk I/O? Keep looping until disk ready bit set Disadvantages of polling? Can t use CPU for anything else while polling Or schedule poll in future and do something else, but then high latency to receive packet or process disk block A. Legrand File Systems I/O Systems Communicating With a Device 13 / 95

15 Interrupt driven devices Instead, ask card to interrupt CPU on events Interrupt handler runs at high priority Asks card what happened (xmit buffer free, new packet) This is what most general-purpose OSes do Bad under high network packet arrival rate Packets can arrive faster than OS can process them Interrupts are very expensive (context switch) Interrupt handlers have high priority In worst case, can spend 100% of time in interrupt handler and never make any progress receive livelock Best: Adaptive switching between interrupts and polling Very good for disk requests Rest of today: Disks (network devices in 1.5 weeks) A. Legrand File Systems I/O Systems Communicating With a Device 14 / 95

16 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems I/O Systems Hard drives 15 / 95

17 Anatomy of a disk Stack of magnetic platters Rotate together on a central RPM Drive speed drifts slowly over time Can t predict rotational position after revolutions Disk arm assembly Arms rotate around pivot, all move together Pivot offers some resistance to linear shocks Arms contain disk heads one for each recording surface Heads read and write data to platters A. Legrand File Systems I/O Systems Hard drives 16 / 95

18 Disk A. Legrand File Systems I/O Systems Hard drives 17 / 95

19 Disk A. Legrand File Systems I/O Systems Hard drives 17 / 95

20 Disk A. Legrand File Systems I/O Systems Hard drives 17 / 95

21 Storage on a magnetic platter Platters divided into concentric tracks A stack of tracks of fixed radius is a cylinder Heads record and sense data along cylinders Significant fractions of encoded stream for error correction Generally only one head active at a time Disks usually have one set of read-write circuitry Must worry about cross-talk between channels Hard to keep multiple heads exactly aligned A. Legrand File Systems I/O Systems Hard drives 18 / 95

22 Cylinders, tracks, & sectors A. Legrand File Systems I/O Systems Hard drives 19 / 95

23 Disk positioning system Move head to specific track and keep it there Resist physical socks, imperfect tracks, etc. Seek time depends on: Inertial power of the arm actuator motor Distance between outer-disk recording radius and inner-disk recording radius (data-band) Depends on platter-size A seek consists of up to four phases: speedup accelerate arm to max speed or half way point coast at max speed (for long seeks) slowdown stops arm near destination settle adjusts head to actual desired track Very short seeks dominated by settle time ( 1 ms) Short ( cyl.) seeks dominated by speedup Accelerations of 40g A. Legrand File Systems I/O Systems Hard drives 20 / 95

24 Seek details Head switches comparable to short seeks May also require head adjustment Settles take longer for writes than for reads Why? A. Legrand File Systems I/O Systems Hard drives 21 / 95

25 Seek details Head switches comparable to short seeks May also require head adjustment Settles take longer for writes than for reads If read strays from track, catch error with checksum, retry If write strays, you ve just clobbered some other track Disk keeps table of pivot motor power Maps seek distance to power and time Disk interpolates over entries in table Table set by periodic thermal recalibration But, e.g., 500 ms recalibration every 25 min bad for AV Average seek time quoted can be many things Time to seek 1/3 disk, 1/3 time to seek whole disk A. Legrand File Systems I/O Systems Hard drives 21 / 95

26 Sectors Bits are grouped into sectors: generally 512 bytes + overhead information Error Correcting Codes Servo fields to properly position the head Disk interface presents linear array of sectors Also 512 bytes, written atomically (even if power failure) Disk maps logical sector #s to physical sectors Zoning puts more sectors on longer tracks Track and Cylinder skewing sector 0 pos. varies by track (why?) Sparing flawed sectors remapped elsewhere OS doesn t know logical to physical sector mapping Larger logical sector # difference means larger seek Highly non-linear relationship (and depends on zone) OS has no info on rotational positions Can empirically build table to estimate times A. Legrand File Systems I/O Systems Hard drives 22 / 95

27 Sectors Bits are grouped into sectors: generally 512 bytes + overhead information Error Correcting Codes Servo fields to properly position the head Disk interface presents linear array of sectors Also 512 bytes, written atomically (even if power failure) Disk maps logical sector #s to physical sectors Zoning puts more sectors on longer tracks Track and Cylinder skewing sector 0 pos. varies by track (known head and cylinder switch time sequential access speed optimization) Sparing flawed sectors remapped elsewhere OS doesn t know logical to physical sector mapping Larger logical sector # difference means larger seek Highly non-linear relationship (and depends on zone) OS has no info on rotational positions Can empirically build table to estimate times A. Legrand File Systems I/O Systems Hard drives 22 / 95

28 Disk review Disk reads/writes in terms of sectors, not bytes Read/write single sector or adjacent groups (cluster) How to write a single byte? Read-modify-write Read in sector containing the byte Modify that byte Write entire sector back to disk Key: if cached, don t need to read in Sector = unit of atomicity. Sector write done completely, even if crash in middle (disk saves up enough momentum to complete) Larger atomic units have to be synthesized by OS A. Legrand File Systems I/O Systems Hard drives 23 / 95

29 Disk interface Controls hardware, mediates access Computer, disk often connected by bus (e.g., SCSI) Multiple devices may contentd for bus Possible disk/interface features: Disconnect from bus during requests Command queuing: Give disk multiple requests Disk can schedule them using rotational information Disk cache used for read-ahead Otherwise, sequential reads would incur whole revolution Cross track boundaries? Can t stop a head-switch Some disks support write caching But data not stable not suitable for all requests A. Legrand File Systems I/O Systems Hard drives 24 / 95

30 Disk performance Placement & ordering of requests a huge issue Sequential I/O much, much faster than random Long seeks much slower than short ones Power might fail any time, leaving inconsistent state Must be careful about order for crashes More on this in next lecture Try to achieve contiguous accesses where possible E.g., make big chunks of individual files contiguous Try to order requests to minimize seek times OS can only do this if it has a multiple requests to order Requires disk I/O concurrency High-performance apps try to maximize I/O concurrency Next: How to schedule concurrent requests A. Legrand File Systems I/O Systems Hard drives 25 / 95

31 Scheduling: FCFS First Come First Served Process disk requests in the order they are received Advantages Disadvantages A. Legrand File Systems I/O Systems Hard drives 26 / 95

32 Scheduling: FCFS First Come First Served Process disk requests in the order they are received Advantages Easy to implement Good fairness Disadvantages Cannot exploit request locality Increases average latency, decreasing throughput A. Legrand File Systems I/O Systems Hard drives 26 / 95

33 FCFS example A. Legrand File Systems I/O Systems Hard drives 27 / 95

34 Shortest positioning time first (SPTF) Shortest positioning time first (SPTF) Always pick request with shortest seek time Advantages Disadvantages Improvement A. Legrand File Systems I/O Systems Hard drives 28 / 95

35 Shortest positioning time first (SPTF) Shortest positioning time first (SPTF) Always pick request with shortest seek time Advantages Exploits locality of disk requests Higher throughput Disadvantages Starvation Don t always know what request will be fastest Improvement: Aged SPTF Give older requests higher priority Adjust effective seek time with weighting factor: T eff = T pos W T wait Also called Shortest Seek Time First (SSTF) A. Legrand File Systems I/O Systems Hard drives 28 / 95

36 SPTF example A. Legrand File Systems I/O Systems Hard drives 29 / 95

37 Elevator scheduling (SCAN) Sweep across disk, servicing all requests passed Like SPTF, but next seek must be in same direction Switch directions only if no further requests Advantages Disadvantages A. Legrand File Systems I/O Systems Hard drives 30 / 95

38 Elevator scheduling (SCAN) Sweep across disk, servicing all requests passed Like SPTF, but next seek must be in same direction Switch directions only if no further requests Advantages Takes advantage of locality Bounded waiting Disadvantages Cylinders in the middle get better service Might miss locality SPTF could exploit CSCAN: Only sweep in one direction Very commonly used algorithm in Unix Also called LOOK/CLOOK in textbook (Textbook uses [C]SCAN to mean scan entire disk uselessly) A. Legrand File Systems I/O Systems Hard drives 30 / 95

39 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems I/O Systems Flash Memory 31 / 95

40 Flash memory Today, people increasingly using flash memory Completely solid state (no moving parts) Remembers data by storing charge Lower power consumption and heat No mechanical seek times to worry about Limited # overwrites possible Blocks wear out after 10,000 (MLC) 100,000 (SLC) erases Requires flash translation layer (FTL) to provide wear leveling, so repeated writes to logical block don t wear out physical block FTL can seriously impact performance In particular, random writes very expensive [Birrell] Limited durability Charge wears out over time Turn off device for a year, you can easily lose data A. Legrand File Systems I/O Systems Flash Memory 32 / 95

41 Disk vs. Memory MLC NAND Disk Flash DRAM Smallest write sector sector byte Atomic write sector sector byte/word Random read 8 ms 75 µs 50 ns Random write 8 ms 300 µs* 50 ns Sequential read 100 MB/s 250 MB/s > 1 GB/s Sequential write 100 MB/s 170 MB/s* > 1 GB/s Cost $.08 1/GB $3/GB $10-25/GB Persistence Non-volatile Non-volatile Volatile *Flash write performance degrades over time A. Legrand File Systems I/O Systems Flash Memory 33 / 95

42 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems File System 34 / 95

43 File system fun File systems = the hardest part of OS More papers on FSes than any other single topic Main tasks of file system: Don t go away (ever) Associate bytes with name (files) Associate names with each other (directories) Can implement file systems on disk, over network, in memory, in non-volatile ram (NVRAM), on tape, w/ paper. We ll focus on disk and generalize later Today: files, directories, and a bit of performance A. Legrand File Systems File System 35 / 95

44 The medium is the message Disk = First thing we ve seen that doesn t go away So: Where all important state ultimately resides Slow (ms access vs ns for memory) Huge (100 1,000x bigger than memory) How to organize large collection of ad hoc information? Taxonomies! (Basically FS = general way to make these) A. Legrand File Systems File System 36 / 95

45 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems File System File 37 / 95

46 Files: named bytes on disk File abstraction: User s view: named sequence of bytes FS s view: collection of disk blocks File system s job: translate name & offset to disk blocks: {file, offset} FS disk address File operations: Create a file, delete a file Read from file, write to file Repositionning withing a file Truncating a file, append, rename,... File meta-informations (size, owner, access rights, timestamps,... ) Want: operations to have as few disk accesses as possible & have minimal space overhead A. Legrand File Systems File System File 38 / 95

47 What s hard about grouping blocks? Like page tables, file system meta data are simply data structures used to construct mappings Page table: map virtual page # to physical page # 23 Page table 33 File meta data: map byte offset to disk block address 418 Unix inode Directory: map name to disk address or file # foo.c directory 44 Inode stores meta-information (not name!) and file bytes location. A. Legrand File Systems File System File 39 / 95

48 FS vs. VM In both settings, want location transparency In some ways, FS has easier job than than VM: CPU time to do FS mappings not a big deal (= no TLB) Page tables deal with sparse address spaces and random access, files often denser (0... filesize 1) & sequentially accessed In some ways FS s problem is harder: Each layer of translation = potential disk access Space a huge premium! (But disk is huge?!?!) Reason? Cache space never enough; amount of data you can get in one fetch never enough Range very extreme: Many files <10 KB, some files many GB A. Legrand File Systems File System File 40 / 95

49 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems File System Inode Organization 41 / 95

50 Some working intuitions FS performance dominated by # of disk accesses Each access costs 10 milliseconds Touch the disk 100 extra times = 1 second Can easily do 100s of millions of ALU ops in same time Access cost dominated by movement, not transfer: seek time + rotational delay + # bytes/disk-bw Can get 50x the data for only 3% more overhead 1 sector: 10ms + 8ms + 10µs (= 512 B/(50 MB/s)) 18ms 50 sectors: 10ms + 8ms +.5ms = 18.5ms Observations that might be helpful: All blocks in file tend to be used together, sequentially All files in a directory tend to be used together All names in a directory tend to be used together A. Legrand File Systems File System Inode Organization 42 / 95

51 Common addressing patterns Sequential: File data processed in sequential order By far the most common mode Example: editor writes out new file, compiler reads in file, etc Random access: Address any block in file directly without passing through predecessors Examples: data set for demand paging, databases Keyed access Search for block with particular values Examples: associative data base, index Usually not provided by OS A. Legrand File Systems File System Inode Organization 43 / 95

52 Problem: how to track file s data Disk management: Need to keep track of where file contents are on disk Must be able to use this to map byte offset to disk block Structure tracking a file s sectors is called an index node or inode File descriptors must be stored on disk, too Things to keep in mind while designing file structure: Most files are small Much of the disk is allocated to large files Many of the I/O operations are made to large files Want good sequential and good random access (what do these require?) A. Legrand File Systems File System Inode Organization 44 / 95

53 Problem: how to track file s data Disk management: Need to keep track of where file contents are on disk Must be able to use this to map byte offset to disk block Structure tracking a file s sectors is called an index node or inode File descriptors must be stored on disk, too Things to keep in mind while designing file structure: Most files are small Much of the disk is allocated to large files Many of the I/O operations are made to large files Want good sequential and good random access (what do these require?) Just like VM: good data structures Arrays, linked list, trees (of arrays), hash tables. A. Legrand File Systems File System Inode Organization 44 / 95

54 Straw man: contiguous allocation Extent-based : allocate files like segmented memory When creating a file, make the user specify pre-specify its length and allocate all space at once Inode contents: location and size Example: IBM OS/360 Pros? Cons? (What VM scheme does this correspond to?) A. Legrand File Systems File System Inode Organization 45 / 95

55 Straw man: contiguous allocation Extent-based : allocate files like segmented memory When creating a file, make the user specify pre-specify its length and allocate all space at once Inode contents: location and size Example: IBM OS/360 Pros? Simple, fast access, both sequential and random Cons? (What VM scheme does this correspond to?) External fragmentation A. Legrand File Systems File System Inode Organization 45 / 95

56 Linked files Basically a linked list on disk. Keep a linked list of all free blocks Inode contents: a pointer to file s first block In each block, keep a pointer to the next one Examples (sort-of): Alto, TOPS-10, DOS FAT Pros? Cons? A. Legrand File Systems File System Inode Organization 46 / 95

57 Linked files Basically a linked list on disk. Keep a linked list of all free blocks Inode contents: a pointer to file s first block In each block, keep a pointer to the next one Examples (sort-of): Alto, TOPS-10, DOS FAT Pros? Easy dynamic growth & sequential access, no fragmentation Cons? Linked lists on disk a bad idea because of access times Pointers take up room in block, skewing alignment If one pointer is ever damaged, the rest of the file is lost. A. Legrand File Systems File System Inode Organization 46 / 95

58 Example: DOS FS (simplified) Uses linked files. Cute: links reside in fixed-sized file allocation table (FAT) rather than in the blocks. Still do pointer chasing, but can cache entire FAT so can be cheap compared to disk access A. Legrand File Systems File System Inode Organization 47 / 95

59 FAT discussion Entry size = 16 bits What s the maximum size of the FAT? Given a 512 byte block, what s the maximum size of FS? One attack: go to bigger blocks. Pros? Cons? Space overhead of FAT is trivial: 2 bytes / 512 byte block = 0.4% (Compare to Unix) Reliability: how to protect against errors? Create duplicate copies of FAT on disk. State duplication a very common theme in reliability Bootstrapping: where is root directory? Fixed location on disk: A. Legrand File Systems File System Inode Organization 48 / 95

60 FAT discussion Entry size = 16 bits What s the maximum size of the FAT? 65,536 entries Given a 512 byte block, what s the maximum size of FS? 32 MB One attack: go to bigger blocks. Pros? Cons? Space overhead of FAT is trivial: 2 bytes / 512 byte block = 0.4% (Compare to Unix) Reliability: how to protect against errors? Create duplicate copies of FAT on disk. State duplication a very common theme in reliability Bootstrapping: where is root directory? Fixed location on disk: A. Legrand File Systems File System Inode Organization 48 / 95

61 Indexed files Each file has an array holding all of it s block pointers Just like a page table, so will have similar issues Max file size fixed by array s size (static or dynamic?) Allocate array to hold file s block pointers on file creation Allocate actual blocks on demand using free list Pros? Cons? A. Legrand File Systems File System Inode Organization 49 / 95

62 Indexed files Each file has an array holding all of it s block pointers Just like a page table, so will have similar issues Max file size fixed by array s size (static or dynamic?) Allocate array to hold file s block pointers on file creation Allocate actual blocks on demand using free list Pros? Both sequential and random access easy Cons? Mapping table requires large chunk of contiguous space... Same problem we were trying to solve initially A. Legrand File Systems File System Inode Organization 49 / 95

63 Indexed files Issues same as in page tables Large possible file size = lots of unused entries Large actual size? table needs large contiguous disk chunk Solve identically: small regions with index array, this array with another array,... Downside? A. Legrand File Systems File System Inode Organization 50 / 95

64 Multi-level indexed files (old BSD FS) inode = 14 block pointers + stuff (meta-informations) A. Legrand File Systems File System Inode Organization 51 / 95

65 Old BSD FS discussion Pros: Simple, easy to build, fast access to small files Maximum file length fixed, but large. Cons: What is the worst case # of accesses? What is the worst-case space overhead? (e.g., 13 block file) An empirical problem: Because you allocate blocks by taking them off unordered freelist, meta data and data get strewn across disk A. Legrand File Systems File System Inode Organization 52 / 95

66 More about inodes Inodes are stored in a fixed-size array Size of array fixed when disk is initialized; can t be changed Lives in known location, originally at one side of disk: Now is smeared across it (why?) The index of an inode in the inode array called an i-number Internally, the OS refers to files by inumber When file is opened, inode brought in memory Written back when modified and file closed or time elapses A. Legrand File Systems File System Inode Organization 53 / 95

67 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems File System Directory Organization 54 / 95

68 Directories Problem: Spend all day generating data, come back the next morning, want to use it. F. Corbato, on why files/dirs invented. Approach 0: Have users remember where on disk their files are (E.g., like remembering your social security or bank account #) Yuck. People want human digestible names We use directories to map names to file blocks Next: What is in a directory and why? A. Legrand File Systems File System Directory Organization 55 / 95

69 A short history of directories Approach 1: Single directory for entire system Put directory at known location on disk Directory contains name, inumber pairs If one user uses a name, no one else can Many ancient personal computers work this way Approach 2: Single directory for each user Still clumsy, and ls on 10,000 files is a real pain Approach 3: Hierarchical name spaces Allow directory to map names to files or other dirs File system forms a tree (or graph, if links allowed) Large name spaces tend to be hierarchical (ip addresses, domain names, scoping in programming languages, etc.) A. Legrand File Systems File System Directory Organization 56 / 95

70 Hierarchical Unix Used since CTSS (1960s) Unix picked up and used really nicely Directories stored on disk just like regular files name,inode# afs,1021 Inode contains special flag bit set dir tmp,1020 bin,1022 User s can read just like any other file cdrom,4123 Only special programs can write (why?) dev,1001 Inodes at fixed disk location sbin,1011 File pointed to by the index may be. another directory Makes FS into hierarchical tree (what needed to make a DAG?) Simple, plus speeding up file ops speeds up dir ops! A. Legrand File Systems File System Directory Organization 57 / 95

71 Naming magic Bootstrapping: Where do you start looking? Root directory always inode #2 (0 and 1 historically reserved) Special names: Root directory: / Current directory:. Parent directory:.. Special names not implemented in FS: User s home directory: Globbing: foo.* expands to all files starting foo. Using the given names, only need two operations to navigate the entire name space: cd name: move into (change context to) directory name ls : enumerate all names in current directory (context) A. Legrand File Systems File System Directory Organization 58 / 95

72 Unix example: /a/b/c.c A. Legrand File Systems File System Directory Organization 59 / 95

73 Default context: working directory Cumbersome to constantly specify full path names In Unix, each process associated with a current working directory File names that do not begin with / are assumed to be relative to the working directory, otherwise translation happens as before Shells track a default list of active contexts A search path for programs you run Given a search path A : B : C, a shell will check in A, then check in B, then check in C Can escape using explicit paths:./foo A. Legrand File Systems File System Directory Organization 60 / 95

74 Hard and soft links (synonyms) More than one dir entry can refer to a given file foo bar Unix stores count of pointers ( hard links ) to inode To make: ln foo bar creates a synonym (bar) for file foo Soft links = synonyms for names inode #31279 refcount = 2 Point to a file (or dir) name, but object can be deleted from underneath it (or never even exist). Unix implements like directories: inode has special sym link bit set and contains pointed to name foo /bar refcount = 1 To make: ln -sf bar baz When the file system encounters a symbolic link it automatically translates it (if possible). A. Legrand File Systems File System Directory Organization 61 / 95

75 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems File System Speeding Up: FFS 62 / 95

76 Case study: speeding up FS Original Unix FS: Simple and elegant: Components: Data blocks Inodes (directories represented as files) Hard links Superblock. (specifies number of blks in FS, counts of max # of files, pointer to head of free list) Problem: slow Only gets 20Kb/sec (2% of disk maximum) even for sequential disk transfers! A. Legrand File Systems File System Speeding Up: FFS 63 / 95

77 A plethora of performance costs Blocks too small (512 bytes) File index too large Too many layers of mapping indirection Transfer rate low (get one block at time) Sucky clustering of related objects: Consecutive file blocks not close together Inodes far from data blocks Inodes for directory not close together Poor enumeration performance: e.g., ls, grep foo *.c Next: how FFS fixes these problems (to a degree) A. Legrand File Systems File System Speeding Up: FFS 64 / 95

78 Problem: Internal fragmentation Block size was to small in Unix FS Why not just make bigger? Block size space wasted file bandwidth % 2.6% % 3.3% % 6.4% % 12.0% 1MB 99.0% 97.2% Bigger block increases bandwidth, but how to deal with wastage ( internal fragmentation )? Use idea from malloc: split unused portion. A. Legrand File Systems File System Speeding Up: FFS 65 / 95

79 Solution: fragments BSD FFS: Has large block size (4096 or 8192) Allow large blocks to be chopped into small ones ( fragments ) Used for little files and pieces at the ends of files Best way to eliminate internal fragmentation? Variable sized splits of course Why does FFS use fixed-sized fragments (1024, 2048)? A. Legrand File Systems File System Speeding Up: FFS 66 / 95

80 Clustering related objects in FFS Group 1 or more consecutive cylinders into a cylinder group Key: can access any block in a cylinder without performing a seek. Next fastest place is adjacent cylinder. Tries to put everything related in same cylinder group Tries to put everything not related in different group (?!) A. Legrand File Systems File System Speeding Up: FFS 67 / 95

81 Clustering in FFS Tries to put sequential blocks in adjacent sectors (Access one block, probably access next) Tries to keep inode in same cylinder as file data: (If you look at inode, most likely will look at data too) Tries to keep all inodes in a dir in same cylinder group Access one name, frequently access many, e.g., ls -l A. Legrand File Systems File System Speeding Up: FFS 68 / 95

82 What does a cyl. group look like? Basically a mini-unix file system: How how to ensure there s space for related stuff? Place different directories in different cylinder groups Keep a free space reserve so can allocate near existing things When file grows too big (1MB) send its remainder to different cylinder group. A. Legrand File Systems File System Speeding Up: FFS 69 / 95

83 Finding space for related objs Old Unix (& dos): Linked list of free blocks Just take a block off of the head. Easy. Bad: free list gets jumbled over time. Finding adjacent blocks hard and slow FFS: switch to bit-map of free blocks Easier to find contiguous blocks. Small, so usually keep entire thing in memory Key: keep a reserve of free blocks. Makes finding a close block easier A. Legrand File Systems File System Speeding Up: FFS 70 / 95

84 Using a bitmap Usually keep entire bitmap in memory: 4G disk / 4K byte blocks. How big is map? Allocate block close to block x? Check for blocks near bmap[x/32] If disk almost empty, will likely find one near As disk becomes full, search becomes more expensive and less effective. Trade space for time (search time, file access time) Keep a reserve (e.g, 10%) of disk always free, ideally scattered across disk Don t tell users (df 110% full) With 10% free, can almost always find one of them free A. Legrand File Systems File System Speeding Up: FFS 71 / 95

85 So what did we gain? Performance improvements: Able to get 20-40% of disk bandwidth for large files 10-20x original Unix file system! Better small file performance (why?) Is this the best we can do? No. Block based rather than extent based Name contiguous blocks with single pointer and length (Linux ext2fs) Writes of meta data done synchronously Really hurts small file performance Make asynchronous with write-ordering ( soft updates ) or logging (the episode file system, LFS) Play with semantics (/tmp file systems) A. Legrand File Systems File System Speeding Up: FFS 72 / 95

86 Other hacks Obvious: Big file cache. Fact: no rotation delay if get whole track. How to use? Fact: transfer cost negligible. Recall: Can get 50x the data for only 3% more overhead 1 sector: 10ms + 8ms + 10µs (= 512 B/(50 MB/s)) 18ms 50 sectors: 10ms + 8ms +.5ms = 18.5ms How to use? Fact: if transfer huge, seek + rotation negligible How to use? A. Legrand File Systems File System Speeding Up: FFS 73 / 95

87 Other hacks Obvious: Big file cache. Fact: no rotation delay if get whole track. How to use? Fact: transfer cost negligible. Recall: Can get 50x the data for only 3% more overhead 1 sector: 10ms + 8ms + 10µs (= 512 B/(50 MB/s)) 18ms 50 sectors: 10ms + 8ms +.5ms = 18.5ms How to use? Fact: if transfer huge, seek + rotation negligible How to use? Use read ahead + cluster read/write (hoard data, write out MB at a time) A. Legrand File Systems File System Speeding Up: FFS 73 / 95

88 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems Recovering from failures 74 / 95

89 Fixing corruption fsck Must run FS check (fsck) program after crash Summary info usually bad after crash Scan to check free block map, block/inode counts System may have corrupt inodes (not simple crash) Bad block numbers, cross-allocation, etc. Do sanity check, clear inodes with garbage Fields in inodes may be wrong Count number of directory entries to verify link count, if no entries but count 0, move to lost+found Make sure size and used data counts match blocks Directories may be bad Holes illegal,. and.. must be valid,... All directories must be reachable A. Legrand File Systems Recovering from failures 75 / 95

90 Crash recovery permeates FS code Have to ensure fsck can recover file system Example: Suppose all data written asynchronously Delete/truncate a file, append to other file, crash New file may reuse block from old Old inode may not be updated Cross-allocation! Often inode with older mtime wrong, but can t be sure Append to file, allocate indirect block, crash Inode points to indirect block But indirect block may contain garbage A. Legrand File Systems Recovering from failures 76 / 95

91 Ordering of updates Must be careful about order of updates Write new inode to disk before directory entry Remove directory name before deallocating inode Write cleared inode to disk before updating CG free map Solution: Many metadata updates synchronous Doing one write at a time ensures ordering Of course, this hurts performance E.g., untar much slower than disk bandwidth Note: Cannot update buffers on the disk queue E.g., say you make two updates to same directory block But crash recovery requires first to be synchronous Must wait for first write to complete before doing second A. Legrand File Systems Recovering from failures 77 / 95

92 Performance vs. consistency FFS crash recoverability comes at huge cost Makes tasks such as untar easily times slower All because you might lose power or reboot at any time Even while slowing ordinary usage, recovery slow If fsck takes one minute, then disks get 10 bigger... One solution: battery-backed RAM Expensive (requires specialized hardware) Often don t learn battery has died until too late A pain if computer dies (can t just move disk) If OS bug causes crash, RAM might be garbage Better solution: Advanced file system techniques Topic of rest of lecture A. Legrand File Systems Recovering from failures 78 / 95

93 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems Recovering from failures Ordered Updates 79 / 95

94 First attempt: Ordered updates Must follow three rules in ordering updates: 1. Never write pointer before initializing the structure it points to 2. Never reuse a resource before nullifying all pointers to it 3. Never clear last pointer to live resource before setting new one If you do this, file system will be recoverable Moreover, can recover quickly Might leak free disk space, but otherwise correct So start running after reboot, scavenge for space in background How to achieve? Keep a partial order on buffered blocks A. Legrand File Systems Recovering from failures Ordered Updates 80 / 95

95 Ordered updates (continued) Example: Create file A Block X contains an inode Block Y contains a directory block Create file A in inode block X, dir block Y We say Y X, pronounced Y depends on X Means Y cannot be written before X is written X is called the dependee, Y the depender Can delay both writes, so long as order preserved Say you create a second file B in blocks X and Y Only have to write each out once for both creates A. Legrand File Systems Recovering from failures Ordered Updates 81 / 95

96 Problem: Cyclic dependencies Suppose you create file A, unlink file B Both files in same directory block & inode block Can t write directory until inode A initialized Otherwise, after crash directory will point to bogus inode Worse yet, same inode # might be re-allocated So could end up with file name A being an unrelated file Can t write inode block until dir entry B cleared Otherwise, B could end up with too small a link count File could be deleted while links to it still exist Otherwise, fsck has to be very slow Check every directory entry and inode link count A. Legrand File Systems Recovering from failures Ordered Updates 82 / 95

97 Cyclic dependencies illustrated Inode Block Directory Block Inode #4 <,#0 > Inode #5 < B,#5 > Inode #6 Inode #7 < C,#7 > (a) Original Organization = in use = free Original Modified Inode Block Directory Block Inode #4 < A,#4 > Inode #5 < B,#5 > Inode #6 Inode #7 < C,#7 > (b) Create File A Inode Block Directory Block Inode #4 < A,#4 > Inode #5 <,#0 > Inode #6 Inode #7 < C,#7 > (c) Remove file B A. Legrand File Systems Recovering from failures Ordered Updates 83 / 95

98 More problems Crash might occur between ordered but related writes E.g., summary information wrong after block freed Block aging Block that always has dependency will never get written back Solution: Soft updates [Ganger] Write blocks in any order But keep track of dependencies When writing a block, temporarily roll back any changes you can t yet commit to disk A. Legrand File Systems Recovering from failures Ordered Updates 84 / 95

99 Breaking dependencies w. rollback Main Memory Disk Inode Block Directory Block Inode Block Directory Block Inode #4 Inode #5 Inode #6 Inode #7 < A,#4 > <,#0 > < C,#7 > Inode #4 Inode #5 Inode #6 Inode #7 <,#0 > < B,#5 > < C,#7 > (a) After Metadata Updates Now say we decide to write directory block... Can t write file name A to disk has dependee A. Legrand File Systems Recovering from failures Ordered Updates 85 / 95

100 Breaking dependencies w. rollback Main Memory Disk Inode Block Inode #4 Inode #5 Inode #6 Inode #7 Directory Block < A,#4 > <,#0 > < C,#7 > Inode Block Inode #4 Inode #5 Inode #6 Inode #7 Directory Block <,#0 > <,#0 > < C,#7 > (b) Safe Version of Directory Block Written Undo file A before writing dir block to disk Even though we just wrote it, directory block still But now inode block has no dependees Can safely write inode block to disk as-is... A. Legrand File Systems Recovering from failures Ordered Updates 85 / 95

101 Breaking dependencies w. rollback Main Memory Disk Inode Block Directory Block Inode Block Directory Block Inode #4 Inode #5 Inode #6 Inode #7 < A,#4 > <,#0 > < C,#7 > Inode #4 Inode #5 Inode #6 Inode #7 <,#0 > <,#0 > < C,#7 > (c) Inode Block Written Now inode block clean (same in memory as on disk) But have to write directory block a second time... A. Legrand File Systems Recovering from failures Ordered Updates 85 / 95

102 Breaking dependencies w. rollback Main Memory Disk Inode Block Directory Block Inode Block Directory Block Inode #4 Inode #5 Inode #6 Inode #7 < A,#4 > <,#0 > < C,#7 > Inode #4 Inode #5 Inode #6 Inode #7 < A,#4 > <,#0 > < C,#7 > All data stably on disk (d) Directory Block Written Crash at any point would have been safe A. Legrand File Systems Recovering from failures Ordered Updates 85 / 95

103 Soft updates Structure for each updated field or pointer, contains: old value new value list of updates on which this update depends (dependees) Can write blocks in any order But must temporarily undo updates with pending dependencies Must lock rolled-back version so applications don t see it Choose ordering based on disk arm scheduling Some dependencies better handled by postponing in-memory updates E.g., when freeing block (e.g., because file truncated), just mark block free in bitmap after block pointer cleared on disk A. Legrand File Systems Recovering from failures Ordered Updates 86 / 95

104 Simple example Say you create a zero-length file A Depender: Directory entry for A Can t be written untill dependees on disk Dependees: Inode must be initialized before dir entry written Bitmap must mark inode allocated before dir entry written Old value: empty directory entry New value: filename A, inode # Can write directory block to disk any time Must substitute old value until inode & bitmap updated on disk Once dir block on disk contains A, file fully created Crash before A on disk, worst case might leak the inode A. Legrand File Systems Recovering from failures Ordered Updates 87 / 95

105 Operations requiring soft updates (1) 1. Block allocation Must write the disk block, the free map, & a pointer Disk block & free map must be written before pointer Use Undo/redo on pointer (& possibly file size) 2. Block deallocation Must write the cleared pointer & free map Just update free map after pointer written to disk Or just immediately update free map if pointer not on disk Say you quickly append block to file then truncate You will know pointer to block not written because of the allocated dependency structure So both operations together require no disk I/O! A. Legrand File Systems Recovering from failures Ordered Updates 88 / 95

106 Operations requiring soft updates (2) 3. Link addition (see simple example) Must write the directory entry, inode, & free map (if new inode) Inode and free map must be written before dir entry Use undo/redo on i# in dir entry (ignore entries w. i# 0) 4. Link removal Must write directory entry, inode & free map (if nlinks==0) Must decrement nlinks only after pointer cleared Clear directory entry immediately Decrement in-memory nlinks once pointer written If directory entry was never written, decrement immediately (again will know by presence of dependency structure) Note: Quick create/delete requires no disk I/O A. Legrand File Systems Recovering from failures Ordered Updates 89 / 95

107 Soft update issues fsync sycall to flush file changes to disk Must also flush directory entries, parent directories, etc. unmount flush all changes to disk on shutdown Some buffers must be flushed multiple times to get clean Deleting large directory trees frighteningly fast unlink syscall returns even if inode/indir block not cached! Dependencies allocated faster than blocks written Cap # dependencies allocated to avoid exhausting memory Useless write-backs Syncer flushes dirty buffers to disk every 30 seconds Writing all at once means many dependencies unsatisfied Fix syncer to write blocks one at a time Fix LRU buffer eviction to know about dependencies A. Legrand File Systems Recovering from failures Ordered Updates 90 / 95

108 Soft updates fsck Split into foreground and background parts Foreground must be done before remounting FS Need to make sure per-cylinder summary info makes sense Recompute free block/inode counts from bitmaps very fast Will leave FS consistent, but might leak disk space Background does traditional fsck operations Do after mounting to recuperate free space Can be using the file system while this is happening Must be done in forground after a media failure Difference from traditional FFS fsck: May have many, many inodes with non-zero link counts Don t stick them all in lost+found (unless media failure) A. Legrand File Systems Recovering from failures Ordered Updates 91 / 95

109 Outline I/O Systems Organization Communicating With a Device Hard drives Flash Memory File System File Inode Organization Directory Organization Speeding Up: FFS Recovering from failures Ordered Updates Journaling A. Legrand File Systems Recovering from failures Journaling 92 / 95

110 An alternative: Journaling Reserve a portion of disk for write-ahead log Write any metadata operation first to log, then to disk After crash/reboot, re-play the log (efficient) May re-do already committed change, but won t miss anything Performance advantage: Log is consecutive portion of disk Multiple log writes very fast (at disk b/w) Consider updates committed when written to log Example: delete directory tree Record all freed blocks, changed directory entries in log Return control to user Write out changed directories, bitmaps, etc. in background (sort for good disk arm scheduling) A. Legrand File Systems Recovering from failures Journaling 93 / 95

111 Journaling details Must find oldest relevant log entry Otherwise, redundant and slow to replay whole log Use checkpoints Once all records up to log entry N have been processed and affected blocks stably committed to disk... Record N to disk either in reserved checkpoint location, or in checkpoint log record Never need to go back before most recent checkpointed N Must also find end of log Typically circular buffer; don t play old records out of order Can include begin transaction/end transaction records Also typically have checksum in case some sectors bad A. Legrand File Systems Recovering from failures Journaling 94 / 95

112 Journaling vs. soft updates Both much better than FFS alone Some limitations of soft updates Very specific to FFS data structures (E.g., couldn t easily complex data structures like B-trees in XFS even directory rename not quite right) Metadata updates may proceed out of order (E.g., create A, create B, crash maybe only B exists after reboot) Still need slow background fsck to reclaim space Some limitations of journaling Disk write required for every metadata operation (whereas createthen-delete might require no I/O w. soft updates) Possible contention for end of log on multi-processor fsync must sync other operations metadata to log, too A. Legrand File Systems Recovering from failures Journaling 95 / 95

The medium is the message. File system fun. Disk review Disk reads/writes in terms of sectors, not bytes. Disk vs. Memory. Files: named bytes on disk

The medium is the message. File system fun. Disk review Disk reads/writes in terms of sectors, not bytes. Disk vs. Memory. Files: named bytes on disk File system fun The medium is the message File systems = the hardest part of OS - More papers on FSes than any other single topic Main tasks of file system: - Don t go away (ever) - Associate bytes with

More information

File system fun. File systems: traditionally hardest part of OS. - More papers on FSes than any other single topic

File system fun. File systems: traditionally hardest part of OS. - More papers on FSes than any other single topic 1/39 File system fun File systems: traditionally hardest part of OS - More papers on FSes than any other single topic Main tasks of file system: - Don t go away (ever) - Associate bytes with name (files)

More information

File system fun. File systems: traditionally hardest part of OS

File system fun. File systems: traditionally hardest part of OS File system fun File systems: traditionally hardest part of OS - More papers on FSes than any other single topic Main tasks of file system: - Don t go away (ever) - Associate bytes with name (files) -

More information

- So: Where all important state ultimately resides

- So: Where all important state ultimately resides File system fun Why disks are different Disk = First state we ve seen that doesn t go away File systems: traditionally hardest part of OS - More papers on FSes than any other single topic CRASH! memory

More information

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups Review: FFS background 1980s improvement to original Unix FS, which had: - 512-byte blocks - Free blocks in linked list - All inodes at beginning of disk - Low throughput: 512 bytes per average seek time

More information

Review: FFS background

Review: FFS background 1/37 Review: FFS background 1980s improvement to original Unix FS, which had: - 512-byte blocks - Free blocks in linked list - All inodes at beginning of disk - Low throughput: 512 bytes per average seek

More information

Anatomy of a disk. Stack of magnetic platters. Disk arm assembly

Anatomy of a disk. Stack of magnetic platters. Disk arm assembly Anatomy of a disk Stack of magnetic platters - Rotate together on a central spindle @3,600-15,000 RPM - Drives speed drifts slowly over time - Can t predict rotational position after 100-200 revolutions

More information

Midterm evaluations. Thank you for doing midterm evaluations! First time not everyone thought I was going too fast

Midterm evaluations. Thank you for doing midterm evaluations! First time not everyone thought I was going too fast p. 1/3 Midterm evaluations Thank you for doing midterm evaluations! First time not everyone thought I was going too fast - Some people didn t like Q&A - But this is useful for me to gauge if people are

More information

Administrivia. Midterm. Realistic PC architecture. Memory and I/O buses. What is memory? What is I/O bus? E.g., PCI

Administrivia. Midterm. Realistic PC architecture. Memory and I/O buses. What is memory? What is I/O bus? E.g., PCI Midterm Administrivia "midterm.data" using 1:2 14 12 10 8 6 4 2 0 0 20 40 60 80 100 Median: 40, Mean: 43.6 Recall we will have a resurrection final - As long as you took the midterm - Don t panic if you

More information

Questions on Midterm. Midterm. Administrivia. Memory and I/O buses. Realistic PC architecture. What is memory? Median: 59.5, Mean: 57.3, Stdev: 26.

Questions on Midterm. Midterm. Administrivia. Memory and I/O buses. Realistic PC architecture. What is memory? Median: 59.5, Mean: 57.3, Stdev: 26. Midterm Questions on Midterm 14 12 10 8 6 4 2 "midterm.data" using 1:2 Problem 1: See Andrea Problem 2: See Matt Problem 3: See Ali Problem 4: See Kiyoshi Problem 5: See Juan Problem 6: See Ali or Matt

More information

Operating Systems. File Systems. Thomas Ropars.

Operating Systems. File Systems. Thomas Ropars. 1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating

More information

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh

Advanced File Systems. CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh Advanced File Systems CS 140 Feb. 25, 2015 Ali Jose Mashtizadeh Outline FFS Review and Details Crash Recoverability Soft Updates Journaling LFS/WAFL Review: Improvements to UNIX FS Problems with original

More information

SCSI overview. SCSI domain consists of devices and an SDS

SCSI overview. SCSI domain consists of devices and an SDS SCSI overview SCSI domain consists of devices and an SDS - Devices: host adapters & SCSI controllers - Service Delivery Subsystem connects devices e.g., SCSI bus SCSI-2 bus (SDS) connects up to 8 devices

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2017 Lecture 15: File Systems Ryan Huang Administrivia Next Tuesday project hacking day, no class 10/26/17 CS 318 Lecture 15 File Systems 2 File System Fun File

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2018 Lecture 16: Advanced File Systems Ryan Huang Slides adapted from Andrea Arpaci-Dusseau s lecture 11/6/18 CS 318 Lecture 16 Advanced File Systems 2 11/6/18

More information

CS 318 Principles of Operating Systems

CS 318 Principles of Operating Systems CS 318 Principles of Operating Systems Fall 2017 Lecture 16: File Systems Examples Ryan Huang File Systems Examples BSD Fast File System (FFS) - What were the problems with the original Unix FS? - How

More information

Memory and I/O buses

Memory and I/O buses Memory and I/O buses 1880Mbps Memory 1056Mbps I/O bus CPU Crossbar CPU accesses physical memory over a bus Devices access memory over I/O bus with DMA Devices can appear to be a region of memory 1 / 41

More information

1 / Dual ported DRAM, can write while someone else reads. [intel] 3 / 40

1 / Dual ported DRAM, can write while someone else reads. [intel] 3 / 40 Memory and I/O es Realistic ~2005 PC architecture I/O 1880Mbps Memory 1056Mbps CPU Crossbar CPU accesses physical memory over a Devices access memory over I/O with DMA Devices can appear to be a region

More information

Operating Systems. Operating Systems Professor Sina Meraji U of T

Operating Systems. Operating Systems Professor Sina Meraji U of T Operating Systems Operating Systems Professor Sina Meraji U of T How are file systems implemented? File system implementation Files and directories live on secondary storage Anything outside of primary

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

I/O and file systems. Dealing with device heterogeneity

I/O and file systems. Dealing with device heterogeneity I/O and file systems Abstractions provided by operating system for storage devices Heterogeneous -> uniform One/few storage objects (disks) -> many storage objects (files) Simple naming -> rich naming

More information

Review: Hardware user/kernel boundary

Review: Hardware user/kernel boundary Review: Hardware user/kernel boundary applic. applic. applic. user lib lib lib kernel syscall pg fault syscall FS VM sockets disk disk NIC context switch TCP retransmits,... device interrupts Processor

More information

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane Secondary storage CS 537 Lecture 11 Secondary Storage Michael Swift Secondary storage typically: is anything that is outside of primary memory does not permit direct execution of instructions or data retrieval

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Ext3/4 file systems. Don Porter CSE 506

Ext3/4 file systems. Don Porter CSE 506 Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers

More information

Lecture 16: Storage Devices

Lecture 16: Storage Devices CS 422/522 Design & Implementation of Operating Systems Lecture 16: Storage Devices Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information

Disk Scheduling COMPSCI 386

Disk Scheduling COMPSCI 386 Disk Scheduling COMPSCI 386 Topics Disk Structure (9.1 9.2) Disk Scheduling (9.4) Allocation Methods (11.4) Free Space Management (11.5) Hard Disk Platter diameter ranges from 1.8 to 3.5 inches. Both sides

More information

Lecture S3: File system data layout, naming

Lecture S3: File system data layout, naming Lecture S3: File system data layout, naming Review -- 1 min Intro to I/O Performance model: Log Disk physical characteristics/desired abstractions Physical reality Desired abstraction disks are slow fast

More information

CS 537 Fall 2017 Review Session

CS 537 Fall 2017 Review Session CS 537 Fall 2017 Review Session Deadlock Conditions for deadlock: Hold and wait No preemption Circular wait Mutual exclusion QUESTION: Fix code List_insert(struct list * head, struc node * node List_move(struct

More information

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What

More information

CS 4284 Systems Capstone

CS 4284 Systems Capstone CS 4284 Systems Capstone Disks & File Systems Godmar Back Filesystems Files vs Disks File Abstraction Byte oriented Names Access protection Consistency guarantees Disk Abstraction Block oriented Block

More information

CS5460: Operating Systems Lecture 20: File System Reliability

CS5460: Operating Systems Lecture 20: File System Reliability CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving

More information

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU Crash Consistency: FSCK and Journaling 1 Crash-consistency problem File system data structures must persist stored on HDD/SSD despite power loss or system crash Crash-consistency problem The system may

More information

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University. Operating Systems Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring 2014 Paul Krzyzanowski Rutgers University Spring 2015 March 27, 2015 2015 Paul Krzyzanowski 1 Exam 2 2012 Question 2a One of

More information

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University COS 318: Operating Systems Storage Devices Vivek Pai Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall11/cos318/ Today s Topics Magnetic disks Magnetic disk

More information

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now Ext2 review Very reliable, best-of-breed traditional file system design Ext3/4 file systems Don Porter CSE 506 Much like the JOS file system you are building now Fixed location super blocks A few direct

More information

Announcements. Persistence: Crash Consistency

Announcements. Persistence: Crash Consistency Announcements P4 graded: In Learn@UW by end of day P5: Available - File systems Can work on both parts with project partner Fill out form BEFORE tomorrow (WED) morning for match Watch videos; discussion

More information

CSC369 Lecture 9. Larry Zhang, November 16, 2015

CSC369 Lecture 9. Larry Zhang, November 16, 2015 CSC369 Lecture 9 Larry Zhang, November 16, 2015 1 Announcements A3 out, due ecember 4th Promise: there will be no extension since it is too close to the final exam (ec 7) Be prepared to take the challenge

More information

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices Storage Systems Main Points File systems Useful abstractions on top of physical devices Storage hardware characteristics Disks and flash memory File system usage patterns File Systems Abstraction on top

More information

Lecture 18: Reliable Storage

Lecture 18: Reliable Storage CS 422/522 Design & Implementation of Operating Systems Lecture 18: Reliable Storage Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of

More information

Operating Systems Design Exam 2 Review: Spring 2011

Operating Systems Design Exam 2 Review: Spring 2011 Operating Systems Design Exam 2 Review: Spring 2011 Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 CPU utilization tends to be lower when: a. There are more processes in memory. b. There are fewer processes

More information

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices Where Are We? COS 318: Operating Systems Storage Devices Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) u Covered: l Management of CPU

More information

CS 416: Opera-ng Systems Design March 23, 2012

CS 416: Opera-ng Systems Design March 23, 2012 Question 1 Operating Systems Design Exam 2 Review: Spring 2011 Paul Krzyzanowski pxk@cs.rutgers.edu CPU utilization tends to be lower when: a. There are more processes in memory. b. There are fewer processes

More information

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling. Don Porter CSE 506 Block Device Scheduling Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Kernel RCU File System Networking Sync Memory Management Device Drivers CPU Scheduler

More information

Block Device Scheduling

Block Device Scheduling Logical Diagram Block Device Scheduling Don Porter CSE 506 Binary Formats RCU Memory Management File System Memory Allocators System Calls Device Drivers Interrupts Net Networking Threads Sync User Kernel

More information

File Layout and Directories

File Layout and Directories COS 318: Operating Systems File Layout and Directories Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Topics File system structure Disk

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha OS Abstractions Applications Threads File system Virtual memory Operating System Next few lectures:

More information

22 File Structure, Disk Scheduling

22 File Structure, Disk Scheduling Operating Systems 102 22 File Structure, Disk Scheduling Readings for this topic: Silberschatz et al., Chapters 11-13; Anderson/Dahlin, Chapter 13. File: a named sequence of bytes stored on disk. From

More information

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24 FILE SYSTEMS, PART 2 CS124 Operating Systems Fall 2017-2018, Lecture 24 2 Last Time: File Systems Introduced the concept of file systems Explored several ways of managing the contents of files Contiguous

More information

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1 Ref: Chap 12 Secondary Storage and I/O Systems Applied Operating System Concepts 12.1 Part 1 - Secondary Storage Secondary storage typically: is anything that is outside of primary memory does not permit

More information

File Systems. Chapter 11, 13 OSPP

File Systems. Chapter 11, 13 OSPP File Systems Chapter 11, 13 OSPP What is a File? What is a Directory? Goals of File System Performance Controlled Sharing Convenience: naming Reliability File System Workload File sizes Are most files

More information

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse] File Systems CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse] The abstraction stack I/O systems are accessed through a series of layered abstractions Application

More information

[537] Fast File System. Tyler Harter

[537] Fast File System. Tyler Harter [537] Fast File System Tyler Harter File-System Case Studies Local - FFS: Fast File System - LFS: Log-Structured File System Network - NFS: Network File System - AFS: Andrew File System File-System Case

More information

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 Today s Goals Supporting multiple file systems in one name space. Schedulers not just for CPUs, but disks too! Caching

More information

COS 318: Operating Systems. Journaling, NFS and WAFL

COS 318: Operating Systems. Journaling, NFS and WAFL COS 318: Operating Systems Journaling, NFS and WAFL Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Topics Journaling and LFS Network

More information

19 File Structure, Disk Scheduling

19 File Structure, Disk Scheduling 88 19 File Structure, Disk Scheduling Readings for this topic: Silberschatz et al., Chapters 10 11. File: a named collection of bytes stored on disk. From the OS standpoint, the file consists of a bunch

More information

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University COS 318: Operating Systems Storage Devices Kai Li Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall11/cos318/ Today s Topics Magnetic disks Magnetic disk

More information

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23 FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most

More information

CSE506: Operating Systems CSE 506: Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems CSE 506: Operating Systems File Systems Traditional File Systems FS, UFS/FFS, Ext2, Several simple on disk structures Superblock magic value to identify filesystem type Places to find metadata on disk

More information

Lecture 11: Linux ext3 crash recovery

Lecture 11: Linux ext3 crash recovery 6.828 2011 Lecture 11: Linux ext3 crash recovery topic crash recovery crash may interrupt a multi-disk-write operation leave file system in an unuseable state most common solution: logging last lecture:

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006 November 21, 2006 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds MBs to GBs expandable Disk milliseconds

More information

CS 111. Operating Systems Peter Reiher

CS 111. Operating Systems Peter Reiher Operating System Principles: File Systems Operating Systems Peter Reiher Page 1 Outline File systems: Why do we need them? Why are they challenging? Basic elements of file system design Designing file

More information

CS510 Operating System Foundations. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole CS510 Operating System Foundations Jonathan Walpole File System Performance File System Performance Memory mapped files - Avoid system call overhead Buffer cache - Avoid disk I/O overhead Careful data

More information

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse] Disks and RAID CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse] Storage Devices Magnetic disks Storage that rarely becomes corrupted Large capacity at low cost Block

More information

[537] Journaling. Tyler Harter

[537] Journaling. Tyler Harter [537] Journaling Tyler Harter FFS Review Problem 1 What structs must be updated in addition to the data block itself? [worksheet] Problem 1 What structs must be updated in addition to the data block itself?

More information

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery

More information

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin) : LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)

More information

Arvind Krishnamurthy Spring Implementing file system abstraction on top of raw disks

Arvind Krishnamurthy Spring Implementing file system abstraction on top of raw disks File Systems Arvind Krishnamurthy Spring 2004 File Systems Implementing file system abstraction on top of raw disks Issues: How to find the blocks of data corresponding to a given file? How to organize

More information

File System Management

File System Management Lecture 8: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation

More information

FS Consistency & Journaling

FS Consistency & Journaling FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching

More information

File Systems. CS170 Fall 2018

File Systems. CS170 Fall 2018 File Systems CS170 Fall 2018 Table of Content File interface review File-System Structure File-System Implementation Directory Implementation Allocation Methods of Disk Space Free-Space Management Contiguous

More information

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University CS 370: OPERATING SYSTEMS [DISK SCHEDULING ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University L30.1 Frequently asked questions from the previous class survey ECCs: How does it impact

More information

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Spring 2017] Dept. Of Computer Science, Colorado State University CS 370: OPERATING SYSTEMS [DISK SCHEDULING ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Can a UNIX file span over

More information

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University

CS370: Operating Systems [Fall 2018] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [DISK SCHEDULING ALGORITHMS] Shrideep Pallickara Computer Science Colorado State University ECCs: How does it impact

More information

Case study: ext2 FS 1

Case study: ext2 FS 1 Case study: ext2 FS 1 The ext2 file system Second Extended Filesystem The main Linux FS before ext3 Evolved from Minix filesystem (via Extended Filesystem ) Features Block size (1024, 2048, and 4096) configured

More information

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling. Don Porter CSE 506 Block Device Scheduling Don Porter CSE 506 Quick Recap CPU Scheduling Balance competing concerns with heuristics What were some goals? No perfect solution Today: Block device scheduling How different from

More information

2. PICTURE: Cut and paste from paper

2. PICTURE: Cut and paste from paper File System Layout 1. QUESTION: What were technology trends enabling this? a. CPU speeds getting faster relative to disk i. QUESTION: What is implication? Can do more work per disk block to make good decisions

More information

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University COS 318: Operating Systems Storage Devices Jaswinder Pal Singh Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Today s Topics Magnetic disks

More information

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University File System Consistency Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Crash Consistency File system may perform several disk writes to complete

More information

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13 I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2004 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems V. File System SGG9: chapter 11 Files, directories, sharing FS layers, partitions, allocations, free space TDIU11: Operating Systems Ahmed Rezine, Linköping University Copyright Notice: The lecture notes

More information

Case study: ext2 FS 1

Case study: ext2 FS 1 Case study: ext2 FS 1 The ext2 file system Second Extended Filesystem The main Linux FS before ext3 Evolved from Minix filesystem (via Extended Filesystem ) Features Block size (1024, 2048, and 4096) configured

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002 Lecture 18: Naming, Directories, and File Caching 18.0 Main Points How do users name files? What is a name? Lookup:

More information

File System Internals. Jo, Heeseung

File System Internals. Jo, Heeseung File System Internals Jo, Heeseung Today's Topics File system implementation File descriptor table, File table Virtual file system File system design issues Directory implementation: filename -> metadata

More information

Chapter 6: File Systems

Chapter 6: File Systems Chapter 6: File Systems File systems Files Directories & naming File system implementation Example file systems Chapter 6 2 Long-term information storage Must store large amounts of data Gigabytes -> terabytes

More information

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File File File System Implementation Operating Systems Hebrew University Spring 2007 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write

More information

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization CPSC 421 Database Management Systems Lecture 11: Storage and File Organization * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Today s Agenda Start on Database Internals:

More information

CSE 153 Design of Operating Systems Fall 2018

CSE 153 Design of Operating Systems Fall 2018 CSE 153 Design of Operating Systems Fall 2018 Lecture 12: File Systems (1) Disk drives OS Abstractions Applications Process File system Virtual memory Operating System CPU Hardware Disk RAM CSE 153 Lecture

More information

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Hard Disk Drives Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Storage Stack in the OS Application Virtual file system Concrete file system Generic block layer Driver Disk drive Build

More information

Computer Systems Laboratory Sungkyunkwan University

Computer Systems Laboratory Sungkyunkwan University File System Internals Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Today s Topics File system implementation File descriptor table, File table

More information

Journaling. CS 161: Lecture 14 4/4/17

Journaling. CS 161: Lecture 14 4/4/17 Journaling CS 161: Lecture 14 4/4/17 In The Last Episode... FFS uses fsck to ensure that the file system is usable after a crash fsck makes a series of passes through the file system to ensure that metadata

More information

File Systems: Consistency Issues

File Systems: Consistency Issues File Systems: Consistency Issues File systems maintain many data structures Free list/bit vector Directories File headers and inode structures res Data blocks File Systems: Consistency Issues All data

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important

More information

File System Consistency

File System Consistency File System Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

File Systems II. COMS W4118 Prof. Kaustubh R. Joshi hdp://

File Systems II. COMS W4118 Prof. Kaustubh R. Joshi hdp:// File Systems II COMS W4118 Prof. Kaustubh R. Joshi krj@cs.columbia.edu hdp://www.cs.columbia.edu/~krj/os References: OperaXng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright

More information

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1 Filesystem Disclaimer: some slides are adopted from book authors slides with permission 1 Storage Subsystem in Linux OS Inode cache User Applications System call Interface Virtual File System (VFS) Filesystem

More information

Mass-Storage Structure

Mass-Storage Structure Operating Systems (Fall/Winter 2018) Mass-Storage Structure Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review On-disk structure

More information

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions.

Implementation should be efficient. Provide an abstraction to the user. Abstraction should be useful. Ownership and permissions. File Systems Ch 4. File Systems Manage and organize disk space. Create and manage files. Create and manage directories. Manage free space. Recover from errors. File Systems Complex data structure. Provide

More information