FFS. CS380L: Mike Dahlin. October 9, Overview of file systems trends, future. Phase 1: Basic file systems (e.g.

Similar documents
CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2002

Lecture S3: File system data layout, naming

CS 318 Principles of Operating Systems

Operating Systems. Operating Systems Professor Sina Meraji U of T

CS 318 Principles of Operating Systems

Operating Systems. File Systems. Thomas Ropars.

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

Introduction to I/O and Disk Management

Introduction to I/O and Disk Management

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

[537] Fast File System. Tyler Harter

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Disk Scheduling COMPSCI 386

CSE 153 Design of Operating Systems

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CS 537 Fall 2017 Review Session

CSC369 Lecture 9. Larry Zhang, November 16, 2015

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File Systems Part 1. Operating Systems In Depth XIV 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

CSE 153 Design of Operating Systems

File Systems: FFS and LFS

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1

2. PICTURE: Cut and paste from paper

COS 318: Operating Systems. Storage Devices. Jaswinder Pal Singh Computer Science Department Princeton University

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

L7: Performance. Frans Kaashoek Spring 2013

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

The Berkeley File System. The Original File System. Background. Why is the bandwidth low?

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

Chapter 10: Mass-Storage Systems

File System Implementations

L9: Storage Manager Physical Data Organization

CSE 153 Design of Operating Systems Fall 2018

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

SCSI overview. SCSI domain consists of devices and an SDS

CS510 Operating System Foundations. Jonathan Walpole

COMP 530: Operating Systems File Systems: Fundamentals

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Database Management Systems need to:

COS 318: Operating Systems. Storage Devices. Vivek Pai Computer Science Department Princeton University

I/O CANNOT BE IGNORED

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

Storing Data: Disks and Files. Storing and Retrieving Data. Why Not Store Everything in Main Memory? Chapter 7

Locality and The Fast File System. Dongkun Shin, SKKU

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

Disks, Memories & Buffer Management

Storing and Retrieving Data. Storing Data: Disks and Files. Solution 1: Techniques for making disks faster. Disks. Why Not Store Everything in Tapes?

6.033 Computer System Engineering

secondary storage: Secondary Stg Any modern computer system will incorporate (at least) two levels of storage:

Professor: Pete Keleher! Closures, candidate keys, canonical covers etc! Armstrong axioms!

High-Performance Storage Systems

Filesystems Lecture 10. Credit: some slides by John Kubiatowicz and Anthony D. Joseph

Chapter 11: File System Implementation. Objectives

I/O CANNOT BE IGNORED

Current Topics in OS Research. So, what s hot?

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

CSE 451: Operating Systems Spring Module 12 Secondary Storage

File Systems: Fundamentals

Today: Secondary Storage! Typical Disk Parameters!

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

BBM371- Data Management. Lecture 2: Storage Devices

1. What is the difference between primary storage and secondary storage?

Hard Disk Drives. Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

CS162 Operating Systems and Systems Programming Lecture 17. Disk Management and File Systems. Page 1

Chapter 4 File Systems. Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved

Hard Disk Organization. Vocabulary

CS 4284 Systems Capstone

Workloads. CS 537 Lecture 16 File Systems Internals. Goals. Allocation Strategies. Michael Swift

Lecture 16: Storage Devices

Fast File, Log and Journaling File Systems" cs262a, Lecture 3

Principles of Data Management. Lecture #2 (Storing Data: Disks and Files)

File System Implementation. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS5460: Operating Systems Lecture 20: File System Reliability

Module 1: Basics and Background Lecture 4: Memory and Disk Accesses. The Lecture Contains: Memory organisation. Memory hierarchy. Disks.

Chapter 10: Mass-Storage Systems

File Systems Management and Examples

Mass-Storage Structure

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

- So: Where all important state ultimately resides

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Local File Stores. Job of a File Store. Physical Disk Layout CIS657

Review: FFS [McKusic] basics. Review: FFS background. Basic FFS data structures. FFS disk layout. FFS superblock. Cylinder groups

CS 318 Principles of Operating Systems

Last Class: Memory management. Per-process Replacement

Hard facts. Hard disk drives

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

Storing Data: Disks and Files

V. Mass Storage Systems

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

File System Structure. Kevin Webb Swarthmore College March 29, 2018

Midterm evaluations. Thank you for doing midterm evaluations! First time not everyone thought I was going too fast

Fast File System (FFS)

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

Transcription:

FFS CS380L: Mike Dahlin October 9, 2007 Treat disks like tape. J. Ousterhout 1 Preliminaries 1.1 Review 1.2 Outline Overview of file systems trends, future Disk drive modeling FFS 1.3 Preview (see below) 2 Overview of file systems Phase 1: Basic file systems (e.g., Unix pre-ffs) Naming: name fileid blocks Allocation: free list, bitmap,... Phase 2: Performance/efficiency/locality (e.g., FFS) Technical motivation: seek times, rotation times v. transfer times; capacity increases faster than performance 1

Workload motivation: locality (sequential within files, locality within directories) Approach: locality within files, directories; waste space for performance More today: A Fast File System for Unix Phase 3: Optimize writes reliable, efficient (e.g., JFS, LFS) Workload motivation: don t lose data Technical motivation: (1) big memory writes dominate reads (?), (2) ordering writes via careful ordering of updates lots of sequentially-constrained seeks Approach: Transactions + logging; almost all modern file systems are journaling metadata updates are done via transaction, data updates done in place More next week: The transaction concept, LFS Guest lecture: IBM Linux Technology Center JFS for Linux and Linux Volume Manager W Oct 23 5-6:30PM TAY 2.106 Phase 4: Scalable, distributed (e.g., RAID, XFS, xfs, NASD,...) Technical motivation: (1) networks can concentrate more users (e.g., from across enterprise) on a file server; (2) networks can distribute file system across many disks/servers; (3) cheap silicon can put more smarts into disk Approach 1 (widely adopted): RAID, scalable FS (XFS) Approach 2: (research systems) smart disks, xfs, NASD, Frangipani,... 2

3 Disk drive modeling Disk surface: circular disk, coated with magnetic material Tracks: concentric rings around disk surface, bits laid out serially along each track Each track is split up into sectors: arc of track; also, minimal unit of transfer Disks spin continuously Disks organized as set of platters in a stack Disk is read/write via a comb - 2 read/write heads at end of each arm cylinder - corresponds to track on each surface 3

Disk operation is in terms of radial coordinates (not x, y, z) - move arm to correct track, wait for disk to rotate under head, select head, then transfer as it is going by High end disk today: Seagate Cheetah ST373405LC (March 2002) Capacity 73GB # surfaces per pack 8 # cylinders 29,549 total # tracks per system 236,394 # sectors per track 776 (avg) # bytes per sector 512 # revolutions per minute 10000 buffer size 4MB xfer rate 50MB/s-85MB/s (internal) 38-64 MB/s (internal, formatted) 160MB/s (external, max) seek time avg: 5.1ms (r) 5.5ms (w) 1 track:.4ms (r).6ms (w) full disk: 9.4ms (r) 9.8ms (w) idle power 10W MTBF 1,200,000 hours Nonrecoverable read errors 1 per 10 15 bits read Note: nonrecoverable read errors may be becoming significant: 1GB = 10 10 bits; 128GB = 10 12 bits. Based on worst-case guarantees by manufacturers, nontrivial (1/1000?) probability that you will successfully read an entire disk. And, nonrecoverable read errors are improving slowly (Pete Chen s 1994 RAID survey cites a figure of 1 per 10 14 in 1994 disks have increased in size by about 3 orders of magnitude while bit errors have improved by only 1 order of magnitude. This implies probability of this error has increased by 2 orders of magnitude over the last decade. Possible project: examine this trend in more detail. Do we need to redesign single-disk file systems to use more redundancy? Or, should we count on disk manufacturers to back off on density and/or increase robustness of coding to keep this issue acceptable. Will everyone that cares about reliability use RAID? (What about laptops and other portable devices?) 3.1 what you need to know about disks 1. tech trends 2. failures 4

3. performance 3.2 Tech trends Disk density (dollars/byte) improving much faster then mechanical limitations cost per byte improves by 100% per year bandwidth improves by about 20-30% per year (10x per decade) seek, rotation get better by 5-10% per year (2-3x per decade) Other key techh trend: form factor innovator s dilemma: 8 inch, 5 inch, 3.5inch,... Today on 3.5 inch v. 2.5inch cusp (incumbants likely to survive) solid state drives are threatening (incumbants at risk) 3.3 Reliability Disks are supposed to store data permanently problems Block writes v. higher-level writes: transactions (next lecture) Disk crash MTTF (advertised) O(1M hours about 100 years) MTTF valid during disk lifetime (about 5 years) bathtub curve In practice, expect to lose a few percent of your disks each year Block corruption nonrecoverable read error (advertised) 1 per 10 1 5 bits read ( about 1 bad block per 10 TB...not ignorable any more!) 5

3.4 Disk performance To read or write a disk block 0 controller time: Wilkes: about 1ms 1. seek: position heads over cylinder (+ select head) Time for seek depends on how fast you can move the arm. Typically 10-20ms to move all the way across disk Typically 0.5-1ms to move 1 track or select different head on same cylinder Wilkes: small seek time is K0 + K1 * sqrt(ntracks), large seek time is K2 + K3 * ntracks HP 97560: 3.24 +.400 d(d < 383) 8.00 +.008d(d >= 383) 2. rotational delay: wait for sector to rotate underneath head 10000 RPM - 166 revolutions per second 6 ms per rotation 3. transfer bytes 0.5 KB/sector * (100-1000)sectors/revolution * 166 Rev per second = 8-80 MB/s (e.g. 10 MB/s.5KB sector 0.005ms) overall time to do disk I/O: controller time + seek + rotational delay + transfer Question: are seek and rotational times overhead, latency, or bandwidth? Seek and rotational delay are latency to CPU, overhead to disk transfer rate is BW controller time is latency to CPU, overhead to controller Random access avg seek 4ms 1 2 rotation 3ms transfer 0.02 ms 6

about 7ms to fetch/put data; mostly seek/rotation 73 KB/s for random sector reads note: random access is pessimistic worst case assumes no locality at all (assumes seek is 1/3 of distance across disk) Sequential access What if next sector on same track? Then no seek, no rotation delay 10+ MB/s Key to using disk effectively (and therefore to everything in file systems) is to minimize seek and rotational delay Ousterhout: treat disks like tape Performance bottom line peformance standard model seek + rot + xfer is standard model seek 4-10ms average rot 6-10ms full rotation, 3-5ms average xfer 40-80MB/s.005ms per sector gotchas 4 Admin Exam: tuesday average rotation misnamed rot > seek >> xfer break even at 1MB xfer (ignoring limits on track size!) standard model ignores track buffer no such thing as a cylinder stateful need to model current head position, rot position, track sparing, etc to accurately model disk (see Ruemmler and wilkes) In class open book Be ready for all papers. 7

Projects: Checkpoint Oct 28 (3 weeks) Think through the rest of the semester. 5 FFS 5.1 Old Unix FS Small blocks 0.5-1K Files not layed out contiguously Simple free list implementation Fragmentation over time need to study file system layout over time, not in lab Inodes far from data Inodes in a directory not together 5.2 FFS Solution part 1: Big blocks 4-8K blocks But most files are small internal fragmentation Fragments within blocks (.5-1K) File size < block size: a number of contiguous fragments File size > block size: a number of blocks plus a number of contiguous fragments Solved? Pretty good bandwidth for large files Pretty good space utilization for small files Question: Why don t we have a 1MB block size and 1KB fragment size? Solution part 2: Inter-block locality Organize freelist into cylinder groups try to allocate within a group; search for new block: 8

Rotationally closest in current cylinder Current cylinder group Another cylinder group Challenge: can t put everything next to everything Each new directory goes to new cylinder group (localize inodes) Start allocating blocks for files in directory s cylinder group Move big files to different cylinder group before they fill it up Performance 10x improvement CPU, memory-to-memory copy are limiting factors Optimal rotational positioning hard Criticisms 6 P Heuristics and magic numbers: hack or science? Paper skips interesting science: 90% heuristic; cylinder group concept; bigger blocks;... How will technology trends change these heuristics? Do cylinder groups make sense any more? (Do cylinders?) How will other workloads interact with heuristics? erformance model Questions Assumes directory is unit of locality... What if patterns change? Suppose you blanked out the experimental section. What experiments should they run? (What are they/should they be trying to show?) Suppose you blanked out the graphs, themselves. Is the system and methodology described well enough that you can develop a performance model to predict system performance? 9

Disk parameters: They don t provide all of the details about their disk performance. From looking on line, here is the info I was able to find on the AMPEX Capricorn 330-mb winchester drives used in this study. 32 sectors/track 16 tracks 1024 cylindars 3600 rpm (Note that these numbers are not quite consistent with the paper they come out to a 262MB disk rather than 330. Hopefully they are close enough to make sense of the results.) 6.1 P erformance model What are goals/claims/metrics? Design an experiment to test each. Based on performance model of old FS, ideal FS, and FFS predict performance for these three cases First pass: Predict shape of lines, which line above which other line Second pass: predict actual values What is performance model for old FS? 1024-byte blocks Random layout of blocks (due to free space fragmentation + linked list free list) each read request requires more-or-less a random seek + random rotation) (about 20ms?) Even more detailed model might account for the fact that inodes are clustered together near outer edge of disk reading lots of inodes may be a little bit faster (ls); reading first block of file may have a little longer seek How many reads for first block of a random file? How many reads to read a k-block file once the first block is read? 10

How many writes to create a file? How many writes to write a k-block file? How long to do ls of a directory containing k files? What is performance model for FFS? (what factors important) 4KB blocks sequential blocks of file layed out (mostly) sequentially (hopefully) More detailed model accounts for skip-sector positioning, head switch time, cylindar switch time, optimization (p190) of limiting amount of file per cylendar group (first 48KB in first group; max of 1MB in each subsequent random group),... Inodes in cylendar groups Files in same directory in same cylindar group; different directories in different groups Paper The biggest factor in this speedup is because of the larger block size Do you agree? Specific questions/experiments/predictions? How many reads for first block of a random file? How long to do a read of first block of a random file? How many reads to read a k-block file once the first block is read? How long to read a k-block file once the first block is read? How many writes to create a file? How long to create a file? How many writes to write a k-block file? How long to write a k-block file? How long to do ls of a directory containing k files?... Other notes In paper: other factors (controller, CPU) limit performance. Not just disk model Claim: Having good disk model would help you analyze system/write this paper by realizing that other factors were important. 11

7 Z FS 7.1 K ey features 128 bit (tech trend 2x per year...1 bit per year) volume pool performance/correctness: understand transactions that span disks (v. low-level block interface must conservatively flush everything in order) easier management (wrapper scripts can do some but not all e.g., grow a partition) dynamic load balancing, grow pool,... checksum data uncrecoverable read errors no longer can be ignored Also proactively scrub copy on write defer to LFS discussion 7.2 P erformance v. ext3 journaled (slightly unfair to ZFS?) v. ext3 ordered/writeback (very unfair to ZFS)...probably won t make too much sense w/o LFS paper assignment: after read LFS paper, see if graphs here make sense 12