CS 416: Operating Systems Design March 30, 2015

Similar documents
Operating Systems Design 14. Special File Systems. Paul Krzyzanowski

Operating Systems Design 12a. Special File Systems

CS 416: Operating Systems Design 4/24/2014

Peeping Tom in the Neighborhood: Keystroke Eavesdropping on Multi-User System

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

CS 416: Opera-ng Systems Design March 23, 2012

Outline. Operating Systems. File Systems. File System Concepts. Example: Unix open() Files: The User s Point of View

Operating Systems Design Exam 2 Review: Spring 2011

Anatomy of Linux flash file systems

ECE 598 Advanced Operating Systems Lecture 19

CS370 Operating Systems

Operating Systems. File Systems. Thomas Ropars.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

UNIX File Systems. How UNIX Organizes and Accesses Files on Disk

CS510 Operating System Foundations. Jonathan Walpole

Virtual File System. Don Porter CSE 306

File System Internals. Jo, Heeseung

Operating Systems. Operating Systems Professor Sina Meraji U of T

File System: Interface and Implmentation

Linux File System Study Guide

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

The Journalling Flash File System

Operating Systems Design Exam 2 Review: Fall 2010

ECE 598 Advanced Operating Systems Lecture 14

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File Systems. What do we need to know?

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Advanced UNIX File Systems. Berkley Fast File System, Logging File System, Virtual File Systems

File Systems Management and Examples

YAFFS A NAND flash filesystem

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

CS 111. Operating Systems Peter Reiher

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

Logical disks. Bach 2.2.1

Virtual File System. Don Porter CSE 506

CSE506: Operating Systems CSE 506: Operating Systems

(Not so) recent development in filesystems

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

CS 550 Operating Systems Spring File System

File Management 1/34

CS370 Operating Systems

File Systems: Fundamentals

Chapter 10: Case Studies. So what happens in a real operating system?

CS 416: Operating Systems Design April 22, 2015

Various Versioning & Snapshot FileSystems

File Systems: Fundamentals

Operating Systems Design Exam 2 Review: Spring 2012

Caching and reliability

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

CIS Operating Systems File Systems. Professor Qiang Zeng Spring 2018

COS 318: Operating Systems. Journaling, NFS and WAFL

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

3/26/2014. Contents. Concepts (1) Disk: Device that stores information (files) Many files x many users: OS management

Mobile phones Memory technologies MMC, emmc, SD & UFS

Computer Engineering II Solution to Exercise Sheet 9

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

Unix System Architecture, File System, and Shell Commands

we are here I/O & Storage Layers Recall: C Low level I/O Recall: C Low Level Operations CS162 Operating Systems and Systems Programming Lecture 18

The UNIX Time- Sharing System

What is a file system

Design Overview of the FreeBSD Kernel CIS 657

FFS: The Fast File System -and- The Magical World of SSDs

Example Implementations of File Systems

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Design Overview of the FreeBSD Kernel. Organization of the Kernel. What Code is Machine Independent?

Glossary. The target of keyboard input in a

CIS Operating Systems File Systems. Professor Qiang Zeng Fall 2017

OPERATING SYSTEMS CS136

Computer Systems Laboratory Sungkyunkwan University

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

SCSI overview. SCSI domain consists of devices and an SDS

Chapter 12: File System Implementation

The Journalling Flash File System

Virtual Memory Paging

COMP 530: Operating Systems File Systems: Fundamentals

Outline. Cgroup hierarchies

CS370 Operating Systems

Files. File Structure. File Systems. Structure Terms. File Management System. Chapter 12 File Management 12/6/2018

Persistent Storage - Datastructures and Algorithms

ECE 598 Advanced Operating Systems Lecture 22

OPERATING SYSTEM. Chapter 12: File System Implementation

Ext3/4 file systems. Don Porter CSE 506

Linux Filesystems Ext2, Ext3. Nafisa Kazi

W4118 Operating Systems. Instructor: Junfeng Yang

CS197U: A Hands on Introduction to Unix

Introduction to Linux

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

File Systems. CS170 Fall 2018

File Systems. Chapter 11, 13 OSPP

Fall 2017 :: CSE 306. File Systems Basics. Nima Honarmand

VIRTUAL FILE SYSTEM AND FILE SYSTEM CONCEPTS Operating Systems Design Euiseong Seo

[537] Journaling. Tyler Harter

1 / 23. CS 137: File Systems. General Filesystem Design

Transcription:

Operating Systems 15. File System Implementation Log Structured File Systems Paul Krzyzanowski Rutgers University Spring 2015 1 2 NAND flash memory Memory arranged in pages similar to disk blocks Unit of allocation and programming Individual bytes cannot be written Problems with conventional file systems Modify the same blocks over and over At odds with NAND flash performance Have to rely on FTL or smart controller You cannot just write to a block of flash memory It has to be erased first Read-erase-write may be 50 100x slower than writing to an already-erased block! Optimizations to minimize seek time Spatial locality is meaningless with flash memory Limited erase-write cycles ~100,000 to 1,000,000 cycles Employ wear leveling to distribute writes among all (most) blocks Bad block retirement 3 4 Wear leveling Dynamic wear leveling Monitors erase count of blocks Map logical block addresses to flash memory block addresses When a block of data is written to flash memory, Write to a free block with the lowest erase count Update logical physical block map Blocks that are never modified will not get reused Static wear leveling Copy static data with low erase counts to another block so the original block can be reused Usually triggered when the (maximum-minimum) erase cycles reaches a threshold Our options with NAND flash memory 1. NAND flash with a flash memory controller Handles block mapping (logical physical) Block Lookup Table Employs wear leveling: usually static and dynamic Asynchronous garbage collection and erasing Can use conventional file systems transparent to software 2. Flash Translation Layer (FTL) Software layer between flash hardware & a block device Microsoft s term: Flash Abstraction Layer (FAL) sits on top of Flash Media Driver Rarely used now moved to firmware (1) 3. OS file system software optimized for raw flash storage Write new blocks instead of erasing & overwriting an old one Erase the old blocks later 5 6 1

Block 1 Block 1 CS 416: Operating Systems Design March 30, 2015 Log-Structured file systems Designed for wear-leveling Entire file system is essentially a log of operations Some operations update older operations Blocks containing the older operations can be reclaimed File systems designed for wear leveling UBIFS, 2, LogFS, JFFS2, and others JFFS2 is favored for smaller disks Used in low-capacity embedded systems 2 is favored for disks > 64 MB Android used 2 for /system and /data [through v2.3] and VFAT for /sdcard Supports static wear leveling Supports dynamic wear leveling Supports static wear UBIFS (Unsorted Block Image File System) leveling Successor to 2; designed to shorten mounting time & memory needs LogFS Short mounting time as in UBIFS competes with UBIFS Supports compression 7 8 Stores objects Files, directories, hard links, symbolic links, devices Each object has a unique integer object ID inodes & directory entries (dentries) Unit of allocation = chunk Several (32 128+) chunks = 1 block Unit of erasure for Log structure: all updates written sequentially Each log entry is 1 chunk in size: Data chunk or Object header (describes directory, file, link, etc.) Sequence numbers are used to organize a log chronologically Each chunk contains: Object ID: object the chunk belongs to Chunk ID: where the chunk belongs in the file Byte count: # bytes of valid data in the chunk 9 10 Create a file Write some data to the file 1 First chunk of data 2 2 500 2 Second chunk of data 3 3 500 3 Third chunk of data 11 12 2

Block 1 Block 2 Block 1 Block 2 Close the file: write new header Open file; modify first chunk; close file First chunk of data First chunk of data 2 500 2 Second chunk of data 2 500 2 Second chunk of data 3 500 3 Third chunk of data 3 500 3 Third chunk of data Object header for file (length=n) Object header for file (length=n) New first chunk of data 2 500 0 New object header for file (length=n) 13 14 Garbage Collection If all chunks in a block are deleted The block can be erased & reused If blocks have some free chunks We need to do garbage collection Copy active chunks onto other blocks so we can free a block Passive collection: pick blocks with few used chunks Aggressive collection: try harder to consolidate chunks in-memory structures Construct file system state in memory Map of in-use chunks In-memory object state for each object File tree/directory structure to locate objects Scan the log backwards chronologically highest lowest sequence numbers Checkpointing: save the state of these structures at unmount time to speed up the next mount 15 16 error detection/correction ECC used for error recovery Correct 1 bad bit per 256 bytes Detect 2 bad bits per 256 bytes Bad blocks: if read or write fails, ask driver to mark the block as bad UBIFS vs Entire file system state does not have to be stored in memory Challenge Index has to be updated out-of-place Parts that refer to updated areas have to also be updated UBIFS wandering tree (B+ Tree) Only leaves contain file information Internal nodes = index nodes Update to FS Create leaf; add/replace into wandering tree Update parent index nodes up to the root 17 18 3

Special file systems Pseudo devices Device drivers can also provide custom functions Even if there is no underlying device 19 20 Simple special-function device files /dev/null Null device Throw away anything written to it; return EOF on reads /dev/zero Zero device Return zeros for each read operation /dev/random, /dev/urandom urandom is non-blocking \Device\KsecDD on Windows NT Random numbers Loop pseudo device Provides a block device interface to a file Register file as a block device Let the buffer cache know: request (strategy) procedure for read/write block size The file can then be formatted with a file system and mounted See the losetup command in Linux Common uses installation software CD/DVD images Encrypted file systems 21 22 Loop device Example: Access an ISO 9660 CD image on a file sitting in an ext4 file system System calls VFS Example: (1) Create a loop device Create a 10MB file named file.img # dd if=/dev/zero of=file.img bs=1k count=10000 10000+0 records in 10000+0 records out Associate loop device /dev/loop0 with the file file.img # losetup /dev/loop0 file.img iso9660 fs driver ext4 fs driver This makes /dev/loop0 a block device whose contents are file.img Buffer cache # ls -l /dev/loop0 brw-rw---- 1 root disk 7, 0 Mar 30 10:55 /dev/loop0 Loop device driver SATA driver 23 24 4

Example: (2) Put a file system on the file Example: (3) Mount it Create a file system on /dev/loop0 # mke2fs -c /dev/loop0 10000 mke2fs 1.42.9 (4-Feb-2014) Discarding device blocks: done Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 2512 inodes, 10000 blocks... Create a directory that will be the mount point # mkdir /mnt/here Mount the file system # mount -t ext2 /dev/loop0 /mnt/here Test it out! # ls -l /mnt/here total 12 drwx------ 2 root root 12288 Mar 30 10:56 lost+found # echo hello >/mnt/here/hello.txt # ls -l /mnt/here total 13 -rw-r--r-- 1 root root 6 Mar 30 14:31 hello.txt drwx------ 2 root root 12288 Mar 30 10:56 lost+found # cat /mnt/here/hello.txt hello /mnt/here file.img 25 26 Example: Do it recursively! Create a 1000KB file called another.img within the file.img file system # dd if=/dev/zero of=/mnt/here/another.img bs=1k count=1000 1000+0 records in 1000+0 records out Make /dev/loop1 be a loop device that points to another.img # losetup /dev/loop1 /mnt/here/another.img Create a file system # mke2fs -c /dev/loop1 1000 another.img is a file containing a file system mke2fs 1.42.9 (4-Feb-2014) Filesystem label= It exists within file.img, which is also a file OS type: Linux containing a file system Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 128 inodes, 1000 blocks... # Example: Do it recursively! Create a directory (/mnt/there) that will be the mount point # mkdir /mnt/there mount the file system # mount -t ext2 /dev/loop1 /mnt/there Test it! # echo hey! >/mnt/there/test # ls -l /mnt/there total 13 drwx------ 2 root root 12288 Mar 30 14:35 lost+found -rw-r--r-- 1 root root 5 Mar 30 14:36 test It works! another.img is a file system within file.img which is a file system on the disk # ls -l /mnt/here total 1018 -rw-r--r-- 1 root root 1024000 Mar 30 14:35 another.img -rw-r--r-- 1 root root 6 Mar 30 14:31 hello.txt drwx------ 2 root root 12288 Mar 30 10:56 lost+found /mnt/there/text: File in a file system (/mnt/there) that is a file (another.img) within a file system (/mnt/here) that is a file (file.img) within a file system (top-level) 27 28 Generic Interfaces via VFS VFS gives us a generic interface to file operations We don t need to have persistent storage underneath or even storage! procfs: process file system /proc View & control processes & kernel structures as files Origins: Plan 9 from Bell Labs Look into and control processes procfs is a file system driver Registers itself with VFS When VFS calls to request inodes as files & directories are accessed, /proc creates them from info within kernel structures. 29 30 5

procfs: process file system procfs: process info Remove the need for system calls to get info, read config parameters, and inspect processes Simplify scripting Just a few items: /proc/cpuinfo info about the cpu /proc/devices list of all character & block devices /proc/diskstats info on logical disks /proc/meminfo info on system memory /proc/net directory containing info on the network stack /proc/swaps list of swap partitions /proc/uptime time the system has been up /proc/version kernel version Plan 9 allowed remote access to /proc $ ls /proc/27325 attr cwd loginuid oom_adj smaps auxv environ maps oom_score stack cgroup exe mem pagemap stat clear_refs fd mountinfo personality statm cmdline fdinfo mounts root status comm io mountstats sched syscall coredump_filter latency net schedstat task cpuset limits numa_maps sessionid wchan 31 32 Device Names in Windows Naming Devices Windows Object Manager Owns the system namespace Manages Windows resources: devices, files, registry entries, processes, memory, Programs can look up, share, protect, and access resources Resource access is dedicated to the appropriate subsystem I/O Manager gets requests to parse & access file names When a device driver is loaded by the kernel Driver init routine registers a device name with the Object Manager \Device\CDRom0, \Device\Serial0 Win32 API requires MS-DOS device names Names also live in the Object Manager Created as symbolic links in the \?? directory 33 34 Devices in Linux & OS X In the past: Now: Devices were static; explicitly created via mknod Devices come & go devfs: special file system mounted on /dev Presents device files Device driver registers with devfs upon initialization via devfs_register Avoids having to create device special files in /dev Obsolete since Linux 2.6; still used in OS X and others udev device manager User level process; listens to uevents from the kernel via a netlink socket Detect new device initialization or removal Persistent device naming guided by files in /etc/udev/rules.d FUSE: Filesystem in Userspace File system can run as a normal user process FUSE module Conduit to pass data between VFS and user process Communication via a special file descriptor obtained by opening /dev/fuse 35 36 6

Thoughts on naming: Plan 9 Thoughts on naming: Plan 9 Plan 9 from Bell Labs Research OS built as a successor to UNIX Take the good ideas from UNIX; get rid of the bad ones The hierarchical name space was a good thing so were devices as files User-friendly: easy to inspect & understand Great for scripting Conventions work well Binaries in /bin, Libraries in /lib, include files in /include, Global conventions make life easier: no PATH Customization is good too But need alternative to PATH, LD_LIBRARY_PATH, other paths No file system just a protocol for accessing data Devices are drivers that interpret a file access protocol Console: /dev/cons Clock: /dev/time Disk: /dev/disk/1 Process 1 s memory map: /proc/1/mem Build up a name space by mounting various components Name space is not system wide but per process group Inherited across fork/exec 37 38 Thoughts on naming: Plan 9 Mounting directories & union mounts Multiple directories mounted on one place Behave like one directory comprising union of contents Order matters: acts like PATH E.g., /bin is built up of Shell scripts, architecture-specific binaries, your scripts, your other stuff A shell profile starts of by building up your workspace Window system devices per process group /dev/cons standard input, output /dev/mouse /dev/bitblt bitmap operations /dev/screen read/only image of the screen /dev/window read/only image of the current window The End 39 40 7