CSE506: Operating Systems CSE 506: Operating Systems

Similar documents
CSE 506: Opera.ng Systems. The Page Cache. Don Porter

The Page Cache 3/16/16. Logical Diagram. Background. Recap of previous lectures. The address space abstracvon. Today s Problem.

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

VFS, Continued. Don Porter CSE 506

Logical disks. Bach 2.2.1

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Virtual File System. Don Porter CSE 306

Caching and reliability

Ext3/4 file systems. Don Porter CSE 506

File Systems. CS170 Fall 2018

Virtual File System. Don Porter CSE 506

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Operating Systems Design Exam 2 Review: Fall 2010

Virtual Memory Management in Linux (Part II)

File Systems: Fundamentals

File Systems: Fundamentals

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

CS 4284 Systems Capstone

Section 7: Wait/Exit, Address Translation

CSE 120 Principles of Operating Systems

File Systems. Kartik Gopalan. Chapter 4 From Tanenbaum s Modern Operating System

Page Frame Reclaiming

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File Systems. Chapter 11, 13 OSPP

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

CS 550 Operating Systems Spring File System

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

File Systems Management and Examples

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CS370 Operating Systems

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

Address spaces and memory management

Problem Overhead File containing the path must be read, and then the path must be parsed and followed to find the actual I-node. o Might require many

File System Internals. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COMP 530: Operating Systems File Systems: Fundamentals

File System Case Studies. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Computer Systems Laboratory Sungkyunkwan University

Radix Tree, IDR APIs and their test suite. Rehas Sachdeva & Sandhya Bankar

The UNIX Time- Sharing System

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

238P: Operating Systems. Lecture 5: Address translation. Anton Burtsev January, 2018

CSE 373 OCTOBER 25 TH B-TREES

ECE 598 Advanced Operating Systems Lecture 18

CSE 451: Operating Systems Winter Page Table Management, TLBs and Other Pragmatics. Gary Kimura

Operating Systems Design Exam 2 Review: Spring 2011

Learning to Play Well With Others

CS 416: Opera-ng Systems Design March 23, 2012

File System Internals. Jo, Heeseung

COSC3330 Computer Architecture Lecture 20. Virtual Memory

CIS Operating Systems Memory Management Cache and Demand Paging. Professor Qiang Zeng Spring 2018

CSE506: Operating Systems CSE 506: Operating Systems

File Systems: Consistency Issues

CSE 530A. B+ Trees. Washington University Fall 2013

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CSE 120 Principles of Operating Systems Spring 2017

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2017

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

To understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file

Case study: ext2 FS 1

File Systems. Todays Plan. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

143A: Principles of Operating Systems. Lecture 6: Address translation. Anton Burtsev January, 2017

CS399 New Beginnings. Jonathan Walpole

Disclaimer: this is conceptually on target, but detailed implementation is likely different and/or more complex.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

CIS Operating Systems Memory Management Cache. Professor Qiang Zeng Fall 2015

a process may be swapped in and out of main memory such that it occupies different regions

Process Time. Steven M. Bellovin January 25,

Case study: ext2 FS 1

File System: Interface and Implmentation

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

Virtual Memory Management

mode uid gid atime ctime mtime size block count reference count direct blocks (12) single indirect double indirect triple indirect mode uid gid atime

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

Virtual Memory: From Address Translation to Demand Paging

Lecture 12: Demand Paging

CIS Operating Systems Memory Management Address Translation for Paging. Professor Qiang Zeng Spring 2018

Chapter 8: Memory-Management Strategies

Main Memory and the CPU Cache

ECE 571 Advanced Microprocessor-Based Design Lecture 12

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual to physical address translation

Operating Systems. File Systems. Thomas Ropars.

Learning to Play Well With Others

CSE506: Operating Systems CSE 506: Operating Systems

CSE506: Operating Systems CSE 506: Operating Systems

Chapter 11: File System Implementation

HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System

Linux Filesystems Ext2, Ext3. Nafisa Kazi

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

NFS. Don Porter CSE 506

Transcription:

CSE 506: Operating Systems Block Cache

Address Space Abstraction Given a file, which physical pages store its data? Each file inode has an address space (0 file size) So do block devices that cache data in RAM (0 dev size) So does virtual memory of a process (0 16EB in 64-bit) All page mappings are (object, offset) tuple

Types of Address Spaces Anonymous memory no file backing it e.g., the stack for a process Not shared between processes Will discuss sharing and swapping later How do we figure out virtual to physical mapping? Data structure to map virtual to physical addresses Some OSes (e.g, Linux) just walk the page tables File contents backed by storage device Kernel can map file pages into application

Logical View? Foo.txt inode?? Process A? Process B Hello! Disk Process C

Logical View Hello! Disk? FDs are processspecific Foo.txt inode File Descriptor Table Process A Process B Process C

Tracking Inode Pages What data structure to use for an inode? No page tables for files Ex: What page stores the first 4k of file foo What data structure to use? Hint: Files can be small or very, very large

A space-optimized trie The Radix Tree Trie without key in each node Traversal of parent(s) builds a prefix Tree with branching factor k > 2 is fast Faster lookup for large files (esp. with tricks) Assume upper bound file size when building Can rebuild later if we are wrong Ex: Max size is 256k, branching factor (k) = 64 256k / 4k pages = 64 pages Need a radix tree of height 1 to represent these pages

Tree of height 1 Root has 64 slots, can be null or a pointer to a page Lookup address X: Shift off low 12 bits (offset within page) Use next 6 bits as an index into these slots (2 6 = 64) If pointer non-null, go to the child node (page) If null, page doesn t exist

Shift off low 12 bits Tree of height n At each child, shift off 6 bits from middle (starting at 6 * (distance to the bottom 1) bits) To find which of the 64 potential children to go to Fixed height to figure out where to stop (use bits for offset) Observations: Key at each node implicit based on position in tree Lookup time constant in height of tree In a general-purpose radix tree, may have to check all k children Higher lookup cost

Fixed heights If the file size grows beyond max height Grow the tree Add another root, previous tree becomes first child Scaling in height: 1: 2 (6 1)+12 = 256 KB 2: 2 (6 2)+12 = 16 MB 3: 2 (6 3)+12 = 1 GB 4: 2 (6 4)+12 = 16 GB 5: 2 (6 5)+12 = 4 TB

From Understanding the Linux Kernel radix_tree_root rnode height = 2 radix_tree_root rnode radix_tree_node count = 2 0 63... height = 1 radix_tree_node count = 2 0 63... slots[0] radix_tree_node count = 2 0 63... slots[2] radix_tree_node count = 1 0 63... slots[0] slots[4] slots[0] slots[4] slots[3] page page page page page index = 0 index = 4 index = 0 index = 4 index = 131 (a) radix tree of height 1 (b) radix tree of height 2

Block Cache (File Address Spaces) Cached inodes (files) have a radix tree Points to physical pages Tree is sparse: pages not in memory are missing Radix tree also supports tags A tree node is tagged if at least one child also has the tag Example: Tag a file s page dirty Must tag each parent in the radix tree as dirty When finished writing page back Check all siblings» If none dirty, clear the parent s dirty tag

Address Space Logical View Foo.txt inode Process A Hello! Radix Tree Process B Disk Process C

VFS does a read() Reading Block Cache Looks up in inode s radix tree If found in block cache, can return data immediately If data not in block cache Call s FS-specific read() operation Schedules getting data from disk Puts process to sleep until disk interrupt When disk read is done Populate radix tree with pointer to page Wake up process Repeat VFS read attempt

Dirty Pages OSes do not write file updates to disk immediately Pages might be written again soon Writes can be done later, when convenient OS instead tracks dirty pages Ensures that write back isn t delayed too long Lest data be lost in a crash Application can force immediate write back sync() system calls (and some open/mmap options)

Sync System Calls sync() Flush all dirty buffers to disk fsync(fd) Flush fd s dirty buffers to disk Including inode fdatasync(fd) Flush fd s dirty buffers to disk Don t bother with the inode

How to implement sync? Goal: keep overheads of finding dirty blocks low A naïve scan of all pages would work, but expensive Lots of clean pages Idea: keep track of dirty data to minimize overheads A bit of extra work on the write path, of course

How to implement sync? Background: Each file system has a super block All super blocks in a list Each in-memory super block keeps list of dirty inodes Inodes and superblocks both marked dirty upon use

Dirty list SB / FS Organization SB /floppy Dirty list of inodes SB /d1 One Superblock per FS inode Inodes and radix nodes/pages marked dirty separately

Simple Dirty Traversal for each s in superblock list: if (s->dirty) writeback s for i in inode list: if (i->dirty) writeback i if (i->radix_root->dirty) : // Recursively traverse tree, writing // dirty pages and clearing dirty flag

Asynchronous Flushing Kernel thread(s): pdflush Task that runs in the kernel s address space 2-8 threads, depending on how busy/idle threads are When pdflushruns Kernel maintains a total number of dirty pages Heuristics/admin configures a target dirty ratio (say 10%) When pdflush wakes up Figures out how many dirty pages are above target ratio Determines target number of pages to write back Writes back pages until it meets its goal or can t write more back (Some pages may be locked, just skip those)

Writeback to Stable Storage We can find dirty pages in physical memory How does kernel know where on disk to write them? And which disk for that matter? Superblock tracks device Inode tracks mapping from file offset to LBA Note: this is FS s inode, not VFS s inode Probably uses something like a radix tree