File Systems, Course Wrap Up

Similar documents
Course Wrap-Up CSE 351 Spring

CSE 451: Operating Systems Winter Module 15 File Systems

Motivation: I/O is Important. CS 537 Lecture 12 File System Interface. File systems

We made it! Java: Assembly language: OS: Machine code: Computer system:

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems

Introduction to Computer Systems

Structs & Alignment. CSE 351 Autumn Instructor: Justin Hsia

Introduction to Computer Systems

Workloads. CS 537 Lecture 16 File Systems Internals. Goals. Allocation Strategies. Michael Swift

Introduction to Computer Systems

Structs and Alignment

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

CS 318 Principles of Operating Systems

Memory, Data, & Addressing I

Virtual Memory I. CSE 351 Spring Instructor: Ruth Anderson

Chris Riesbeck, Fall Introduction to Computer Systems

Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

L14: Structs and Alignment. Structs and Alignment. CSE 351 Spring Instructor: Ruth Anderson

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Carnegie Mellon. 16 th Lecture, Mar. 20, Instructors: Todd C. Mowry & Anthony Rowe

Computer Systems CSE 410 Autumn Disks and File Systems

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 23

Virtual Memory I. CSE 351 Winter Instructor: Mark Wyse

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

Computer Systems Architecture I. CSE 560M Lecture 3 Prof. Patrick Crowley

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2015 Lecture 23

The Stack & Procedures

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

18-447: Computer Architecture Lecture 16: Virtual Memory

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Computer Systems A Programmer s Perspective 1 (Beta Draft)

CS 261 Fall Mike Lam, Professor. Virtual Memory

CS162 Operating Systems and Systems Programming Lecture 12. Address Translation. Page 1

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Memory Hierarchy. Mehran Rezaei

CSE 333 Lecture 9 - storage

Principles of Operating Systems

Instruction Set Architectures

CS3600 SYSTEMS AND NETWORKS

Digital Forensics Lecture 3 - Reverse Engineering

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

CSE 451: Operating Systems Winter Processes. Gary Kimura

Virtual Memory: Concepts

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

CS399 New Beginnings. Jonathan Walpole

Processes and Virtual Memory Concepts

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

virtual memory. March 23, Levels in Memory Hierarchy. DRAM vs. SRAM as a Cache. Page 1. Motivation #1: DRAM a Cache for Disk

Address spaces and memory management

CS2028 -UNIX INTERNALS

Memory Management Topics. CS 537 Lecture 11 Memory. Virtualizing Resources

CSE 153 Design of Operating Systems

Assembly Programming IV

Chapter 2. OS Overview

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory

New-School Machine Structures. Overarching Theme for Today. Agenda. Review: Memory Management. The Problem 8/1/2011

Floating Point II, x86 64 Intro

Practical Malware Analysis

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

Another View of the Memory Hierarchy. Lecture #25 Virtual Memory I Memory Hierarchy Requirements. Memory Hierarchy Requirements

Operating Systems CMPSC 473. File System Implementation April 1, Lecture 19 Instructor: Trent Jaeger

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING LECTURE 27, FALL 2012

CSE 120 Principles of Operating Systems

CS 318 Principles of Operating Systems

CPSC 213. Introduction to Computer Systems. About the Course. Course Policies. Reading. Introduction. Unit 0

Final Exam. 11 May 2018, 120 minutes, 26 questions, 100 points

Assembly Programming III

14 May 2012 Virtual Memory. Definition: A process is an instance of a running program

Recap: Memory Management

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Instruction Set Architectures

Slides for Lecture 6

CS 3733 Operating Systems:

Princeton University COS 217: Introduction to Programming Systems Fall 2017 Final Exam Preparation

John Wawrzynek & Nick Weaver

Memory Hierarchy Requirements. Three Advantages of Virtual Memory

Virtual Memory Oct. 29, 2002

Processes and Threads

! What is main memory? ! What is static and dynamic allocation? ! What is segmentation? Maria Hybinette, UGA. High Address (0x7fffffff) !

Lecture 19: File System Implementation. Mythili Vutukuru IIT Bombay

Lecture 4: Memory Management & The Programming Interface

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

Goals of memory management

Virtual Memory II. CSE 351 Autumn Instructor: Justin Hsia

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

CSE 560 Computer Systems Architecture

Chapter 8: Memory-Management Strategies

CISC 360. Virtual Memory Dec. 4, 2008

CS 5460/6460 Operating Systems

Random-Access Memory (RAM) Systemprogrammering 2007 Föreläsning 4 Virtual Memory. Locality. The CPU-Memory Gap. Topics

What s in a process?

15 Sharing Main Memory Segmentation and Paging

Transcription:

File Systems, Course Wrap Up CSE 410 Winter 2017 Instructor: Justin Hsia Slides adapted from CSE451 material by Gribble, Lazowska, Levy, and Zahorjan Teaching Assistants: Kathryn Chan, Kevin Bi, Ryan Wong, Waylon Huang, Xinyu Sui

Administrivia Course evaluation: https://uw.iasystem.org/survey/172382 Final Exam: Tue, Mar. 14 @ 2:30pm in MGH 241 Review Session: Sun, Mar. 12 @ 1:30pm in SAV 264 Cumulative (midterm clobber policy applies) TWO double sided handwritten 8.5 11 cheat sheets Recommended that you reuse/remake your midterm cheat sheet 2

Topics Secondary Storage Disks File Systems Files, directories, and disk blocks 3

Interface Layers App. Code Std. Runtime Library OS Disk Procedure Calls Device-type Dependent Commands Whatever Syscalls 4

Primary Roles of the OS (File System) 1) Hide hardware specific interface 2) Allocate disk blocks 3) Check permissions 4) Understand directory file structure 5) Maintain metadata 6) Performance 7) Flexibility OS Disk / root etc 5

File System Concept The implementation of the abstraction for secondary storage Specific chunks of memory put together form a file Logical organization of files into directories and the directory hierarchy Sharing of data between processes, people and machines Access control, consistency, 6

Files A file is a collection of data with some properties Attributes: size, owner, last read/write time, protection, On Linux, use ls -l to view Files may also have types 1) Understood by file system e.g. device, directory, symbolic link 2) Understood by other parts of the OS or by runtime libraries e.g. executable, dll, source code, object code, text file, Type can be encoded in the file s name or contents Windows encodes in file extension (e.g..exe,.dll,.jpg,.mp3) UNIX encodes within file (e.g. magic numbers or initial characters) 7

Basic Operations Unix create(name) open(name, mode) read(fd, buf, len) write(fd, buf, len) sync(fd) seek(fd, pos) close(fd) unlink(name) rename(old, new) Windows CreateFile(name, CREATE) CreateFile(name, OPEN) ReadFile(handle, ) WriteFile(handle, ) FlushFileBuffers(handle, ) SetFilePointer(handle, ) CloseHandle(handle, ) DeleteFile(name) CopyFile(name) MoveFile(name) 8

Directories Directories provide: A way for users to organize their files A convenient file name space for both users and file systems Most file systems support multi level directories Naming hierarchies (/, /usr, /usr/local, /usr/local/bin, ) Most file systems support the notion of current directory Absolute: fully qualified starting from root of file system bash$ cd /usr/local Relative: specified with respect to current directory bash$ cd /usr/local (absolute) bash$ cd bin (relative, equivalent to cd /usr/local/bin) 9

Directory Internals A directory is typically just a file that happens to contain special metadata Directory = list of (name of file, file attributes) Attributes include such things as: Size, protection, location on disk, creation time, access time, The directory list is usually unordered (effectively random) In UNIX, the ls command sorts the results for you 10

Path Name Translation Let s say you want to open /one/two/three fd = open( /one/two/three, O_RDWR); What goes on inside the file system? 1) Open directory / (well known, can always find) 2) Search the directory for one and get location 3) Open directory one, search for two and get location 4) Open directory two, search for three and get location 5) Open file three Permissions are checked at each step OS caches prefix lookups to enhance performance 11

File Protection File system must implement some kind of protection Control who can access a file (user) Control how they can access it (e.g. read, write, or exec) More generally: Generalize files to objects (the what ) Generalize users to principals (the who, user or program) Generalize read/write to actions (the how, or operations) A protection system dictates whether a given action performed by a given principal on a given object should be allowed 12

UNIX: inodes Unique representation of each file Format: User number Group number Protection bits Times (file last read, file last written, inode last written) File code: specifies if the i node represents a directory, an ordinary user file, or a special file (typically an I/O device) Size: length of file in bytes Block list: locates contents of file (in the file contents area) Link count: number of directories referencing this i node 13

UNIX: The Tree File System A directory is a flat file of fixed size entries Each entry consists of an inode number and a file name inode number 152. 18.. File name 216 my_file 4 another_file 93 oh_my_god 144 a_directory 14

UNIX 7: Block List of the inode Points to blocks in the file contents area Must be able to represent very small and very large files How? Each inode contains up to 13 block pointers First 10 are direct pointers (pointers to 512B blocks of file data) Then, single, double, and triple indirect pointers 0 1 10 11 12 15

Putting It All Together The file system is just a huge data structure superblock inode for / directory / (table of entries) inode for usr/ inode for bigfile.bin directory usr/ (table of entries) inode for var/ directory var/ (table of entries) inode free list file block free list indirection block data blocks indirection block data blocks Indirection block data blocks 16

File System Consistency Both inodes and file blocks are cached in memory The sync command forces memory resident disk information to be written to disk System does a sync every few seconds A crash or power failure between syncs can leave an inconsistent disk You could reduce the frequency of problems by reducing caching, but performance would suffer bigtime 17

Course Wrap Up End to end Review What happens after you write your source code? How code becomes a program (Lecture 28) How your computer executes your code Victory lap and high level concepts ( points) More useful for 5 years from now than next week s final Question time 18

C: The Low Level High Level Language C is a hands off language that exposes more of hardware (especially memory) Weakly typed language that stresses data as bits Anything can be represented with a number! Unconstrained pointers can hold address of anything And no bounds checking buffer overflow possible! Efficient by leaving everything up to the programmer C is good for two things: being beautiful and creating catastrophic 0days in memory management. (https://medium.com/message/everything is broken 81e5f33a24e1)

C Data Types C Primitive types Fixed sizes and alignments Characters (char), Integers (short, int, long), Floating Point (float, double) C Data Structures Arrays contiguous chunks of memory Multidimensional arrays = still one continuous chunk, but row major Multi level arrays = array of pointers to other arrays Structs structured group of variables Struct fields are ordered according to declaration order Internal fragmentation: space between members to satisfy member alignment requirements (aligned for each primitive element) External fragmentation: space after last member to satisfy overall struct alignment requirement (largest primitive member)

C and Memory Using C allowed us to examine how we store and access data in memory Endianness (only applies to memory) Is the first byte (lowest address) the least significant (little endian) or most significant (big endian) of your data? Array indices and struct fields result in calculating proper addresses to access Consequences of your code: Affects performance (locality) Affects security But to understand these effects better, we had to dive deeper

How Code Becomes a Program text text C source code Compiler (gcc Og -S) Assembly files Assembler (gcc -c or as) binary Object files Linker (gcc or ld) Static libraries binary Executable program Loader (the OS) Hardware 22

Instruction Set Architecture Source code Different applications or algorithms C Language Program A Compiler Perform optimizations, generate instructions GCC Architecture Instruction set x86 64 Hardware Different implementations Intel Pentium 4 Intel Core 2 Intel Core i7 Program B Your program Clang CISC RISC ARMv8 (AArch64/A64) AMD Opteron AMD Athlon ARM Cortex A53 Apple A7 23

Assembly Programmer s View PC CPU Registers Condition Codes Addresses Data Instructions Memory Stack Heap Data Programmer visible state PC: the Program Counter (%rip in x86 64) Address of next instruction Named registers Together in register file Heavily used program data Condition codes Store status information about most recent arithmetic operation Code Memory Byte addressable array Huge virtual address space Private, all to yourself Used for conditional branching 24

Program s View CPU Memory %rip Registers Condition Codes 2 N 1 High addresses Stack local variables; procedure context Dynamic Data (Heap) variables allocated with new or malloc Static Data static variables (global variables in C) Literals Large constants (e.g., example ) Low addresses 0 Instructions 25

Program s View Instructions Data movement mov, movz, movz push, pop Arithmetic add, sub, imul Control flow cmp, test jmp, je, jgt,... call, ret 2 N 1 High addresses Operand types Literal: $8 Register: %rdi, %al Memory: D(Rb,Ri,S) = D+Rb+Ri*S lea: not a memory access! Low addresses 0 Memory Stack Dynamic Data (Heap) Static Data Literals Instructions local variables; procedure context variables allocated with new or malloc static variables (global variables in C) Large constants (e.g., example ) 26

Program s View Procedures Essential abstraction Recursion Stack discipline Stack frame per call Local variables Calling convention How to pass arguments 2 N 1 High addresses Diane s Silk Dress Costs $89 How to return data Return address Caller saved / callee saved registers Memory Stack Dynamic Data (Heap) Static Data Literals local variables; procedure context variables allocated with new or malloc static variables (global variables in C) Large constants (e.g., example ) Low addresses 0 Instructions 27

But remember it s all an illusion! CPU Memory %rip Registers Condition Codes 2 N 1 High addresses Stack local variables; procedure context Context switches Don t really have CPU to yourself Dynamic Data (Heap) variables allocated with new or malloc Virtual Memory Don t really have 2 64 bytes of memory all to yourself Allows for indirection (remap physical pages, sharing ) Low addresses 0 Static Data Literals Instructions static variables (global variables in C) Large constants (e.g., example ) 28

But remember it s all an illusion! %rip %rip CPU Process 3 CPU Process 2 Registers Condition Codes Registers Condition Codes 2 N 1 High addresses 2 N 1 High addresses Memory Stack Dynamic Data (Heap) Dynamic DataStatic Data (Heap) Literals Static Data Instructions Low Literals 0 addresses Memory Stack fork Creates copy of the process execve Replace with new program wait Wait for child to die (to reap it and prevent zombies) Low addresses 0 Instructions %rip CPU Process 1 Registers Condition Codes 2 N 1 High addresses Low addresses 0 Memory Stack Dynamic Data (Heap) Static Data Literals Instructions Hardware 29

Virtual Memory CPU Chip TLB 2 PTE VPN 3 CPU VA 1 MMU PA 4 Cache/ Memory Data 5 Address Translation Every memory access must first be converted from virtual to physical Indirection: just change the address mapping when switching processes Luckily, TLB (and page size) makes it pretty fast 30

But Memory is Also a Lie! Memory CPU %rip Registers Condition Codes L1 Cache L2 Cache L3 Cache Main Memory DRAM Illusion of one flat array of bytes But caches invisibly make accesses to physical addresses faster! Caches Associativity tradeoff with miss rate and access time Block size tradeoff with spatial and temporal locality Cache size tradeoff with miss rate and cost 31

Memory Hierarchy <1 ns registers 5 10 s Smaller, faster, costlier per byte 5 10 ns 1 ns on chip L1 cache (SRAM) off chip L2 cache (SRAM) 1 2 min Larger, slower, cheaper per byte 100 ns 150,000 ns SSD main memory (DRAM) local secondary storage 15 30 min 31 days 10,000,000 ns (10 ms) Disk (local disks) 66 months = 1.3 years 1 150 ms remote secondary storage (distributed file systems, web servers) 1 15 years 32

Operating Systems Applications OS Hardware The OS is everything you don t need to write in order to run your application OS Structure determined by privilege levels of modules Hybrid between monolithic and microkernel structures A threads is a sequential execution stream within a process The unit of scheduling different/less state information than process The file system provides an abstraction of related pieces of data and an interface to data on secondary storage 33

Victory Lap A victory lap is an extra trip around the track By the exhausted victors (that s us) Review course goals The following slides are copied directly from Lecture 1 They should make much more sense now!

Little Theme 1: Representation All digital systems represent everything as 0s and 1s The 0 and 1 are really two different voltage ranges in the wires Or magnetic positions on a disc, or hole depths on a DVD, or even DNA Everything includes: Numbers integers and floating point Characters the building blocks of strings Instructions the directives to the CPU that make up a program Pointers addresses of data objects stored away in memory Encodings are stored throughout a computer system In registers, caches, memories, disks, etc. They all need addresses (a way to locate) Find a new place to put a new item Reclaim the place in memory when data no longer needed 35

Little Theme 2: Translation There is a big gap between how we think about programs and data and the 0s and 1s of computers Need languages to describe what we mean These languages need to be translated one level at a time We know Java as a programming language Have to work our way down to the 0s and 1s of computers Try not to lose anything in translation! We ll encounter C language, assembly language, and machine code (for the x86 family of CPU architectures) 36

Little Theme 3: Control Flow How do computers orchestrate everything they are doing? Within one program: How do we implement if/else, loops, switches? What do we have to keep track of when we call a procedure, and then another, and then another, and so on? How do we know what to do upon return? Across programs and operating systems: Multiple user programs Operating system has to orchestrate them all Each gets a share of computing cycles They may need to share system resources (memory, I/O, disks) Yielding and taking control of the processor Voluntary or by force? 37

Course Perspective CSE410 will make you a better programmer Purpose is to show how software really works Understanding the underlying system makes you more effective Better debugging Better basis for evaluating performance How multiple activities work in concert (e.g. OS and user programs) Not just a course for hardware enthusiasts! What every programmer needs to know (plus many more details) Stuff everybody learns and uses and forgets not knowing CSE410 presents a world view that will empower you The intellectual and software tools to understand the trillions+ of 1s and 0s that are flying around when your program runs 38

Can You Now Explain These to a Friend? Which of the following did you actually find the most interesting to learn about? (http://pollev.com/justinh) a) What is a GFLOP and why is it used in computer benchmarks? b) How and why does running many programs for a long time eat into your memory (RAM)? c) What is stack overflow and how does it happen? d) Why does your computer slow down when you run out of disk space? e) What was the flaw behind the original Internet worm and the Heartbleed bug? f) What is the meaning behind the different CPU specifications? (e.g. # of cores, # and size of cache, supported memory types) 39

The First Comic http://xkcd.com/676/

Thanks for a great quarter! Huge thanks to your awesome TAs! Kathryn Kevin Ryan Xinyu Waylon Thanks to course content creators: Randy Bryant David O Halloran Hal Perkins Best of luck in the future! Always possible to keep learning outside of courses

Ask Me Anything