C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Similar documents
Distributed File Systems

What is a file system

Lecture 7: Distributed File Systems

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

DISTRIBUTED FILE SYSTEMS & NFS

CSE 333 Lecture 9 - storage

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

Main Points. File systems. Storage hardware characteristics. File system usage patterns. Useful abstractions on top of physical devices

GFS: The Google File System

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

OPERATING SYSTEM. Chapter 12: File System Implementation

EECS 482 Introduction to Operating Systems

Chapter 11: Implementing File Systems

416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016

Chapter 11: Implementing File

Chapter 12: File System Implementation

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Chapter 11: Implementing File Systems. Operating System Concepts 9 9h Edition

Distributed File Systems. Directory Hierarchy. Transfer Model

Chapter 10: File System Implementation

OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

Chapter 12: File System Implementation. Operating System Concepts 9 th Edition

Chapter 12: File System Implementation

Distributed File Systems. CS432: Distributed Systems Spring 2017

Distributed Systems - III

V. File System. SGG9: chapter 11. Files, directories, sharing FS layers, partitions, allocations, free space. TDIU11: Operating Systems

Chapter 11: File System Implementation

Module 7 File Systems & Replication CS755! 7-1!

Chapter 11: File System Implementation

Chapter 11: Implementing File-Systems

CSE 451: Operating Systems Spring Module 12 Secondary Storage

DFS Case Studies, Part 1

Chapter 11: Implementing File Systems

Lecture 18: Reliable Storage

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Week 12: File System Implementation

File-System Structure

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Lecture 16: Storage Devices

CS307: Operating Systems

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Paxos Phase 2. Paxos Phase 1. Google Chubby. Paxos Phase 3 C 1

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

GFS: The Google File System. Dr. Yingwu Zhu

Secondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

CS510 Operating System Foundations. Jonathan Walpole

Chapter 7: File-System

FILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24

COS 318: Operating Systems. Journaling, NFS and WAFL

CA485 Ray Walshe Google File System

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

CS5460: Operating Systems Lecture 20: File System Reliability

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Today: Distributed File Systems

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

Computer System Management - File Systems

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Distributed Filesystem

Distributed File Systems II

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

CLOUD-SCALE FILE SYSTEMS

Frequently asked questions from the previous class survey

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

Storage. Hwansoo Han

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

System and Algorithmic Adaptation for Flash

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Chapter 12: File System Implementation

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Operating Systems. File Systems. Thomas Ropars.

Chapter 11: Implementing File Systems

ICS Principles of Operating Systems

NFS. Don Porter CSE 506

Cloud Computing CS

Main Points. File systems. Storage hardware characteris7cs. File system usage Useful abstrac7ons on top of physical devices

COSC 6385 Computer Architecture. Storage Systems

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/15

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

The Google File System

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

CSE 486/586: Distributed Systems

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

CS3600 SYSTEMS AND NETWORKS

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

Today: Distributed File Systems. Naming and Transparency

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

EECS 482 Introduction to Operating Systems

Long-term Information Storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple proces

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Transcription:

Recap CSE 486/586 Distributed Systems Distributed File Systems Optimistic quorum Distributed transactions with replication One copy serializability Primary copy replication Read-one/write-all replication Active copies replication Steve Ko Computer Sciences and Engineering University at Buffalo 2 Local File Systems File systems provides file management. Name space API for file operations (create, delete, open, close, read, write, append, truncate, etc.) Physical storage management & allocation (e.g., block storage) Security and protection (access control) Name space is usually hierarchical. Files and directories File systems are mounted. Different file systems can be in the same name space. Traditional Distributed File Systems Goal: emulate local file system behaviors Files not replicated No hard performance guarantee But, Files located remotely on servers Multiple clients access the servers Why? Users with multiple machines Data sharing for multiple users Consolidated data management (e.g., in an enterprise) 3 4 Requirements Requirements Transparency: a distributed file system should appear as if it s a local file system Access transparency: it should support the same set of operations, i.e., a program that works for a local file system should work for a DFS. (File) Location transparency: all clients should see the same name space. Migration transparency: if files move to another server, it shouldn t be visible to users. Performance transparency: it should provide reasonably consistent performance. Scaling transparency: it should be able to scale incrementally by adding more servers. Concurrent updates should be supported. Fault tolerance: servers may crash, msgs can be lost, etc. Consistency needs to be maintained. Security: access-control for files & authentication of users 5 6 C 1

File Server Architecture program Client computer program Client module Server computer Directory service Flat file service Components Directory service Meta data management Creates and updates directories (hierarchical file structures) Provides mappings between user names of files and the unique file ids in the flat file structure. Flat file service Actual data management File operations (create, delete, read, write, access control, etc.) These can be independently distributed. E.g., centralized directory service & distributed flat file service 7 8 Sun NFS Client Computer" Program" Virtual File UNIX File Other File Program" NFS Client UNIX Kernel" NFS Protocol! Server Computer" Virtual File NFS Server UNIX File VFS A translation layer that makes file systems pluggable & co-exist E.g., NFS, EXT2, EXT3, ZFS, etc. Keeps track of file systems that are available locally and remotely. Passes requests to appropriate local or remote file systems Distinguishes between local and remote files. 9 10 NFS Mount Service NFS Basic Operations /" /" /" org" usr" nfs" people" student" staff" users" mth" john" bob"..." pet" jim" bob" " Server 1! Client! Server 2! Client Transfers blocks of files to and from server via RPC Server Provides a conventional RPC interface at a well-known port on each host Stores files and directories Problems? Performance Failures Remote Mount" Each server keeps a record of local files available for remote mounting. Clients use a mount command for remote mounting, providing name mappings" 11 12 C 2

Improving Performance Let s cache! Server-side Typically done by OS & disks anyway A disk usually has a cache built-in. OS caches file pages, directories, and file attributes that have been read from the disk in a main memory buffer cache. Client-side On accessing data, cache it locally. What s a typical problem with caching? Consistency: cached data can become stale. (General) Caching Strategies Read-ahead (prefetch) Read strategy Anticipates read accesses and fetches the pages following those that have most recently been read. Delayed-write strategy New writes stored locally. Periodically or when another client accesses, send back the updates to the server -through strategy s go all the way to the server s disk This is not an exhaustive list! 13 14 NFS Client-Side Caching -through, but only at close() Not every single write Helps performance Other clients periodically check if there s any new write (next slide). Multiple writers No guarantee Could be any combination of writes Leads to inconsistency Validation A client checks with the server about cached blocks. Each block has a timestamp. If the remote block is new, then the client invalidates the local cached block. Always invalidate after some period of time 3 seconds for files 30 seconds for directories Written blocks are marked as dirty. 15 16 Failures Failures Two design choices: stateful & stateless Stateful The server maintains all client information (which file, which block of the file, the offset within the block, file lock, etc.) Good for the client-side process (just send requests!) Becomes almost like a local file system (e.g., locking is easy to implement) Problem? Server crash à lose the client state Becomes complicated to deal with failures Stateless Clients maintain their own information (which file, which block of the file, the offset within the block, etc.) The server does not know anything about what a client does. Each request contains complete information (file name, offset, etc.) Easier to deal with server crashes (nothing to lose!) NFS s choice Problem? Locking becomes difficult. 17 18 C 3

NFS Client-side caching for improved performance -through at close() Consistency issue Stateless server Easier to deal with failures Locking is not supported (later versions of NFS support locking though) Simple design Led to simple implementation, acceptable performance, easier maintenance, etc. Ultimately led to its popularity CSE 486/586 Administrivia Anonymous feedback form still available. Please come talk to me! 19 20 New Trends in Distributed Storage Power Consumption Geo-replication: replication with multiple data centers Latency: serving nearby clients Fault-tolerance: disaster recovery Power efficiency: power-efficient storage Going green! Data centers consume lots of power ebay: 16K servers, ~0.6 * 10^5 MWh, ~$3.7M Akamai: 40K servers, ~1.7 * 10^5 MWh, ~$10M Rackspace: 50K servers, ~2 * 10^5 MWh, ~$12M Microsoft: > 200K servers, > 6 * 10^5 MWh, > $36M Google: > 500K servers, > 6.3 * 10^5 MWh, > $38M USA (2006): 10.9M servers, 610 * 10^5 MWh, $4.5B Year-to-year: 1.7%~2.2% of total electricity use in US http://ccr.sigcomm.org/online/files/p123.pdf Question: can we reduce the energy footprint of a distributed storage while preserving performance? 21 22 One Extreme Design Point: FAWN Embedded CPUs Fast Array of Wimpy Nodes Andersen et al. (CMU & Intel Labs) Coupling of low-power, efficient embedded CPUs with flash storage Embedded CPUs are more power efficient. Flash is faster than disks, cheaper than memory, consumes less power than either. Performance target Not just queries (requests) per second Queries per second per Watt (queries per Joule) Observation: many modern server storage workloads do not need fast CPUs Not much computation necessary, mostly just small I/O I.e., mostly I/O bound, not CPU bound E.g., 1 KB values for thumbnail images, 100s of bytes for wall posts, twitter messages, etc. (Rough) Comparison Server-class CPUs (superscalar quad-core): 100M instructions/joule Embedded CPUs (low-frequency, single-core): 1B instructions/joule 23 24 C 4

Unlike magnetic disks, there s no mechanical part Disks have motors that rotate disks & arms that move and read. Efficient I/O Less than 1 Watt consumption Magnetic disks over 10 Watt Fast random reads << 1 ms Up to 175 times faster than random reads on magnetic disks The smallest unit of operation (read/write) is a page Typically 4KB Initially all 1 A write involves setting some bits to 0 A write is fundamentally constrained. Individual bits cannot be reset to 1. Requires an erasure operation that resets all bits to 1. This erasure is done over a large block (e.g., 128KB), i.e., over multiple pages together. Typical latency: 1.5 ms Blocks wear out for each erasure. 100K cycles or 10K cycles depending on the technology. 25 26 Early design limitations Slow write: a write to a random 4 KB page à the entire 128 KB erase block to be erased and rewritten à write performance suffers Uneven wear: imbalanced writes result in uneven wear across the device Any idea to solve this? Recent designs: log-based The disk exposes a logical structure of pages & blocks (called Flash Translation Layer). Internally maintains remapping of blocks. For rewrite of a random 4KB page: Read the surrounding entire 128KB erasure block into the disk s internal buffer Update the 4KB page in the disk s internal buffer the entire block to a new or previously erased physical block Additionally, carefully choose this new physical block to minimize uneven wear 27 28 FAWN Design E.g. sequential write till block 2, then random read of a page in block 1 Block 0 Block 1 Block 2 Block 0 Block Free1 Block 2 Block 1 1) Read to buffer 2) Update the page 3) to a different block location 4) Garbage collect the old block Wimpy nodes based on PCEngine Alix 3c2 Commonly used for thin clients, network firewalls, wireless routers, etc. Single-core 500 MHz AMD Geode LX 256MB RAM at 400 MHz 100 MBps Ethernet 4 GB Sandisk CompactFlash Power consumption 3W when idle 6W under heady load Logical Structure Physical Structure 29 30 C 5

Summary NSF Caching with write-through policy at close() Stateless server One power efficient design: FAWN Embedded CPUs & Flash storage Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta (UIUC). 31 32 C 6