Today s Objec2ves. AWS/MR Review Final Projects Distributed File Systems. Nov 3, 2017 Sprenkle - CSCI325

Similar documents
Introduction. Distributed file system. File system modules. UNIX file system operations. File attribute record structure

Lecture 7: Distributed File Systems

Distributed File Systems. File Systems

Main Points. File systems. Storage hardware characteris7cs. File system usage Useful abstrac7ons on top of physical devices

Chapter 8: Distributed File Systems. Introduction File Service Architecture Sun Network File System The Andrew File System Recent advances Summary

Module 7 File Systems & Replication CS755! 7-1!

Introduction. Chapter 8: Distributed File Systems

Today s Objec4ves. Data Center. Virtualiza4on Cloud Compu4ng Amazon Web Services. What did you think? 10/23/17. Oct 23, 2017 Sprenkle - CSCI325

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Distributed File Systems

CS 425 / ECE 428 Distributed Systems Fall Indranil Gupta (Indy) Nov 28, 2017 Lecture 25: Distributed File Systems All slides IG

Chap 12: File System Implementa4on

Distributed File Systems. CS432: Distributed Systems Spring 2017

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Distributed file systems

Distributed File Systems. Distributed Computing Systems. Outline. Concepts of Distributed File System. Concurrent Updates. Transparency 2/12/2016

DFS Case Studies, Part 2. The Andrew File System (from CMU)

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Chapter 8 Distributed File Systems

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Lecture 14: Distributed File Systems. Contents. Basic File Service Architecture. CDK: Chapter 8 TVS: Chapter 11

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

Chapter 12 Distributed File Systems. Copyright 2015 Prof. Amr El-Kadi

Cloud Computing CS

C 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.

Alterna(ve Architectures

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

Distributed File Systems

CSE 486/586: Distributed Systems

Distributed File Systems

Distributed File Systems. Case Studies: Sprite Coda

Today s Objec2ves. Kerberos. Kerberos Peer To Peer Overlay Networks Final Projects

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Today: Distributed File Systems

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Opera&ng Systems: Principles and Prac&ce. Tom Anderson

CS 4284 Systems Capstone

Virtualization. Introduction. Why we interested? 11/28/15. Virtualiza5on provide an abstract environment to run applica5ons.

CSE 451: Operating Systems. Sec$on 8 Project 2b wrap- up, ext2, and Project 3

The LOCUS Distributed Operating System

Introduction to Cloud Computing

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

Architectural Support. Processes. OS Structure. Threads. Scheduling. CSE 451: Operating Systems Spring Module 28 Course Review

Today: Distributed File Systems. Naming and Transparency

Today: Distributed File Systems!

DFS Case Studies, Part 1

Distributed Systems INF Michael Welzl

CSE Opera*ng System Principles

Distributed File Systems. Directory Hierarchy. Transfer Model

Engineering Data Intensive Scalable Systems

Introduction to Distributed Computing

File System. Computadors Grau en Ciència i Enginyeria de Dades. Xavier Verdú, Xavier Martorell

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

The Kernel Abstrac/on

CA485 Ray Walshe Google File System

Main Points. Address Transla+on Concept. Flexible Address Transla+on. Efficient Address Transla+on

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Lecture 2: Processes. CSE 120: Principles of Opera9ng Systems. UC San Diego: Summer Session I, 2009 Frank Uyeda

W1005 Intro to CS and Programming in MATLAB. Brief History of Compu?ng. Fall 2014 Instructor: Ilia Vovsha. hip://

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop

Network File Systems

Toward An Integrated Cluster File System

Extreme computing Infrastructure


CSE 153 Design of Operating Systems

Operating Systems 2010/2011

Storage and File Hierarchy

COS 318: Operating Systems

Cloud Computing CS

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Opera&ng Systems ECE344

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

h7ps://bit.ly/citustutorial

Fifty Years of Operating Systems. SIGOPS SOSP History Day October 4, 2015

OS History and OS Structures

CSE 120: Principles of Operating Systems. Lecture 10. File Systems. November 6, Prof. Joe Pasquale

Background: I/O Concurrency

Networks and Opera/ng Systems Chapter 19: I/O Subsystems

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 18: Naming, Directories, and File Caching

Main Points. File systems. Storage hardware characteris7cs. File system usage Useful abstrac7ons on top of physical devices

Distributed Systems CS6421

File systems CS 241. May 2, University of Illinois

we are here Page 1 Recall: How do we Hide I/O Latency? I/O & Storage Layers Recall: C Low level I/O

COS 318: Operating Systems. NSF, Snapshot, Dedup and Review

SQS, SWF, and SNS 7/24/17. References. Amazon Simple Queue Service(SQS)

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

ECS 165B: Database System Implementa6on Lecture 2

Google Cluster Computing Faculty Training Workshop

Performance Measurement

Operating systems. Lecture 7 File Systems. Narcis ILISEI, oct 2014

CS 202 Advanced OS. WAFL: Write Anywhere File System

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák

Transcription:

Today s Objec2ves AWS/MR Review Final Projects Distributed File Systems Nov 3, 2017 Sprenkle - CSCI325 1 Inverted Index final input files have been posted Another email out to AWS Google cloud Nov 3, 2017 Sprenkle - CSCI325 2 1

FINAL PROJECT Nov 3, 2017 Sprenkle - CSCI325 3 Final Project: Schedule Project proposal: November?? Dec 4 8: Team Presenta2ons Dec 11 15 (finals week): Write up due Nov 3, 2017 Sprenkle - CSCI325 4 2

Final Project: Logis2cs Work in teams of 2-4 Choose your own project Ø See course final project page (to be posted) for some ideas Ø Using Amazon Cloud Ø Data analysis using Hadoop (or other cloud-based services) Nov 3, 2017 Sprenkle - CSCI325 5 Review What is RAID? Ø What is its mo2va2on? Ø What are some varia2ons? Ø What are the tradeoffs? When would you use one vs the other? Nov 3, 2017 Sprenkle - CSCI325 6 3

FILE SYSTEMS Nov 3, 2017 Sprenkle - CSCI325 7 File System Characteris2cs Higher level of abstrac2on Ø Prevents users and processes from dealing with disk blocks and memory blocks File systems are responsible for: Ø Organiza2on Ø Storage Ø Retrieval Ø Naming Ø Sharing Ø Protec2on Nov 3, 2017 Sprenkle - CSCI325 8 4

File System Characteris2cs Designed to store and manage large numbers of files Ø Create, name, delete Naming is supported through use of directories Ø UNIX uses pathnames to represent hierarchical naming scheme for files metadata refers to extra informa2on stored by file system for managing files Ø File afributes, directories, etc Nov 3, 2017 Sprenkle - CSCI325 9 Hierarchical Naming / root directory usr home etc csdept courses faculty cs325 students Your home directories www Your web pages Symbolic link public_html cs325 handouts turnin Nov 3, 2017 Sprenkle - CSCI325 10 5

Links Directories are lists of files and directories Each directory entry links to a file on the disk mydir hello file2 subdir cpy Hello World! Two directory entries can link to the same file Ø In same directory or across different directories Moving a file does not actually move any data around Ø Creates link in new loca2on Ø Deletes link in old loca2on ln command: ln <target> <dest> Nov 3, 2017 Sprenkle - CSCI325 11 Files Contain data and afributes Ø Afributes: file length, 2mestamps, owner, accesscontrol lists Typical File Header Block 0 Block 1 Block N-1 File contents Timestamps: crea2on, read, write, header Ownership Access Control List: who can access this file and in what mode Reference Count: Number of directories containing this file May be > 1 (hard linking of files) When 0, can delete file Nov 3, 2017 Sprenkle - CSCI325 12 6

File System Opera2ons File system opera2ons are ojen performed via system calls User programs are not allowed to access system resources directly Ø must ask OS to do that on their behalf System calls: set of func2ons for user programs to request for OS services Ø Run in kernel mode Ø Invoked by special instruc2on causing the kernel to switch from user mode Ø When the system call finishes, processor returns to the user program and runs in user mode. Nov 3, 2017 Sprenkle - CSCI325 13 How system calls work syscall C Run2me Library Opera2ng System Kernel Applica2on Program Device Driver User-space Kernel-space Nov 3, 2017 Sprenkle - CSCI325 14 7

Unix File System Uses no2on of file descriptors Ø Handle for a process to access a file Each process needs to open a file before reading/wri2ng file Ø OS creates an internal data structure for a file descriptor, returns handle Nov 3, 2017 Sprenkle - CSCI325 15 15 Unix File System Opera2ons open(name, mode) Return file descriptor creat(name, mode) close(filedescriptor) read(filedescriptor, buffer, n) write(filedescriptor, buffer, n) stat(name, buffer) link(name1, name2) and unlink(name) Nov 3, 2017 Sprenkle - CSCI325 16 8

DISTRIBUTED FILE SYSTEMS Nov 3, 2017 Sprenkle - CSCI325 17 Distributed File System Files are stored on a server Ø Clients communicate with server to perform opera2ons on file How does the network complicate things? What can we do about it? What challenges are introduced by a distributed file system (in addition to scalable storage)? Nov 3, 2017 Sprenkle - CSCI325 18 9

Distributed File System Requirements Transparency Ø Access, loca2on, mobility, performance, scaling Concurrent file updates File replica2on Hardware and OS heterogeniety Fault tolerance Consistency Security Efficiency Nov 3, 2017 Sprenkle - CSCI325 19 Distributed File Systems and RAID Distributed file systems require higher reliability and capacity at higher loads than a local file system RAID was created to address these limita2ons Most work in distributed file systems ignores the disklevel details (like which RAID is used) All work assumes that the disks themselves are reliable, scalable, and have high performance which can be accomplished using RAID Distributed File System RAID Storage Nov 3, 2017 Sprenkle - CSCI325 20 10

Distributed File Systems: Goal & Challenges Goal: Transparent access to remote files Ø Enable programs to store and access remote files as though they were local Ø Access at any 2me from any computer Ø Comparable performance and reliability to local disks Why would you want this? What are some of the hard issues? Know any examples of DFS? Nov 3, 2017 Sprenkle - CSCI325 21 DFS: Architecture Goal: Transparent access to remote files Client Client Client Client Local Area Network Server Server Server Nov 3, 2017 Sprenkle - CSCI325 22 11

Mo2va2on Key goal for distributed systems is resource sharing We have seen some ways to share resources Ø Web servers What about sharing within local networks? Nov 3, 2017 Sprenkle - CSCI325 23 Examples of Distributed File Systems NFS: Sun s Network File System AFS: Andrew File System ZFS: combined file system and logical volume manager designed by Sun Microsystems Coda: CMU research project for mobile clients xfs: Berkeley research project stressing serverless design GFS: Google File System FarSite: Microsoj Research project leveraging desktop drives Nov 3, 2017 Sprenkle - CSCI325 24 12

Distributed File System Requirements Transparency Ø Access - Clients are unaware of distribu2on of files; access local and remote files in same way Ø Loca2on - Clients see uniform name space; user programs see same name space wherever they are executed Ø Mobility - Client programs do not need to change when files move on disk Ø Performance - Load on service does not affect performance Ø Scaling - Service can be expanded to deal with varying load and network sizes Nov 3, 2017 Sprenkle - CSCI325 25 Distributed File System Requirements Concurrent file updates Ø Changes to file by one client should not interfere with changes to same file by another client Ø Concurrency control issues! (Think about locking ) File replica2on Ø A single file may be represented by several copies of its contents in different loca2ons Hardware and OS heterogeneity Ø Service must support different clients with different OSes and different types of hardware Nov 3, 2017 Sprenkle - CSCI325 26 13

Distributed File System Requirements Fault tolerance Ø Must operate in face of client and server failures Ø Need to support idempotent opera2ons Consistency Ø Concurrent access to files that are replicated should be consistent Security Ø Enforce permissions and access-control lists Efficiency Ø Need to achieve comparable performance, power, and generality as a non-distributed file system Nov 3, 2017 Sprenkle - CSCI325 27 Looking Ahead Monday: Priya Mahadevan, Network Architect at Google Ø Start thinking about ques2ons! Nov 3, 2017 Sprenkle - CSCI325 28 14