CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Similar documents
CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Chapter 11 DISTRIBUTED FILE SYSTEMS

CPE731 Distributed System Models

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

Cloud Computing CS

CS 416: Operating Systems Design April 22, 2015

CS 470 Spring Security. Mike Lam, Professor. a.k.a. Why on earth do Alice and Bob need to share so many secrets?!?

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Advanced Operating Systems

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Google Cluster Computing Faculty Training Workshop

WWW, REST, and Web Services

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

Distributed File Systems. CS432: Distributed Systems Spring 2017

CS 470 Spring Security. Mike Lam, Professor. a.k.a. Why on earth do Alice and Bob need to talk so much?!? Content taken from the following:

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Uniform Resource Locators (URL)

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Web as a Distributed System

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

Distributed File Systems II

Global Servers. The new masters

Lecture 2 Distributed Filesystems

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

CNIT 129S: Securing Web Applications. Ch 3: Web Application Technologies

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Novell Access Manager

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

Chapter 10 DISTRIBUTED OBJECT-BASED SYSTEMS

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Distributed File Systems

Cloud Computing CS

CA485 Ray Walshe Google File System

Operating Systems Design Exam 3 Review: Spring 2011

Lecture 08: Networking services: there s no place like

Chapter 2. Application Layer

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

Chapter 4 The Internet

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

HP Instant Support Enterprise Edition (ISEE) Security overview

Distributed File Systems. Distributed Systems IT332

Introduction to Cloud Computing

Chapter 10: Application Layer CCENT Routing and Switching Introduction to Networks v6.0

NFS Version 4.1. Spencer Shepler, Storspeed Mike Eisler, NetApp Dave Noveck, NetApp

Feedback on BeeGFS. A Parallel File System for High Performance Computing

EEC-682/782 Computer Networks I

The Parallel NFS Bugaboo. Andy Adamson Center For Information Technology Integration University of Michigan

Application Layer Introduction; HTTP; FTP

CS6601 DISTRIBUTED SYSTEM / 2 MARK

Novell Access Manager

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

HTTP and Web Content Delivery

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Crossing the Chasm: Sneaking a parallel file system into Hadoop

CS /29/17. Paul Krzyzanowski 1. Fall 2016: Question 2. Distributed Systems. Fall 2016: Question 2 (cont.) Fall 2016: Question 3

CS November 2017

FILE EXCHANGE PROTOCOLS AND ZERO CONFIGURATION NETWORKING

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 5 Naming

CMPE 151: Network Administration. Servers

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Objectives. Connecting with Computer Science 2

Using the Cisco ACE Application Control Engine Application Switches with the Cisco ACE XML Gateway

Introduction to Distributed Computing Systems

2. Introduction to Internet Applications

ReST 2000 Roy Fielding W3C

Int ernet w orking. Internet Security. Literature: Forouzan: TCP/IP Protocol Suite : Ch 28

Computer Networks. Wenzhong Li. Nanjing University

Chapter 18 Distributed Systems and Web Services

Lecture 7: Distributed File Systems

Massively Scalable File Storage. Philippe Nicolas, KerStor

Chapter 4: Networking and the Internet

Foundations of Python

Distributed File Systems. Security: Anonymous access. Peer-to-Peer Security. Server Authenticated Sessions

Introduction to TCP/IP

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

CS November 2018

4. Note: This example has NFS version 3, but other settings such as NFS version 4 may also work better in some environments.

Chapter 4: Networking and the Internet. Network Classifications. Network topologies. Network topologies (continued) Connecting Networks.

Issues. Separation of. Distributed system security. Security services. Security policies. Security mechanism

Operating Systems Design 16. Networking: Remote File Systems

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

DISTRIBUTED FILE SYSTEMS & NFS

Architectural Overview INSIGHT Remote Monitoring Platform

EECS 122: Introduction to Computer Networks DNS and WWW. Internet Names & Addresses

CS454/654 Midterm Exam Fall 2004

Storage and File Hierarchy

Lustre A Platform for Intelligent Scale-Out Storage

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

COS 318: Operating Systems

Chapter 4 Communication

Operating system security models

HTTP. Web. Web Web web

2 SCANNING, PROBING, AND MAPPING VULNERABILITIES

CSE 486/586: Distributed Systems

Distributed Systems 16. Distributed File Systems II

Transcription:

CS 470 Spring 2018 Mike Lam, Professor Distributed Web and File Systems Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapters 11 and 12) Various online sources

The "Internet" World Wide Web (WWW) System for sharing information via hyperlinked documents Started as a CERN project; now a massive distributed system built on a worldwide network (the "Internet") Issues: Naming Security Consistency Replication

Naming IPv4 and IPv6 addresses for hosts Uniform Resource Locator (URL) Unique worldwide name of a document Hierarchical domain + path Domain Name Service (DNS) Distributed, hierarchical IP address lookup protocol

Security Secure Socket Layer (SSL) and Transport Layer Security (TLS) Public-key authentication, symmetric end-to-end encryption Certificate Authority (CA) provides centralized key checking Examples: InCommon/Comodo, Symantec, and Let s Encrypt HyperText Transfer Protocol (HTTP) Protocol for browser-server communication Request and response model w/ headers and status codes HTTPS is HTTP over SSL/TLS Common Gateway Interface (CGI) Standardized program execution protocol Somewhat similar to remote procedure calls (RPCs) Denial-of-Service (DoS) or Distributed DoS (DDoS) attacks Often executed by botnets of virus-infected personal computers

Interchange formats The internet is a very heterogeneous distributed system HyperText Markup Language (HTML) Encoding formats for email messages Simple Object Access Protocol (SOAP) Generalized HTML; generic data-interchange format Multipurpose Internet Mail Exchange (MIME) Document format for WWW extensible Markup Language (XML) Exchanging information can be a challenge Web services data format JavaScript Object Notation (JSON) Lightweight data-interchange format

Consistency Content Delivery Networks (e.g. Akamai) Globally-distributed network of proxy servers Goal: improve data locality Peer-to-peer and private CDNs Firefox / Chrome / Safari / IE / Edge Graphical interface for HTTP connections Often caches website components locally

Replication Apache - Open-source extensible web server LAMP: Linux, Apache, MySQL/MariaDB/MongoDB, PHP/Perl/Python Other web servers: Nginx & Microsoft IIS Load balancing Large websites require multiple servers w/ replicated data to provide availability to a massive number of users Load balancers ensure that the traffic is distributed evenly Apache

File systems File system: manages storage of structured data in files Export: a file system that is made available to another host Mount: link to a remote file system in the local file system File systems table (fstab) configuration Static vs. automatic mounting /etc/fstab on cluster /dev/mapper/rhel_login01-root /dev/mapper/rhel_login01-swap / swap xfs swap defaults defaults nfs.cluster.cs.jmu.edu:/nfs/home nfs.cluster.cs.jmu.edu:/nfs/scratch nfs.cluster.cs.jmu.edu:/nfs/shared /nfs/home /scratch /shared nfs nfs nfs rw rw rw,acl

Distributed file systems Networked file system: centralized storage with export/mount sharing Distributed file system: distributed storage with communication protocol Centralized vs. decentralized Asymmetric vs. symmetric Issues: Naming Security Consistency Replication

Naming Hierarchical file names Filesystem Hierarchy Standard File descriptors / handles Abstract identifier for an open file In POSIX, positive integers: Standard input: 0 Standard output: 1 Standard error: 2 In distributed file systems: Data-centric names Location-centric names Name servers (lookup) vs. file servers (access)

Security Authentication UIDs, LDAP, Kerberos, Active Directory Access control (authorization) Unix file permissions Access control lists (e.g., POSIX) Client vs. server permissions Encryption: security vs. performance tradeoff Alice: read,write; Bob: read; Unix file permissions Access control list

Consistency Remote access vs. upload/download model Closely related to replication issue Remote access Upload / download

Replication Client-side replication (caching) Provides continued functionality while offline Causes synchronization / consistency problems Callback system for updating other clients Server-side replication (mirroring) Provides fault tolerance Striping: splitting a file's blocks across multiple servers Can be counterproductive if writes are frequent A B C D E A B A B A B A B C D E C D C D C E A B C E E no striping E w/ striping

Network File System (NFS) Basic file sharing protocol for local networks Based on remote procedure calls (RPCs) Provides shared storage and reliability in presence of failures from Tanenbaum and Van Steen (Ch.11)

Network File System (NFS) Developed by Sun in 1984 NFSv2 released in 1989 UDP-based stateless protocol No built-in locking NFSv3 released in 1995 Originally an in-house solution (v.1), now an open standard 64-bit and TCP support NFSv4 released in 2000 Adds stateful protocol Better access control New security features (including encrypted traffic) pnfs: scalable access to files distributed on multiple servers

Andrew File System (AFS) Developed at CMU (Named after Andrew Carnegie and Andrew Mellon) Improved on NFS in terms of scalability and security Weak consistency model Each file is locked when opened Modifications are performed and buffered locally Updates are only sent to the server when a file is closed Server uses callbacks to update other clients Kerberos-based access control lists Lookup / insert / delete / administer Read / write / lock Heavily influenced development of NFSv4

NFSv4 Access control lists Type: A = allow, D = deny, U = audit, L = alarm Flags: g = group, d = directory-inherit, f = file-inherit Permissions: r = read, w = write, a = append, x = execute, d = delete Permissions (cont.): c = read-acl, C = write-acl, o = write-owner Policy of "default-deny" A::OWNER@:rwatTnNcCy A::alice@nfsdomain.org:rxtncy A::bob@nfsdomain.org:rwadtTnNcCy A:g:GROUP@:rtncy D:g:GROUP@:waxTC A::EVERYONE@:rtncy D::EVERYONE@:waxTC

Google File System (GFS) Reliable asymmetric distributed file system on commodity hardware Each file is split into chunks with unique chunk IDs (chunks can be replicated) Master stores metadata tracking each file and its chunks (and where they are) Basis for BigTable, backing store for the original MapReduce from Tanenbaum and Van Steen (Ch.11)

Lustre High-performance parallel file system Initially a research project; later owned by Sun/Oracle Now open source, maintained by a collection of organizations Multiple lower-level interconnects: Ethernet, Infiniband Used by many supercomputer installations E.g., Sequoia and Titan Three functional units Metadata server (MDS) names, layout, permissions Object storage server (OSS) stores file data Clients connect to servers

GPFS The General Parallel File System (GPFS) was developed by IBM and released in 1998 Re-branded as IBM Spectrum Scale in 2015 Stripes file data across multiple servers Reads and writes happen in parallel Distributed metadata; no single point of failure Includes availability and fault tolerance mechanisms Provides full POSIX compatibility

Peer-to-peer file systems Characterized by direct communication between clients Centralized (e.g., Napster) vs. decentralized (e.g., Bittorrent) Anonymized (e.g., Freenet) via large-scale distributed caching with encryption and hash-based keys to locate data Raises many social and ethical issues Censorship, activism, and free speech Privacy and security Illegal activity and law enforcement