CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Similar documents
CS 470 Spring Distributed Web and File Systems. Mike Lam, Professor. Content taken from the following:

Chapter 11 DISTRIBUTED FILE SYSTEMS

CPE731 Distributed System Models

Operating Systems. Week 13 Recitation: Exam 3 Preview Review of Exam 3, Spring Paul Krzyzanowski. Rutgers University.

CS 416: Operating Systems Design April 22, 2015

CS 470 Spring Security. Mike Lam, Professor. a.k.a. Why on earth do Alice and Bob need to share so many secrets?!?

Operating Systems Design Exam 3 Review: Spring Paul Krzyzanowski

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Advanced Operating Systems

Cloud Computing CS

Google Cluster Computing Faculty Training Workshop

WWW, REST, and Web Services

CS 470 Spring Security. Mike Lam, Professor. a.k.a. Why on earth do Alice and Bob need to talk so much?!? Content taken from the following:

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Uniform Resource Locators (URL)

Distributed File Systems. CS432: Distributed Systems Spring 2017

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Web as a Distributed System

Global Servers. The new masters

Distributed File Systems II

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

CNIT 129S: Securing Web Applications. Ch 3: Web Application Technologies

System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

Novell Access Manager

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Distributed Systems. Hajussüsteemid MTAT Distributed File Systems. (slides: adopted from Meelis Roos DS12 course) 1/25

GridNFS: Scaling to Petabyte Grid File Systems. Andy Adamson Center For Information Technology Integration University of Michigan

Lecture 2 Distributed Filesystems

Chapter 10 DISTRIBUTED OBJECT-BASED SYSTEMS

Distributed File Systems

Today CSCI Coda. Naming: Volumes. Coda GFS PAST. Instructor: Abhishek Chandra. Main Goals: Volume is a subtree in the naming space

Cloud Computing CS

Operating Systems Design Exam 3 Review: Spring 2011

Lecture 08: Networking services: there s no place like

CA485 Ray Walshe Google File System

Chapter 2. Application Layer

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Chapter 4 The Internet

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Distributed File Systems. Distributed Systems IT332

HP Instant Support Enterprise Edition (ISEE) Security overview

Introduction to Cloud Computing

Chapter 10: Application Layer CCENT Routing and Switching Introduction to Networks v6.0

NFS Version 4.1. Spencer Shepler, Storspeed Mike Eisler, NetApp Dave Noveck, NetApp

EEC-682/782 Computer Networks I

Application Layer Introduction; HTTP; FTP

CS6601 DISTRIBUTED SYSTEM / 2 MARK

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Novell Access Manager

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, Pilani Pilani Campus Instruction Division

AN OVERVIEW OF DISTRIBUTED FILE SYSTEM Aditi Khazanchi, Akshay Kanwar, Lovenish Saluja

HTTP and Web Content Delivery

Content Overlays. Nick Feamster CS 7260 March 12, 2007

CS /29/17. Paul Krzyzanowski 1. Fall 2016: Question 2. Distributed Systems. Fall 2016: Question 2 (cont.) Fall 2016: Question 3

CS November 2017

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 5 Naming

CMPE 151: Network Administration. Servers

Objectives. Connecting with Computer Science 2

Using the Cisco ACE Application Control Engine Application Switches with the Cisco ACE XML Gateway

Introduction to Distributed Computing Systems

2. Introduction to Internet Applications

Int ernet w orking. Internet Security. Literature: Forouzan: TCP/IP Protocol Suite : Ch 28

ReST 2000 Roy Fielding W3C

Computer Networks. Wenzhong Li. Nanjing University

Chapter 18 Distributed Systems and Web Services

Lecture 7: Distributed File Systems

Massively Scalable File Storage. Philippe Nicolas, KerStor

Chapter 4: Networking and the Internet

Foundations of Python

FILE EXCHANGE PROTOCOLS AND ZERO CONFIGURATION NETWORKING

Distributed File Systems. Security: Anonymous access. Peer-to-Peer Security. Server Authenticated Sessions

Introduction to TCP/IP

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

4. Note: This example has NFS version 3, but other settings such as NFS version 4 may also work better in some environments.

CS November 2018

Chapter 4: Networking and the Internet. Network Classifications. Network topologies. Network topologies (continued) Connecting Networks.

Issues. Separation of. Distributed system security. Security services. Security policies. Security mechanism

DISTRIBUTED FILE SYSTEMS & NFS

Feedback on BeeGFS. A Parallel File System for High Performance Computing

CS454/654 Midterm Exam Fall 2004

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

HTTP. Web. Web Web web

2 SCANNING, PROBING, AND MAPPING VULNERABILITIES

Operating system security models

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

CSE 486/586: Distributed Systems

Information Security CS 526

Distributed File Systems. Security: Anonymous access. Peer-to-Peer Security. Server Authenticated Sessions

Distributed Systems 16. Distributed File Systems II

NFS around the world Tigran Mkrtchyan for dcache Team dcache User Workshop, Umeå, Sweden

StreamSets Control Hub Installation Guide

VMware Identity Manager Connector Installation and Configuration (Legacy Mode)

Verteilte Systeme (Distributed Systems)

Operating Systems Design 16. Networking: Remote File Systems

Jeff Offutt SWE 642 Software Engineering for the World Wide Web

Etanova Enterprise Solutions

The Parallel NFS Bugaboo. Andy Adamson Center For Information Technology Integration University of Michigan

Distributed File Systems

Transcription:

CS 470 Spring 2017 Mike Lam, Professor Distributed Web and File Systems Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapters 11 and 12) Various online sources

The "Internet" World Wide Web (WWW) System for sharing information via hyperlinked documents Started as a CERN project; now a massive distributed system built on a worldwide network (the "Internet") Issues: Naming Security Consistency Replication

Naming IPv4 and IPv6 addresses for hosts Uniform Resource Locator (URL) Unique worldwide name of a document Hierarchical domain + path Domain Name Service (DNS) Distributed, hierarchical IP address lookup protocol

Security Secure Socket Layer (SSL) and Transport Layer Security (TLS) Public-key authentication, symmetric end-to-end encryption Certification Authority (CA) provides centralized key checking Examples: InCommon/Comodo, Symantec, and Let s Encrypt HyperText Transfer Protocol (HTTP) Protocol for browser-server communication Request and response model w/ headers and status codes HTTPS is HTTP over SSL/TLS Common Gateway Interface (CGI) Standardized program execution protocol Somewhat similar to remote procedure calls (RPCs) Denial-of-Service (DoS) or Distributed DoS (DDoS) attacks Often executed by botnets of virus-infected personal computers

Consistency Content Delivery Networks (e.g. Akamai) Globally-distributed network of proxy servers Goal: improve data locality Peer-to-peer and private CDNs Firefox / Chrome / Safari / IE / Edge Graphical interface for HTTP connections Often caches website components locally

Replication Apache - Open-source extensible web server LAMP: Linux, Apache, MySQL/MariaDB/MongoDB, PHP/Perl/Python Other web servers: Nginx & Microsoft IIS Load balancing Large websites require multiple servers w/ replicated data to provide availability to a massive number of users Load balancers ensure that the traffic is distributed evenly Apache

Interchange formats The internet is a very heterogeneous distributed system HyperText Markup Language (HTML) Encoding formats for email messages Simple Object Access Protocol (SOAP) Generalized HTML; generic data-interchange format Multipurpose Internet Mail Exchange (MIME) Document format for WWW extensible Markup Language (XML) Exchanging information can be a challenge Web services data format JavaScript Object Notation (JSON) Lightweight data-interchange format

File systems File system: manages storage of structured data in files Export: a file system that is made available to another host Mount: link to a remote file system in the local file system File systems table (fstab) configuration Static vs. automatic mounting /etc/fstab on cluster /dev/mapper/rhel_login01-root /dev/mapper/rhel_login01-swap / swap xfs swap defaults defaults nfs.cluster.cs.jmu.edu:/nfs/home nfs.cluster.cs.jmu.edu:/nfs/scratch nfs.cluster.cs.jmu.edu:/nfs/shared /nfs/home /scratch /shared nfs nfs nfs rw rw rw,acl

Distributed file systems Networked file system: centralized storage with export/mount sharing Distributed file system: distributed storage with communication protocol Centralized vs. decentralized Asymmetric vs. symmetric Issues: Naming Security Consistency Replication

Naming Hierarchical file names Filesystem Hierarchy Standard File descriptors / handles Abstract identifier for an open file In POSIX, positive integers: Standard input: 0 Standard output: 1 Standard error: 2 In distributed file systems: Data-centric names Location-centric names Name servers (lookup) vs. file servers (access)

Security Authentication UIDs, LDAP, Kerberos, Active Directory Access control (authorization) Unix file permissions Access control lists (e.g., POSIX) Client vs. server permissions Encryption: security vs. performance tradeoff Alice: read,write; Bob: read; Unix file permissions Access control list

Consistency Remote access vs. upload/download model Closely related to replication issue Remote access Upload / download

Replication Client-side replication (caching) Provides continued functionality while offline Causes synchronization / consistency problems Callback system for updating other clients Server-side replication (mirroring) Provides fault tolerance Striping: splitting a file's blocks across multiple servers Can be counterproductive if writes are frequent A B C D E A B A B A B A B C D E C D C D C E A B C E E no striping E w/ striping

Network File System (NFS) Basic file sharing protocol for local networks Based on remote procedure calls (RPCs) Provides shared storage and reliability in presence of failures from Tanenbaum and Van Steen (Ch.11)

Network File System (NFS) Developed by Sun in 1984 NFSv2 released in 1989 UDP-based stateless protocol No built-in locking NFSv3 released in 1995 Originally an in-house solution (v.1), now an open standard 64-bit and TCP support NFSv4 released in 2000 Adds stateful protocol Better access control New security features (including encrypted traffic) pnfs: scalable access to files distributed on multiple servers

Andrew File System (AFS) Developed at CMU (Named after Andrew Carnegie and Andrew Mellon) Improved on NFS in terms of scalability and security Weak consistency model Each file is locked when opened Modifications are performed and buffered locally Updates are only sent to the server when a file is closed Server uses callbacks to update other clients Kerberos-based access control lists Lookup / insert / delete / administer Read / write / lock Heavily influenced development of NFSv4

NFSv4 Access control lists Type: A = allow, D = deny, U = audit, L = alarm Flags: g = group, d = directory-inherit, f = file-inherit Permissions: r = read, w = write, a = append, x = execute, d = delete Permissions (cont.): c = read-acl, C = write-acl, o = write-owner Policy of "default-deny" A::OWNER@:rwatTnNcCy A::alice@nfsdomain.org:rxtncy A::bob@nfsdomain.org:rwadtTnNcCy A:g:GROUP@:rtncy D:g:GROUP@:waxTC A::EVERYONE@:rtncy D::EVERYONE@:waxTC

Google File System (GFS) Reliable asymmetric distributed file system on commodity hardware Each file is split into chunks with unique chunk IDs (chunks can be replicated) Master stores metadata tracking each file and its chunks (and where they are) Basis for BigTable, backing store for the original MapReduce from Tanenbaum and Van Steen (Ch.11)

Lustre High-performance parallel file system Initially a research project; later owned by Sun/Oracle Now open source, maintained by a collection of organizations Multiple lower-level interconnects: Ethernet, Infiniband Used by many supercomputer installations E.g., Sequoia and Titan Three functional units Metadata server (MDS) names, layout, permissions Object storage server (OSS) stores file data Clients connect to servers

Peer-to-peer file systems Characterized by direct communication between clients Centralized (e.g., Napster) vs. decentralized (e.g., Bittorrent) Anonymized (e.g., Freenet) via large-scale distributed caching with encryption and hash-based keys to locate data Raises many social and ethical issues Censorship, activism, and free speech Privacy and security Illegal activity and law enforcement