CS603: Distributed Systems

Similar documents
CIS 5373 Systems Security

MODELS OF DISTRIBUTED SYSTEMS

MODULE: NETWORKS MODULE CODE: CAN1102C. Duration: 2 Hours 15 Mins. Instructions to Candidates:

Configuring attack detection and prevention 1

CS101 Lecture 7: Internetworking:

MODELS OF DISTRIBUTED SYSTEMS

Network Layer: Control/data plane, addressing, routers

Today: Fault Tolerance

Networking interview questions

Configuring attack detection and prevention 1

CS 421: COMPUTER NETWORKS SPRING FINAL May 16, minutes

TCP/IP Protocol Suite and IP Addressing

CS505: Distributed Systems

TSIN02 - Internetworking

CS4700/5700: Network fundamentals

UDP, TCP, IP multicast

Network.... communication system for connecting end- systems. End-systems a.k.a. hosts PCs, workstations dedicated computers network components

CS-461 Internetworking. Dr. Mohamed Aboutabl

Distributed Systems (5DV147)

CS 421: COMPUTER NETWORKS SPRING FINAL May 24, minutes. Name: Student No: TOT

Network Architecture. TOC Architecture

II. Principles of Computer Communications Network and Transport Layer

Lecture 8 Network Layer: Logical addressing

EC441 Fall 2018 Introduction to Computer Networking Chapter4: Network Layer Data Plane

Networking and Internetworking 1

LOGICAL ADDRESSING. Faisal Karim Shaikh.

Different Layers Lecture 20

Layering and Addressing CS551. Bill Cheng. Layer Encapsulation. OSI Model: 7 Protocol Layers.

Vorlesung Kommunikationsnetze

Router Architecture Overview

CS101 Lecture 6: Internetworking: Internet Protocol, IP Addresses, Routing, DNS. John Magee 8 July Some images courtesy Wikimedia Commons

System Models for Distributed Systems

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Today: Fault Tolerance. Fault Tolerance

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

TSIN02 - Internetworking

Network Protocols - Revision

Introduction to Open System Interconnection Reference Model

EITF25 Internet Techniques and Applications L7: Internet. Stefan Höst

Networking Technologies and Applications

Applied Networks & Security

Lecture 17: Network Layer Addressing, Control Plane, and Routing

TCP /IP Fundamentals Mr. Cantu

CS 268: Internet Architecture & E2E Arguments. Today s Agenda. Scott Shenker and Ion Stoica (Fall, 2010) Design goals.

Network Architecture

Today: Fault Tolerance. Replica Management

06/02/ Local & Metropolitan Area Networks 0. INTRODUCTION. 1. History and Future of TCP/IP ACOE322

Introduction to Protocols

ECPE / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

EEC-684/584 Computer Networks

CHAPTER-2 IP CONCEPTS

20-CS Cyber Defense Overview Fall, Network Basics

Computer Communication & Networks / Data Communication & Computer Networks Week # 03

===================================================================== Exercises =====================================================================

Network Protocols. Sarah Diesburg Operating Systems CS 3430

W H I T E P A P E R : O P E N. V P N C L O U D. Implementing A Secure OpenVPN Cloud

OSI Layer OSI Name Units Implementation Description 7 Application Data PCs Network services such as file, print,

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018

Planning for Information Network

Question 7: What are Asynchronous links?

UNIT IV -- TRANSPORT LAYER

CS-435 spring semester Network Technology & Programming Laboratory. Stefanos Papadakis & Manolis Spanakis

Toward Intrusion Tolerant Clouds

Lecture 17 Overview. Last Lecture. Wide Area Networking (2) This Lecture. Internet Protocol (1) Source: chapters 2.2, 2.3,18.4, 19.1, 9.

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 8

Chapter 4: Network Layer

Unit 2.

Chapter 7 Internet Protocol Version 4 (IPv4) Kyung Hee University

Lecture 7. Reminder: Homework 2, Programming Project 1 due today. Homework 3, Programming Project 2 out, due Thursday next week. Questions?

Position of IP and other network-layer protocols in TCP/IP protocol suite

CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT

Distributed Systems COMP 212. Lecture 19 Othon Michail

Examination 2D1392 Protocols and Principles of the Internet 2G1305 Internetworking 2G1507 Kommunikationssystem, fk SOLUTIONS

Computer Science 425 Distributed Systems CS 425 / ECE 428. Fall 2013

Computer Communication EDA344, EDA343, DIT 420

The Transmission Control Protocol (TCP)

9th Slide Set Computer Networks

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

Computer Networks Prof. S. Ghosh Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 28 IP Version 4

Communications Software. CSE 123b. CSE 123b. Spring Lecture 10: Mobile Networking. Stefan Savage

Quick announcement. CSE 123b Communications Software. Last class. Today s issues. The Mobility Problem. Problems. Spring 2003

TSIN02 - Internetworking

NAT, IPv6, & UDP CS640, Announcements Assignment #3 released

Fault Tolerance. Basic Concepts

IPv4 addressing, NAT. Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley.

IP Address Assignment

King Fahd University of Petroleum and Minerals College of Computer Sciences and Engineering Department of Computer Engineering

What is the difference between unicast and multicast? (P# 114)

CS603: Distributed Systems

Private and Public addresses. Real IPs. Lecture (09) Internetwork Layer (3) Agenda. By: Dr. Ahmed ElShafee

ECPE / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

User Datagram Protocol (UDP):

CS670: Network security

Guide to Networking Essentials, 6 th Edition. Chapter 7: Network Hardware in Depth

Guide to Networking Essentials, 6 th Edition. Chapter 5: Network Protocols

CSCI-1680 Network Layer:

Dr Markus Hagenbuchner CSCI319 SIM. Distributed Systems Chapter 4 - Communication

exam. Number: Passing Score: 800 Time Limit: 120 min CISCO Interconnecting Cisco Networking Devices Part 1 (ICND)

Table of Contents. Cisco How NAT Works

Network Layer/IP Protocols

Transcription:

CS603: Distributed Systems Lecture 1: Basic Communication Services Cristina Nita-Rotaru Lecture 1/ Spring 2006 1

Reference Material Textbooks Ken Birman: Reliable Distributed Systems Recommended reading Research papers that will be specified for each lecture Cristina Nita-Rotaru Lecture 1/ Spring 2006 7

What is a Distributed System? A distributed computing system is a set of computer programs executing on one ore more computers and coordinating actions by exchanging messages. Cristina Nita-Rotaru Lecture 1/ Spring 2006 9

Examples of Distributed Systems Air Traffic Control Space Shuttle Banking Systems Grid Power Systems Modern Data Centers Cristina Nita-Rotaru Lecture 1/ Spring 2006 10

Distributed Systems Requirements Reliability: provide continuous service Availability: ready to use Safety: systems do what they are supposed to do, avoiding catastrophic consequences Security: withstands passive/active attacks from outsiders or insiders Cristina Nita-Rotaru Lecture 1/ Spring 2006 11

not easy to achieve because Computers and networks fail in many (often unpredictable) ways Computers get compromised Real-time constraints Performance requirements Complexity Cristina Nita-Rotaru Lecture 1/ Spring 2006 12

Why Do Computer Systems Fail? 1985, Fault-tolerant system (Tandem) System administration (operator actions, system configuration and maintenance) Software faults, environmental failures Hardware failures (disks and communication controllers) Power outages 2004, Where are we now?! The Internet Age Operator error (particularly configuration errors) is the leading cause of failures Failures in custom-written front-end software Not enough on-line testing Why do Internet services fail, and what can be done about it? D. Oppenheimer, A.Ganapathi and D. A. Patterson, 2003. Why Do Computers Stop and What can be done about it? Jim Gray, 1985 Cristina Nita-Rotaru Lecture 1/ Spring 2006 13

Why Do Computers Get Compromised? Software bugs Administration errors Lack of diversity, same vulnerability is exploited The explosion of the Internet facilitates the spread of malware Cristina Nita-Rotaru Lecture 1/ Spring 2006 14

..how do computer system fail Halting failures: no way to detect except by using timeout Fail-stop failures: accurately detectable halting failures Send-omission failures Receive-omission failures Network failures Network partitioning failures Timing failures: temporal property of the system is violated Byzantine failures: arbitrary failures, include both benign and malicious failures Cristina Nita-Rotaru Lecture 1/ Spring 2006 15

Air Traffic Control: A Case Scenario Prepared with slides courtesy of Prof. Ken Birman and used in a similar course at Cornell University Cristina Nita-Rotaru Lecture 1/ Spring 2006 16

ATC and Its Role Assists planes in taking-off, landing and en route (during flying) Assigns trajectories making sure that planes fly at a safe distance Each ATC has a certain space assigned to it As planes move they enter the space controlled by different ATCs Planes are also equipped with a collision avoidance system TCAS Cristina Nita-Rotaru Lecture 1/ Spring 2006 17

More Details on ATC Air space divided in sectors Each sector has a control center Centers may have few or many (50) controllers In USA, controller works alone In France, a controller is a team of 3-5 people Data comes from a radar system that broadcasts updates every 10 seconds Database keeps other flight data Controllers owns smaller sub-sectors Controllers make very quick decision(s) based on available data Cristina Nita-Rotaru Lecture 1/ Spring 2006 18

ATC Architecture NETWORK INFRASTRUCTURE DATABASE THE SYSTEM MUST BE AVAILABLE ALL TIME and MAINTAIN CONSISTENCY OF THE INFORMATION Cristina Nita-Rotaru Lecture 1/ Spring 2006 19

What Can Go Wrong? Overloaded computers can often crash Systems may get slow as volume of air traffic rises Inconsistent displaying: phantom planes missing planes stale information Scheduled maintenance going wrong Some major outages recently (and some nearmiss stories associated with them), some very unfortunate events as recent as 2003. Cristina Nita-Rotaru Lecture 1/ Spring 2006 20

Concept of IBM s 1994 System Replace video terminals with workstations Build a highly available real-time system guaranteeing no more than 3 seconds downtime per year Offer much better user interface to ATC controllers, with intelligent course recommendations and warnings about future course changes that will be needed IBM approach was based on lock-step replication Replace every major component of the system with a faulttolerant component set Replicate entire programs ( state machine approach) Cristina Nita-Rotaru Lecture 1/ Spring 2006 21

IBM ATC System Architecture Independent consoles backed by ultra-reliable components Console Radar processing system is redundant ATC database ATC database ATC database is really a high-availability cluster Cristina Nita-Rotaru Lecture 1/ Spring 2006 22

French ATC Project Concept French project used replication selectively. Some specific and critical data was replicated, for example list of planes currently in sector A.17 E.g. controller interface programs could maintain replicas of certain data structures or variables with system-wide value Programs did computing on their own helped by databases Program hosts a data replica but isn t itself replicated Cristina Nita-Rotaru Lecture 1/ Spring 2006 23

French ATC System Architecture Multiple consoles but in some ways they function like one Console A Console B Console C Radar updates sent with hardware broadcasts ATC database ATC database only sees one connection Cristina Nita-Rotaru Lecture 1/ Spring 2006 24

Other technologies used Both used standard off-the-shelf workstations (easier to maintain, upgrade, manage) IBM proposed their own software for fault-tolerance and consistent system implementation French used Isis software developed at Cornell Both developed fancy graphical user interface much like the Web, pop-up menus for control decisions, etc. Cristina Nita-Rotaru Lecture 1/ Spring 2006 25

IBM Project Was a Fiasco!! IBM was unable to implement their faulttolerant software architecture! Problem was much harder than they expected. Even a non-distributed interface turned out to be very hard, major delays, scaled back goals And performance of the replication scheme turned out to be terrible for reasons they didn t anticipate The French project was a success and never even missed a deadline In use today. Cristina Nita-Rotaru Lecture 1/ Spring 2006 26

Where did IBM go wrong? Their software worked correctly The replication mechanism wasn t flawed, although it was much slower than expected But somehow it didn t fit into a comfortable development methodology Developers need to find a good match between their goals and the tools they use IBM never reached this point The French approach matched a more standard way of developing applications Cristina Nita-Rotaru Lecture 1/ Spring 2006 27

Basic Communication Services Cristina Nita-Rotaru Lecture 1/ Spring 2006 32

OSI/ISO Model Application Presentation Session Transport Network Data Link Physical Layer Application Presentation Session Transport Network Data Link Physical Layer Cristina Nita-Rotaru Lecture 1/ Spring 2006 33

Internet Protocol - IP IP is the current delivery protocol on the Internet, between hosts. IP provides best effort, unreliable delivery of packets. There are two versions: IPv4 is the current routing protocol on the Internet IPv6, a newer version, still not totally embraced by the community Cristina Nita-Rotaru Lecture 1/ Spring 2006 34

Transport Protocols Provides communication between processes running on hosts The most common transport protocols are UDP and TCP. OS provides support for developing applications on top of UDP and TCP. Cristina Nita-Rotaru Lecture 1/ Spring 2006 35

User Datagram Protocol - UDP Connectionless protocol for a user process: No connection established Unreliable transmission: no guarantee that the packets reach their destination. Error detection. Runs on top of IP. Cristina Nita-Rotaru Lecture 1/ Spring 2006 36

Transmission Control Protocol - TCP Connection oriented protocol for a user process: Reliable, full-duplex channel: acknowledgements, retransmissions, timeouts, flow-control The packets are delivered in the same order in which they were sent. Flow Control: Max allowed window size Congestion control: Slow-start phase exponential increase (until the slowstart threshold is hit) Congestion Avoidance phase additive increase Multiplicative Decrease on timeout. Cristina Nita-Rotaru Lecture 1/ Spring 2006 37

Hardware Addresses Hosts access the physical medium via network cards. Each network card is uniquely identified by a 48 bit (6 bytes) number, called hardware address, or Ethernet address. Ethernet addresses are hardwired into the electronics of the network device. b 1 b 2 b 3 b 4 b 5 b 6 unique to the assigned by the manufacturer of the manufacturer. card. ARP/RARP protocols map IP addresses to hardware addresses and vice versa. Cristina Nita-Rotaru Lecture 1/ Spring 2006 38

IP Addresses Hosts are identified in the network by IP addresses Two different network addresses: IPv4 addresses: 32 bits addresses, most used IPv6 addresses: 128 bits addresses. Each decimal number represents eight bits of binary data (value between 0 and 255). Divided in classes. Network addresses with first byte between 1 and 126 are class A Network addresses with first byte between 128 and 191 are class B Network addresses with first byte between 192 and 223 are class C All other networks are class D, used for special functions or class E which is reserved. Cristina Nita-Rotaru Lecture 1/ Spring 2006 39

Naming services: DNS People prefer names for hosts (hostnames): Name: ugrad1 Fully qualified name: ugrad1.cs.jhu.edu DNS (Domain Name System) maps hostnames to IP addresses. Example: ugrad1.cs.jhu.edu has the IP 128.220.224.76 Cristina Nita-Rotaru Lecture 1/ Spring 2006 40

NATs and their implications There are not enough IP addresses Solutions: IPv6 or.network Address Translation (NAT) NAT allows a single device, to act as an agent between the Internet (or "public network") and a local (or "private") network: only a single, unique IP address is required to represent an entire group of computers Computers can not communicate directly, STUN client-server protocol allows computers to discover each other behind a NAT (learn their public addresses), but requires presence of STUN server Cristina Nita-Rotaru Lecture 1/ Spring 2006 41

Problems with NATs Break end-to-end control Hosts depend on same trusted point (the STUN server) Add complexity Prevent IP security deployment Cristina Nita-Rotaru Lecture 1/ Spring 2006 42

IP Multicast Provides support for group communication: send to multiple parties Groups are specified by reserved IP multicast addresses 224.0.0.0 to 239.255.255.255. Unreliable communication IGMP is used to dynamically register individual hosts in a multicast group on a particular LAN. Network cards recognize IP multicast addresses: hosts that did not subscribe to a particular group will not process those packets (unlike broadcast that is processed by all hosts in a network segment) Issues with IP multicast: can be used to cause DOS, many ISP and enterprise network block IP multicast communication Cristina Nita-Rotaru Lecture 1/ Spring 2006 43

Byte Order Different systems store multibyte values (for example int) in different ways. HP, Motorola 68000, and SUN systems store multibyte values in Big Endian order: stores the high-order byte at the starting address Intel 80x86 systems store them in Little Endian order: stores the low-order byte at the starting address. Why is this a problem for network applications? Data is interpreted differently on hosts with different architectures. Cristina Nita-Rotaru Lecture 1/ Spring 2006 44

Buffering and Fragmentation Buffering: OS maintains a set of buffers used to temporary store incoming and outgoing messages THE BUFFERING SPACE is LIMITED Fragmentation: IP datagrams are fragmented, they can travel on different paths When processes send very fast, packets can be dropped by the OS without any notification On sending: no OS memory can be obtained for one or several fragments On receiving: one or several fragments did not make it to the destination, entire datagram is dropped Cristina Nita-Rotaru Lecture 1/ Spring 2006 45

Why these protocols do not provide better support for distributed applications? Cristina Nita-Rotaru Lecture 1/ Spring 2006 46

The End-to-End Argument End to end arguments in System Design. Saltzer, Reed, Clark TOCS 1990. Cristina Nita-Rotaru Lecture 1/ Spring 2006 47

What is all about? Analyzes what services should be provided at low levels and what should be provided by the application Commonly cited as a justification for not addressing reliability at low levels and let application handle it Example: how to transfer a file: hop-by-hop or endto-end Low-level mechanisms should focus on speed, not reliability The application should worry about properties it needs Cristina Nita-Rotaru Lecture 1/ Spring 2006 48

References Chapter 1 and 2 from Reliable Distributed Systems Why do Internet services fail, and what can be done about it? D. Oppenheimer, A.Ganapathi and D. A. Patterson, 2003. Why Do Computers Stop and What can be done about it? Jim Gray, 1985. End to end arguments in System Design. Saltzer, Reed, Clark TOCS 1990. Cristina Nita-Rotaru Lecture 1/ Spring 2006 49