What is a distributed system?

Similar documents
Where I was born. Where I studied. Distributed Computing Honors. Distributed Computing. Lorenzo Alvisi. Lorenzo Alvisi Surbhi Goel Benny Renard

Introduction. Distributed Systems IT332

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

CA464 Distributed Programming

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Distributed Systems Principles and Paradigms

Concepts of Distributed Systems 2006/2007

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Advanced Distributed Systems

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

System Models for Distributed Systems

02 - Distributed Systems

02 - Distributed Systems

CSE Traditional Operating Systems deal with typical system software designed to be:

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

System models for distributed systems

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. UNIT I PART A (2 marks)

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Introduction to Distributed Systems

Introduction. Chapter 1

Distributed Clock Synchronization Algorithms: A Survey

CHAPTER 1 Fundamentals of Distributed System. Issues in designing Distributed System

Introduction to Distributed Systems

Chapter 1: Introduction 1/29

TDDD82 Secure Mobile Systems Lecture 1: Introduction and Distributed Systems Models

Distributed File Systems. CS 537 Lecture 15. Distributed File Systems. Transfer Model. Naming transparency 3/27/09

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

Distributed Systems Conclusions & Exam. Brian Nielsen

MODELS OF DISTRIBUTED SYSTEMS

Replication in Distributed Systems

Client Server & Distributed System. A Basic Introduction

Distributed Algorithms Models

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Distributed KIDS Labs 1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

PART B UNIT II COMMUNICATION IN DISTRIBUTED SYSTEM PART A

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

Chapter 17: Distributed Systems (DS)

Distributed Systems LEEC (2006/07 2º Sem.)

3C05 - Advanced Software Engineering Thursday, April 29, 2004

CS4513 Distributed Computer Systems

CS 403/534 Distributed Systems Midterm April 29, 2004

It also performs many parallelization operations like, data loading and query processing.

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

CSE 5306 Distributed Systems. Course Introduction

Distributed Object-Based Systems The WWW Architecture Web Services Handout 11 Part(a) EECS 591 Farnam Jahanian University of Michigan.

TDP3471 Distributed and Parallel Computing

Consistency & Replication

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Middleware for Embedded Adaptive Dependability (MEAD)

Chapter 20: Database System Architectures

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems. Chapter 1: Introduction

A subtle problem. An obvious problem. An obvious problem. An obvious problem. t ss

Large Systems: Design + Implementation: Communication Coordination Replication. Image (c) Facebook

Distributed Systems (ICE 601) Fault Tolerance

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Chapter 2 System Models

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis

Basic vs. Reliable Multicast

Distributed Information Processing

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Introduction to Distributed Systems


Distributed Systems: Models and Design

Distributed Systems COMP 212. Lecture 1 Othon Michail

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS 43: Computer Networks. 08:Network Services and Distributed Systems 19 September

MODELS OF DISTRIBUTED SYSTEMS

Introduction to Distributed Systems Seif Haridi

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Distributed Systems

Outline. Distributed Computing Systems. The Rise of Distributed Systems. Depiction of a Distributed System 4/15/2014

INTRODUCTION TO Object Oriented Systems BHUSHAN JADHAV

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Distributed Systems Conclusions & Exam. Brian Nielsen

416 practice questions (PQs)

Introduction to Distributed Systems

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

Module 7 - Replication

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Lecture 5: Object Interaction: RMI and RPC

Consensus and related problems

Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms

Distributed Systems. Today. Next. Distributed systems The class Models and architectures. A brief, gentle intro to Go

Distributed System. Gang Wu. Spring,2018

Introduction to Distributed Computing

Distributed Systems (5DV147)

Time. COS 418: Distributed Systems Lecture 3. Wyatt Lloyd

Communication. Distributed Systems Santa Clara University 2016

Transcription:

CS 378 Intro to Distributed Computing Lorenzo Alvisi Harish Rajamani What is a distributed system? A distributed system is one in which the failure of a computer you didn t even know existed can render your own computer unusable. Leslie Lamport

What is a distributed system? A distributed system is software through which a collection of independent computers appears to its users as a single, coherent system Machine A Machine B Machine C Distributed Applications Middleware Local OS Local OS Local OS Network Goals (and auto-goals) of a Distributed System Connecting Resources and Users Transparency Openness Scalability

Transparency Transparency Access Location Migration Relocation Replication Concurrency Failure Persistence Description Hides differences in data representation and invocation mechanisms Hides where an object resides Hides from an object that object s location Hides from a client the change of location of an object to which the client is bound Hides that an object may be replicated, with replicas at different locations Hides coordination of activities between objects Hides the failure and recovery of object Hides whether a resource is in memory or on disk Openness Conform to well-defined interfaces Easily interact with other open systems Achieve independence in heterogeneity wrt Hardware Platform Languages Support different app/user-specific policies

Openness Conform to well-defined interfaces Easily interact with other open systems Achieve independence in heterogeneity wrt Hardware Platform Languages Support different app/user-specific policies ideally, provide only mechanisms! Scalability Size scalability number of users and processes Geographical scalability maximum distance between nodes Administrative scalability number of administrative domains

Scaling Distribute partition data and computation across multiple machine Java applets, DNS, WWW Replicate make copies of data available at different machines Cache mirrored web sites, replicated fs, replicated db Consistency allow client processes to access local copies Web caches, file caching A first course in Distributed Computing... Two basic approaches cover many interesting systems, and distill from them fundamental principles focus on a deep understanding of the fundamental principles, and see them instantiated in a few systems

A few intriguing questions How do we talk about a distributed execution? Can we draw global conclusions from local information? Can we coordinate operations without relying on synchrony? For the problems we know how to solve, how do we characterize the goodness of our solution? Are there problems that simply cannot be solved? What are useful notions of consistency, and how do we maintain them? What if part of the system is down? Can we still do useful work? What if instead part of the system becomes possessed and starts behaving arbitrarily all bets are off?

Global Predicate Detection and Event Ordering Our Problem To compute predicates over the state of a distributed application

Model Message passing No failures Two possible timing assumptions: 1. Synchronous System 2. Asynchronous System No upper bound on message delivery No bound on relative process speeds No centralized clock Clock Synchronization External Clock Synchronization: keeps processor clock within some maximum deviation from an external source. can exchange of info about timing events of different systems can take actions at real- deadlines synchronization within 0.1 ms Internal Clock Synchronization: keeps processor clocks within some maximum deviation from each other. can measure duration of distributed activities that start on one process and terminate on another can totally order events that occur on a distributed system

Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery Guarantee that processes stay synchronized within max min. Clock Synchronization: Take 1 Assume an upper bound max and a lower bound min on message delivery Guarantee that processes stay synchronized within max min. Problem: % of messages 5000 message run (IBM Almaden) 4.22 4.48 Time (ms) 93.17

Clock Synchronization: Take 2 No upper bound on message delivery......but lower bound min on message delivery Use out maxp to detect process failures slaves send messages to master Master averages slaves value; computes fault-tolerant average Precision: 4 maxp min Probabilistic Clock Synchronization (Cristian) Master-Slave architecture Master is connected to external source Slaves read master s clock and! adjust their own How accurately can a slave read the master s clock?

The Idea Clock accuracy depends on message roundtrip if roundtrip is small, master and slave cannot have drifted by much! Since no upper bound on message delivery, no certainty of accurate enough reading... but very accurate reading can be achieved by repeated attempts Asynchronous systems Weakest possible assumptions cfr. finite progress axiom Weak assumptions less vulnerabilities Asynchronous " slow Interesting model w.r.t. failures (ah ah ah!)

Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response c s Client-Server Processes exchange messages using Remote Procedure Call (RPC) A client requests a service by sending the server a message. The client blocks while waiting for a response c The server computes the response (possibly asking other servers) and returns it to the client s #!?%!

Deadlock! p 2 Goal Design a protocol by which a processor can determine whether a global predicate (say, deadlock) holds

Wait-For Graphs Draw arrow from p i to p j if p j has received a request but has not responded yet Wait-For Graphs Draw arrow from p i to p j if p j has received a request but has not responded yet Cycle in WFG Deadlock deadlock cycle in WFG

The protocol p 0 sends a message to... p 0 On receipt of s message, replies with its state and wait-for info p i An execution p 2 p 2

An execution p 2 p 2 An execution p 2 p 2 Ghost Deadlock!

Houston, we have a problem... Asynchronous system no centralized clock, etc. etc. Synchrony useful to coordinate actions order events Mmmmhhh... Events and Histories Processes execute sequences of events Events can be of 3 types: local, send, and receive e i p is the i-th event of process p The local history h p of process p is the sequence of events executed by process p h k p h 0 p : prefix that contains first k events : initial, empty sequence The history H is the set h p0 h p1... h pn 1 NOTE: In H, local histories are interpreted as sets, rather than sequences, of events

Ordering events Observation 1: Events in a local history are totally ordered p i Ordering events Observation 1: Events in a local history are totally ordered p i Observation 2: For every message m, send(m) precedes receive(m) p i m p j

Happened-before (Lamport[1978]) A binary relation defined over events 1. if e k i, e l i h i and k < l, then e k i el i 2. if e i = send(m) and e j = receive(m), then e i e j 3. if e e and e e then e e Space-Time diagrams A graphic representation of a distributed execution p 2 p 2

Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 Space-Time diagrams A graphic representation of a distributed execution p 2 p 2

Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 H and impose a partial order

Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 H and impose a partial order Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 H and impose a partial order

Space-Time diagrams A graphic representation of a distributed execution p 2 p 2 H and impose a partial order Runs and Consistent Runs A run is a total ordering of the events in H that is consistent with the local histories of the processors Ex: h 1, h 2,..., h n is a run A run is consistent if the total order imposed in the run is an extension of the partial order induced by A single distributed computation may correspond to several consistent runs!

Cuts A cut C is a subset of the global history of H C = h c 1 1 hc 2 2... hc n n p 2 Cuts A cut C is a subset of the global history of H The frontier of C is the set of events e c 1 1, ec 2 2,... ec n n C = h c 1 1 hc 2 2... hc n n p 2

Global states and cuts The global state of a distributed computation is an n-tuple of local states Σ = (σ 1,... σ n ) To each cut (c 1... c n ) state (σ c 1 1,... σc n n ) corresponds a global