DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Similar documents
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Distributed Systems. Chapter 1: Introduction

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

CA464 Distributed Programming

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms

CSE 5306 Distributed Systems. Course Introduction

Introduction. Distributed Systems IT332

Chapter 1: Introduction 1/29

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Chapter 1 Introduction

Introduction to Distributed Systems (DS)

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Introduction Distributed Systems

Advanced Distributed Systems

Introduction to Distributed Systems (DS)

Distributed Information Processing

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Introduction to Distributed Systems

Concepts of Distributed Systems 2006/2007

Introduction to Distributed Systems

02 - Distributed Systems

02 - Distributed Systems

Distributed Systems LEEC (2006/07 2º Sem.)

TDP3471 Distributed and Parallel Computing

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

Distributed Systems. Prof. Dr. Schahram Dustdar Distributed Systems Group Vienna University of Technology. dsg.tuwien.ac.

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Introduction to Distributed Systems

Introduction to Distributed Systems

Client Server & Distributed System. A Basic Introduction

Distributed Systems. Prof. Dr. Schahram Dustdar Distributed Systems Group Vienna University of Technology. dsg.tuwien.ac.

Assignment 5. Georgia Koloniari

Introduction to Distributed Systems

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

System Models for Distributed Systems

What is a distributed system?

Distributed File Systems. CS432: Distributed Systems Spring 2017

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Outline. Distributed Computing Systems. The Rise of Distributed Systems. Depiction of a Distributed System 4/15/2014

Introduction to Distributed Systems

System models for distributed systems

Designing Issues For Distributed Computing System: An Empirical View

Introduction to Distributed Systems

Chapter 17: Distributed Systems (DS)

Distributed Systems Principles and Paradigms. Chapter 12: Distributed Web-Based Systems

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

Distributed System: Definition

Introduction to Distributed Systems

Verteilte Systeme (Distributed Systems)

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 2 ARCHITECTURES

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Module 1 - Distributed System Architectures & Models

Basic vs. Reliable Multicast

Datacenter replication solution with quasardb

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

It also performs many parallelization operations like, data loading and query processing.

Distributed Systems Principles and Paradigms

Chapter 11 DISTRIBUTED FILE SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Distributed Information Processing

Distributed Systems COMP 212. Revision 2 Othon Michail

Telecommunication Services Engineering Lab. Roch H. Glitho

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

THE IMPACT OF E-COMMERCE ON DEVELOPING A COURSE IN OPERATING SYSTEMS: AN INTERPRETIVE STUDY

3C05 - Advanced Software Engineering Thursday, April 29, 2004

Consistency & Replication in Distributed Systems

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Overview. Distributed Systems. Distributed Software Architecture Using Middleware. Components of a system are not always held on the same host

Introduction to Peer-to-Peer Systems

Chapter 2 System Models

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Distributed Systems Course. a.o. Univ.-Prof. Dr. Harald Kosch

Consistency & Replication

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9b: Distributed File Systems INTRODUCTION. Transparency: Flexibility: Slide 1. Slide 3.

Multiprocessors 2007/2008

Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds 1

Lecture 23 Database System Architectures

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

DS 2009: middleware. David Evans

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Distributed Systems Development

Agenda. What is Replication?

Distributed Systems (5DV147)

Distributed Systems Principles and Paradigms. Chapter 07: Consistency & Replication

Consistency in Distributed Systems

Distributed Computing Course Instructor: Dr. Safdar Ali

Distributed Systems Principles and Paradigms

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

Chapter 6 Synchronization (2)

Distributed Systems INF Michael Welzl

Distributed Systems Principles and Paradigms

Transcription:

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan

Definition of a Distributed System (1) A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. Autonomous computing elements, also referred to as nodes, be the hardware devices or software processes. Single coherent system: users or applications perceive a single system nodes need to collaborate.

Collection of autonomous nodes Independent behavior Each node is autonomous and will thus have its own notion of time: there is no global clock. Leads to fundamental synchronization and coordination problems. Collection of nodes How to manage group membership? How to know that you are indeed communicating with an authorized(non)member?

Overlay network Organization Each node in the collection communicates only with other nodes in the system, its neighbors. The set of neighbors may be dynamic, or may even be known only implicitly (i.e., requires a lookup). Overlay types: Well-known example of overlay networks: peer-to-peer systems. Structured: each node has a well-defined set of neighbors with whom it can communicate (tree, ring). Unstructured: each node has references to randomly selected other nodes from the system.

Essence Coherent system The collection of nodes as a whole operates the same, no matter where, when, and how interaction between a user and the system takes place. Examples An end user cannot tell where a computation is taking place Where data is exactly stored should be irrelevant to an application If or not data has been replicated is completely hidden Keyword is distribution transparency The snag: partial failures It is inevitable that at any time only a part of the distributed system fails. Hiding partial failures and their recovery is often very difficult and in general impossible to hide.. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 3e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5

Middleware: OS of Distributed Systems The middleware layer extends over multiple machines, and offers each application the same interface. What does it contain? Commonly used components and functions that need not be implemented by applications separately.

What do we want to achieve? 1. Sharing resources 2. Transparency 3. Openness 4. Scalability

Transparency in a Distributed System Figure 1-2. Different forms of transparency in a distributed system (ISO, 1995).

Degree of transparency Aiming at full distribution transparency may be too much: There are communication latencies that cannot be hidden Completely hiding failures of networks and nodes is (theoretically and practically) impossible You cannot distinguish a slow computer from a failing one You can never be sure that a server actually performed an operation before a crash Full transparency will cost performance, exposing distribution of the system Keeping replicas exactly up-to-date with the master takes time Immediately flushing write operations to disk for fault tolerance Conclusion: distribution transparency is a nice a goal, but achieving it is a different story, and it should often not even be aimed at.

Openness of distributed systems Open distributed system: Be able to interact with services from other open systems, irrespective of the underlying environment: osystems should conform to well-defined interfaces osystems should support portability of applications osystems should easily interoperate osystems should be extensible Achieving openness: At least make the distributed system independent from heterogeneity of the underlying environment: ohardware oplatforms olanguages

Degrees of scalability 1. Size 2. Geographic 3. Administrative

Scalability Problems Figure 1-3. Examples of scalability limitations.

Characteristics of decentralized algorithms: No machine has complete information about the system state. Machines make decisions based only on local information. Failure of one machine does not ruin the algorithm. There is no implicit assumption that a global clock exists.

Problems with geographical scalability Cannot simply go from LAN to WAN: many distributed systems assume synchronous client-server interactions: client sends request and waits for an answer. Latency may easily prohibit this scheme. WAN links are often inherently unreliable: simply moving streaming video from LAN to WAN is bound to fail. Lack of multipoint communication, so that a simple search broadcast cannot be deployed. Solution is to develop separate naming and directory services (having their own scalability problems).

Problems with administrative scalability Conflicting policies concerning usage (and thus payment), management, and security. We do not trust others. When multiple domains, security is an issue. How to trust the other system. How they manage my data and who gets access to it. Example when multiple cloud organizations are interacting to introduce multiple services to the client. Another example, sensor networks which collect data and send it to a central point.

Scaling techniques Hide communication latencies Make use of asynchronous communication (interrupt) Have separate handler for incoming response (multithreading) Problem: not every application fits this model (interactive) Partition data and computations across multiple machines Move computations to clients (Java applets) Decentralized naming services (DNS) Decentralized information systems (WWW) Replication and caching: Make copies of data available at different machines Replicated file servers and databases Web caches (in browsers and proxies) File caching (at server and client)

Scaling Techniques (1) Figure 1-4. The difference between letting (a) a server or (b) a client check forms as they are being filled.

Scaling Techniques (2) Figure 1-5. An example of dividing the DNS name space into zones.

Scaling Techniques (3) Replication: Replicated file servers and databases, Mirrored Web sites o o o o Performance enhancement Fault tolerance Load balancing Geographic scalability Cache is a replication example. Problems of replication: consistency, cost tradeoff. Strong consistency: needs global synchronization. The global synch may hinder replication for large systems.

Pitfalls when Developing Distributed Systems False assumptions made by first time developer: The network is reliable. The network is secure. The network is homogeneous. The topology does not change. Latency is zero. Bandwidth is infinite. Transport cost is zero. There is one administrator.

Types of Distributed Systems 1. High performance distributed computing systems a) Cluster systems b) Grid systems c) Cloud Computing 2. Distributed information systems a) Transactions processing b) Integrated enterprise applications 3. Distributed pervasive systems a) Electronic health care systems b) Sensor networks

Parallel computing High-performance distributed computing started with parallel computing. Multi processor: shared memory Multi computer Distributed shared memory

Cluster Computing Systems Figure 1-6. An example of a cluster computing system.

Grid Computing Systems Figure 1-7. A layered architecture for grid computing systems.

Cloud Computing

Is cloud computing cost-effective? Outsource the entire infrastructure. hardware and software. More than just providing high performance computing. Is outsourcing also cheaper?

Is cloud computing cost-effective? There is objective function that must contain costs and benefits. Also, a set of constraints.

Transaction Processing Systems (1) Figure 1-8. Example primitives for transactions.

Transaction Processing Systems (2) Characteristic properties of transactions (ACID): Atomic: To the outside world, the transaction happens indivisibly. Consistent: The transaction does not violate system invariants. Isolated: Concurrent transactions do not interfere with each other. Durable: Once a transaction commits, the changes are permanent.

Transaction Processing Systems (3) Figure 1-9. A nested transaction.

Transaction Processing Systems (4) Figure 1-10. The role of a TP monitor in distributed systems.

Enterprise Application Integration Middleware offers communication facilities for integration Remote Procedure Call (RPC): Requests are sent through local procedure call, packaged as message, processed, responded through message, and result returned as return from call. Message Oriented Middleware (MOM): Messages are sent to logical contact point (published), and forwarded to subscribed applications.

How to integrate applications File transfer: Technically simple, but not flexible: Figure out file format and layout Figure out file management Update propagation, and update notifications. Shared database: Much more flexible, but still requires common data scheme next to risk of bottleneck. Remote procedure call: Effective when execution of a series of actions is needed. Messaging: RPCs require caller and callee to be up and running at the same time. Messaging allows decoupling in time and space.

Distributed Pervasive Systems Requirements for pervasive systems Embrace contextual changes: changes in the networks is always expected. Encourage ad hoc composition. Recognize sharing as the default. Three (overlapping) subtypes Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user. Mobile computing systems: pervasive, but emphasis is on the fact that devices are inherently mobile. Sensor (and actuator) networks: pervasive, with emphasis on the actual (collaborative) sensing and actuation of the environment.

Sensor Networks (1) Questions concerning sensor networks: How do we (dynamically) set up an efficient tree in a sensor network? How does aggregation of results take place? Can it be controlled? What happens when network links fail?

Sensor Networks (2) Figure 1-13. Organizing a sensor network database, while storing and processing data (a) only at the operator s site or

Sensor Networks (3) Figure 1-13. Organizing a sensor network database, while storing and processing data or (b) only at the sensors.

Duty-cycled networks Many sensor networks need to operate on a strict energy budget: introduce duty cycles A node is active during T active time units, and then suspended for T suspended units, to become active again. Duty cycle = T active /(T active +T suspended ) Typical duty cycles are 10 30%, but can also be lower than 1%

Keeping duty-cycled networks in sync If duty cycles are low, sensor nodes may not wake up at the same time anymore and become permanently disconnected: they are active during different, non-overlapping time slots. Solution Each node A adopts a cluster ID C A, being a number. Let a node send a join message during its suspended period. When A receives a join message from B and C A <C B, it sends a join message to its neighbors (in cluster C A ) before joining B. When C A >C B it sends a join message to B during B s active period. Note Once a join message reaches a whole cluster, merging two clusters is very fast. Merging means: re-adjust clocks.