Distributed Systems. Chapter 1: Introduction

Similar documents
Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

CA464 Distributed Programming

Distributed Systems Principles and Paradigms

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Chapter 1: Introduction 1/29

Introduction. Distributed Systems IT332

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Advanced Distributed Systems

CSE 5306 Distributed Systems. Course Introduction

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Introduction Distributed Systems

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Distributed Information Processing

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Client Server & Distributed System. A Basic Introduction

Concepts of Distributed Systems 2006/2007

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Distributed Operating Systems Fall Prashant Shenoy UMass Computer Science. CS677: Distributed OS

Introduction to Distributed Systems (DS)

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

TDP3471 Distributed and Parallel Computing

Distributed Operating Systems Spring Prashant Shenoy UMass Computer Science.

02 - Distributed Systems

Introduction to Distributed Systems

Introduction to Distributed Systems

Introduction to Distributed Systems

Distributed and Operating Systems Spring Prashant Shenoy UMass Computer Science.

Introduction to Distributed Systems (DS)

02 - Distributed Systems

Chapter 1 Introduction

Distributed Systems. Prof. Dr. Schahram Dustdar Distributed Systems Group Vienna University of Technology. dsg.tuwien.ac.

Distributed Systems LEEC (2006/07 2º Sem.)

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed System: Definition

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Distributed Systems. Prof. Dr. Schahram Dustdar Distributed Systems Group Vienna University of Technology. dsg.tuwien.ac.

Chapter 20: Database System Architectures

Lecture 23 Database System Architectures

Designing Issues For Distributed Computing System: An Empirical View

Lecture 1: January 22

Chapter 17: Distributed Systems (DS)

Consistency & Replication

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Outline. Distributed Computing Systems. The Rise of Distributed Systems. Depiction of a Distributed System 4/15/2014

CS4513 Distributed Computer Systems

Lecture 1: January 23

Assignment 5. Georgia Koloniari

Introduction. Chapter 1

What is a distributed system?

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Distributed Computing. Santa Clara University 2016

Synchronization. Chapter 5

Telecommunication Services Engineering Lab. Roch H. Glitho

Datacenter replication solution with quasardb

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

System Models for Distributed Systems

Synchronization. Clock Synchronization

Support for resource sharing Openness Concurrency Scalability Fault tolerance (reliability) Transparence. CS550: Advanced Operating Systems 2

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

Distributed Algorithms Models

RELIABILITY & AVAILABILITY IN THE CLOUD

Chapter 18 Distributed Systems and Web Services

Multiprocessors 2007/2008

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed OS and Algorithms

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

2.1 What are distributed systems? What are systems? Different kind of systems How to distribute systems? 2.2 Communication concepts

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Chapter 2 System Models

Chapter 1: Distributed Information Systems

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

System models for distributed systems

Multimedia Database Architecture!

Distributed Systems Principles and Paradigms

Introduction to Peer-to-Peer Systems

Introduction CHAPTER. Practice Exercises. 1.1 What are the three main purposes of an operating system? Answer: The three main puropses are:

Chapter 17: Distributed-File Systems. Operating System Concepts 8 th Edition,

Module 17: Distributed-File Systems

Introduction to Distributed Systems

CSE 5306 Distributed Systems. Consistency and Replication

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

CSE 5306 Distributed Systems

Crossbar switch. Chapter 2: Concepts and Architectures. Traditional Computer Architecture. Computer System Architectures. Flynn Architectures (2)

Verteilte Systeme (Distributed Systems)

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Introduction to Distributed Systems

Chapter 18: Parallel Databases

Replication and Consistency. Fall 2010 Jussi Kangasharju

Chapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP

Research Faculty Summit Systems Fueling future disruptions

Distributed Computing Environment (DCE)

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

CS655: Advanced Topics in Distributed Systems [Fall 2013] Dept. Of Computer Science, Colorado State University

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Transcription:

Distributed Systems (3rd Edition) Chapter 1: Introduction Version: February 25, 2017

2/56 Introduction: What is a distributed system? Distributed System Definition A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. Characteristic features 1. Autonomous computing elements, also referred to as nodes, be they hardware devices or software processes. 2. Single coherent system: users or applications perceive a single system nodes need to collaborate.

3/56 Introduction: What is a distributed system? Distributed System Characteristic 1: Collection of autonomous computing elements Title Each node is autonomous and will thus have its own notion of time: there is no global clock. Leads to fundamental synchronization and coordination problems. Collection of nodes 1. How to manage group membership? 2. How to know that you are indeed communicating with an authorized (non)member?

4/56 Introduction: What is a distributed system? Organization Characteristic 1: Collection of autonomous computing elements Overlay network Each node in the collection communicates only with other nodes in the system, its neighbors. The set of neighbors may be dynamic, or may even be known only implicitly (i.e., requires a look up). Overlay network Well-known example of overlay networks: peer-to-peer systems. Structured: each node has a well-defined set of neighbors with whom it can communicate (tree, ring). Unstructured: each node has references to randomly selected other nodes from the system.

5/56 Introduction: What is a distributed system? Coherent system Characteristic 2: Single coherent system Essence The collection of nodes as a whole operates the same, no matter where, when, and how interaction between a user and the system takes place. Examples 1. An end user cannot tell where a computation is taking place 2. Where data is exactly stored should be irrelevant to an application 3. If or not data has been replicated is completely hidden Keyword is distribution transparency The snag: partial failures It is inevitable that at any time only a part of the distributed system fails. Hiding partial failures and their recovery is often very difficult and in general impossible to hide.

6/56 Introduction: What is a distributed system? Middleware: the OS of distributed systems Middleware and distributed systems What does it contain? Commonly used components and functions that need not be implemented by applications separately.

7/56 Introduction: Design goals What do we want to achieve? 1. Support sharing of resources 2. Distribution transparency 3. Openness 4. Scalability

8/56 Introduction: Design goals Collection of autonomous nodes Supporting resource sharing Canonical examples 1. Cloud-based shared storage and files 2. Peer-to-peer assisted multimedia streaming 3. Shared mail services (think of outsourced mail systems) 4. Shared Web hosting (think of content distribution networks) Observation The network is the computer (quote from John Gage, then at Sun Microsystems)

Types of distribution transparency 9/56 Introduction: Design goals Distribution transparency Making distribution transparent Types Transparency Access Location Relocation Migration Replication Concurrency Failure Description Hide differences in data representation and how an object is accessed Hide where an object is located Hide that an object may be moved to another location while in use Hide that an object may move to another location Hide that an object is replicated Hide that an object may be shared by several independent users Hide the failure and recovery of an object

Degree of distribution transparency 10/56 Introduction: Design goals Degree of transparency Making distribution transparent Observation Aiming at full distribution transparency may be too much: 1. There are communication latencies that cannot be hidden\ 2. Completely hiding failures of networks and nodes is (theoretically and practically)impossible a) You cannot distinguish a slow computer from a failing one b) You can never be sure that a server actually performed an operation before a crash 3. Full transparency will cost performance, exposing distribution of the system a) Keeping replicas exactly up-to-date with the master takes time b) Immediately flushing write operations to disk for fault tolerance

Interoperability, composability, and extensibility 12/56 Introduction: Design goals Degree of transparency Making distribution transparent Exposing distribution may be good 1. Making use of location-based services (finding your nearby friends) 2. When dealing with users in different time zones 3. When it makes it easier for a user to understand what s going on (when e.g., a server does not respond for a long time, report it as failing). Conclusion Distribution transparency is a nice a goal, but achieving it is a different story, and it should often not even be aimed at.

Interoperability, composability, and extensibility 12/56 Introduction: Design goals Openness of distributed systems Being open What are we talking about? Be able to interact with services from other open systems, irrespective of the underlying environment: 1. Systems should conform to well-defined interfaces 2. Systems should easily interoperate 3. Systems should support portability of applications 4. Systems should be easily extensible

Separating policy from mechanism 13/56 Introduction: Design goals Policies versus mechanisms Being open Observation 1. What level of consistency do we require for client-cached data? 2. Which operations do we allow downloaded code to perform? 3. Which QoS requirements do we adjust in the face of varying bandwidth? 4. What level of secrecy do we require for communication? Implementing openness: mechanisms 1. Allow (dynamic) setting of caching policies 2. Support different levels of trust for mobile code 3. Provide adjustable QoS parameters per data stream 4. Offer different encryption algorithms

Separating policy from mechanism 14/56 Introduction: Design goals On strict separation Being open Observation The stricter the separation between policy and mechanism, the more we need to make ensure proper mechanisms, potentially leading to many configuration parameters and complex management. Finding a balance Hard coding policies often simplifies management and reduces complexity at the price of less flexibility. There is no obvious solution.

Scalability dimensions 15/56 Introduction: Design goals Scale in distributed systems Being scalable Observation Many developers of modern distributed systems easily use the adjective scalable without making clear why their system actually scales. At least three components 1. Number of users and/or processes (size scalability) 2. Maximum distance between nodes (geographical scalability) 3. Number of administrative domains (administrative scalability) Observation Most systems account only, to a certain extent, for size scalability. Often a solution: multiple powerful servers operating independently in parallel. Today, the challenge still lies in geographical and administrative scalability.

Scalability dimensions 16/56 Introduction: Design goals Size scalability Being scalable Root causes for scalability problems with centralized solutions 1. The computational capacity, limited by the CPUs 2. The storage capacity, including the transfer rate between CPUs and disks 3. The network between the user and the centralized service

Scalability dimensions 20/56 Introduction: Design goals Problems with geographical scalability Being scalable 1. Cannot simply go from LAN to WAN: many distributed systems assume synchronous client-server interactions: client sends request and waits for an answer. Latency may easily prohibit this scheme. 2. WAN links are often inherently unreliable: simply moving streaming video from LAN to WAN is bound to fail. 3. Lack of multipoint communication, so that a simple search broadcast cannot be deployed. Solution is to develop separate naming and directory services(having their own scalability problems).

Scalability dimensions 21/56 Introduction: Design goals Scale in distributed systems Being scalable Essence Conflicting policies concerning usage (and thus payment), management, and security Examples 1. Computational grids: share expensive resources between different domains. 2. Shared equipment: how to control, manage, and use a shared radio telescope constructed as large-scale shared sensor network? Exception: several peer-to-peer networks 1. File-sharing systems (based, e.g., on BitTorrent) 2. Peer-to-peer telephony (Skype) 3. Peer-assisted audio streaming (Spotify) Note: end users collaborate and not administrative entities.

Scaling techniques 22/56 Introduction: Design goals Techniques for scaling Being scalable Hide communication latencies 1. Make use of asynchronous communication 2. Have separate handler for incoming response 3. Problem: not every application fits this model

Scaling techniques 23/56 Introduction: Design goals Techniques for scaling Being scalable Facilitate solution by moving computations to client

Scaling techniques 24/56 Introduction: Design goals Techniques for scaling Being scalable Partition data and computations across multiple machines 1. Move computations to clients (Java applets) 2. Decentralized naming services (DNS) 3. Decentralized information systems (WWW)

Scaling techniques 25/56 Introduction: Design goals Techniques for scaling Being scalable Replication and caching: Make copies of data available at different machines 1. Replicated file servers and databases 2. Mirrored Web sites 3. Web caches (in browsers and proxies) 4. File caching (at server and client)

Scaling techniques 26/56 Introduction: Design goals Scaling: The problem with replication Being scalable Applying replication is easy, except for one thing 1. Having multiple copies (cached or replicated), leads to in consistencies: modifying one copy makes that copy different from the rest. 2. Always keeping copies consistent and in a general way requires global synchronization on each modification. 3. Global synchronization precludes large-scale solutions.

Scaling techniques 26/56 Introduction: Design goals Scaling: The problem with replication Being scalable Applying replication is easy, except for one thing 1. Having multiple copies (cached or replicated), leads to in consistencies: modifying one copy makes that copy different from the rest. 2. Always keeping copies consistent and in a general way requires global synchronization on each modification. 3. Global synchronization precludes large-scale solutions. Observation If we can tolerate inconsistencies, we may reduce the need for global synchronization, but tolerating inconsistencies is application dependent.

27/56 Introduction: Design goals Developing distributed systems: Pitfalls Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions 1. The network is reliable 2. The network is secure 3. The network is homogeneous 4. The topology does not change 5. Latency is zero 6. Bandwidth is infinite 7. Transport cost is zero 8. There is one administrator

28/56 Introduction: Types of distributed systems Three types of distributed systems 1. High performance distributed computing systems 2. Distributed information systems 3. Distributed systems for pervasive computing

29/56 Introduction: Types of distributed systems Parallel computing High performance distributed computing Observation High-performance distributed computing started with parallel computing Multiprocessor and multicore versus multicomputer

30/56 Introduction: Types of distributed systems Distributed shared memory systems High performance distributed computing Observation Multi processors are relatively easy to program in comparison to multi computers, yet have problems when increasing the number of processors (or cores).solution: Try to implement a shared-memory model on top of a multi computer. Example through virtual-memory techniques Map all main-memory pages (from different processors) into one single virtual address space. If process at processor A addresses a page P located at processor B, the OS at A traps and fetches P from B, just as it would if P had been located on local disk. Problem Performance of distributed shared memory could never compete with that of multiprocessors, and failed to meet the expectations of programmers. It has been widely abandoned by now.

Cluster computing 31/56 Introduction: Types of distributed systems Cluster computing High performance distributed computing Essentially a group of high-end systems connected through a LAN 1. Homogeneous: same OS, near-identical hardware 2. Single managing node

Grid computing 32/56 Introduction: Types of distributed systems Grid computing High performance distributed computing The next step: lots of nodes from everywhere 1. Heterogeneous 2. Dispersed across several organizations 3. Can easily span a wide-area network Note To allow for collaborations, grids generally use virtual organizations. In essence, this is a grouping of users (or better: their IDs) that will allow for authorization on resource allocation.

Grid computing 33/56 Introduction: Types of distributed systems Architecture for grid computing High performance distributed computing The layers Fabric: Provides interfaces to local resources (for querying state and capabilities, locking, etc.) Connectivity: Communication/transaction protocols, e.g., for moving data between resources. Also various authentication protocols. Resource: Manages a single resource, such as creating processes or reading data. Collective: Handles access to multiple resources: discovery, scheduling, replication. Application: Contains actual grid applications in a single organization.

Cloud computing 34/56 Introduction: Types of distributed systems Cloud computing High performance distributed computing

Cloud computing 35/56 Introduction: Types of distributed systems Cloud computing High performance distributed computing Make a distinction between four layers 1. Hardware: Processors, routers, power and cooling systems. Customers normally never get to see these. 2. Infrastructure: Deploys virtualization techniques. Evolves around allocating and managing virtual storage devices and virtual servers. 3. Platform: Provides higher-level abstractions for storage and such. Example: Amazon S3 storage system offers an API for (locally created) files to be organized and stored in so-calledbuckets. 4. Application: Actual applications, such as office suites (text processors, spreadsheet applications, presentation applications). Comparable to the suite of apps shipped with OSes.

42/56 Introduction: Types of distributed systems Integrating applications Distributed information systems Situation Organizations confronted with many networked applications, but achieving interoperability was painful. Basic approach A networked application is one that runs on a server making its services available to remote clients. Simple integration: clients combine requests for (different) applications; send that off; collect responses, and present a coherent result to the user. Next step Allow direct application-to-application communication, leading to Enterprise Application Integration.

Introduction: Types of distributed systems Example EAI: (nested) transactions Distributed information systems Transaction Primitive BEGIN TRANSACTION END TRANSACTION ABORT TRANSACTION READ WRITE Description Mark the start of a transaction Terminate the transaction and try to commit Kill the transaction and restore the old values Read data from a file, a table, or otherwise Write data to a file, a table, or otherwise Issue: all-or-nothing 1. Atomic: happens indivisibly (seemingly) 2. Consistent: does not violate system invariants 3. Isolated: not mutual interference 4. Durable: commit means changes are permanent Distributed transaction processing 43/56

Introduction: Types of distributed systems TPM: Transaction Processing Monitor Distributed information systems Observation In many cases, the data involved in a transaction is distributed across several servers. A TP Monitoris responsible for coordinating the execution of a transaction. Distributed transaction processing 44/56

Introduction: Types of distributed systems Middleware and EAI Distributed information systems Middleware offers communication facilities for integration Remote Procedure Call (RPC):Requests are sent through local procedure call, packaged as message, processed, responded through message, and result returned as return from call. Message Oriented Middleware (MOM):Messages are sent to logical contact point (published), and forwarded to subscribed applications. Enterprise application integration 45/56

Introduction: Types of distributed systems Middleware and EAI Distributed information systems File transfer: Technically simple, but not flexible: 1. Figure out file format and layout 2. Figure out file management 3. Update propagation, and update notifications. Shared database: Much more flexible, but still requires common data scheme next to risk of bottleneck Remote procedure call: Effective when execution of a series of actions is needed. Messaging: RPCs require caller and callee to be up and running at the same time. Messaging allows decoupling in time and space. Enterprise application integration 46/56

47/56 Introduction: Types of distributed systems Distributed pervasive systems Pervasive systems Middleware offers communication facilities for integration Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user s environment. Three (overlapping) subtypes 1. Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user. 2. Mobile computing systems: pervasive, but emphasis is on the fact that devices are inherently mobile. 3. Sensor (and actuator) networks: pervasive, with emphasis on the actual (collaborative) sensing and actuation of the environment.

Ubiquitous computing systems 48/56 Introduction: Types of distributed systems Ubiquitous systems Pervasive systems Middleware offers communication facilities for integration 1. (Distribution) Devices are networked, distributed, and accessible in a transparent manner 2. (Interaction) Interaction between users and devices is highly unobtrusive 3. (Context awareness) The system is aware of a user s context in order to optimize interaction 4. (Autonomy) Devices operate autonomously without human intervention, and are thus highly self-managed 5. (Intelligence) The system as a whole can handle a wide range of dynamic actions and interactions

Introduction: Types of distributed systems Mobile computing Pervasive systems Distinctive features 1. A myriad of different mobile devices (smartphones, tablets, GPS devices, remote controls, active badges. 2. Mobile implies that a device s location is expected to change over time change of local services, reachability, etc. Keyword: discovery. 3. Communication may become more difficult: no stable route, but also perhaps no guaranteed connectivity disruption-tolerant networking. Mobile computing systems 49/56

Introduction: Types of distributed systems Mobility patterns Pervasive systems Issue What is the relationship between information dissemination and human mobility? Basic idea: an encounter allows for the exchange of information A successful strategy 1. Alice s world consists of friends and strangers. 2. If Alice wants to get a message to Bob: hand it out to all her friends 3. Friend passes message to Bob at first encounter Issue This strategy works because (apparently) there are relatively closed communities of friends. Mobile computing systems 50/56

Introduction: Types of distributed systems How mobile are people? Pervasive systems Experimental results Tracing 100,000 cell-phone users during six months leads to: Moreover: people tend to return to the same place after 24, 48, or 72 hours we re not that mobile. Mobile computing systems 52/56

Introduction: Types of distributed systems Sensor networks Pervasive systems Characteristics The nodes to which sensors are attached are: 1. Many (10s-1000s) 2. Simple (small memory/compute/communication capacity) 3. Often battery-powered (or even battery-less) Sensor networks 53/56

Sensor networks 54/56 Introduction: Types of distributed systems Sensor networks as distributed databases Pervasive systems Two extremes