Dr. Jack Lange Computer Science Department University of Pittsburgh. Fall Distributed System Definition. A distributed system is

Similar documents
DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS

Communication. Distributed Systems Santa Clara University 2016

Interprocess Communication Tanenbaum, van Steen: Ch2 (Ch3) CoDoKi: Ch2, Ch3, Ch5

Last Class: RPCs and RMI. Today: Communication Issues

Distributed Information Processing

Chapter 4 Communication

殷亚凤. Processes. Distributed Systems [3]

Chapter 4 Communication

Communication. Overview

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Verteilte Systeme (Distributed Systems)

Overview. Communication types and role of Middleware Remote Procedure Call (RPC) Message Oriented Communication Multicasting 2/36

CHAPTER - 4 REMOTE COMMUNICATION

Distributed Systems Principles and Paradigms

Threads Chapter 5 1 Chapter 5

Chapter 4 Communication

CS454/654 Midterm Exam Fall 2004

Advanced Distributed Systems

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Parallelism. Master 1 International. Andrea G. B. Tettamanzi. Université de Nice Sophia Antipolis Département Informatique

Communication in Distributed Systems

Distributed Systems Inter-Process Communication (IPC) in distributed systems

Verteilte Systeme (Distributed Systems)

Operating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering

02 - Distributed Systems

02 - Distributed Systems

Advanced Topics in Distributed Systems. Dr. Ayman A. Abdel-Hamid. Computer Science Department Virginia Tech

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Middleware. Adapted from Alonso, Casati, Kuno, Machiraju Web Services Springer 2004

Communication. Layered Protocols. Topics to be covered. Layered Protocols. Introduction

Distributed Systems COMP 212. Lecture 15 Othon Michail

MTAT Enterprise System Integration. Lecture 2: Middleware & Web Services

Threads. CS3026 Operating Systems Lecture 06

Introduction. CS3026 Operating Systems Lecture 01

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008

CHAPTER 2. Introduction to Middleware Technologies

Remote Invocation. Today. Next time. l Overlay networks and P2P. l Request-reply, RPC, RMI

SAI/ST course Distributed Systems

For use by students enrolled in #71251 CSE430 Fall 2012 at Arizona State University. Do not use if not enrolled.

Operating Systems: Internals and Design Principles. Chapter 4 Threads Seventh Edition By William Stallings

IPC. Communication. Layered Protocols. Layered Protocols (1) Data Link Layer. Layered Protocols (2)

Distributed Systems Principles and Paradigms. Chapter 04: Communication

Communication. Distributed Systems IT332

Processes. Process Management Chapter 3. When does a process gets created? When does a process gets terminated?

Distributed Systems. Edited by. Ghada Ahmed, PhD. Fall (3rd Edition) Maarten van Steen and Tanenbaum

Communication. Outline

Distributed Systems Principles and Paradigms. Chapter 04: Communication

Today CSCI Remote Method Invocation (RMI) Distributed Objects

Dr Markus Hagenbuchner CSCI319 SIM. Distributed Systems Chapter 4 - Communication

CSci Introduction to Distributed Systems. Communication: RPC

Software Architectures

Architectural Styles I

Threads SPL/2010 SPL/20 1

Windows 7 Overview. Windows 7. Objectives. The History of Windows. CS140M Fall Lake 1

Two Phase Commit Protocol. Distributed Systems. Remote Procedure Calls (RPC) Network & Distributed Operating Systems. Network OS.

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 22: Remote Procedure Call (RPC)

Architecture of Software Intensive Systems

Q.1 Explain Computer s Basic Elements

CS Operating Systems: Threads (SGG 4)

Part V. Process Management. Sadeghi, Cubaleska RUB Course Operating System Security Memory Management and Protection

Processes and Threads. Processes and Threads. Processes (2) Processes (1)

Distributed Systems. Chapter 1: Introduction

CS 5523 Operating Systems: Thread and Implementation

Today. Architectural Styles

Chapter 3 Process Description and Control

Distributed Systems Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2016

Software Architectures

DS 2009: middleware. David Evans

Chapter 1: Distributed Information Systems

Midterm 1. Administrivia. Virtualizing Resources. CS162 Operating Systems and Systems Programming Lecture 12. Address Translation

Distributed File Systems. CS432: Distributed Systems Spring 2017

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

CHAPTER 3 - PROCESS CONCEPT

Software Architecture

7/20/2008. What Operating Systems Do Computer-System Organization

CAS 703 Software Design

Computer System Overview

DISTRIBUTED FILE SYSTEMS & NFS

Chapter 1: Introduction

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Architectures for distributed systems (Chapter 2)

Today CSCI Communication. Communication in Distributed Systems. Communication in Distributed Systems. Remote Procedure Calls (RPC)

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

Process Characteristics

ICT 6544 Distributed Systems Lecture 2: ARCHITECTURES

Applications, services. Middleware. OS2 Processes, threads, Processes, threads, communication,... communication,... Platform

Today. Architectural Styles

Distributed Systems: Architectural Issues

Chapter 4: Threads. Operating System Concepts 9 th Edition

Verteilte Systeme (Distributed Systems)

An Introduction to Software Architecture

Software Architecture

Chapter 4: Threads. Operating System Concepts 9 th Edition

Indirect Communication

9. Queued Transaction Processing

Remote Procedure Call (RPC) and Transparency

CHAPTER 4: INTERPROCESS COMMUNICATION AND COORDINATION

Transcription:

DISTRIBUTED SYSTEMS Midterm Review Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Distributed System Definition A distributed system is A collection of independent computers that appears to its users as a single coherent system 1

Definition Distributed System A distributed system organized as middleware. The middleware layer extends over multiple machines, and offers each application the same interface. Distributed System Transparency Different Forms Of Transparency In A Distributed System (ISO, 1995). 2

INTRODUCTION TYPES OF DISTRIBUTED SYSTEMS Transaction Processing Systems Transaction Primitives 3

TPS Transaction Characteristic Properties Atomic To the outside world, the transaction happens indivisibly. Consistent The transaction does not violate system invariants. Isolated Concurrent transactions do not interfere with each other. Durable Once a transaction commits, the changes are permanent. Architectural Styles An architectural style describes a particular way to configure a collection of components and connectors. Component - a module with well-defined interfaces; reusable, replaceable Connector communication link between modules Architectures suitable for distributed systems: Layered architectures Object-based architectures Data-centered architectures Event-based architectures 4

Architectural Styles Layered and Object-Based Layered Object based is less structured Component = object Connector = RPC Or RMI Object-Based Data-Centered Architectures Main purpose is data access and update Processes interact by reading and modifying data in some shared repository Repository can be active or passive Traditional data base Passive repository responds to requests Blackboard system Active repository, where clients solve problems collaboratively and system updates clients when information changes. Web-based distributed systems are largely data centric Processes communicate through shared Web-based data services 5

Architectural Styles Event- Based Communication is via event propagation Often associated with Publish/ Subscribe systems register interest in market information and receive email updates Referential Decoupling Loosely couples sender and receiver space decoupling Event-based arch. supports several communication styles: Publish-subscribe Broadcast Point-to-point Centralized vs Decentralized Architectures Vertical Distribution Traditional client-server architectures exhibit vertical distribution. Each level serves a different purpose in the system. Logically different components reside on different nodes Horizontal distribution e.g., P2P architectures Each node has roughly the same processing capabilities and stores and manages part of the total system data. Better load balancing, more resistant to denial-of-service attacks, but harder to manage than C/S Communication and control is not hierarchical, all nodes are peers with equal functionalities 6

Idempotency Retransmission, after timeout, is the typical response to lost request in connectionless communication Consider effect of re-sending a message such as Increment X by 1000 If first message was acted on, now the operation has been performed twice Idempotent operations can be performed multiple times without harm Return current value of X and Check on availability of a product are idempotent Increment X, Order Product Y are nonidempotent Two-tiered Client/Server Architectures Thin-Client Architecture Server provides processing and data management and client provides simple graphical display Perceived performance loss at client Easier to manage, more reliable, client machines don t need to be so large and powerful Fat-Client Architecture At the other extreme, all application processing and some data resides at the client Pro Reduces work load at server; more scalable Con Harder to manage, and potentially less secure 7

Multitiered Architectures Thin Client Alternative Client-server Organizations Fat Client System Architecture DECENTRALIZED ARCHITECTURE 8

Peer-to-Peer Nodes act as both client and server; interaction is symmetric Each node acts as a server for part of the total system data Overlay networks connect nodes in the P2P system Nodes in the overlay use their own addressing system for storing and retrieving data in the system Nodes can route requests to locations that may not be known by the requester. Overlay Networks ONs are logical or virtual networks, built on top of a physical network A link between two nodes in the overlay may consist of several physical links. Messages in the overlay are sent to logical addresses, not physical (IP) addresses Various approaches used to resolve logical addresses to physical. 9

Overlay Networks Each node in a P2P system knows how to contact several other nodes. The overlay network may be: Structured Nodes and content are connected according to some design that simplifies later lookups, or Unstructured Content is assigned to nodes without regard to the network topology Structured Peer-to-Peer Architectures The connections between nodes are logical connections, not necessarily physical connections Mapping of Data items onto nodes in Chord for m = 4 10

Collaborative Distributed Systems BitTorrent Clients contact a global directory (Web server) to locate a.torrent file with the information needed to locate a tracker; A tracker is a server that can supply a list of active nodes that have chunks of the desired file. Using information from the tracker, clients can download the file in chunks from multiple sites in the network. Clients must also provide file chunks to other users. PROCESSES AND THREADS 11

Max Process Address Space Stack Heap Data Name Code Segment (Text) Data Segment 1.Initialized Data 2. Zero-Initialized or Block Started by Symbol (BSS) data Heap Interpretation Program executable statements 1.Static or global data initialized with non-zero values 2.Uninitialized static or global data Categories occupy a single contiguous area. Dynamic memory allocation 0 Text Stack Segment Local variables of the scope. Function parameters Process control block (PCB) PCB state memory files accounting kernel user text data PSW IR PC CPU priority heap SP user CPU registers storage stack general purpose registers 12

Switching Between Processes An important task of the OS is to manage CPU allocation among concurrent processes When the OS takes away the CPU from a running process, the OS must perform a switch between the processes This referred to as context switch What is a context? It is also referred to as a state save and state restore What operations must take place to achieve this switch? How can the OS guarantee that the interrupted process can resume correctly? CPU Switch From Process to Process 13

Threads: Processes Sharing Memory Process == Address Space Thread == Program Counter / Stream of Instructions Two examples Three processes, each with one thread One process with three threads Process 1" Process 2" Process 3" Process 1" User space" System space" Threads" Kernel Threads" Kernel Why Use threads? Allow a single application to do many things at once Simpler programming model Less waiting Threads are faster to create or destroy No separate address space Overlap computation and I/O Could be done without threads, but it s harder Example: word processor Thread to read from keyboard Thread to format document Thread to write to disk When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes Kernel destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all 14

Implementing threads Process" Thread" Kernel Kernel Run-time system" Thread table" Process table" Process table" Thread table" ULTs Pros and Cons Advantages Thread switching does not involve the kernel no need for mode switching Fast context switch time, Threads semantics are defined by application Scheduling can be application specific Best algorithm can be selected ULTs are highly portable Only a thread library is needed Disadvantages Most system calls are blocking for processes All threads within a process will be implicitly blocked Waste of resource and decreased performance The kernel can only assign processors to processes. Two threads within the same process cannot run simultaneously on two processors 15

KLT Pros and Cons Advantages The kernel can schedule multiple threads of the same process on multiple processors Blocking at thread level, not process level If a thread blocks, the CPU can be assigned to another thread in the same process Even the kernel routines can be multithreaded Disadvantages Thread switching always involves the kernel This means two mode switches per thread switch are required KTLs switching is slower compared ULTs Still faster than a full process switch Solaris Threads bound thread Task 2 is equivalent to a pure ULT approach Traditional Unix process structure " Tasks 1 and 3 map one or more ULT s onto a fixed number of LWP s, which in turn map onto KLT s" Note how task 3 maps a single ULT to a single LWP bound to a CPU" 16

Role of Scheduler Activations Abstraction Implementation... User-level Threads... Thread Library Kernel P1 P2... Pn SA SA... SA Virtual Multiprocessor Invariant: There is one running scheduler activation (SA) for each processor assigned to the user process. Communication via Upcalls The kernel-level scheduler activation mechanism communicates with the user-level thread library by a set of upcalls Add this processor (processor #) Processor has been preempted (preempted activation #, machine state) Scheduler activation has blocked (blocked activation #) Scheduler activation has unblocked (unblocked activation #, machine state) The thread library must maintain the association between a thread s identity and thread s scheduler activation number. 17

Multithreaded Servers (1) A multithreaded server organized in a dispatcher/worker model. Servers and state Stateless servers: Never keep accurate information about the status of a client after having handled a request: Don t record whether a file has been opened (simply close it again after access) Don t promise to invalidate a client s cache Don t keep track of your clients Consequences: Clients and servers are completely independent State inconsistencies due to client or server crashes are reduced Possible loss of performance because, e.g., a server cannot anticipate client behavior (think of prefetching file blocks) 18

Servers and state Stateful servers: Keeps track of the status of its clients: Record that a file has been opened, so that pre-fetching can be done Knows which data a client has cached, and allows clients to keep local copies of shared data Observation: The performance of stateful servers can be extremely high, provided clients are allowed to keep local copies. As it turns out, reliability is not a major problem. Remote Procedure Call Explicit message passing has been typically used in early distributed systems The paradigm does not achieve access transparency send() and receive() primitives do not conceal communication from the communicating entities Alternative method to message passing Allow programs to call procedures located in other machines Simple and elegant idea, but subtle problems may exist 19

Parameter Passing Call-by-Value To the called procedure a value is just an initialized local variable Call-by-Reference The reference parameter is a pointer to a variable, rather than the value of the variable The called procedure modifies the variable in the calling procedure Call-by-Copy/Restore The variable is copied onto the stack by the caller and copied back after the call, overwriting the caller s original value Not often used in computer languages Which Mechanism to Use in RPC? RPC Information Flow Client Server Client (Caller) Server (Callee) Call Return Return Call Pack Parameters Client Stub Unpack Results Pack Results Server Stub Unpack Parameters Send Receive Send Receive Packet Handler Network Network Packet Handler Client Mbox Server Mbox 20

10/15/15 RPC COMPILATION Interface Definition IDL Compiler Client Code Client Stub Header File Server Stub Language Compiler Language Compiler Server Application Server Application Server Code Writing a Client and a Server n The steps in writing a client and a server in DCE RPC 21

10/15/15 Binding a Client to a Server n Client-to-server binding in DCE Overview of the Sockets Interface Server" Client" Socket() Socket() Bind() open_listenfd open_clientfd Connection" Request" Client" Server" Session" Listen() connect Accept() Write() Read() read() Close() write() EOF" Await connection" request from" next client" Read() Close() 22

Group and Context MPI assumes communication takes place with a group of processes (groupid, procid) Use in lieu of a transport-level address MPI ties the concepts of process group and communication context into a communicator " Group" " " Communicator Context" MPI Abstractions A process group is high-level abstraction, visible to users A context is a system-defined object that uniquely identifies a communicator Mechanism to isolate messages in distinct libraries and the user programs from one another A message sent in one context can't be received in other contexts. The communication context is low-level abstraction, not visible to users A communicator is a data object that specializes the scope of a communication. MPI_COMM_WORLD is an initial communicator, which is predefined and consists of all the processes running when program execution begins 23

Message-Queuing Model MQM, aka Message Oriented Middleware, provides support for persistent asynchronous communication MOM offers intermediate-term message storage capacity, without requiring either the sender of receiver to be active Sender Running Sender Running Sender Passive Sender Passive Receiver Running Receiver Passive Receiver Running Receiver Passive MOM Basic Semantics and Interface MOM supports loosely-coupled in time semantics Senders are given only guarantees that message will eventually be inserted in the receiver s queue No guarantee about if or when will messages be read Recipient-behavior dependent Basic interface to a queue in a message-queuing system. Primitive Put Get Poll Notify Meaning Append a message to a specified queue Block until the specified queue is nonempty, and remove the first message Check a specified queue for messages, and remove the first. Never block. Install a handler to be called when a message is put into the specified queue. 24

MQM Brokers Functionalities Message brokers act as an application-level gateway Their main purpose is to convert incoming messages to a format that can be understood by the recipient Properly match end-of-record delimiters between database applications More advanced broker are designed to handle conversations between two different database applications Difficult task, as information and semantic mapping between two different applications may not always be possible Brokers are commonly used in Enterprise Application Integration for mediation Publish/Subscribe is the typical communication model Brokers matches messages to applications interests QoS Specification and Services A flow specification model and services Characteristics of the Input Maximum data unit size (bytes) Token bucket rate (bytes/sec) Token bucket size (bytes) Maximum transmission rate (bytes/sec) Service Required Loss sensitivity (bytes) Loss interval (µsec) Burst loss sensitivity (data units) Minimum delay noticed (µsec) Maximum delay variation (µsec) Quality of guarantee 25

Specifying QoS Token Bucket The principle of a token bucket algorithm. 26