Distributed Operating Systems

Similar documents
Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance

Distributed Systems Fault Tolerance

Today: Fault Tolerance. Fault Tolerance

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Today: Fault Tolerance. Replica Management

CSE 5306 Distributed Systems. Fault Tolerance

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Fault Tolerance Dealing with an imperfect world

Consensus and related problems

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance. Basic Concepts

Dep. Systems Requirements

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

CSE 5306 Distributed Systems

Fault Tolerance. Distributed Systems. September 2002

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

To do. Consensus and related problems. q Failure. q Raft

Replication in Distributed Systems

Chapter 8 Fault Tolerance

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Distributed Systems COMP 212. Lecture 19 Othon Michail

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Basic vs. Reliable Multicast

Failure Tolerance. Distributed Systems Santa Clara University

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Byzantine Techniques

Example: CPU-bound process that would run for 100 quanta continuously 1, 2, 4, 8, 16, 32, 64 (only 37 required for last run) Needs only 7 swaps

Distributed Systems 11. Consensus. Paul Krzyzanowski

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Multiprocessor Scheduling. Multiprocessor Scheduling

Multiprocessor Scheduling

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Exam 2 Review. Fall 2011

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Byzantine Fault Tolerance

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Current Topics in OS Research. So, what s hot?

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

Lecture XII: Replication

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Security (and finale) Dan Ports, CSEP 552

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Process Synchroniztion Mutual Exclusion & Election Algorithms

Networks and distributed computing

Paxos provides a highly available, redundant log of events

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Two Phase Commit Protocol. Distributed Systems. Remote Procedure Calls (RPC) Network & Distributed Operating Systems. Network OS.

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Arvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 20: Networks and Distributed Systems

Assignment 5. Georgia Koloniari

CSCI 4717 Computer Architecture

Chapter 8 Fault Tolerance

Primary-Backup Replication

Database Architectures

Intuitive distributed algorithms. with F#

Scheduling II. Today. Next Time. ! Proportional-share scheduling! Multilevel-feedback queue! Multiprocessor scheduling. !

Achieving Century Uptimes An Informational Series on Enterprise Computing

Announcements. Program #1. Reading. Due 2/15 at 5:00 pm. Finish scheduling Process Synchronization: Chapter 6 (8 th Ed) or Chapter 7 (6 th Ed)

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 19: Networks and Distributed Systems

Distributed Systems COMP 212. Revision 2 Othon Michail

Chapter 2 System Models

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Failures, Elections, and Raft

Swapping. Operating Systems I. Swapping. Motivation. Paging Implementation. Demand Paging. Active processes use more physical memory than system has

Recovering from a Crash. Three-Phase Commit

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Chapter 17: Distributed Systems (DS)

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

Byzantine Fault Tolerance

Scalability of web applications

Byzantine Fault Tolerant Raft

Two phase commit protocol. Two phase commit protocol. Recall: Linearizability (Strong Consistency) Consensus

Chapter 1: Distributed Information Systems

Distributed Deadlock

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

SYNCHRONIZATION AND PROCESSES

CS122 Lecture 15 Winter Term,

Multiprocessor and Real-Time Scheduling. Chapter 10

Introduction to MySQL Cluster: Architecture and Use

May Gerd Liefländer System Architecture Group Universität Karlsruhe (TH), Systemarchitektur

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Distributed Consensus Protocols

Multi-Level Feedback Queues

Lecture 1: January 23

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Transcription:

2 Distributed Operating Systems System Models, Processor Allocation, Distributed Scheduling, and Fault Tolerance Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce855 System Models Workstation model Each workstation has a processor and an owner Disks are optional» diskless: no disk low cost, easy system updates, server bottleneck» paging files: small, cheap disk for paging reduces network traffic» paging files & binary: application binaries are local further reduces network traffic, but updates become more difficult» paging files & binaries & caching: also cache file pages less network traffic and load on servers, but cache consistency problems» full local file system: each workstation has file system no need for file servers, but transparency can suffer (ala NFS, etc.) Local processes take precedence over remote Page 2 Page

System Models Processor pool model Pool of idle processors available for everyone Workstations may not even have a processor Processor Allocation Two basic allocation models:» non-migratory: once process is placed, it cannot be moved» migratory: process can move in the middle of execution must restore state at new CPU better load balancing, but more complex design 3 4 Page 3 Page 4

Processor Allocation Distributed processor management» if local processor is idle or underutilized, use it otherwise execute it remotely» resource manager must: keep track of idle processors find one when a request is received send the process to a remote computer receive results from remote process» first problem: find a CPU Finding Idle Workstations Server-driven» idle processors announce their availability» processor puts info in a global (replicated) registry» broadcast available message but all processors need to maintain the list» race conditions can occur more than one client sends work to same idle processor Client-driven» client broadcasts need for a processor heterogeneous environments need info on processing needs» idle processors respond with message to client servers can delay message in proportion to load» client chooses one 5 6 Page 5 Page 6

Graph Theory Approach Graph Theory Approach Assign process to a CPU» to minimize network traffic (interprocess communication)» processes and messages make a weighted graph nodes are processes edges are communication paths weighted by messages» total network traffic is sum of arcs intersected by partition Assign 9 processes to 3 CPUs» two different examples: 3 2 2 2 2 3 6 5 2 3 3 2 2 6 5 2 2 2 7 8 Page 7 Page 8

Graph Theory Approach Essentially looks for tightly clustered processes» place interacting processes on same machine Problem with graph approach» assumes pre-knowledge of message traffic» lots of information to process» computationally difficult to achieve Centralized Load Sharing Up-Down Algorithm Objective: fairly divide CPU cycles among users When a CPU becomes idle:» if users have processes in their waiting queue...» decide which should run next» decision based on penalty points 9 0 Page 9 Page 0

2 3 4 5 6 7 8 9 Centralized Load Sharing Up-Down Algorithm Centralized Load Sharing Up-Down Algorithm Central coordinator maintains a usage table» for each event, message sent to coordinator Table entries for each user process track penalty points (for each time unit)» process executing remotely (add points)» pending processes (subtract points) i.e. processes on ready queue» processor idle (move toward zero) Allocate idle CPU based on penalty points» process whose owner has fewest points wins» shares computing power equally Awards light process user» user (process) occupying no processor with a pending request Can visualize algorithm execution as a scale Problems with the updown algorithm» centralized control» lots of events, messages 0-2 8 6 4 2 0 0 2 3 4 5 6 7 8 9 20 2 2 Page Page 2

Algorithm rganization is a hierarchy re workers en nodes are managers l replaced by a committee shown to work (Micros) Hierarchical Algorithm Communication paths» minimized by each level talking one level up or down the hierarchy» when processor busy, send message to manager manager can then propagate the message can also summarize messages Manager nodes» manage k processor ( worker ) nodes or j manager nodes» keep track of idle processors no attempt at keeping track of down hosts therefore count is an upper bound 4 3 Page 3 Page 4

Hierarchical Algorithm Load sharing» jobs can be created at any level of the hierarchy» suppose a worker spawns a job needing n processes need to find and allocate n processes» immediate manager notified of the request manager knows of w workers available if w n, then manager reserves n processors otherwise send request to next higher manager Hierarchical Algorithm Failure of intermediate managers» superior node detects its failure» elects a subordinate of the intermediate manager to replace it must get updated information from subordinates 5 6 Page 5 Page 6

Hierarchical Algorithm Failure of a top-level manager» top level organized as a committee» if one top level manager fails, others choose a subordinate to replace it» two different methods: top-level managers do not share information used only to choose replacement for failed managers top-level managers pass summary information among themselves keep track of each member s available capacity usual problems with replicating information Distributed Algorithms Two methods:» sender-initiated: machines try to offload processes» receiver-initiated: idle processors try to find work General sender-initiated method» process is created on a workstation» if current machine is heavily loaded, find a machine that is not 7 8 Page 7 Page 8

Distributed Algorithms Distributed Algorithms Three variants of the general sender-initiated method (Eager et al.)» random migration pick a machine at random send process to that machine repeat procedure (i.e. if that machine is loaded, send to another randomly chosen machine)» random probes pick a machine at random send probes until suitable machine found try a max of N probes» probing k machines probe k machines to get their exact load send process to least loaded in the set 9 Analysis of the three algorithms» third algorithm (choose best of k) performs best i.e. load balancing, fewest process migrations, etc. but not the best overall must factor in overhead from probes gain from the algorithm too small to offset additional k probes» moral of the story: simple algorithms are preferred overhead from complex algorithms often erase gains 20 Page 9 Page 20

Distributed Algorithms Problems with sender-initiated distributed algorithms» incomplete information A sends to B, thinking B has a light load B sends to A, because in reality A s load is lighter A sends to B...» heavily loaded system - all machines overloaded therefore all machines will try to offload processes probing won t accomplish anything (can t find a lightly loaded machine) but additional overhead is incurred (when the system can least afford it) Distributed Algorithms Receiver-initiated distributed algorithm» when a process terminates, check load if load is light, look for work send probe to a machine, or k machines stop if no work found after N probes» doesn t create traffic when system is overloaded generates traffic when machines are lightly loaded but what else do the machines have to do anyway? 2 22 Page 2 Page 22

Distributed Algorithms Combining approaches» look for work when lightly loaded, offload work when heavily loaded unclear what the dynamics would be» keep histories of chronically under-loaded or overloaded machines 23 Bidding Algorithms Simulate contract bidding» processes buy CPU time» processors give cycles to highest bidder A three-step process:» a new process needing CPU time is created» a bid is constructed for the process detailing the computational environment needed CPU loading, queue & stack sizes, special I/O needs, floating point hardware, etc.» processors receive bids, chooses highest set of bidders it can comfortably accommodate must multi-cast contract to all bidders Why? 24 Page 23 Page 24

Bidding Algorithms Distributed Scheduling One node can simultaneously be a contractor and a bidder Contractors can sub-contract a process Independent scheduling» each processor has separate scheduler» problem: processes may communicate frequently if A and B aren t both simultaneously in the running state... will waste time waiting for each other to get the CPU Co-scheduling» break schedules for all processes into time slices» schedule slices in all processors simultaneously use round-robin scheduling can use broadcast messages to synchronize» put communicating groups into the same time slice» schedule all others into empty time slices 25 26 Page 25 Page 26

Fault Tolerance Overall message» although fault tolerance is one of the reasons cited in favor of distributed systems» it s really, really, hard to achieve» and not much research has been done!! Types of faults» fail-silent (or fail-stop) processor just stops or machine crashes easy to detect» Byzantine processor/process continues to run, giving incorrect data much harder to analyze and correct Fault Tolerance Fault tolerance in distributed systems» transaction processing Already discussed aborting transactions and two-phase commit» replication techniques TMR (Triple Modular Redundancy) A triplicated voter follows each stage in the circuit Majority rules Active replication (state machine approach) extension of TMR 27 28 Page 27 Page 28

Replication Techniques Replication Techniques Active replication» provide n processors and vote on output» for example: if 2 out of 3 are the same, use that result» problem: it takes lots of processors to achieve this voters must also be treated as suspect» k fault tolerant: can survive k faulty components for fail safe: k + - just use the other processor Byzantine failures: 2k +» another problem: all requests must be serviced in the same order (in voters and nodes) atomic broadcast problem note that only write operations in a file server need to be ordered Primary backup» primary does the work... sends work to backup for synchronization» general structure: if primary fails, use the backup no ordering needed requires fewer processors» but difficult to agree on when backup takes over in Byzantine failures 29 30 Page 29 Page 30

Two-Army Problem General problem» two units of army (or file servers) need to coordinate a strike on the enemy» communicate by messenger, but messenger may be captured (message lost)» no matter how many acknowledgments, can never be sure that the last message was received Byzantine Generals Problem General problem» n generals (or file servers) need to coordinate a strike on the enemy» communicate by telephone on perfect lines (reliable communication)» m generals are traitors (faulty processors) give incorrect and contradictory information» generals must exchange troop strengths each general will end up with a vector of length n containing troop strengths of each army if general i is loyal, element i is his troop strength 3 32 Page 3 Page 32

Byzantine Generals Problem One (limited) solution» each general sends a reliable message to all other generals, giving troop strength ) Each general sends their troop strength 2) Each general collects numbers in a vector 3) Each general sends vector to all other generals Byzantine Generals Example If any value has a majority, that is a true result, otherwise the value is unknown In the face of m faulty processors» can be achieved only if 2m + correctly functioning processors are present» meaning you need 3m + processors total 33 34 Page 33 Page 34