Distributed Algorithms Models

Size: px
Start display at page:

Download "Distributed Algorithms Models"

Transcription

1 Distributed Algorithms Models Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

2 Contents 1 Taxonomy Client-server Multi-tier Cluster computing Cloud computing Peer-to-peer systems 2 Modeling Distributed Systems Computation Interaction Failures Time

3 Taxonomy Taxonomy of Distributed Systems Architectures: Client-server Multi-tier Clusters Cloud computing Peer-to-Peer Sensor networks See companion course Alberto Montresor (UniTN) DS - Models 2016/04/26 1 / 35

4 Taxonomy Client-server Client-server The easiest form of distributed systems Resources are centralized on servers Large number of clients access them through request-reply interactions Alberto Montresor (UniTN) DS - Models 2016/04/26 2 / 35

5 Taxonomy Client-server Client-server: problem examples Reliable message delivery TCP/IP: Guarantee the delivery of message in FIFO order Resource lease DHCP: Lend limited resources for a predefined period of time Remote procedure call Allow invocation of procedures/methods/functions on remote objects RPC ( 60) CORBA ( 90) Java RMI,.Net WCF JSON-RPC, XML-RPC Google Protocol Buffers, Apache Thrift, Apache Avro, Twitter Finagle Alberto Montresor (UniTN) DS - Models 2016/04/26 3 / 35

6 Taxonomy Multi-tier Multi-tier Alberto Montresor (UniTN) DS - Models 2016/04/26 4 / 35

7 Taxonomy Multi-tier Multi-tier: problem examples Total order broadcast Processes may not only to agree on which actions they should execute... But also in the order in which they are executed Example Initial state: Process A: c = 1, Process B: c = 1 Process A: [c c 3] [c c + 1] Process B: [c c + 1] [c c 3] Inconsistency! Alberto Montresor (UniTN) DS - Models 2016/04/26 5 / 35

8 Taxonomy Cluster computing Cluster computing A group of high-end systems connected through a fast LAN Homogeneous: same OS, near-identical hardware Single managing node Example: Mosix/OpenMosix Alberto Montresor (UniTN) DS - Models 2016/04/26 6 / 35

9 Taxonomy Cluster computing Cluster computing: problem examples Load balancing Different nodes may be subject to different computational load Possible techniques for load balancing: Assign new tasks to under-loaded nodes Migrate tasks from overloaded nodes to underloaded nodes Message passing / synchronization PVM, the Parallel Virtual Machine provides a run-time environment for message-passing, task and resource management, and fault notification MPI, the Message Passing Interface a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers Alberto Montresor (UniTN) DS - Models 2016/04/26 7 / 35

10 Taxonomy Cloud computing Cloud computing Informal definition Cloud computing is a general term that describes a new class of network-based computing taking place over the Internet (utility computing) A collection/group of integrated and networked hardware, software and Internet infrastructure (called a platform). Using the Internet for communication and transport provides hardware, software and networking services to clients. These platforms hide the complexity and details of the underlying infrastructure from users and applications by providing graphical interfaces or API Alberto Montresor (UniTN) DS - Models 2016/04/26 8 / 35

11 Taxonomy Cloud computing Different cloud computing layers Software as a service (SaaS) Platform as a service (PaaS) Infrastructure as a service (IaaS) Alberto Montresor (UniTN) DS - Models 2016/04/26 9 / 35

12 Taxonomy Cloud computing An example: Amazon Compute Elastic Compute Cloud (EC2) Elastic MapReduce Auto Scaling Content Delivery CloudFront Database DynamoDB Relational DB Service (RDS) E-Commerce Fulfillment Web Service (FWS) Messaging Simple Queue Service (SQS) Simple Notification Service (SNS) Monitoring CloudWatch Networking Virtual Private Cloud (VPC) Elastic Load Balancing Payments & Billing Flexible Payments Service (FPS) DevPay Storage Simple Storage Service (S3) Elastic Block Storage (EBS) AWS Import/Export Alberto Montresor (UniTN) DS - Models 2016/04/26 10 / 35

13 Taxonomy Peer-to-peer systems Peer-to-peer Definition A peer-to-peer system is a collection of peer nodes Each peer is both a server and a client ( servent ) Provides resources to other peers Consumes resources from other peers Characteristics: Put together resources at the edge of the Internet Share resources by direct exchange between nodes Perform critical functions in a decentralized manner Alberto Montresor (UniTN) DS - Models 2016/04/26 11 / 35

14 Taxonomy Peer-to-peer systems Overlay networks Overlay TCP/IP Network Alberto Montresor (UniTN) DS - Models 2016/04/26 12 / 35

15 Taxonomy Peer-to-peer systems Peer-to-peer systems: problem examples P2P key-value stores A peer-to-peer service that offers an associative Map interface: put(key k, Value v): associate a value v to the key k Value get(key k): returns the value associated to key k (Distributed) Hash Tables: Hash tables map keys to memory locations Distributed hash tables map keys to nodes Organization: Each node is responsible for a portion of the key space Messages are routed between nodes to reach responsible nodes Replication used to tolerate failures Alberto Montresor (UniTN) DS - Models 2016/04/26 13 / 35

16 Taxonomy Peer-to-peer systems Routing in DHTs x put(9, x ) get(9) Alberto Montresor (UniTN) DS - Models 2016/04/26 14 / 35

17 Modeling Distributed Systems Contents Modeling distributed systems Computation: Processes, deterministic vs probabilistic behavior Interaction: Processes interact through messages, which result in: Communication, i.e. information flow Coordination, i.e. synchronization and ordering of activities Failures: Which kind of failures can occur? Benign vs malicious (Byzantine) Process vs communication Time: Determining whether we can make any assumption on time bounds on communication and computation speeds. Alberto Montresor (UniTN) DS - Models 2016/04/26 15 / 35

18 Modeling Distributed Systems Computation Computation Process: the unit of computation in a distributed system. Sometimes we may call it node, host, etc. Process set: denoted by Π, it is composed by a collection of n uniquely identified processes, like p 1, p 2,..., p n. Typical assumptions: The set is static (n is well-defined); Processes do know each other All processes run a copy of the same algorithm; the sum of all these copies constitutes the distributed algorithm But in extreme distributed systems: Dynamic set Too many, too dynamic to know them all Multiple algorithms Alberto Montresor (UniTN) DS - Models 2016/04/26 16 / 35

19 Modeling Distributed Systems Computation Deterministic vs probabilistic Deterministic process: the local computation and the messages sent by a process is determined by the current state and the messages previously received. Probabilistic process: processes may make used of random oracles to choose the local computation to be performed or the next message to be sent. Alberto Montresor (UniTN) DS - Models 2016/04/26 17 / 35

20 Modeling Distributed Systems Interaction Interaction Processes communicate through messages send(m, p): sends a message m to p receive(m): receives a messages m In some cases, messages may be uniquely identified by Sender of the message A sequence number local to the sender General assumption: every pair of processes is connected by a bi-directional communication channel Through routing Not true for P2P systems Alberto Montresor (UniTN) DS - Models 2016/04/26 18 / 35

21 DS - Models Modeling Distributed Systems Interaction Interaction Interaction Processes communicate through messages send(m, p): sends a message m to p receive(m): receives a messages m In some cases, messages may be uniquely identified by Sender of the message A sequence number local to the sender General assumption: every pair of processes is connected by a bi-directional communication channel Through routing Not true for P2P systems In the receive operation, we do not specify the original sender; can be Fully connected topology may be obtained through routing. For example, consider the following architectures: Fully connected mesh broadcast medium (Ethernet, wireless) Ring Internet with routers

22 Modeling Distributed Systems Failures Process failures In a distributed systems, both processes and communication channels may fail, i.e. depart from what is considered its correct behavior. Hadzilacos and Toueg provide a taxonomy. Benign process failures Fail-stop: A process stops executing events, and other processes may detect this fact. Crash: A process stops executing events Malicious process failures Arbitrary failure, or Byzantine: any type of error may occur. This may be caused by: A software bug A malicious behavior inspired by an intelligent adversary Alberto Montresor (UniTN) DS - Models 2016/04/26 19 / 35

23 Modeling Distributed Systems Failures Process failures A process that never fails is correct A process that eventually fails is faulty Several protocols are designed to work correctly if the number of failures f is bounded (for example, f < n/3). In some models, processes may perform a recovery action: After some time, a process may resume functioning It suffers amnesia: the local state maintained in volatile memory is lost To limit the effects of amnesia, a log can be maintained Alberto Montresor (UniTN) DS - Models 2016/04/26 20 / 35

24 DS - Models Modeling Distributed Systems Failures Process failures Process failures A process that never fails is correct A process that eventually fails is faulty Several protocols are designed to work correctly if the number of failures f is bounded (for example, f < n/3). In some models, processes may perform a recovery action: After some time, a process may resume functioning It suffers amnesia: the local state maintained in volatile memory is lost To limit the effects of amnesia, a log can be maintained To avoid the problem of amnesia completely, every read/write would have to pass through permanent memory; too expensive

25 Modeling Distributed Systems Failures Communication failures Benign communication failures Process p performs send of a message m to process q Message m is inserted in a local outgoing buffer of p (Send-omission) Message m is transmitted from p to q (Omission) Message m is inserted in a local incoming buffer of q (Receive-omission) Process q performs receive of m Malign communication failures Messages created out of nothing, duplicated messages, etc. These problems can easily be solved through encryption techniques. Alberto Montresor (UniTN) DS - Models 2016/04/26 21 / 35

26 Modeling Distributed Systems Failures Communication failures Possible causes of message failures: Buffer overflow in the operating system Congestion, routing errors in routers Partitioning: Processes are subdivided in disjoint sets called partitions Communication inside a partition is possible Communication between partitions is not possible When a partition disappears, we say that partitions merge Alberto Montresor (UniTN) DS - Models 2016/04/26 22 / 35

27 Modeling Distributed Systems Failures Modeling (faulty) communication channels The idea: the channels cannot systematically drop a specific message. This is the minimum abstraction needed to create reliable channels. Fair-Loss Channels Validity Fair Loss: If a message m is sent infinitely often by a process p to a process q and neither p and q crash, then q will receive m infinitely often Integrity Finite Duplication: If a message m is sent a finite number of times by a process p to a process q, then m cannot be received by q an infinite number of times Integrity No creation: If a message m is delivered by some process p, then m was previously sent by some process q to p Alberto Montresor (UniTN) DS - Models 2016/04/26 23 / 35

28 Modeling Distributed Systems Failures Modeling (correct) communication channels The idea: channels are reliable, messages are never lost. It can be implemented, but there is a price to be payed: asynchrony. Perfect Channels Validity Reliable delivery: If p sends a message to q, and neither of p and q crash, then q will eventually receive m Integrity No duplication: No message is delivered to a process more than once Integrity No creation: If a message m is delivered by some process p, then m was previously sent by some process q to p Alberto Montresor (UniTN) DS - Models 2016/04/26 24 / 35

29 Modeling Distributed Systems Failures An Example Algorithm Fair-loss Channel Perfect Channel upon init do Set sent Set delivered starttimer(timeout) upon timeout do foreach (m, q) sent do fairlosssend(m, q) starttimer(timeout) upon perfectsend(m, q) do fairlosssend(m, q) sent sent {(m, q)} upon fairlossreceive(m, q) do if m / delivered then delivered delivered {m} perfectreceive(m, q) Alberto Montresor (UniTN) DS - Models 2016/04/26 25 / 35

30 Modeling Distributed Systems Failures Safety and liveness Safety Something bad will never happen In other words, a distributed program should never enter an unacceptable state. No message is delivered to a process more than once. Liveness Something good eventually does happen In other words, a distributed program eventually enters a desirable state. If p sends a message to q, and neither of p and q crash, then eventually q will receive m. Alberto Montresor (UniTN) DS - Models 2016/04/26 26 / 35

31 Modeling Distributed Systems Time Time Global clock For presentation simplicity, it may be convenient to assume the presence of a global real-time clock, outside the control of processes. This can be used to provide a global ordering of steps in a distributed systems In reality: Each process is associated with a local clock Local clocks may not report the perfect time Clock drift rate: refers to the relative amount that a computer clock differs from a perfect reference clock. Synchronization is possible, but expensive: Atomic clocks GPS See: Google TrueTime API: Alberto Montresor (UniTN) DS - Models 2016/04/26 27 / 35

32 DS - Models Modeling Distributed Systems Time Time Time Global clock For presentation simplicity, it may be convenient to assume the presence of a global real-time clock, outside the control of processes. This can be used to provide a global ordering of steps in a distributed systems In reality: Each process is associated with a local clock Local clocks may not report the perfect time Clock drift rate: refers to the relative amount that a computer clock differs from a perfect reference clock. Synchronization is possible, but expensive: Atomic clocks GPS See: Google TrueTime API: GPS does not work into buildings Atomic clocks: cost not justified

33 Modeling Distributed Systems Time Time measures associated to communication Latency: The delay between the start of message sending from one process and the beginning of its receipt by another. Possible causes: the actual time for bit transmission (e.g., satellite link) the delay for accessing the network, especially in case of congestion the time taken by the operating system to handle the message both at sender and receiver Bandwidth: Total amount of information that can be transmitted over a communication channel in a given time. Jitter: Variation in the time taken to deliver a series of messages. Mostly related with multimedia data. Alberto Montresor (UniTN) DS - Models 2016/04/26 28 / 35

34 Modeling Distributed Systems Time Asynchronous vs synchronous Distributed Systems vs Time Distributed systems make difficult to reason about time, not only for lack of clock synchronization. It is also difficult to pose time bounds on events and communication. We may think about several different models: Asynchronous distributed systems Synchronous distributed systems Partially synchronous distributed systems Alberto Montresor (UniTN) DS - Models 2016/04/26 29 / 35

35 DS - Models Modeling Distributed Systems Time Asynchronous vs synchronous Asynchronous vs synchronous Distributed Systems vs Time Distributed systems make difficult to reason about time, not only for lack of clock synchronization. It is also difficult to pose time bounds on events and communication. We may think about several different models: Asynchronous distributed systems Synchronous distributed systems Partially synchronous distributed systems Asynchronous distributed systems No assumptions can be made. Most of the problems cannot be solved Synchronous distributed systems Precise assumptions are possible on computation, communication time and clocks. Not really realistic / difficult to implement Partially synchronous distributed systems Some assumptions can be made, others not, OR Assumptions can be made statistically, OR Assumptions hold for arbitrarily long periods of time

36 Modeling Distributed Systems Time Asynchronous vs synchronous Asynchronous distributed system There are no bounds on the relative speed of process execution. There are no bounds on message transmission delays. There are no bounds on clock drift. OR, since we cannot count on their precision at all, there are no clocks. Alberto Montresor (UniTN) DS - Models 2016/04/26 30 / 35

37 Modeling Distributed Systems Time Asynchronous vs synchronous Comments These are not assumptions! These are lack of assumptions! The worst possible model: services as simple as: failure detection time-based coordination are not possible Advantages: simple semantics easier to port to more powerful models More realistic: several sources of asynchrony are present in a large-scale network (like the Internet) Alberto Montresor (UniTN) DS - Models 2016/04/26 31 / 35

38 Modeling Distributed Systems Time Asynchronous vs synchronous Synchronous Distributed Systems Synchronous computation: There is a known upper bound on the relative speed of process execution. Synchronous communication: There is a known upper bound on message transmission delays. Synchronous clocks: Processes are equipped with local clocks. There is a known upper bound on the drift rates of local clocks with respect to a global real-time clock. Alberto Montresor (UniTN) DS - Models 2016/04/26 32 / 35

39 Modeling Distributed Systems Time Asynchronous vs synchronous Comments The best possible model. Can be built, but not with standard hardware/software. Synchronous Ethernet vs CSMA/CD Ethernet Real-time OS vs normal OS Many interesting properties: Timed failure detection (e.g., ping) Coordination based on time (e.g., lease) Worst-case performance analysis Synchronized clocks Alberto Montresor (UniTN) DS - Models 2016/04/26 33 / 35

40 Modeling Distributed Systems Time Asynchronous vs synchronous Partial synchrony For most systems we know of, it is relatively easy to define physical time bounds that are respected most of the time. There are however periods where the timing assumptions do not hold. Delays on processes: Machines may run out of memory, slowing down processes A typical case of no bound on relative speeds of processes Delays on messages: Network may congested, and messages may be dropped. Re-transmission protocols can ensure reliability, but at the price of asynchrony Messages may be re-transmitted an arbitrary number of times. Alberto Montresor (UniTN) DS - Models 2016/04/26 34 / 35

41 DS - Models Modeling Distributed Systems Time Asynchronous vs synchronous Asynchronous vs synchronous Partial synchrony For most systems we know of, it is relatively easy to define physical time bounds that are respected most of the time. There are however periods where the timing assumptions do not hold. Delays on processes: Machines may run out of memory, slowing down processes A typical case of no bound on relative speeds of processes Delays on messages: Network may congested, and messages may be dropped. Re-transmission protocols can ensure reliability, but at the price of asynchrony Messages may be re-transmitted an arbitrary number of times. In this sense, practical systems are partially synchronous

42 Modeling Distributed Systems Time Asynchronous vs synchronous How to express partial synchrony? A possibility is the following: Timing assumptions only hold eventually. Theoretically, it means: There is a time after which the system is synchronous forever The system is initially asynchronous and only after a long time becomes synchronous How to read it: The system is not always synchronous There is no known bound to the period in which it is asynchronous We expect that there are periods during which the system is synchronous Some of these periods are long enough to terminate protocol execution Alberto Montresor (UniTN) DS - Models 2016/04/26 35 / 35

43 Reading Material V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. In S. Mullender, editor, Distributed Systems (2 nd ed.). Addison-Wesley,

Distributed Systems 2 Introduction

Distributed Systems 2 Introduction Distributed Systems 2 Introduction Alberto Montresor University of Trento, Italy 2018/09/13 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Getting

More information

Distributed Algorithms Reliable Broadcast

Distributed Algorithms Reliable Broadcast Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents

More information

System Models for Distributed Systems

System Models for Distributed Systems System Models for Distributed Systems INF5040/9040 Autumn 2015 Lecturer: Amir Taherkordi (ifi/uio) August 31, 2015 Outline 1. Introduction 2. Physical Models 4. Fundamental Models 2 INF5040 1 System Models

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information

Distributed algorithms

Distributed algorithms Distributed algorithms Prof R. Guerraoui lpdwww.epfl.ch Exam: Written Reference: Book - Springer Verlag http://lpd.epfl.ch/site/education/da - Introduction to Reliable (and Secure) Distributed Programming

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Fundamentals Fall 2013 1 basics 2 basics Single process int i; i=i+1; 1 CPU - Steps are strictly sequential - Program behavior & variables state determined by sequence of operations

More information

Chapter 2 System Models

Chapter 2 System Models CSF661 Distributed Systems 分散式系統 Chapter 2 System Models 吳俊興國立高雄大學資訊工程學系 Chapter 2 System Models 2.1 Introduction 2.2 Physical models 2.3 Architectural models 2.4 Fundamental models 2.5 Summary 2 A physical

More information

2. System Models Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems. Chapter 2 System Models

2. System Models Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems. Chapter 2 System Models 2. System Models Page 1 University of Freiburg, Germany Department of Computer Science Distributed Systems Chapter 2 System Models Christian Schindelhauer 27. April 2012 2. System Models 2.1. Introduction

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud?

DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing WHAT IS CLOUD COMPUTING? 2. Slide 3. Slide 1. Why is it called Cloud? DISTRIBUTED SYSTEMS [COMP9243] Lecture 8a: Cloud Computing Slide 1 Slide 3 ➀ What is Cloud Computing? ➁ X as a Service ➂ Key Challenges ➃ Developing for the Cloud Why is it called Cloud? services provided

More information

System models for distributed systems

System models for distributed systems System models for distributed systems INF5040/9040 autumn 2010 lecturer: Frank Eliassen INF5040 H2010, Frank Eliassen 1 System models Purpose illustrate/describe common properties and design choices for

More information

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

Announcements.  me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris Announcements Email me your survey: See the Announcements page Today Conceptual overview of distributed systems System models Reading Today: Chapter 2 of Coulouris Next topic: client-side processing (HTML,

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Introduction. Distributed Systems IT332

Introduction. Distributed Systems IT332 Introduction Distributed Systems IT332 2 Outline Definition of A Distributed System Goals of Distributed Systems Types of Distributed Systems 3 Definition of A Distributed System A distributed systems

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India (AWS) Overview: AWS is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy various types of application in the cloud.

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Failure Detectors Slides by: Steve Ko Computer Sciences and Engineering University at Buffalo Administrivia Programming Assignment 2 is out Please continue to monitor Piazza

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University L22.1 Frequently asked questions from the previous

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [DISTRIBUTED COORDINATION/MUTUAL EXCLUSION] Shrideep Pallickara Computer Science Colorado State University

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati

Distributed Algorithms. Partha Sarathi Mandal Department of Mathematics IIT Guwahati Distributed Algorithms Partha Sarathi Mandal Department of Mathematics IIT Guwahati Thanks to Dr. Sukumar Ghosh for the slides Distributed Algorithms Distributed algorithms for various graph theoretic

More information

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013 Chapter 1: Distributed Systems: What is a distributed system? Fall 2013 Course Goals and Content n Distributed systems and their: n Basic concepts n Main issues, problems, and solutions n Structured and

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Introduction Version: October 25, 2009 2 / 26 Contents Chapter

More information

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How. Recap Best Practices Distributed Systems Failure Detectors Steve Ko Computer Sciences and Engineering University at Buffalo 2 Today s Question Two Different System Models How do we handle failures? Cannot

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition. Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Version: February 21, 2011 1 / 26 Contents Chapter 01: 02: Architectures

More information

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies Lecture 8 Cloud Programming & Software Environments: High Performance Computing & AWS Services Part 2 of 2 Spring 2015 A Specialty Course

More information

Architekturen für die Cloud

Architekturen für die Cloud Architekturen für die Cloud Eberhard Wolff Architecture & Technology Manager adesso AG 08.06.11 What is Cloud? National Institute for Standards and Technology (NIST) Definition On-demand self-service >

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Introduction to Cloud Computing A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction

More information

Amazon Web Services Training. Training Topics:

Amazon Web Services Training. Training Topics: Amazon Web Services Training Training Topics: SECTION1: INTRODUCTION TO CLOUD COMPUTING A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction to Cloud Computing

More information

Fundamental Interaction Model

Fundamental Interaction Model Fundamental Interaction Model Synchronous distributed system 8 time to execute each step of computation within a process has known lower and upper bounds 8 message delivery times are bounded to a known

More information

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

At Course Completion Prepares you as per certification requirements for AWS Developer Associate. [AWS-DAW]: AWS Cloud Developer Associate Workshop Length Delivery Method : 4 days : Instructor-led (Classroom) At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

More information

LINUX, WINDOWS(MCSE),

LINUX, WINDOWS(MCSE), Virtualization Foundation Evolution of Virtualization Virtualization Basics Virtualization Types (Type1 & Type2) Virtualization Demo (VMware ESXi, Citrix Xenserver, Hyper-V, KVM) Cloud Computing Foundation

More information

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Enroll Now to Take online Course   Contact: Demo video By Chandra sir Enroll Now to Take online Course www.vlrtraining.in/register-for-aws Contact:9059868766 9985269518 Demo video By Chandra sir www.youtube.com/watch?v=8pu1who2j_k Chandra sir Class 01 https://www.youtube.com/watch?v=fccgwstm-cc

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Architecture of distributed systems

Architecture of distributed systems Prof. Dr. Claudia Müller-Birn Institute for Computer Science, Networked Information Systems Architecture of distributed systems Oct 25, 2011 Netzprogrammierung (Algorithmen und Programmierung V) Our topics

More information

Architecture of distributed systems

Architecture of distributed systems Prof. Dr. Claudia Müller-Birn Institute for Computer Science, Networked Information Systems Architecture of distributed systems Oct 25, 2011 Netzprogrammierung (Algorithmen und Programmierung V) Our topics

More information

CA464 Distributed Programming

CA464 Distributed Programming 1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook

More information

Introduction to Distributed Systems

Introduction to Distributed Systems Introduction to Distributed Systems Other matters: review of the Bakery Algorithm: why can t we simply keep track of the last ticket taken and the next ticvket to be called? Ref: [Coulouris&al Ch 1, 2]

More information

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General Areas for Discussion System Models 2 Joseph Spring School of Computer Science MCOM0083 - Distributed Systems and Security Lecture - System Models 2 1 Architectural Models Software Layers System Architecture

More information

TDP3471 Distributed and Parallel Computing

TDP3471 Distributed and Parallel Computing TDP3471 Distributed and Parallel Computing Lecture 1 Dr. Ian Chai ianchai@mmu.edu.my FIT Building: Room BR1024 Office : 03-8312-5379 Schedule for Dr. Ian (including consultation hours) available at http://pesona.mmu.edu.my/~ianchai/schedule.pdf

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 01 (version September 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.

More information

Distributed Algorithms Benoît Garbinato

Distributed Algorithms Benoît Garbinato Distributed Algorithms Benoît Garbinato 1 Distributed systems networks distributed As long as there were no machines, programming was no problem networks distributed at all; when we had a few weak computers,

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing.

Cloud Computing 4/17/2016. Outline. Cloud Computing. Centralized versus Distributed Computing Some people argue that Cloud Computing. Cloud Computing. Cloud Computing By: Muhammad Naseem Assistant Professor Department of Computer Engineering, Sir Syed University of Engineering & Technology, Web: http://sites.google.com/site/muhammadnaseem105 Email: mnaseem105@yahoo.com

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement 12.1 Introduction 12.2 Distributed Mutual Exclusion 12.4 Multicast Communication 12.3 Elections 12.5 Consensus and Related Problems AIM: Coordination and/or Agreement Collection

More information

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD

ActiveNET. #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD ActiveNET #202, Manjeera Plaza, Opp: Aditya Park Inn, Ameerpetet HYD-500018 9848111288 activesurya@ @gmail.com wwww.activenetinformatics.com y Suryanaray yana By AWS Course Content 1. Introduction to Cloud

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Definition of a Distributed System (1) A distributed system is: A collection of

More information

Certificate of Registration

Certificate of Registration Certificate of Registration THIS IS TO CERTIFY THAT 2001 8th Ave, Seattle, WA 98121 USA operates AWS using IaaS model (Amazon CloudFront, Amazon Elastic Block Store (EBS), Amazon Elastic Compute Cloud

More information

Designing Fault-Tolerant Applications

Designing Fault-Tolerant Applications Designing Fault-Tolerant Applications Miles Ward Enterprise Solutions Architect Building Fault-Tolerant Applications on AWS White paper published last year Sharing best practices We d like to hear your

More information

What is a distributed system?

What is a distributed system? CS 378 Intro to Distributed Computing Lorenzo Alvisi Harish Rajamani What is a distributed system? A distributed system is one in which the failure of a computer you didn t even know existed can render

More information

Cloud Computing /AWS Course Content

Cloud Computing /AWS Course Content Cloud Computing /AWS Course Content 1. Amazon VPC What is Amazon VPC? How to Get Started with Amazon VPC Create New VPC Launch an instance (Server) to use this VPC Security in Your VPC Networking in Your

More information

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How CSE 486/586 Distributed Systems Failure Detectors Today s Question I have a feeling that something went wrong Steve Ko Computer Sciences and Engineering University at Buffalo zzz You ll learn new terminologies,

More information

Dependable Computer Systems

Dependable Computer Systems Dependable Computer Systems Part 6b: System Aspects Contents Synchronous vs. Asynchronous Systems Consensus Fault-tolerance by self-stabilization Examples Time-Triggered Ethernet (FT Clock Synchronization)

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Chapter 1: Introduction 1/29

Chapter 1: Introduction 1/29 Chapter 1: Introduction 1/29 What is a Distributed System? A distributed system is a collection of independent computers that appears to its users as a single coherent system. 2/29 Characteristics of a

More information

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech

Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Cloud Computing DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Managing servers isn t for everyone What are some prohibitive issues? (we touched on these last time) Cost (initial/operational)

More information

Amazon Web Services (AWS) Training Course Content

Amazon Web Services (AWS) Training Course Content Amazon Web Services (AWS) Training Course Content SECTION 1: CLOUD COMPUTING INTRODUCTION History of Cloud Computing Concept of Client Server Computing Distributed Computing and it s Challenges What is

More information

Coordination and Agreement

Coordination and Agreement Coordination and Agreement 1 Introduction 2 Distributed Mutual Exclusion 3 Multicast Communication 4 Elections 5 Consensus and Related Problems AIM: Coordination and/or Agreement Collection of algorithms

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

PrepAwayExam.   High-efficient Exam Materials are the best high pass-rate Exam Dumps PrepAwayExam http://www.prepawayexam.com/ High-efficient Exam Materials are the best high pass-rate Exam Dumps Exam : SAA-C01 Title : AWS Certified Solutions Architect - Associate (Released February 2018)

More information

Providing Real-Time and Fault Tolerance for CORBA Applications

Providing Real-Time and Fault Tolerance for CORBA Applications Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

Chapter 1: Distributed Information Systems

Chapter 1: Distributed Information Systems Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier

More information

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018 416 Distributed Systems Networks review; Day 1 of 2 Jan 5 + 8, 2018 1 Distributed Systems vs. Networks Low level (c/go) Run forever Support others Adversarial environment Distributed & concurrent Resources

More information

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

AWS Administration. Suggested Pre-requisites Basic IT Knowledge Course Description Amazon Web Services Administration (AWS Administration) course starts your Cloud Journey. If you are planning to learn Cloud Computing and Amazon Web Services in particular, then this

More information

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio)

Introduction to Distributed Systems. INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) Introduction to Distributed Systems INF5040/9040 Autumn 2018 Lecturer: Eli Gjørven (ifi/uio) August 28, 2018 Outline Definition of a distributed system Goals of a distributed system Implications of distributed

More information

Distributed Systems: Models and Design

Distributed Systems: Models and Design Distributed Systems: Models and Design Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Architectural Models 2. Interaction Model 3. Design Challenges 4. Case Study: Design of a Client-Server

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

CSE 5306 Distributed Systems. Course Introduction

CSE 5306 Distributed Systems. Course Introduction CSE 5306 Distributed Systems Course Introduction 1 Instructor and TA Dr. Donggang Liu @ CSE Web: http://ranger.uta.edu/~dliu Email: dliu@uta.edu Phone: 817-2720741 Office: ERB 555 Office hours: Tus/Ths

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2

Gustavo Alonso, ETH Zürich. Web services: Concepts, Architectures and Applications - Chapter 1 2 Chapter 1: Distributed Information Systems Gustavo Alonso Computer Science Department Swiss Federal Institute of Technology (ETHZ) alonso@inf.ethz.ch http://www.iks.inf.ethz.ch/ Contents - Chapter 1 Design

More information

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? 2. What are different application domains of distributed systems? Explain. 3. Discuss the different

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Training on Amazon AWS Cloud Computing. Course Content

Training on Amazon AWS Cloud Computing. Course Content Training on Amazon AWS Cloud Computing Course Content 15 Amazon Web Services (AWS) Cloud Computing 1) Introduction to cloud computing Introduction to Cloud Computing Why Cloud Computing? Benefits of Cloud

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie

More information

Ruminations on Domain-Based Reliable Broadcast

Ruminations on Domain-Based Reliable Broadcast Ruminations on Domain-Based Reliable Broadcast Svend Frølund Fernando Pedone Hewlett-Packard Laboratories Palo Alto, CA 94304, USA Abstract A distributed system is no longer confined to a single administrative

More information

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 2015 Exam 1 Review Paul Krzyzanowski Rutgers University Fall 2016 1 Question 1 Why did the use of reference counting for remote objects prove to be impractical? Explain. It s not fault

More information

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems Distributed Architectures & Microservices CS 475, Spring 2018 Concurrent & Distributed Systems GFS Architecture GFS Summary Limitations: Master is a huge bottleneck Recovery of master is slow Lots of success

More information

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input

Initial Assumptions. Modern Distributed Computing. Network Topology. Initial Input Initial Assumptions Modern Distributed Computing Theory and Applications Ioannis Chatzigiannakis Sapienza University of Rome Lecture 4 Tuesday, March 6, 03 Exercises correspond to problems studied during

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

Introduction to Distributed Systems

Introduction to Distributed Systems Introduction to Distributed Systems Minsoo Ryu Department of Computer Science and Engineering 2 Definition A distributed system is a collection of independent computers that appears to its users as a single

More information