Distributed Systems COMP 212. Revision 2 Othon Michail

Similar documents
Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 17 Othon Michail

Synchronization. Chapter 5

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization. Clock Synchronization

Distributed Systems

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Exam 2 Review. Fall 2011

Dep. Systems Requirements

Fault Tolerance. Distributed Systems IT332

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Chapter 8 Fault Tolerance

T ransaction Management 4/23/2018 1

Distributed Systems 24. Fault Tolerance

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Failure Tolerance. Distributed Systems Santa Clara University

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Today: Fault Tolerance. Replica Management

CS5412: TRANSACTIONS (I)

Fault Tolerance. Distributed Systems. September 2002

CSE 5306 Distributed Systems. Synchronization

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

(Pessimistic) Timestamp Ordering

Replication in Distributed Systems

CS 425 / ECE 428 Distributed Systems Fall 2017

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

CS122 Lecture 15 Winter Term,

Distributed Transaction Management. Distributed Database System

Synchronization (contd.)

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

416 practice questions (PQs)

CS October 2017

Chapter 22. Transaction Management

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Transaction Management. Pearson Education Limited 1995, 2005

Transactions. A Banking Example

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Today: Fault Tolerance. Fault Tolerance

Part III Transactions

Consistency & Replication

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Transaction Management 2003

Overview of Transaction Management

Distributed System. Gang Wu. Spring,2018

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Exam 2 Review. October 29, Paul Krzyzanowski 1

Transaction Management & Concurrency Control. CS 377: Database Systems

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

Module 8 Fault Tolerance CS655! 8-1!

CSE 5306 Distributed Systems. Consistency and Replication

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014

Database Architectures

Database Architectures

Distributed Systems 23. Fault Tolerance

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Integrity in Distributed Databases

Transactions and ACID

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Operating Systems. Distributed Synchronization

Distributed Systems 11. Consensus. Paul Krzyzanowski

Today: Fault Tolerance

CSE 5306 Distributed Systems

Distributed Systems (ICE 601) Transactions & Concurrency Control - Part1

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Concurrency Control & Recovery

Lecture X: Transactions

CLOUD-SCALE FILE SYSTEMS

Consistency in Distributed Systems

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

Silberschatz and Galvin Chapter 18

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Paxos provides a highly available, redundant log of events

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Control. CS432: Distributed Systems Spring 2017

Distributed Systems (5DV147)

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Problem: if one process cannot perform its operation, it cannot notify the. Thus in practise better schemes are needed.

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Lecture 10: Clocks and Time

Google File System. Arun Sundaram Operating Systems

Causal Consistency and Two-Phase Commit

Building Consistent Transactions with Inconsistent Replication

Transaction Management

Database Recovery. Dr. Bassam Hammo

Some Examples of Conflicts. Transactional Concurrency Control. Serializable Schedules. Transactions: ACID Properties. Isolation and Serializability

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Replication and Consistency. Fall 2010 Jussi Kangasharju

TSW Reliability and Fault Tolerance

Distributed Systems 8L for Part IB

Transcription:

Distributed Systems COMP 212 Revision 2 Othon Michail

Synchronisation 2/55

How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55

How would Lamport s algorithm synchronise the clocks in the following scenario? 4/55

Imagine that each machine in a Distributed System has its own internal extremely accurate clock and all clocks are identical. In order to achieve clock synchronisation, we synchronise all clocks initially, once and for all. Is this a sufficient solution and why? 5/55

Imagine that each machine in a Distributed System has its own internal extremely accurate clock and all clocks are identical. In order to achieve clock synchronisation, we synchronise all clocks initially, once and for all. Is this a sufficient solution and why? No Even if clocks on all computers in a DS are set to the same time, due to clock skew, their clocks will eventually vary quite significantly unless corrections are applied, and this holds for all types of clocks 6/55

Imagine that we are using Cristian s algorithm to synchronise clocks in a Distributed System. 1. Describe Cristian s algorithm. 2. If the time-server B responds to a client A with a time TB less than the current time on A s clock, is it ok for A to set its clock immediately to TB? 7/55

Imagine that we are using Cristian s algorithm to synchronise clocks in a Distributed System. 1. Describe Cristian s algorithm. 2. If the time-server B responds to a client A with a time TB less than the current time on A s clock, is it ok for A to set its clock immediately to TB and why? 1. Next slide 2. No: Time should never go backwards as this could lead to serious local inconsistencies (e.g. file system; new versions of files having smaller timestamps than old versions) Instead, the change should be implemented gradually by delaying the local clock until B s clock catches it up 8/55

Clock Sync. Algorithm: Cristian's 1. Every computer periodically asks the time server for the current time 2. The server responds ASAP with the current time C UTC 3. The client sets its clock to C UTC 9/55

Problems Major problem: if time from time server is less than the client resulting in time running backwards on the client! (Which cannot happen time does not go backwards). Introduce changes gradually Minor problem: results from the delay introduced by the network request/response: latency Best estimate (T 1 -T 0 )/2 If the interrupt handling time, I, is known, (T 1 -T 0 - I)/2 Use series of measurements 10/55

???????????? Fill in all the missing messages transmitted by the Berkeley clock synchronisation algorithm in this setting and the new values of the 3 clocks after synchronisation 11/55

Berkeley Algorithm An algorithm for internal synchronisation of a group of computers A master polls to collect clock values from the others (slaves) The master uses round trip times to estimate the slaves clock values It takes an average It sends the required adjustment to the slaves (better than sending the time which depends on the round trip time) If master fails, can elect a new master to take over 12/55

The Berkeley Clock Sync. Algorithm Clocks that are running fast, are slowed down Clocks running slow, jump forward 13/55

Transactions 14/55

What are the 2 main functionalities that transactions offer? 15/55

Transactions 1. Protect a shared resource against simultaneous access by concurrent processes This can be also achieved by mutual exclusion algorithms 2. Allow a process to access and modify multiple data in a single atomic operation Benefit: when half-success is not acceptable, everything can be restored as it never occurred 16/55

Explain the ACID (standing for Atomic, Consistent, Isolated, and Durable) characteristics that must be satisfied by a transaction 17/55

ACID The four key transaction characteristics Transactions are: Atomic: The transaction is considered to be one thing, even though it may be made of up many different parts Consistent: Invariants that held before the transaction must also hold after its successful execution Isolated: If multiple transactions run at the same time, they must not interfere with each other. To the system, it should look like the two (or more) transactions are executed sequentially (i.e., that they are serializable). Durable: Once a transaction commits, any changes are permanent 18/55

Explain what we mean when we say that a transaction is nested. Mention a possible disadvantage of this type of transaction. 19/55

Explain what we mean when we say that a transaction is nested. Mention a possible disadvantage of this type of transaction. Nested Transactions: a main, parent transaction spawns child sub-transactions to do the real work Disadvantage: problems result when a subtransaction commits and then the parent aborts the main transaction. Things get messy but still manageable. Which characteristic of transactions is violated in this case? 20/55

Explain what a private workspace and a writeahead log are and why they are useful for transactions. 21/55

Explain what a private workspace and a writeahead log are and why they are useful for transactions. Private Workspace: Until the transaction either commits or aborts, all of the reads and writes go to the private workspace. The original data are available to other processes during the transaction. Writeahead log: Files are modified in place, but a record is written to a log prior to that. Only changes the file, after the log has been written successfully If the transaction aborts, the log can be used to rollback to the original state Both are useful techniques for undoing changes in case of an abort 22/55

Mutual Exclusion 23/55

Using an example, demonstrate how a deadlock can arise in transaction processing 24/55

Using an example, demonstrate how a deadlock can arise in transaction processing A transaction T1 acquires a lock on an object X, whereas a different transaction T2 acquires a lock on a different object Y. However, T1 is waiting T2 to release the lock on Y, whereas T2 is waiting T1 to release the lock on X. This results in a deadlock. 25/55

Explain what is the difference between centralised and distributed mutual exclusion Give an example execution of the centralised mutual exclusion algorithm 26/55

DS Mutual Exclusion: Techniques Two major approaches: Centralised: a single coordinator controls whether a process can enter a critical region Distributed: the group confers to determine whether or not it is safe for a process to enter a critical region 27/55

Centralised Algorithm a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted. b) Process 2 asks for permission to enter the same region. No reply. c) When Process 1 quits the critical region, it tells the coordinator, which then replies to Process 2 28/55

Explain all the terms that appear in the following figure. Which of these sections are handled by a mutual exclusion algorithm? 29/55

General Structure of Solutions Programs are partitioned into the following sections: Entry (trying): the code executed in preparation for entering the critical section Critical: the code to be protected from concurrent execution Exit: the code executed on leaving the critical section Remainder: the rest of the code A mutual exclusion algorithm consists of code for the entry and exit sections Should work no matter what the other two sections implement 30/55

Replication 31/55

Why it is important to replicate data in a Distributed System? 32/55

Why Replicate Data? Enhance reliability While at least one server has not crashed, the service can be supplied Protection against corrupted data (the majority of the copies is expected to be correct) Improve performance Increasing the #clients would overload a single server e.g., several web servers can have the same DNS name and the servers are selected in turn to share the load Placing copies of data in the proximity of processes using them 33/55

More on Replication Replicas allow remote sites to continue working in the event of local failures Possible to protect against data corruption Replicas allow data to reside close to where it is used This directly supports the distributed systems goal of enhanced scalability Even a large number of replicated local systems can improve performance think of clusters 34/55

Give a example of inconsistency of replicated data that can be severe 35/55

What Can Go Wrong Updating a replicated database: Update 1 adds 100 to an account, Update 2 calculates and adds 1% interest to the same account Due to network delays, the updates may come in different order! Inconsistent state: The same account has two different balances! 36/55

Explain what we mean by sequential consistency 37/55

Example: Sequential Consistency All processes see the same interleaving set of operations, regardless of what that interleaving is a) A sequentially consistent data-store the first write occurred after the second on all replicas b) A data-store that is not sequentially consistent it appears the writes have occurred in a non-sequential order, and this is NOT allowed 38/55

Describe the push and pull based approaches of update propagation in distributed replicas and mention an example of a hybrid approach 39/55

Push vs. Pull Protocols 1. Push-based/Server-based Approach: sent automatically by server, the client does not request the update Useful when a high degree of consistency is needed Often used between permanent and server-initiated replicas 2. Pull-based/Client-based Approach: used by client caches (e.g., browsers), updates are requested by the client from the server No request, no update! A hybrid approach: leases 40/55

Fault Tolerance 41/55

Name three different types of faults (in terms of a fault s frequency) and for each one of them mention at least one practical example 42/55

Main Types of Faults Transient fault: occurs once and then disappears A bird flying through a beam of a microwave transmitter Some bits might get lost but a retransmission will probably work Intermittent fault: may reappear again and again A loose contact on a connector Permanent fault: continues to exist until the faulty component is replaced burn-out chips, software bugs, disk head crashes 43/55

What is a crash and what a Byzantine failure? Which one of the two is considered harder to deal with? 44/55

What is a crash and what a Byzantine failure? Which one of the two is considered harder to deal with? Crash failure: A server halts, but is working correctly until it halts Byzantine failure: A server may produce arbitrary responses at arbitrary times (even malicious) Byzantine is in general worse due to its unpredictable behaviour 45/55

Give the three main types of redundancy and explain each one of them 46/55

Failure Masking by Redundancy Strategy: if we cannot avoid failures then better hide them from other processes and/or users using redundancy Three main types: 1. Information Redundancy Add extra bits to allow for error detection/recovery e.g., parity bits, Hamming codes 2. Time Redundancy Perform operation and, if required, perform it again. Think about how transactions work (BEGIN/END/COMMIT/ABORT) Well suited for transient and intermittent faults 3. Physical Redundancy Add extra (duplicate) hardware and/or software components to the system Think of replication 47/55

Explain the difference between the forward and backward recovery strategies from failures and mention some of their disadvantages 48/55

Explain the difference between the forward and backward recovery strategies from failures and mention some of their disadvantages 1. Backward Recovery: return the system to some previous correct state (using checkpoints), then continue executing Checkpointing (can be very expensive, especially when errors are very rare) No guarantee that we won t meet the same error again Some operations cannot be rolled back 2. Forward Recovery: bring the system into a correct state, from which it can then continue to execute all potential errors need to be accounted for up-front so that the system knows how to fix them 49/55

Security 50/55

What is the main difference between symmetric and asymmetric cryptosystems? Which one of the two is also called public-key and why? 51/55

What is the main difference between symmetric and asymmetric cryptosystems? Which one of the two is also called publickey and why? In symmetric, both the sender and the receiver use the same key for encryption/decryption while in asymmetric they use different keys Asymmetric, because one of the two keys can be made public 52/55

Assume that a polynomial-time (i.e., efficient) algorithm was found, for computing the prime factors of integers. Which encryption algorithm would no longer be safe to use in this case? 53/55

Assume that a polynomial-time (i.e., efficient) algorithm was found, for computing the prime factors of integers. Which encryption algorithm would no longer be safe to use in this case? The RSA algorithm because it constructs the keys based on large prime numbers, relying on the fact that no efficient method is known to find the prime factors of large numbers 54/55

Final Exam Structure Same as the class test with more subquestions 2 Sections, A and B Section A Answer ALL questions Questions A1 and A2 (30% each) 7 subquestions each Section B Answer ONE of the TWO questions Questions B1 and B2 (40% each) 2 subquestions/problems each 55/55