SYNCHRONIZATION AND PROCESSES

Size: px
Start display at page:

Download "SYNCHRONIZATION AND PROCESSES"

Transcription

1 UNIT III SYNCHRONIZATION AND PROCESSES Synchronization: Clock synchronization - Mutual exclusion - Election algorithms, - Atomic transactions - Deadlocks; Processes: Threads - System models - processor allocation - Scheduling - Fault tolerance - Real time distributed systems. CLOCK SYNCHRONIZATION Contents o Need for clock synchronization o Lamport s algorithm (logical clock) o Physical clock synchronization algorithms o Christian s algorithm (Passive) o Berkely Algorithm (Active) Need for clock synchronization When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time No common clock or other precise global time source exists in a distributed system This could lead to unexpected behavior and various system failures o Example: Building output using make tool, which relies on timestamp of source files and object files Logical clocks (Lamport) Internal consistency of the clocks matters, not their closeness with real time For many DS algorithms, associating an event to an absolute real time is not essential, we only need to know an unambiguous order of events Physical clocks (Christian - Passive, Berkley - Active) The clocks must be same and must not deviate from real time by certain amount Lamport's Algorithm Synchronizes logical clocks happens-before relation o a -> b read as a happens before b, means all processes agree that event a happens before event b o transitive relation : a -> b and b -> c, then a -> c Each message carries the sending time as per its clock If the receivers clock shows a value prior to the sent time, then it fast forwards its clock to one more than sending time MTech CSE (PT, ) SRM, Ramapuram 1 hcr:innovationcse@gg

2 Between every two events, the clock must tick at least once o Example: 3 process : each with own clock of 6, 8, 10 ticks o 56(1) changed to 61(1), 69(1), 77(1), 85(1) o 54(0) changed to 70(0), 76(0) Physical Clock Synchronization Algorithms WWV -> Call sign of radio broadcast from National Institute of Standard Time (NIST) Each machine is assumed to have a timer, which causes an interrupt H times a second When the timer goes off, the interrupt handler adds one to a Software clock UTC Time is t Ideally dc / dt should be 1 for perfect clock Christian s Algorithm One machine as a WWV receiver (time server) and the goal is to have all the other machines keep synchronized with it Periodically each machine sends a message to the time server asking it for the current time The machine responds as fast as it can with a message containing its current time The time server is passive MTech CSE (PT, ) SRM, Ramapuram 2 hcr:innovationcse@gg

3 Berkeley Algorithm The time server is active, polling every machine periodically and asking its time Based on the answer, it computes an average time and tell all the other machines to advanced their clock to the new time or slow the clocks down until some specified reduction has been achieved MTech CSE (PT, ) SRM, Ramapuram 3 hcr:innovationcse@gg

4 DISTRIBUTED MUTUAL EXCLUSION AND ALGORITHMS A state in which one process enters into a critical region and read or update certain shared data structures It ensures that no other process will use the shared data structures at the same time Centralized Algorithm One process is elected as the coordinator (Ex: the one running on the machine with the highest network address) Whenever a process wants to enter a critical region, it sends a request message to the coordinator stating which critical section it wants to enter and asking for permission If no other process is currently in that critical region, the coordinator sends back a reply - granting permission When the reply arrives, the requesting process enters the critical region If some other process is already in critical region, the coordinator cannot grant permission Actual method to deny permission could be system dependent like No reply, or "Permission denied" message etc. When the process exits the critical section, it sends a message to coordinator releasing its exclusive access Distributed Algorithm When a process wants to enter a critical section, it builds a message containing the name of the critical region, its process number and the current time It sends the message to all other processes, including itself If the receiver is not in the critical region and does not want to enter it, it sends back an OK message to the sender MTech CSE (PT, ) SRM, Ramapuram 4 hcr:innovationcse@gg

5 If the receiver is already in the critical region, it does not reply. It queues the request. If the receiver wants to enter the critical region, it compares the timestamp of the incoming message with the one that it has sent. The lowest one wins. If the incoming message is lower, the receiver sends back and OK message If its own message has a lower timestamp, the receiver queues the incoming request and sends nothing After sending out requests, the process sits back and waits until everyone else has given permission After it exits the critical region, it sends OK message to all processes on its queue and deletes them all Token Ring Algorithm When the ring is initialized, process 0 is given a token. The token circulates around the ring. It is passed from process k to process k + 1 (modulo the ring size) in P2P messages. When a process acquires the token from its neighbor, it checks to see if it is attempting to enter a critical region. If so, the process enters the region, does all the work it needs to, and leaves the region. After it has exited, it passes the token along the ring. It is not permitted to enter a second critical region using the same token. If a process is handed the token by its neighbor and is not interested in entering a critical region, it just passes it along. As a consequence, when no processes want to enter any critical regions, the token just circulates at high speed around the ring. Comparison of three algorithms Algorithm Messages per entry / exit Delay before entry (in message times) Problems Centralized 3 2 Coordinator crash Distributed 2 (n 1) 2 (n 1) Crash of any process Token ring 1 to ~ 0 to n - 1 Lost taken, process crash MTech CSE (PT, ) SRM, Ramapuram 5 hcr:innovationcse@gg

6 ELECTION ALGORITHMS The Bully Algorithm When a process notices that the coordinator is no longer responding to requests, it initiates an election as follows P sends an ELECTION message to all processes with higher numbers If no one responds, P wins the election and becomes coordinator If one of the higher-ups answers, it takes over as new coordinator A Ring Algorithm When any process notices that the coordinator is not functioning, it builds an ELECTION message containing its own process number and sends the message to its successor. If the successor is down, the sender skips over the successor and goes to the next member along the ring. or the one after that, until a running process is located. At each step, the sender adds its own process number to the list in the message. Eventually, the message gets back to the process that started it all. That process recognizes this event when it receives an incoming message containing its own process number. At that point, the message type is changed to COORDINATOR and circulated once again, this time to inform everyone else who the coordinator is (the list member with the highest number) and who the members of the new ring are. When this message has circulated once, it is removed and everyone goes back to work. MTech CSE (PT, ) SRM, Ramapuram 6 hcr:innovationcse@gg

7 ATOMIC TRANSACTIONS The Transaction Model o Stable Storage o Transaction Primitives o ACID Properties o Nested Transaction Implementation o Private Workspace o Writeahead Log o Two-phase Commit Protocol Concurrency Control o Locking o Two-phase locking o Optimistic Concurrency Control o Timestamps Higher-level abstraction, that hides technical issues (like mutual exclusion, critical region management, deadlock prevention, and crash recovery) and allows the programmer to concentrate on the algorithms and how the processes work together in parallel. Example banking application: o Withdraw( amount, account 1). o Deposit(amount, account2). If the telephone connection is broken after the first one but before the second one, the first account will have been debited but the second one will not have been credited. The money vanishes into thin air The key is rolling back to the initial state if the transaction fails to complete The Transaction Model Stable Storage Storage comes in three categories RAM memory, MTech CSE (PT, ) SRM, Ramapuram 7 hcr:innovationcse@gg

8 o wiped out when the power fails or a machine crashes. Next we have Disk storage o survives CPU failures but which can be lost in disk head crashes. Stable storage o designed to survive anything except major calamities such as floods and earthquakes. o Stable storage can be implemented with a pair of ordinary disks Transaction Primitives 1. BEGIN_ TRANSACTION: Mark the start of a transaction. 2. END_ TRANSACTION: Terminate the transaction and try to commit. 3. ABORT TRANSACTION: Kill the transaction; restore the old values. 4. READ: Read data from a file (or other object). 5. WRITE: Write data to a file (or other object). ACID Properties of Transactions 1. Atomic: To the outside world, the transaction happens indivisibly. 2. Consistent: The transaction does not violate system invariants. 3. Isolated: Concurrent transactions do not interfere with each other. 4. Durable: Once a transaction commits, the changes are permanent. Nested Transactions Transactions may contain sub transactions, often called nested transactions. The top-level transaction may fork off children that run in parallel with one another, on different processors, to gain performance or simplify programming Implementation Private Workspace when a process starts a transaction, it is given a private workspace containing all the files (and other objects) to which it has access. Until the transaction either commits or aborts, all of its reads and writes go to the private workspace, rather than the "real" one, by which we mean the normal file system MTech CSE (PT, ) SRM, Ramapuram 8 hcr:innovationcse@gg

9 Writeahead Log sometimes called an intentions list files are actually modified in place, but before any block is changed, a record is written to the writeahead log on stable storage telling o which transaction is making the change, o which file and block is being changed o what the old and new values are. Only after the log has been written successfully is the change made to the file. If the transaction succeeds and is committed, a commit record is written to the log If the transaction aborts, the log can be used to back up to the original state (rollback) Two-Phase Commit Protocol One of the processes involved functions as the coordinator Usually, this is the one executing the transaction Concurrency Control When multiple transactions are executing simultaneously in different processes (on different processors), some mechanism is needed to keep them out of each other's way. That mechanism is called a concurrency control algorithm. Locking when a process needs to read or write a file (or other object) as part of a transaction, it first locks the file. the lock manager maintains a list of locked files, and rejects all attempts to lock files that are already locked by another process The issue of how large an item to lock is called the granularity of locking. MTech CSE (PT, ) SRM, Ramapuram 9 hcr:innovationcse@gg

10 The finer the granularity, the more precise the lock can be, and the more parallelism can be achieved Two-phase locking strict two-phase locking o the shrinking phase does not take place until the transaction has finished running and has either committed or aborted cascaded aborts o having to undo a committed transaction because it saw a file it should not have seen. Optimistic Concurrency Control just go ahead and do whatever you want to, without paying attention to what anybody else is doing. If there is a problem, worry about it later. keep track of which files have been read and written. At the point of committing, it checks all other transactions to see if any of its files have been changed since the transaction started. o If so, the transaction is aborted. o If not, it is committed. Timestamps assign each transaction a timestamp at the moment it does BEGIN_TRANSACTION Every file in the system has a read timestamp and a write timestamp associated with it MTech CSE (PT, ) SRM, Ramapuram 10 hcr:innovationcse@gg

11 DISTRIBUTED DEADLOCK DETECTION Centralized Deadlock Detection Each machine maintains the graph for its own processes and resources It has to be sent to the coordinator explicitly in any of the following ways 1. Whenever an arc is added or deleted from the resource graph, a message can be sent to the coordinator providing the update. 2. Periodically every process can send a list of arcs added or deleted since the previous update 3. The coordinator can ask for information when it needs it none of these methods work well Consider a system with processes A and B running on machine 0, and process C running on machine 1. Three resources exist: R, S, and T. A holds S but wants R, which it cannot have because B is using it C has T and wants S As soon as B finishes, A can get R and finish, releasing S for C This configuration is safe. After a while, B releases R and asks for T, a perfectly legal and safe swap. Machine 0 sends a message to the coordinator announcing the release of R Machine 1 sends a message to the coordinator announcing the fact that B is now waiting for its resource, T. Assume that the message from machine 1 arrives first, leading the coordinator to incorrectly concludes that a deadlock exists and kills some process Such a situation is called a false deadlock Use Lamport's algorithm to provide global time When the coordinator gets the message from machine 1 that leads it to suspect deadlock, it could send a message to every machine in the system When every machine has replied, positively or negatively, the coordinator will see that the arc from R to B has vanished, so the system is still safe. Although this method eliminates the false deadlock, it requires global time and is expensive. MTech CSE (PT, ) SRM, Ramapuram 11 hcr:innovationcse@gg

12 Distributed Deadlock Detection Chandy-Misra-Haas algorithm Processes are allowed to request multiple resources (e.g, locks) at once, instead of one at a time. By allowing multiple requests simultaneously, the growing phase of a transaction can be speeded up considerably. The consequence of this change to the model is that a process may now wait on two or more resources simultaneously. The Chandy-Misra-Haas algorithm is invoked when a process has to wait for some resource, for example, process 0 blocking on process 1. At that point a special probe message is generated and sent to the process (or processes) holding the needed resources. The message consists of three numbers: 1. the process that just blocked, 2. the process sending the message, and 3. the process to whom it is being sent. The initial message from 0 to 1 contains the triple (0, 0, 1 ). When the message arrives, the recipient checks to see if it itself is waiting for any processes. If so, the message is updated, keeping the first field but replacing the second field by its own process number and the third one by the number of the process it is waiting for. The message is then sent to the process on which it is blocked If it is blocked on multiple processes, all of them are sent (different) messages If a message goes all the way around and comes back to the original sender, that is, the process listed in the first field, a cycle exists and the system is deadlocked Ways to break the deadlock One way is to have the process that initiated the probe commit suicide. o This method has problems if several processes invoke the algorithm simultaneously - overkill. Alternative algorithm is to have each process add its identity to the end of the probe message so that when it returned to the initial sender, the complete cycle would be listed If multiple processes discover the same cycle at the same time, they will all choose the same victim MTech CSE (PT, ) SRM, Ramapuram 12 hcr:innovationcse@gg

13 DISTRIBUTED DEADLOCK PREVENTION Wait-die algorithm When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e., is younger). We can then allow the wait only if the waiting process has a lower timestamp (is older) than the process waited for. In this manner, following any chain of waiting processes, the timestamps always increase, so cycles are impossible. Wound-wait algorithm one transaction is supposedly wounded (it is actually killed) and the other waits. If an old process wants a resource held by a young one, the old process preempts the young one, whose transaction is then killed The young one probably starts up again immediately, and tries to acquire the resource, forcing it to wait. Compared with Wait-die algorithm, o if the young one wants a resource held by the old one, the young one is killed. o It will undoubtedly start up again and be killed again. o This cycle may go on many times before the old one releases the resource. Wound-wait does not have this property MTech CSE (PT, ) SRM, Ramapuram 13 hcr:innovationcse@gg

14 THREADS Per thread items Program counter Stack Register set Child threads State Per process items Address space Global variables Open files Child processes Timers Signals Semaphores Accounting information Thread Usage to allow parallelism to be combined with sequential execution and blocking system calls. Three organizations of threads in a process are, a) Dispatcher/worker model. b) Team model. c) Pipeline model. MTech CSE (PT, ) SRM, Ramapuram 14 hcr:innovationcse@gg

15 Three ways to construct a server Model Threads Single Thread Process Finite-state machine Characteristics Parallelism, blocking system calls No parallelism, blocking system calls Parallelism, nonblocking system calls Design Issues for Threads Packages A set of primitives (e.g., library calls) available to the user relating to threads is called a threads package Thread management Two alternatives are possible here, static threads o the choice of how many threads there will be is made when the program is written or when it is compiled. o Each thread is allocated a fixed stack. o This approach is simple, but inflexible dynamic threads o allow threads to be created and destroyed on-the-fly during execution o The thread creation call usually specifies the thread's main program (as a pointer to a procedure) and a stack size, and may specify other parameters as well, for example, a scheduling priority. o The call usually returns a thread identifier to be used in subsequent calls involving the thread o In this model, a process starts out with one (implicit) thread, but can create one or more threads as needed, and these can exit when finished Threads can be terminated in one of two ways. o A thread can exit voluntarily when it finishes its job, o it can be killed from outside Shared Data data that are shared among multiple threads, such as the buffers in a producer-consumer system. Access to shared data is usually programmed using critical regions, to prevent multiple threads from trying to access the same data at the same time. Critical regions are most easily implemented using semaphores, monitors, and similar constructions Mutex one of two states, unlocked or locked Operations: LOCK, UNLOCK, TRYLOCK mutexes are used for short-term locking, mostly for guarding the entry to critical regions Condition variable used for long-term waiting until a resource becomes available lock mutex; lock mutex; MTech CSE (PT, ) SRM, Ramapuram 15 hcr:innovationcse@gg

16 check data structures; while(resource busy) wait (condition variable); mark resource as busy; unlock mutex; mark resource as free; unlock mutex; wakeup (condition variable), Global Variable variables that are global to a thread but not global to the entire program do cause trouble Solutions o prohibit global variables altogether o assign each thread its own private global variables Scheduling Threads can be scheduled using various scheduling algorithms, including priority round robin, and others. Threads packages often provide calls to give the user the ability to specify the scheduling algorithm and set the priorities, if any Implementing a Threads Package Implementing Threads in User Space put the threads package entirely in user space. The kernel knows nothing about them. As far as the kernel is concerned, it is managing ordinary, single-threaded processes The threads run on top of a runtime system, which is a collection of procedures that manage threads entire thread switch can be done in a handful of instructions. faster than trapping to the kernel They allow each process to have its own customized scheduling algorithm Letting the thread actually make the system call is unacceptable, since this will stop all the threads The code placed around the system call to do the checking whether it will block is called a jacket spin lock or busy waiting MTech CSE (PT, ) SRM, Ramapuram 16 hcr:innovationcse@gg

17 Implementing Threads in the Kernel when a thread wants to create a new thread or destroy an existing thread, it makes a kernel call, which then does the creation or destruction Threads and RPC When a server thread, S, starts up, it exports its interface by telling the kernel about it. The interface defines which procedures are callable, what their parameters are, and so on. When a client thread C starts up, it imports the interface from the kernel and is given a special identifier to use for the call. The kernel now knows that C is going to call S later, and creates special data structures to prepare for the call. MTech CSE (PT, ) SRM, Ramapuram 17 hcr:innovationcse@gg

18 SYSTEM MODELS 1. Workstation model 2. Processor pool model 3. Hybrid form Workstation Model The system consists of workstations (high-end personal computers) scattered throughout a building or campus and connected by a high-speed LAN. 1. Diskless workstations a. Do not have local disks b. File system must be implemented by one or more remote file servers 2. Diskful workstations / Disky workstations a. Have local disks When the workstations have private disks, these disks can be used in one ofat least four ways 1. Paging and temporary files 2. Paging temporary and system binaries 3. Paging temporary files, system binaries and file caching 4. Complete local file system Disk usage on workstations Disk usage Advantages Disadvantages (Diskless) Paging, scratch files Paging, scratch files, binaries Paging, scratch files, binaries, file caching Full local file system Low cost, easy hardware and software maintenance, symmetry and flexibility Reduces network load over diskless case Reduces network load even more Still lower network load; reduces load on file servers as well Hardly any network load; eliminates need for file servers Heavy network usage; file servers may become bottlenecks Higher cost due to large number of disks needed Higher cost; additional complexity of updating the binaries Higher cost; cache consistency problems Loss of transparency Using Idle Workstations The workstation can be said to be idle, if no one has touched the keyboard or mouse for several minutes no user-initiated processes are running The algorithms used to locate idle workstations can be divided into two categories: 1. server driven 2. client driven MTech CSE (PT, ) SRM, Ramapuram 18 hcr:innovationcse@gg

19 1. Machine registers when it goes idle 2. Request idle workstation, get reply 3. Claim machine 4. Deregister 5. Setup environment 6. Start process 7. Process runs 8. process exits 9. Notify originator The Processor Pool Model Construct a processor pool, a rack full of CPUs in the machine room, which can be dynamically allocated to users on demand Converting all the computing power into "idle workstations" that can be accessed dynamically. Users can be assigned as many CPUs as they need for short periods A Hybrid Model Provide each user with a personal workstation and to have a processor pool in addition MTech CSE (PT, ) SRM, Ramapuram 19 hcr:innovationcse@gg

20 PROCESSOR ALLOCATION Design considerations Migratory or Nonmigratory nature of process CPU Utilization o Maximize this number o Make sure that every CPU has something to do Response time o Minimize mean response time Response ratio o Amount of time it takes to run a process on some machine, divided by how long it would take on some unloaded benchmark processor. Design Issues for Processor Allocation Algorithm Deterministic versus heuristic algorithms o Appropriate when everything about process behavior is known in advance o Systems where the load is completely unpredictable uses ad hoc techniques called heuristics Centralized versus distributed algorithms MTech CSE (PT, ) SRM, Ramapuram 20 hcr:innovationcse@gg

21 o Collecting all the information in one place allows a better decision to be made, but is less robust and can put a heavy load on the central machine Optical versus suboptimal algorithms Local versus global algorithms o transfer policy Sender-initiated versus receiver-initiated algorithms o location policy Implementation Issues for Processor Allocation Algorithms Measuring the CPU utilization how overhead is dealt with complexity stability Processor Allocation Algorithms 1. A Graph-Theoretic Deterministic Algorithm 2. A Centralized Algorithm 3. A Hierarchical Algorithm 4. A Sender-Initiated Distributed Heuristic Algorithm 5. A Receiver-Initiated Distributed Heuristic Algorithm 6. A Bidding Algorithm A Graph-Theoretic Deterministic Algorithm The system can be represented as a weighted graph, with each node being a process each arc representing the flow of messages between two processes. The total network traffic is the sum of the arcs intersected by the dotted cut lines The goal is then to find the partitioning that minimizes the network traffic A Centralized Algorithm heuristic algorithm that does not require any advance information called up-down a coordinator maintains a usage table with one entry per personal workstation concerned with giving each workstation owner a fair share of the computing power MTech CSE (PT, ) SRM, Ramapuram 21 hcr:innovationcse@gg

22 A Hierarchical Algorithm organize them in a logical hierarchy independent of the physical structure of the network Some of the machines are workers and others are managers For each group of k workers, one manager machine (the "department head") is assigned the task of keeping track of who is busy and who is idle If the manager receiving the request thinks that it has too few processors available, it passes the request upward in the tree to its boss A Sender-Initiated Distributed Heuristic Algorithm when a process is created, the machine on which it originates sends probe messages to a randomly-chosen machine, asking if its load is below some threshold value. If so, the process is sent there. If not, another machine is chosen for probing. Probing does not go on forever. If no suitable host is found within N probes, the algorithm terminates and the process runs on the originating machine MTech CSE (PT, ) SRM, Ramapuram 22 hcr:innovationcse@gg

23 A Receiver-Initiated Distributed Heuristic Algorithm initiated by an underloaded receiver whenever a process finishes, the system checks to see if it has enough work. If not, it picks some machine at random and asks it for work. If that machine has nothing to offer, a second, and then a third machine is asked. If no work is found with N probes, the receiver temporarily stops asking, does any work it has queued up, and tries again when the next process finishes. If no work is available, the machine goes idle. After some fixed time interval, it begins probing again A Bidding Algorithm The processes, which must buy CPU time to get their work done, and processors, which auction their cycles off to the highest bidder. Each processor advertises its approximate price by putting it in a publicly readable file Different processors may have different prices, depending on their o speed, o memory size, o presence of floating-point hardware o other features o indication of the service provided like expected response time When a process wants to start up a child process, it goes around and checks out who is currently offering the service that it needs. It then determines the set of processors whose services it can afford. From this set, it computes the best candidate, where "best" may mean cheapest, fastest, or best price/performance, depending on the application. It then generates a bid and sends the bid to its first choice. The bid may be higher or lower than the advertised price. Processors collect all the bids sent to them, make a choice, by picking the highest one. The winners and losers are informed, and the winning process is executed. The published price of the server is then updated to reflect the new going rate. Scheduling in a distributed system each processor does its own local scheduling when a group of related, heavily interacting processes are all running on different processors, independent scheduling is not always the most efficient way. (a) Two jobs running out of phase with each other. (b) Scheduling matrix for eight processors, each with six time slots. The Xs indicated allocated slots. MTech CSE (PT, ) SRM, Ramapuram 23 hcr:innovationcse@gg

24 co-scheduling which takes interprocess communication patterns into account while scheduling to ensure that all members of a group run at the same time. have each processor use a round-robin scheduling algorithm with all processors first running the process in slot 0 for a fixed period, then all processors running the process in slot 1 for a fixed period, and so on. A broadcast message could be used to tell each processor when to do process switching, to keep the time slices synchronized. Variant breaks the matrix into rows and concatenates the rows to form one long row. With k processors, any k consecutive slots belong to different processors FAULT TOLERANCE Component Faults System Failures Synchronous versus Asynchronous Systems Component Faults A fault is a malfunction, possibly caused by a design error a manufacturing error a programming error, physical damage, deterioration in the course of time, harsh environmental conditions unexpected inputs, operator error, rodents eating part of it, and many other causes Classified as transient o occur once and then disappear o A bird flying through the beam of a microwave transmitter intermittent o occurs, then vanishes of its own accord, then reappears, and so on o A loose contact on a connector permanent o continues to exist until the faulty component is repaired. o Burnt-out chips, software bugs, and disk head crashes System Failures Two types of processor faults can be distinguished Fail-silent faults o a faulty processor just stops and does not respond to subsequent input or produce further output, except perhaps to announce that it is no longer functioning o also called fail-stop faults Byzantine faults MTech CSE (PT, ) SRM, Ramapuram 24 hcr:innovationcse@gg

25 o a faulty processor continues to run, issuing wrong answers to questions, and possibly working together maliciously Synchronous versus Asynchronous Systems Synchronous o a system that has the property of always responding to a message within a known finite bound if it is working Asynchronous Systems o A system not having this property Use of Redundancy Three kinds are possible: information redundancy o extra bits are added to allow recovery from garbled bits o Ex: Hamming code time redundancy o an action is performed, and then, if need be, it is performed again physical redundancy o extra equipment is added to make it possible for the system as a whole to tolerate the loss or malfunctioning of some components o two ways to organize these extra processors: active replication primary backup Fault Tolerance Using Active Replication Active replication is a well-known technique for providing fault tolerance using physical redundancy. Also referred as state machine approach Fault Tolerance Using Primary Backup The essential idea of the primary-backup method is that at any one instant, one server is the primary and does all the work. If the primary fails, the backup takes over. Ideally, the cutover should take place in a clean way and be noticed only by the client operating system, not by the application programs. MTech CSE (PT, ) SRM, Ramapuram 25 hcr:innovationcse@gg

26 Agreement in Faulty Systems there is a need to have processes agree on something. Examples are electing a coordinator, deciding whether to commit a in a system with m faulty processors, agreement can be achieved only if 2m + 1 correctly functioning processors are present, for a total of 3m + 1. agreement is possible only if more than two-thirds of the processors are working properly. MTech CSE (PT, ) SRM, Ramapuram 26 hcr:innovationcse@gg

27 REAL-TIME DISTRIBUTED SYSTEMS real-time programs (and systems) interact with the external world in a way that involves time When a stimulus appears, the system must respond to it in a certain way and before a certain deadline. If it delivers the correct answer, but after the deadline, the system is regarded as having failed. When the answer is produced is as important as which answer is produced. types of stimuli o periodic: occurring regularly o aperiodic : recurrent, but not regular o sporadic : unexpected types of RTDS o Soft real-time - missing an occasional deadline is all right o Hard real-time systems - single missed deadline is unacceptable Myths o Myth 1: Real-time systems are about writing device drivers in assembly code o Myth 2: Real-time computing is fast computing o Myth 3: Fast computers will make real-time system obsolete DESIGN ISSUES Clock Synchronization Same as earlier Event-Triggered versus Time-Triggered Systems Event-Triggered o when a significant event in the outside world happens, it is detected by some sensor, which then causes the attached CPU to get an interrupt. o Event-triggered systems are thus interrupt driven o event shower : massive interrupts o give faster response at low load but more overhead and chance of failure at high load Time triggered o a clock interrupt occurs every T milliseconds o At each clock tick (selected) sensors are sampled and (certain) actuators are driven. o No interrupts occur other than clock ticks o suitable in a relatively static environment in which a great deal is known about system behavior in advance MTech CSE (PT, ) SRM, Ramapuram 27 hcr:innovationcse@gg

28 Predictability it should be clear at design time that the system can meet all of its deadlines, even at peak load it is often known what the worst-case behavior of these processes is Fault Tolerance Many real-time systems control safety-critical devices Active replication sometimes used Primary-backup schemes are less popular because deadlines fault-tolerant real-time systems must be able to cope with the maximum number of faults and the maximum load at the same time Some real-time systems have the property that they can be stopped cold when a serious failure occurs. A system that can halt operation like this without danger is said to be fail-safe. o Ex: railroad signaling system unexpectedly blacks out Language Support specialized real-time languages can potentially be of great assistance it should be easy to express the work as a collection of short tasks (e.g., lightweight processes or threads) that can be scheduled independently, subject to user-defined precedence and mutual exclusion constraints maximum execution time of every task can be computed at compile time Recursion cannot be tolerated need a way to deal with time itself special variable, clock, should be available, containing the current time in ticks Range of a 32-bit clock before overflowing for various resolutions o 1 ns => 4 secs; 1 us => 72 mins; 1 ms => 50 days; 1 sec => 136 years way to express minimum and maximum delays way to express what to do if an expected event does not occur within a certain interval useful to have a statement of the form every (25 msec) {... } that causes the statements within the curly brackets to be executed every 25 msec Real-Time Communication Time Division Multiple Access (TDMA) The Time-Triggered Protocol (TTP) TTP protocol consists of a single layer that handles end-to-end data transport, clock synchronization, and membership management The control field contains a bit used to initialize the system Periodically, a packet with the initialization bit is broadcast. MTech CSE (PT, ) SRM, Ramapuram 28 hcr:innovationcse@gg

29 This packet also contains the current global state properties of TTP are o the detection of lost packets by the receivers, not the senders, o the automatic membership protocol, o the CRC on the packet plus global state o the way that clock synchronization is done Real-Time Scheduling Characterized by 1. Hard real time versus soft real time. 2. Preemptive versus nonpreemptive scheduling. 3. Dynamic versus static. 4. Centralized versus decentralized. A set of tasks that meets the foregoing requirement is said to be schedulable. Dynamic Scheduling Algorithms algorithms that decide during program execution which task to run next Rate monotonic algorithm designed for preemptively scheduling periodic tasks with no ordering or mutual exclusion constraints on a single processor In advance, each task is assigned a priority equal to its execution frequency. For example, a task run every 20 msec is assigned priority 50 and a task run every 100 msec is assigned priority 10. At run time, the scheduler always selects the highest priority task to run, preempting the current task if need be. Earliest deadline first Whenever an event is detected, the scheduler adds it to the list of waiting tasks. This list is always keep sorted by deadline, closest deadline first. For a periodic task, the deadline is the next occurrence. The scheduler then just chooses the first task on the list, the one closest to its deadline Least laxity algorithm first computes for each task the amount of time it has to spare, called the laxity (slack). MTech CSE (PT, ) SRM, Ramapuram 29 hcr:innovationcse@gg

30 o For a task that must finish in 200 msec but has another 150 msec to run, the laxity is 50 msec This algorithm chooses the task with the least laxity, o that is, the one with the least breathing room. Static Scheduling Static scheduling is done before the system starts operating. The input consists of a list of all the tasks and the times that each must run. The goal is to find an assignment of tasks to processors and for each processor, a static schedule giving the order in which the tasks are to be run. Comments & Feedback Thanks to my family members who supported me while I spent hours and hours to prepare this. Your feedback is welcome at GHCRajan@gmail.com MTech CSE (PT, ) SRM, Ramapuram 30 hcr:innovationcse@gg

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 Synchronization Part 2 REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17 1 Outline Part 2! Clock Synchronization! Clock Synchronization Algorithms!

More information

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Synchronization Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Election Algorithms Issue: Many distributed algorithms require that one process act as a coordinator (initiator, etc).

More information

Distributed Operating Systems

Distributed Operating Systems 2 Distributed Operating Systems System Models, Processor Allocation, Distributed Scheduling, and Fault Tolerance Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce855 System

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

殷亚凤. Synchronization. Distributed Systems [6]

殷亚凤. Synchronization. Distributed Systems [6] Synchronization Distributed Systems [6] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Protocols Remote Procedure Call

More information

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan Distributed Synchronization EECS 591 Farnam Jahanian University of Michigan Reading List Tanenbaum Chapter 5.1, 5.4 and 5.5 Clock Synchronization Distributed Election Mutual Exclusion Clock Synchronization

More information

Operating Systems. Deadlock. Lecturer: William Fornaciari Politecnico di Milano Homel.deib.polimi.

Operating Systems. Deadlock. Lecturer: William Fornaciari Politecnico di Milano Homel.deib.polimi. Politecnico di Milano Operating Systems Lecturer: William Fornaciari Politecnico di Milano william.fornaciari@elet.polimi.it Homel.deib.polimi.it/fornacia Summary What is a resource Conditions for deadlock

More information

Process Synchroniztion Mutual Exclusion & Election Algorithms

Process Synchroniztion Mutual Exclusion & Election Algorithms Process Synchroniztion Mutual Exclusion & Election Algorithms Paul Krzyzanowski Rutgers University November 2, 2017 1 Introduction Process synchronization is the set of techniques that are used to coordinate

More information

Chapter 16: Distributed Synchronization

Chapter 16: Distributed Synchronization Chapter 16: Distributed Synchronization Chapter 16 Distributed Synchronization Event Ordering Mutual Exclusion Atomicity Concurrency Control Deadlock Handling Election Algorithms Reaching Agreement 18.2

More information

Chapter 18: Distributed

Chapter 18: Distributed Chapter 18: Distributed Synchronization, Silberschatz, Galvin and Gagne 2009 Chapter 18: Distributed Synchronization Event Ordering Mutual Exclusion Atomicity Concurrency Control Deadlock Handling Election

More information

CSE 5306 Distributed Systems. Synchronization

CSE 5306 Distributed Systems. Synchronization CSE 5306 Distributed Systems Synchronization 1 Synchronization An important issue in distributed system is how processes cooperate and synchronize with one another Cooperation is partially supported by

More information

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach

11/7/2018. Event Ordering. Module 18: Distributed Coordination. Distributed Mutual Exclusion (DME) Implementation of. DME: Centralized Approach Module 18: Distributed Coordination Event Ordering Event Ordering Mutual Exclusion Atomicity Concurrency Control Deadlock Handling Election Algorithms Reaching Agreement Happened-before relation (denoted

More information

Silberschatz and Galvin Chapter 18

Silberschatz and Galvin Chapter 18 Silberschatz and Galvin Chapter 18 Distributed Coordination CPSC 410--Richard Furuta 4/21/99 1 Distributed Coordination Synchronization in a distributed environment Ð Event ordering Ð Mutual exclusion

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Note: Distributed Systems 1 Introduction to Distributed Systems Why do we develop distributed systems?

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Chapter 5 Synchronization OUTLINE Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Concurrent Processes Cooperating processes

More information

Distributed Deadlocks. Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal

Distributed Deadlocks. Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal Distributed Deadlocks Prof. Ananthanarayana V.S. Dept. of Information Technology N.I.T.K., Surathkal Objectives of This Module In this module different kind of resources, different kind of resource request

More information

Distributed Systems. 13. Distributed Deadlock. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 13. Distributed Deadlock. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 13. Distributed Deadlock Paul Krzyzanowski Rutgers University Fall 2017 October 23, 2017 2014-2017 Paul Krzyzanowski 1 Deadlock Four conditions for deadlock 1. Mutual exclusion 2. Hold

More information

Transaction Management. Pearson Education Limited 1995, 2005

Transaction Management. Pearson Education Limited 1995, 2005 Chapter 20 Transaction Management 1 Chapter 20 - Objectives Function and importance of transactions. Properties of transactions. Concurrency Control Deadlock and how it can be resolved. Granularity of

More information

PROCESS SYNCHRONIZATION

PROCESS SYNCHRONIZATION DISTRIBUTED COMPUTER SYSTEMS PROCESS SYNCHRONIZATION Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Process Synchronization Mutual Exclusion Algorithms Permission Based Centralized

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008.

CSC Operating Systems Spring Lecture - XII Midterm Review. Tevfik Ko!ar. Louisiana State University. March 4 th, 2008. CSC 4103 - Operating Systems Spring 2008 Lecture - XII Midterm Review Tevfik Ko!ar Louisiana State University March 4 th, 2008 1 I/O Structure After I/O starts, control returns to user program only upon

More information

UNIT:2. Process Management

UNIT:2. Process Management 1 UNIT:2 Process Management SYLLABUS 2.1 Process and Process management i. Process model overview ii. Programmers view of process iii. Process states 2.2 Process and Processor Scheduling i Scheduling Criteria

More information

Distributed Deadlock

Distributed Deadlock Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

DISTRIBUTED REAL-TIME SYSTEMS

DISTRIBUTED REAL-TIME SYSTEMS Distributed Systems Fö 11/12-1 Distributed Systems Fö 11/12-2 DISTRIBUTED REAL-TIME SYSTEMS What is a Real-Time System? 1. What is a Real-Time System? 2. Distributed Real Time Systems 3. Predictability

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

OPERATING SYSTEMS. Prescribed Text Book. Operating System Principles, Seventh Edition. Abraham Silberschatz, Peter Baer Galvin and Greg Gagne

OPERATING SYSTEMS. Prescribed Text Book. Operating System Principles, Seventh Edition. Abraham Silberschatz, Peter Baer Galvin and Greg Gagne OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne 1 DEADLOCKS In a multi programming environment, several processes

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

Concurrency Control in Distributed Systems. ECE 677 University of Arizona Concurrency Control in Distributed Systems ECE 677 University of Arizona Agenda What? Why? Main problems Techniques Two-phase locking Time stamping method Optimistic Concurrency Control 2 Why concurrency

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

CS 167 Final Exam Solutions

CS 167 Final Exam Solutions CS 167 Final Exam Solutions Spring 2018 Do all questions. 1. [20%] This question concerns a system employing a single (single-core) processor running a Unix-like operating system, in which interrupts are

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

Synchronization (contd.)

Synchronization (contd.) Outline Synchronization (contd.) http://net.pku.edu.cn/~course/cs501/2008 Hongfei Yan School of EECS, Peking University 3/17/2008 Mutual Exclusion Permission-based Token-based Election Algorithms The Bully

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Tasks. Task Implementation and management

Tasks. Task Implementation and management Tasks Task Implementation and management Tasks Vocab Absolute time - real world time Relative time - time referenced to some event Interval - any slice of time characterized by start & end times Duration

More information

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski Operating Systems Design Fall 2010 Exam 1 Review Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 To a programmer, a system call looks just like a function call. Explain the difference in the underlying

More information

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control Concurrency Control Process of managing simultaneous operations on the database without having them interfere with one another. Transaction Management Concurrency control Connolly & Begg. Chapter 19. Third

More information

CSE 486/586 Distributed Systems

CSE 486/586 Distributed Systems CSE 486/586 Distributed Systems Mutual Exclusion Steve Ko Computer Sciences and Engineering University at Buffalo CSE 486/586 Recap: Consensus On a synchronous system There s an algorithm that works. On

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable

What s An OS? Cyclic Executive. Interrupts. Advantages Simple implementation Low overhead Very predictable What s An OS? Provides environment for executing programs Process abstraction for multitasking/concurrency scheduling Hardware abstraction layer (device drivers) File systems Communication Do we need an

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Verteilte Systeme (Distributed Systems)

Verteilte Systeme (Distributed Systems) Verteilte Systeme (Distributed Systems) Karl M. Göschka Karl.Goeschka@tuwien.ac.at http://www.infosys.tuwien.ac.at/teaching/courses/ VerteilteSysteme/ Lecture 6: Clocks and Agreement Synchronization of

More information

DB2 Lecture 10 Concurrency Control

DB2 Lecture 10 Concurrency Control DB2 Lecture 10 Control Jacob Aae Mikkelsen November 28, 2012 1 / 71 Jacob Aae Mikkelsen DB2 Lecture 10 Control ACID Properties Properly implemented transactions are commonly said to meet the ACID test,

More information

Last Class: Clock Synchronization. Today: More Canonical Problems

Last Class: Clock Synchronization. Today: More Canonical Problems Last Class: Clock Synchronization Logical clocks Vector clocks Global state Lecture 12, page 1 Today: More Canonical Problems Distributed snapshot and termination detection Election algorithms Bully algorithm

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Synchronization Jia Rao http://ranger.uta.edu/~jrao/ 1 Synchronization An important issue in distributed system is how process cooperate and synchronize with one another Cooperation

More information

Issues in Programming Language Design for Embedded RT Systems

Issues in Programming Language Design for Embedded RT Systems CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics

More information

Synchronization. Distributed Systems IT332

Synchronization. Distributed Systems IT332 Synchronization Distributed Systems IT332 2 Outline Clock synchronization Logical clocks Election algorithms Mutual exclusion Transactions 3 Hardware/Software Clocks Physical clocks in computers are realized

More information

CPS 110 Midterm. Spring 2011

CPS 110 Midterm. Spring 2011 CPS 110 Midterm Spring 2011 Ola! Greetings from Puerto Rico, where the air is warm and salty and the mojitos are cold and sweet. Please answer all questions for a total of 200 points. Keep it clear and

More information

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc.

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc. Foundation of Database Transaction Processing Copyright 2012 Pearson Education, Inc. Chapter Outline - 17.1 Introduction to Transaction Processing - 17.2 Transaction and System Concepts - 17.3 Desirable

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK II SEMESTER CP7204 Advanced Operating Systems Regulation 2013 Academic Year

More information

Embedded Software Programming

Embedded Software Programming Embedded Software Programming Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee yhlee@asu.edu (480) 727-7507 Event and Time-Driven Threads taskspawn (name,

More information

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

B. V. Patel Institute of Business Management, Computer &Information Technology, UTU

B. V. Patel Institute of Business Management, Computer &Information Technology, UTU BCA-3 rd Semester 030010304-Fundamentals Of Operating Systems Unit: 1 Introduction Short Answer Questions : 1. State two ways of process communication. 2. State any two uses of operating system according

More information

Techno India Batanagar Department of Computer Science & Engineering. Model Questions. Multiple Choice Questions:

Techno India Batanagar Department of Computer Science & Engineering. Model Questions. Multiple Choice Questions: Techno India Batanagar Department of Computer Science & Engineering Model Questions Subject Name: Operating System Multiple Choice Questions: Subject Code: CS603 1) Shell is the exclusive feature of a)

More information

Main Points of the Computer Organization and System Software Module

Main Points of the Computer Organization and System Software Module Main Points of the Computer Organization and System Software Module You can find below the topics we have covered during the COSS module. Reading the relevant parts of the textbooks is essential for a

More information

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases Distributed Database Management System UNIT-2 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi-63,By Shivendra Goel. U2.1 Concurrency Control Concurrency control is a method

More information

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019 CS 31: Introduction to Computer Systems 22-23: Threads & Synchronization April 16-18, 2019 Making Programs Run Faster We all like how fast computers are In the old days (1980 s - 2005): Algorithm too slow?

More information

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 coordination Johan Montelius 1 Coordination Coordinating several threads in one node is a problem, coordination in a network is of course worse: failure of nodes and networks

More information

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

(Pessimistic) Timestamp Ordering

(Pessimistic) Timestamp Ordering (Pessimistic) Timestamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps (Pessimistic) stamp Ordering Another approach to concurrency control: Assign a timestamp ts(t) to transaction T at the moment it starts Using Lamport's timestamps: total order is given. In distributed

More information

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence? CPSC-4/6: Operang Systems Atomic Transactions The Transaction Model / Primitives Serializability Implementation Serialization Graphs 2-Phase Locking Optimistic Concurrency Control Transactional Memory

More information

Multi-User-Synchronization

Multi-User-Synchronization Chapter 10 Multi-User-Synchronization Database Systems p. 415/569 Why Run TAs Concurrently? We could run all TAs serially (one after the other) This would prevent all unwanted side effects However, this

More information

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD OPERATING SYSTEMS #8 After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD MEMORY MANAGEMENT MEMORY MANAGEMENT The memory is one of

More information

Chapter 13 : Concurrency Control

Chapter 13 : Concurrency Control Chapter 13 : Concurrency Control Chapter 13: Concurrency Control Lock-Based Protocols Timestamp-Based Protocols Validation-Based Protocols Multiple Granularity Multiversion Schemes Insert and Delete Operations

More information

CS 153 Design of Operating Systems Winter 2016

CS 153 Design of Operating Systems Winter 2016 CS 153 Design of Operating Systems Winter 2016 Lecture 12: Scheduling & Deadlock Priority Scheduling Priority Scheduling Choose next job based on priority» Airline checkin for first class passengers Can

More information

Distributed Operating Systems. Distributed Synchronization

Distributed Operating Systems. Distributed Synchronization 2 Distributed Operating Systems Distributed Synchronization Steve Goddard goddard@cse.unl.edu http://www.cse.unl.edu/~goddard/courses/csce855 1 Synchronization Coordinating processes to achieve common

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester Section Subject Code Subject Name Degree & Branch : I & II : M.E : CP7204 : Advanced Operating Systems : M.E C.S.E. 1. Define Process? UNIT-1

More information

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008 Distributed Systems Theory 4. Remote Procedure Call October 17, 2008 Client-server model vs. RPC Client-server: building everything around I/O all communication built in send/receive distributed computing

More information

UNIT 02 DISTRIBUTED DEADLOCKS UNIT-02/LECTURE-01

UNIT 02 DISTRIBUTED DEADLOCKS UNIT-02/LECTURE-01 UNIT 02 DISTRIBUTED DEADLOCKS UNIT-02/LECTURE-01 Deadlock ([RGPV/ Dec 2011 (5)] In an operating system, a deadlock is a situation which occurs when a process or thread enters a waiting state because a

More information

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5

Synchronization Part 1. REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5 Synchronization Part 1 REK s adaptation of Claypool s adaptation of Tanenbaum s Distributed Systems Chapter 5 1 Outline! Clock Synchronization! Clock Synchronization Algorithms! Logical Clocks! Election

More information

CS516 Distributed Operating Systems. Introduction to Amoeba - Object and Capabilities - memory management - Communication - Amoeba Servers.

CS516 Distributed Operating Systems. Introduction to Amoeba - Object and Capabilities - memory management - Communication - Amoeba Servers. UNIT V CASE STUDY - AMOEBA Introduction to Amoeba - Object and Capabilities - memory management - Communication - Amoeba Servers. THE AMOEBA SYSTEM ARCHITECTURE Designed with two assumptions about the

More information

Chapter 9: Concurrency Control

Chapter 9: Concurrency Control Chapter 9: Concurrency Control Concurrency, Conflicts, and Schedules Locking Based Algorithms Timestamp Ordering Algorithms Deadlock Management Acknowledgements: I am indebted to Arturas Mazeika for providing

More information

Experience with Processes and Monitors in Mesa. Arvind Krishnamurthy

Experience with Processes and Monitors in Mesa. Arvind Krishnamurthy Experience with Processes and Monitors in Mesa Arvind Krishnamurthy Background Focus of this paper: light-weight processes (threads) and how they synchronize with each other History: Second system; followed

More information

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to:

General Objectives: To understand the process management in operating system. Specific Objectives: At the end of the unit you should be able to: F2007/Unit5/1 UNIT 5 OBJECTIVES General Objectives: To understand the process management in operating system Specific Objectives: At the end of the unit you should be able to: define program, process and

More information

Processes Prof. James L. Frankel Harvard University. Version of 6:16 PM 10-Feb-2017 Copyright 2017, 2015 James L. Frankel. All rights reserved.

Processes Prof. James L. Frankel Harvard University. Version of 6:16 PM 10-Feb-2017 Copyright 2017, 2015 James L. Frankel. All rights reserved. Processes Prof. James L. Frankel Harvard University Version of 6:16 PM 10-Feb-2017 Copyright 2017, 2015 James L. Frankel. All rights reserved. Process Model Each process consists of a sequential program

More information

Selected Questions. Exam 2 Fall 2006

Selected Questions. Exam 2 Fall 2006 Selected Questions Exam 2 Fall 2006 Page 1 Question 5 The clock in the clock tower in the town of Chronos broke. It was repaired but now the clock needs to be set. A train leaves for the nearest town,

More information

* What are the different states for a task in an OS?

* What are the different states for a task in an OS? * Kernel, Services, Libraries, Application: define the 4 terms, and their roles. The kernel is a computer program that manages input/output requests from software, and translates them into data processing

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Interprocess Communication By: Kaushik Vaghani

Interprocess Communication By: Kaushik Vaghani Interprocess Communication By: Kaushik Vaghani Background Race Condition: A situation where several processes access and manipulate the same data concurrently and the outcome of execution depends on the

More information