A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems QUAZI EHSANUL KABIR MAMUN *, MORTUZA ALI *, SALAHUDDIN MOHAMMAD MASUM, MOHAMMAD ABDUR RAHIM MUSTAFA * Dept. of CSE, Bangladesh University of Engineering and Technology, Dhaka 1000, BANGLADESH Dept. of CSE & CIS, Faculty of Sc. & IT, Daffodil International University, Dhaka 1207, BANGLADESH Dept. of CSE, University of Dhaka, Dhaka 1000, BANGLADESH Abstract: - Mutual exclusion problem is the problem of ensuring that certain portions of program code are executed within critical regions, where no two programs are permitted to be in critical regions at the same time. This problem arises in both centralized and distributed systems. Unfortunately, all the three basic approaches centralized, distributed and token ring proposed for achieving mutual exclusion in distributed systems are said to be good for distributed systems in some abstract way only. In this paper, we have presented a two-layer hybrid algorithm for achieving mutual exclusion in distributed systems. The algorithm presented in this paper requires less number of message passing with reduced amount of load on coordinators. The algorithm is also fault tolerant in the sense that the crash of a coordinator can easily be detected and recovered. Key-Words: - Critical section (CS), Distributed mutual exclusion, Deadlock, Starvation, Bandwidth, Client delay, throughput. 1 Introduction A fully distributed synchronization is fantasized for several years, but numerous practical obstacles have hindered this effort reluctantly. In some cases, practically it becomes almost an extreme end to implement fully distributed system for synchronization. On the other hand, another extreme is to implement fully centralized system where performance bottleneck and single point of failure problems are involved. Centralized algorithm, the most simplest and straightforward algorithm, formulates decisions to implement mutual exclusion centrally in distributed systems, described in [1, 5, 6, 9]. Although it requires only three messages per entry to critical section, it suffers from the single point of failure and performance bottleneck for large systems. Ricart and Agrawala [1981] developed an algorithm that formulates decisions to achieve mutual exclusion in a distributed way across the entire system, described in [4, 16]. It is a more expensive algorithm in terms of number of message passing [1]. Moreover, in a system having N processes, the algorithm is liable to N points of failure because if one of the processes fails, the entire scheme collapses [6]. In this paper, we have presented a twolayer hybrid algorithm for achieving mutual exclusion which is free from single point of failure and also requires less number of message passing than that of distributed algorithm. Moreover, there is no need for coordinator synchronization using clocks in our algorithm. Besides, the algorithm is fault tolerant and ensures fairness. 2 Definitions and Basic Concepts Mutual exclusion problem involves the allocation of a single, indivisible, non-shareable resource among several users. The users can be thought of as application programs. The resource could be, for example, a printer or other output device that requires exclusive access in order to ensure that the output is sensible. Or it could be a database or other data structure that requires exclusive access in order to avoid interference among the operations of different users [11]. For any algorithm executing mutual exclusion in distributed systems, we consider a system of N processes p i, i = 1, 2, N that do not share variables. The processes access common resources, but they do so in a critical section. We assume that the system is asynchronous, that the processes do not fail, and that message delivery is reliable, so that any messages sent is eventually delivered intact, exactly once. An algorithm for implementing mutual exclusion must satisfy the following three essential requirements [6]: Safety: At most one process may execute in the critical section (CS) at a time. Liveness: Requests to enter and exit the critical

section eventually succeed. This implies freedom from both deadlock and starvation. The absence of starvation is a fairness condition. Another fairness issue is the order in which processes enter the critical section. Happen-before ordering: If one requests to enter the critical section happened-before [Lamport] another, then entry to the critical section is granted in that order. In happened-before order, it is not possible for a process to enter the critical section more than once while another waits to enter. This ordering also allows processes to coordinate their accesses to the critical section [1]. 3 Two-Layer Hybrid Algorithm In our approach, the entire distributed system is divided into several groups. For each group, one process is elected as coordinator. The algorithm works in two layers. In lower layer, the local coordinator handles requests from local processes and in higher layer, the coordinators synchronize the requests among themselves. In lower layer, every process, when it requires a resource, requests its local coordinator for a resource and enters CS only when it gets reply from its local coordinator, just like in centralized approach. In higher layer, a coordinator, receiving a request for a resource from its group, communicates with all other coordinators for mutual exclusion. The higher layer conforms to distributed approach. The algorithm works as follows: To access a shared resource, a process sends a Request message to the local coordinator. The coordinator sends an Acknowledgement message immediately and queues the request. If the request is the first element of queue, the coordinator calls REQUEST_FOR_CS( ) procedure in which the coordinator sends REQUEST message to all other coordinators and wait for ACKNOWLEDGEMENT messages. Whenever it receives OK messages from all other coordinators, it grants the local process entry to CS. When a coordinator receives a REQUEST from another coordinator the action it takes, after sending ACKNOWLEDGEMENT, can be classified into the following cases: - 1. If no process of the group is in the critical section and no local process wants to enter that critical section, the coordinator of the group sends an OK to the sender and changes its priority to the next level. 2. If any of the processes in the group is already in that CS, it queues the REQUEST. 3. If any process of the group wants to enter that CS but has not yet done so, the coordinator compares the priority of the incoming message with that of the first local Request in the queue. The highest one wins. If the priority of incoming message is higher, the receiver sends an OK to the requesting coordinator. If its own message has a higher priority, it appends the REQUEST in the queue. When a coordinator receives OK from all other coordinators, it grants CS to the local process by sending Reply message and changes its priority to the next level. On receiving Reply from its coordinator, the process enters the CS and when it exits from CS, it sends Release message to the coordinator. The coordinator then deletes the first element of the queue and sends OK for every REQUEST in the queue and deletes corresponding REQUESTs. If the first element of the queue is a local process, the coordinator calls REQUEST_FOR_CS( ). 3.1 Messages: Let, n = number of coordinators or groups The messages used in the algorithm are as follows: For lower level: Request (MsgId, ProcessId, CS) Acknowledgement(MsgId, ProcessId, CS) Reply(MsgId, ProcessId, CS) Release(MsgId, ProcessId, CS) For higher level: REQUEST(MsgId, ProcessId, CoordId, CS, Priority) ACKNOWLEDGEMENT(MsgId, CoordId, CS) OK(MsgId, CoordId, CS) 3.2 Algorithm: Local processes to enter CS: Term = 1; Send Request to Coordinator; Wait for Acknowledgement for timeout_period; If (Not_Found_Acknowledgement) Send Request to Coordinator; Term = Term + 1; If (Term > MAXTERM) call Election_Algorithm( ); Local processes exiting CS: Send Release to Coordinator; Coordinator receives a request from local processes: Send Acknowledgement to sender; Append the process in the queue; If the queued process is Q 0 for all other coordinators

call REQUEST_FOR_CS(CoorId) Coordinator receives a release from local processes: Delete(Q 0 ) for all REQUEST messages in Queue Send OK; Delete(REQUEST); If Q 0 is a local process for all other coordinators call REQUEST_FOR_CS(CoorId); Coordinator receives a REQUEST from other coordinators: send ACKNOWLEDGEMENT to the sender; If ((Local_process_in_CS) or ((Queue_Not_Empty) and (Priority(Q 0 ) > Priority(REQUEST)))) Append the REQUEST in the Queue; Else send OK to sender Coordinator; Priotity = Priotity + 1 (mod n); Fig.1: Process H and J request local coordinator L, process A and C request D. Coordinator receives an OK from other coordinators: OK_Count++; If (OK_Count = n-1) send Reply to local process of Q 0 ; Priotity = Priotity + 1 (mod n); REQUEST(CoorId ) Term = 1 While (Term < MAXTERM) Send REQUEST to CoordId; Wait for ACKNOWLEDGEMENT for TimeoutPeriod; If (Not_Found_ACKNOWLEDGEMENT) Term = Term + 1; Else break; If (Term > MAXTERM) Send OK to itself having (MsgId, CoordId, CS); Fig.2: L sends Acknowledgement to H and J, D sends Acknowledgement to A and C. 3.3 Operation: Suppose that we have three groups. The coordinator of each group is marked as shaded one. Each coordinator maintains a queue. The head of the queue is marked by bold line. The priority of each coordinator is given in a small square as P1, P2, and P3. Here we have depicted our example by several sequences of figures that show the consequences of our algorithm. Fig.3: L and D send REQUEST to all other coordinators.

Fig.4: Coordinator G sends OK to L and D and increases its priority by 2 (modulo 3). Coordinator L sends OK to D since the priority of D is higher than that of L. As D gets OK from all other coordinators, it increases its priority by 1 (modulo 3). Fig.7: Coordinator D sends OK to L and deletes it from queue. At the same time D increases its priority by 1 (modulo 3). Coordinator L now gets all required OK messages for local process H. So it grants CS to H and increases its priority by 1. Fig.5: Coordinator D grants A to access the CS by sending Reply to process A. Fig.8: Coordinator D gets local process C in the head of the queue. So it sends REQUEST to all other coordinators for allocation of CS for process C. In the similar fashion all other requests are served according to our algorithm. Fig.6: After using the CS process A sends Release to coordinator D. Coordinator D deletes A from its queue. 4 Empirical results and analyses We had our experiment on 102 nodes. We split them into 7 different groups. The number of nodes in each group was as follows: A=10, B=15, C=12, D=17, E=14, F=21, G=13. Several cases were considered with some number of candidates for a CS from each group. They are shown at Table 1. The experimental results of the hybrid approach showed a considerable amount of betterment.

Table 2 shows the comparisons among centralized, distributed and hybrid approaches. In comparison with distributed approach, hybrid approach exhibits enormous amount of message reduction. Though hybrid approach requires more messages than centralized approach does, in consideration of load on coordinator, centralized approach is never a good choice because of the bottleneck of performance of coordinator and the case of single point of failure. Our approach defeats both centralized and distributed approaches in this respect. Table 3 gives the load on coordinators. Table 1: Case Study Index No. of Candidates for CS from each groups [A, B, C, D, E, F] Case 1 [ 1, 1, 1, 1, 1, 1, 1] Case 2 [ 3, 8, 7, 3, 1, 2, 3] Case 3 [ 4, 9, 5, 3, 2, 9, 9] Case 4 [ 3, 5, 2, 4, 2, 8, 0] Case 5 [ 6, 4, 6, 1, 9, 6, 7] Case 6 [ 10, 15, 12, 17, 14, 21, 13] Table 2: Number of messages in three approaches No. of messages Index Centralized Distributed Hybrid Case 1 21 1414 154 Case 2 81 5454 594 Case 3 123 8282 902 Case 4 72 4848 528 Case 5 117 7878 858 Case 6 306 20604 2244 Table 3: Load on coordinator Load/Coordinator Index (No. of messages) Centralized Distributed Hybrid Case 1 21 16 16 Case 2 81 55 25 Case 3 123 84 31 Case 4 72 49 24 Case 5 117 80 30 Case 6 300 203 56 Number of messages per entry to CS in hybrid approach: Suppose we have total N number of processes which are divided into n groups. For each entry to CS, the algorithm requires one Request, one Acknowledgement, one Reply, one Release, (n-1) REQUEST, (n 1) ACKNOWLEDGEMENT, and (n 1) OK messages. So total number of messages required, M = (3n + 1) (1) According to the above calculation we can find that, the less the number of groups the less the number of messages required. It may be noticed that when n = 1, the number of messages is the lowest (Actually then it converges to centralized approach). But the problem is that, the load on coordinator will be the highest, which causes performance bottleneck for coordinator for large systems. If we assume that on average every group has c number of processes, N = n * c. In our hybrid approach a coordinator can receive messages from its local group members and from other coordinators. At an instance of time one coordinator may face (n 1 + c) messages whereas in centralized system the coordinator may face (N 1) messages and in distributed system one process may face (N 1) messages. It may be noted that, for even distribution of load on coordinators in hybrid approach, each group should have equal number of processes. So load on coordinator in hybrid approach, L = (n 1) + c = (n 1 + N / n) (2) From (2) it can be seen that, the higher the number of groups, the lower the load on coordinator. Solving (1) and (2), for optimal results, we find that numbers of groups should be 0.7 * N. 5 Conclusion With respect to the traditional algorithms for achieving mutual exclusion in distributed systems, our algorithm is more fault-tolerant; because it can tolerate the crash failure of a coordinator process of any group. But, still it would not tolerate the loss of messages, if the channels are unreliable. So, there are some future scopes to adapt this algorithm to tolerate failures for the lost messages, on the assumption that a reliable failure detector is available and even with a reliable failure detector, care must be required to allow for failures at any point (including failures during a recovery procedure), and to reconstruct the state of the processes after a failure has been detected. References: [1] Coulouris, G., Dollimore, J., and Kindberg, T., Distributed Systems Concepts and Design, Pearson Education, pp. 423 431 (2003). [2] Garg, V. K., Principles of Distributed System, Kluwer Academic, Norwell, MA (1996).

[3] Goscinski, A., Distributed Operating Systems, The Logical Design, Addison-Wesley, Reading, MA (1991). [4] Ricart, G., and Agrawala, A. K., An Optimal Algorithm for Mutual Exclusion in Computer Networks, Communications of the ACM, Vol. 24, No. 1, pp. 9 17 (1981). [5] Silberschatz, A., Galvin, P. B., and Gagne, G., Operating System Concepts, John Wiley & Sons Inc., pp. 598 601 (2002). [6] Sinha, P. K., Distributed Operating Systems Concepts and Design, Prentice-Hall of India Private Limited, pp. 297 305 (March 2002). [7] Tanenbaum, A. S., Modern Operating Systems, Pearson Education Asia, 2 nd Edn. (2001). [8] Tanenbaum, A. S., Distributed Operating Systems, Pearson Education (Singapore) Pte. Ltd. (2002). [9] Tanenbaum, A. S., and Steen, M. v., Distributed Systems Principles and Paradigms, Prentice-Hall of India Private Limited, pp. 265 271 (July 2003). [10] Tanenbaum, A. S., and van Renesse, R., Distributed Operating Systems, Computer Surveys, ACM, Vol. 17, No. 4, pp. 419 470, 1985. [11] Lynch, N., Distributed Algorithms, Morgan Kaufmann (1996). [12] Agarwal, D., and El Abbadi, A., An Efficient and Fault-Tolerant Solution of Distributed Mutual Exclusion, ACM Transactions on Computer Systems, Vol. 9, Association for Computing Machinery, New York, pp. 1 20 (1991). [13] Bulgannawar, S., and Vaidya, N. H., Distributed K-Mutual Exclusion, In: Proceedings of the 15th International Conference on Distributed Computing Systems, IEEE, New York (May June 1995). [14] Carvalho, O. S. F., and Roucairol, G., On Mutual Exclusion in Computer Networks, Communications of the ACM, Vol. 26, No. 2. Association for Computing Machinery, New York, pp. 146 147 (1983). [15] Raynal, M., A Simple Taxonomy for Distributed Mutual Exclusion Algorithms, ACM Operating Systems Review, Vol. 25, pp. 47 50 (1991). [16] Ricart, G., and Agrawala, A. K., An Optimal Algorithm for Mutual Exclusion in Computer Networks, Communications of the ACM, Vol. 24, No. 1, pp. 9 17 (1981). [17] Sanders, B. A., The Information Structure of Distributed Mutual Exclusion, ACM Transactions on Computer Systems, Vol. 5, pp. 284 299 (1987). [18] Suzuki, I., and Kasami, T., A Distributed Mutual Exclusion Algorithm, ACM Transactions on Computer Systems, Vol. 3, No. 4, pp. 344 349 (1985). [19] Mamun, Q. E. K., Masum, S. M., Mustafa, M. A. R., Modified Bully Algorithm for Electing Coordinator in Distributed Systems, In: Proceedings of the 3 rd WSEAS International Conference on Software Engineering, Parallel and Distributed Systems, Salzburg, Austria (February 2004). [20] Tel, G., Introduction to Distributed Algorithms, Cambridge University Press, New York, NY (1994). [21] Maekawa, M., A Algorithm for Mutual Exclusion in Decentralized Systems, ACM Transaction on Computer Systems, Vol. 3, No. 2, pp. 145 159 (1985). [22] Maekawa, M., Oldehoeft, A. E., and Oldehoeft, R. R., Operating Systems: Advanced Concepts, Benjamin/Cummings, White Plains, NY (1987). [23] Stalling, W., Operating Systems: Internals and Design Principles, 4 th edn., Pearson Education, (Singapore) Pte. Ltd., 2003.