MSS. MH <Disconnected> MSS MSS. wired link. wireless link. cell boundary

Size: px
Start display at page:

Download "MSS. MH <Disconnected> MSS MSS. wired link. wireless link. cell boundary"

Transcription

1 Chapter 1 Fault Tolerance and Recovery in Mobile Computing Systems Elisa Bertino, Elena Pagani, and Gian Paolo Rossi 11 INTRODUCTION Through wireless networks, mobile personal machines have the ability to access data and services that can be located on both mobile and wired servers Unlike wired hosts, mobile hosts can be temporarily unreachable as a consequence of their moving across dierent cells, their energy limitation or unavailable wireless channels Mobility forces mobile hosts to alternate connected and disconnected work When connected, they perform personal communications and access shared data and services; when disconnected, they can process locally cached data objects As for wired networks, data replication is the key element to ensure high data availability and to increase performance However, disconnected work and the uncertainties of the underlying wireless network introduce new challenging issues that have been recently discussed in the literature There are three main aspects that we wish to discuss in this chapter: 1 how to provide a fault-tolerant architecture that addresses data access and management despite mobility and disconnected work 2 how to manage data replication to ensure data consistency, integrity, and durability according to the application requirements 3 the extend to which the general requirement of network independency can be met or otherwise the application awareness of mobility can be eectively exploited to provide the level of quality of service needed 1

2 2 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 In this chapter, we investigate how the problem of managing a distributed database and guaranteeing data consistency is aected by the characteristics of the mobile setting We discuss the impact of mobility and disconnections on fault tolerance and recovery We investigate how fault-tolerance can be ensured, by analyzing some of the algorithms proposed for database management in mobile systems This chapter is organized as follows In Section 12, we describe the mobile environment and introduce data management issues in such an environment In Section 13, we present the system architecture to which we refer in the remainder of the discussion, and in Section 14, we characterize mobile applications In Section 15, we investigate how fault tolerance is aected in mobile environments and how the ACID properties can be re dened to guarantee data correctness in a mobile setting In Sections 16 and 17, we discuss some of the approaches proposed for managing distributed databases Finally, in Section 18 we report some performance evaluation results concerning some of the described algorithms 12 DISTRIBUTED SYSTEMS WITH MOBILE HOSTS Unlike computer networks with xed stations (FHs), a mobile host (MH) can retain its network connection even while moving This is possible because of the use of dierent network technologies, such as radio links, satellite networks, and infrared links [35, 36], that do not impose any physical constraint to the hosts, that is, they are wireless Wireless networks may be classied in single-hop [3, 13, 23, 24, 27, 28] and multihop [11, 12]; in the latter case, all the machines in the system are mobile, whereas in the former, both mobile and xed stations are involved In the sequel, we restrict our attention almost only to single-hop systems, as they are the most considered in the literature Single-hop networks are organized as shown in Figure 11: Some of the xed hosts, denoted as Mobile Support Stations (MSSs) [1, 2, 4, 24] are equipped with a wireless interface; they support communication between the MHs that reside in a cell and the MHs in dierent cells The cell is the area in which the signal generated by the MSS can be received by the MHs The messages generated by a given MSS are broadcast within the cell The MHs lter the messages according to their destination address; on the contrary, a MH can communicate with another MH of the same cell only by sending its message to the cell MSS that executes the broadcast FHs and MSSs are connected through a wired network, whose topology is static and used to support the communication between cells Because of movements, the topology of the wireless network may change over time The diameter of the cells may vary according to the wireless technology foe example, it spans from a few meters for infrared technology to 1 to 2 miles for radio or single-hop satellite networks [36] Moreover, the technology also aects the available bandwidth: LANs that use infrared technology have transfer rates on

3 Sec 12 Distributed systems with mobile hosts 3 MH <Disconnected> MH MH MSS FH MH MH MH MSS MH MH MSS MH wired link wireless link FH cell boundary Figure 11 Example of a single-hop network the order of 1 to 2 Mbps (up to 100 Mbps in the recent experiments [36]); on the contrary, WANs have poorer performance, as they usually provide bandwidths in the range 14:4 to 64 Kbps Wireless networks that oer around 100-Kbps services are under development [36] Finally, wireless networks are supposed to be less reliable than wired ones: It has been estimated that the failure rate will increase at least of one order of magnitude with respect to the current wired networks Cells may overlap Hence, a MH may be contemporarily in more than one cell although it refers to only one MSS a time In most systems, MHs choose their current MSS according to the highest signal they receive [1, 13] When the MH m moves from MSS 1 to MSS 2, a hando (or handover) procedure [4, 24] is activated between MSS 1 and MSS 2 to transfer the state information about MH m to MSS 2 A mobility assumption [1, 2] is required to ensure system liveness: MH m resides in a cell at least for the time sucient to complete the hando procedure and to allow the MSS to deliver to MH m at least one of the messages that were still pending at MSS 1 ; in this way, we guarantee that messages do not starve In their wandering, MHs could move to places that are out of the cell coverage, that is, become disconnected This depends on the system capability to cover a given geographical area In the United States [36], the wireless service with the broadest coverage is Ardis, which reaches more than the 80% of the US population and spans around 90% of the metropolitan areas and the 30? 40% of the rest of the country As a counterpart, Ardis has a very low bandwidth; it oers 4:8-Kbps connections Other wireless networks exist but because of the current lack of standards, it is not possible to exploit the services of dierent wireless networks to gain a greater coverage Disconnections can also occur because of several events, for example, the MH

4 4 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 may exhaust its battery charge, it can be lost, or it can crash MHs can be classied as either dumb terminals or walkstations [24] The former ones are diskless hosts (such as, for instance, palmtops) with reduced memory and computing capabilities They can receive from the wireless network, but they are not able to send messages Walkstations are comparable to classical workstations and can both receive and send messages on the wireless network We will focus our considerations on this latter type of MHs Despite their computing resources, MHs are mainly constrained by the short lifetime of batteries [8] that are heavily aected by the communications over wireless channels To reduce the energy waste, MHs enter a doze mode when they are not involved in sending or receiving packets A doze MH only has the network interface active, which is able to lter the messages broadcast in the cell on the basis of the destination address If a message is observed that is addressed to the MH, the system is awakened to revert to the normal operation mode The described system behaviors impact on the design of distributed applications To our purposes, the most relevant are the high failure rate requires to address both the fault-tolerance and the recovery issues the energy-saving argument generates some new constraints that must be considered while designing the services to support distributed applications 13 SYSTEM ARCHITECTURE Figure 12 helps us to identify the main functional modules that compose a MH architecture [29] The hardware interface provides the physical access to the network and also lters the messages broadcast in the cell according to their destination address Mobile applications Data/Service/Resource mobility Mobile transport protocol (Mobile TCP) Multicast transport protocol Mobile IP/Handoff procedure Hardware wireless interface Figure 12 Reference architecture of a mobile host On top of it, the Network layer, for example, through the Mobile-IP [31] protocol, provides transparent addressing of MHs and executes the hando procedure [6, 38] Communication over the wireless network is unreliable, that is, packets are lost, corrupted, or duplicated, and the transmission delay is highly variable due to different wireless technologies and load conditions Hence, a wireless network may be

5 Sec 14 Mobile applications 5 considered as an asynchronous system; this implies the unpredictable duration of transaction processing The transport layer masks network uncertainties to upper layers and provides some sort of reliable, point-to-point, or multicast [1, 2] channels amongst MHs Certain multicast transport channels (eg, [1, 2]) can ensure FIFO order in the message delivery The higher layer provide the value-added services to directly support the application communication requirements These services mainly address the management of the data objects and les in the presence of mobility They are also responsible for negotiating with MSSs the quality of the service according to both user requirements and services actually supplied by the wireless network [7, 13, 15, 22, 28, 29, 32] The problem of locating MHs, ie of knowing their current position to allow the routing of the messages, has received great attention [5, 25] and is emphasized by the trend of reducing the cell's size to improve the communication bandwidth Location service is architecturally located within the network layer, although some interesting evolution of the basic location service are oriented to allow their direct use by mobile applications (see, eg, [41]) 14 MOBILE APPLICATIONS Applications that run on mobile hosts are likely to have dierent requirements with respect to those designed for traditional environments Most users will use MHs for personal communications (eg, , around 25%) and for mobile oce activities (around 45%) [36] The latter possibility implies the ability of porting existing applications on MHs and of allowing them to access and share remote data objects In [23, 24] a rst attemp to classify mobile applications has been introduced based on the locality of the data the application accesses In vertical applications, the users access the data within a specic cell and the access is denied to users that are out of that cell, for instance, data concerning the availability of parking places in that cell, the position of the nearest doctor, or the personal identities of the other users in the cell On the contrary, horizontal applications handle data that span over users being distributed on the whole system; typically, they are applications whose users cooperate toward a common task, in spite of their movements, or multimedia applications such as conferencing The nature of the applications impacts on the pattern of access in reading and writing the data; in particular, in [23, 24] the following classes of data have been identied: Private Data: They are maintained, accessed, and managed by a single user, the owner; no other users may access the data Public Data: One user may update them, and all the users of the system can read them Consider, for instance, applications such as weather forecast, news bulletins, or broadcast of nancial data Another important kind of

6 6 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 information in this category is location data [41], that is, data concerning the identity of the cell in which a MH currently resides In [41], data have been further classied into three categories according to their semantics, which reects the frequency of their updating: (1) terminal mobility data, which concern the location of the host; (2) personal mobility data, which concern the user's identity and are used for the user authentication and (3) service mobility data, which describe the users' proles, regarding, for instance, the customization of the applications they use or the subscribed services Shared Data: They are accessed both in read and write by a group of users cooperating to a common task (eg, cooperative workgroup) or managing multiple copies of the data to achieve availability and reliability Whereas public data are mainly managed by vertical applications, the use of shared data in the framework of horizontal applications introduces a general and complete range of fault-tolerance and recovery issues that mainly concern the topics of this chapter In this work, we will mainly consider this setting To ensure the service availability and to improve the performance, shared data can be replicated Copies may be located both in xed and mobile stations Mobility introduces new challenging issues in the design of the mechanisms that guarantee data consistency and integrity The scalability of these mechanisms over a possibly large amount of MHs is also an important issue 15 FAULT TOLERANCE IN MOBILE DATABASE PLATFORMS As the mobile setting highly diers from the xed setting, it is necessary to redene what a failure is, and what \fault tolerance" means in this new context In general, a system is fault-tolerant if it guarantees to behave correctly with respect to its service specication despite malfunctions; in the case of database systems, correctness is usually dened in terms of ACID properties In this section, we explore the approaches to fault-tolerance in database management and show some examples of normal MHs behaviours that may be misinterpreted as failures We discuss the impact of these behaviours on the correct operation of the system and show how fault-tolerance may be redened according to these considerations, and how to achieve it In the following, we do not consider the failures on the wireless network because their detection and recovery are the responsibility of the transport protocol A reliable transport service is observed at the interface with the transport protocol (see Section 12) The services we consider throughout this chapter are built on top of such reliable transport protocol 151 Transaction Execution in Mobile Database Systems The characteristics of MHs introduce new fault-tolerance issues in transaction management Among these issues, the capability of tolerating the disappearance of

7 Sec 15 Fault-tolerance in mobile database platforms 7 MHs from the cells is of primary concern because of mobility and disconnections Whether the MHs store the entire database or part of it, and actively participate in the management of the database, is a design choice that impacts on the eects that failures may have MH m submit T MSS l t 0 MH m < disconnected > MSS l < processing T> t 1 (a) MH m query D D MSS l MH m < processing query disconnected > t 0 MSS l < processing T> (b) MH m ask for data item D D MSS l t 0 MH m t 1 < processing T disconnected > MSS l < processing T> (c) Figure 13 Management of mobile databases (a) The MH m submits a transaction T according to the transaction proxy approach (b) The MH m submits a query on the data D according to the read-only transaction approach (c) The MH m submits a transaction T according to the weak transaction approach Most approaches assume that copies of data on MHs are secondary copies, whereas primary copies are maintained at FHs and MSSs As we will see in Section 16, dierent approaches may be used that give more or less autonomy to MHs in operating on the database Almost all these approaches, however, force MHs to perform periodic checkpoints and to maintain their backups on FHs The adopted approaches may be classied as follows: Transaction Proxy: The MHs do no execute any computation, but instead ask the MSSs to execute transactions on their behalf [13, 27, 44] Therefore, the MSSs always hold the consistent database, and MHs do not need to execute t 1

8 8 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 any update action on the data objects nor to keep any data object in their caches, see Figure 13(a) Read-Only Transactions: MHs only cache data objects for queries, and updates are performed as in the preceding case, see Figure 13(b) Weak Transactions: Besides performing queries on cached data, MHs may update data objects in their caches even while disconnected [7, 28, 30, 40] see Figure 13(c) In this case, they must stabilize their updates as soon as they re-connect, that is, they have to globally commit the updates in order to re-establish consistency and to guarantee durability For the purpose of stabilizing the disconnected transactions or undoing them in the event of an abort, a log is maintained in secondary storage at the MH, recording the actions executed by uncommitted transactions [28, 30] The log of each transaction is sent to the MSS on reconnection, so that the MSS can reexecute the transaction on its primary copy, to verify whether it can safely commit It should be evident that in the case of the weak-transaction approach, applications have to deal with more dicult fault-tolerance problems than in the transaction proxy approach, because of the maintenance of data objects stored by the MHs In the weak-transaction approach, recovery mechanisms must be designed by properly taking into account the scarce availability of storage resources of walkstations 152 Impact of mobility on transaction correctness In section 12, we have shown that a mobile environment is characterized by hosts that can be temporarily unreachable, because of entering the doze mode, disconnecting or moving to uncovered zones, and by the intrinsic asynchrony of the underlying network environment In this section, we discuss how these features may lead to the violation of ACID properties or may jeopardize the liveness of the system even in the absence of failures Figure 14 shows two cases In the rst case [Figure 14(a)], MH m caches a set D of data objects while being in cell l, and then it disconnects and continues to process its transactions while being disconnected When it later reconnects to the same or to a dierent MSS, its copy of the data is inconsistent with respect to the one held at MSSs The same problem also arises when the MH m caches data for read-only transactions because of the updates carried out by other FHs or MHs, whereas MH m is unreachable To solve this problem, MH m can lock the data objects D at the MSS l site while executing local transactions, thus preventing concurrent execution of other transactions originating from the MSS l or other MHs [Figure 14(b)] If MH m disconnects or cannot be reached for a long time, it can be suspected of having failed If MSS l maintains the lock, problems arise in the event of actual crashes or

9 Sec 15 Fault-tolerance in mobile database platforms 9 MH m MH m data D MSS l D MSS k lock D MSS l D < locked > MSS k (a) (b) Figure 14 Example of possibly incorrect behaviurs in the case of movement of an MH long-lived disconnections If the MSS l releases the lock after a time out the problem arises of inconsistent copies [Figure 14(a)] The preceding problems arise because of the diculty in distinguishing a temporaneous unreachability from a crash It is possible,however, to distinguish between planned and sudden disconnections [4] The former are predictable When the MH becomes doze, or it disconnects either to recharge the batteries or to save power, some safety actions can be performed to tolerate such a temporary disconnection For instance, the pending transactions can be moved to the destination MSS, where an agent can execute them on behalf of the mobile application to allow the MH to obtain the results on reconnection, as proposed in [13] Another approach allows MHs to prefetch in the cache the data they require while being disconnected Appropriate algorithms are designed to reestablish consistency among the existing copies of the data when the MH reconnects 153 Transaction Correctness in Mobile Computing Systems The ability of working in a mobile context must coexist with the possibility that even normal system conditions may lead to the violation of the database correctness As a consequence, the eorts to achieve fault tolerance have been addressed to redesign the notion of correctness rather than to redene the notion of failure A number of alternative denitions of ACID properties have been proposed [15, 21, 30, 33] that weaken one or more of the properties In general, their goal is to guarantee the MHs a certain degree of autonomy in transaction processing during disconnections, and to preserve the (modied) system correctness by allowing bounded inconsistencies among the data copies In the following, we describe how each property has been redened in some proposals in the scientic literature Usually, only one property is considered at a time The weakening of a given property, however, may impact the other ACID properties Atomicity property The rst step toward fault tolerance in mobile systems is allowing MHs to submit \pieces" of transactions in dierent cells according to

10 10 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 their movements Several alternative methods are described in [21] This approach weakens the classical formulation of atomicity and requires the ability of breaking a transaction so that subtransactions can be concurrently executed and interleaved with subtransactions of other transactions while guaranteeing other ACID properties These mobile transaction models are based on extended transaction models developed for long-duration transactions, such as Open Nested Transactions (ONT) [16] and Saga Transactions [18] For instance, in the ONT model, the abort of one or more subtransactions does not necessarily imply the abort of the entire transaction Hence, when a transaction T commits, only some of its operations may have been actually executed The decomposition of a transaction in subtransactions can be performed according to dierent principles and at dierent levels of granularity Dierent approaches to decomposition have been proposed for each of the three transaction models discussed in Section 151 In the case of transaction proxies and read-only transactions, transactions may be either submitted as a whole at a unique MSS [13, 44], or they may be split during processing (Kangaroo model, [21]) following the movements of the MH that submitted the transaction In the latter case, communication costs are reduced by relocating computations as near to the MH as possible In the case of weak transactions, a transaction T is decomposed into mutually independent subtransactions This decomposition ensures that the subtransactions of a transaction T can be concurrently processed at the dierent MSSs to which they have been submitted, and that their execution order does not impact on the successful commit of T Independence may be guaranteed, for instance, according to Bernstein's conditions [9] Each subtransaction S i has a write set W i and a read set R i Every two subtransactions S i and S j of a transaction T satisfy the following conditions: W i \ W j = ; R i \ W j = ; W i \ R j = ; that guarantee their independence (Reporting and Co-transaction model [21]) This approach is based on the Split Transaction Model [34] A run-time support must exist that computes the decomposition by determining an appropriate partition of the read and write sets of the transaction Another approach is based on fragmenting a transaction T so that each of its subtransactions S i executes operations that are commutative with those of the other subtransactions of T [21] Both in the Kangaroo model and in the Reporting and Co-transaction model, the hando procedure must be extended to involve the transfer of information concerning pending transactions generated by the MH These models imply a redenition of the other ACID properties Both isolation and durability are restricted to subtransactions instead of global transactions In the case of weak transactions the consistency property is also aected, and a mechanism is required to merge copies and reestablish consistency on reconnection

11 Sec 15 Fault-tolerance in mobile database platforms 11 Consistency property An approach alternative to that of weakening the atomicity property consists in the redenition of the consistency property Under this approach, the database is considered partitioned in clusters, either according to semantics-based criteria (eg, data objects related by integrity constraints belong to the same cluster) or to location proximity (clustering model [21, 33]) Data in the same cluster must be strictly consistent, whereas a bounded degree of inconsistency is tolerated amongst clusters, according to some denition of consistency Hence, clustering can, for instance which support multiversion databases or tolerate divergences between the secondary copy of the data maintained at a MH (that constitutes a cluster) and the primary copies on FHs MHs are therefore allowed to process transactions while being disconnected According to this approach, two classes of primitives are used to update data: 1 weak-write and weak-read that modify data only in the local cluster, thus possibly causing inconsistencies with respect to other clusters 2 strict-write and strict-read that modify data in the global database, thus maintaining consistency These primitives are executed so that operations that work on the same cluster do not conict Conicts are prevented by locking mechanisms In assigning locks, the usual lock compatibility modes are applied Moreover, to guarantee that weak operations do not observe intermediate results produced by strict operations, strictwrite locks and weak-read locks conict By contrast, strict-read and weak-write operations are not conicting operations The implementation must only guarantee that a strict-read operation reads the value written by the last strict-write operation A consequence of the redenition of the consistency property is a more complex notion of serializability Strict transactions must serialize with respect to each other according to one-copy serializability [17] Moreover, let the projection of a strict transaction T on a MH m be the subtransaction of T that operates on the data objects held at the MH m Weak transactions processed at MH m must serialize always according to the one-copy serializability with respect to each other and with respect to the projections on MH m of the strict transactions The degree of inconsistency can be dened, for instance, in terms of the maximum number of versions of the same data objects that can exist at the same time or the maximum number of weak operations that can be executed on a copy of a data without being propagated to the other copies [32] A timestamp can be associated with each datum so that locks on that data are automatically released after the expiration of a time out (time-based consistency model [21]) This way, MHs can operate disconnected for a limited interval of time Isolation property Some transaction models have been devised for mobile environments in which the isolation property is not guaranteed, that is, intermediate results of a transaction T can be observed by other transactions This is usually a side eect of the relaxation of other ACID properties

12 12 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 We have observed in the previous paragraph that isolation can be enforced by properly modifying conict rules amongst locks, in the case of operations that work on data having dierent degrees of consistency By contrast, if a transaction model is adopted such as those mentioned in the discussion on atomicity, the isolation property only holds for subtransactions If a subtransaction S j of a transaction T j is processed at a MSS after that subtransaction S i of a transaction T i has been processed at the same MSS, then S j can observe the result of S i, that is, an intermediate result of T i Helal et al observe in [21] that the sharing of partial results amongst transactions, for example, by means of Reporting Transactions, may be desirable for some applications On the other hand, enforcing isolation property is probably expensive in that it may severely restrict concurrency among transactions Durability property Durability of committed transactions is mainly aected by the possibility MHs have of autonomously operating on data In some transaction models, MHs can execute transactions on locally stored data even while being disconnected (Reporting and Co-transaction model, Clustering model [21]) If a MH m operating in the disconnected mode fails before stabilizing the results of its committed transactions on the primary copies of the database, these results could be never recovered (for example, in the case of a media failure) To ensure fault tolerance, in the Coda le system [28, 30], the durability property is relaxed by providing two types of transactions and two degrees of commitment First-class transactions are those executed by either connected MHs or users on FHs, and second-class transactions are those processed by disconnected MHs A disconnected MH can only commit a transaction locally if this transaction does not conict with other transactions executed on the same host while the host is disconnected On reconnection, the transactions are globally committed, unless they conict with already committed rst-class or second-class transactions executed on dierent MHs Hence, rst-class transactions have one level of commitment a rst-class transaction can commit if it is serializable with respect to all the transactions previously committed By contrast, second-class transactions are subject to two levels of commitment: local commitment: the transaction can commit if it is serializable with respect to all the previously committed second-class transactions executed on the same host global commitment: the transaction can commit if it is serializable with respect to all the committed transactions in the system Two levels of commitment have also been adopted for the weak transactions described in the consistency paragraph [33] The global commitment allows to detect

13 Sec 15 Fault-tolerance in mobile database platforms 13 possible inconsistencies caused by weak transactions on reconnection A locally committed transaction however can globally abort The durability of locally committed transactions is not guaranteed until these transactions globally commit 154 Recovery in Mobile Databases Transaction recovery deals with the capability of ensuring failure atomicity [14] It concerns the durability and the atomicity properties Recovery mechanisms guarantee that these properties are satised in spite of failures As in most proposals found in the current literature, we assume that the xed network is reliable We therefore focus on the problem of recovering MHs As in the case of fault tolerance, we must understand which situations require recovery Moreover, we investigate how recovery could be achieved according to the limited computing and storage resources of the MHs According to [20], we may classify failures in three categories: Transaction Failures: A transaction may abort because the MH was disconnected, as in the read-only transactions and weak-transaction approach On reconnection, the invalidation of its cache is communicated to the transactions or a conict is raised between the updates of the transaction and the updates of other, possibly already committed, transactions Site Failures: The MH crashes, but the content of its permanent storage is not lost Media Failures: The loss of part or all the secondary storage holding the database can occur Recovery in general makes of a log le recording information on the operations executed by both committed and still uncommitted transactions, the last safe state, and what else is needed to rebuild a consistent database in case of failures This information is used during recovery to undo the partially executed transactions (atomicity property) and to redo the committed transactions (durability property) based on the last safe state As we have seen in the previous section, atomicity in mobile environments is aected by failures when a MH m either (1) processes transactions while being disconnected, or (2) submits subtransactions In the latter case, a subtransaction S may have to be undone whose results have been observed by other subtransactions, possibly belonging to an already committed transaction In [21], this situation is dealt with by executing compensating transactions that semantically undo the effects of S (Saga model [21]) This solution however is not always viable because some operations are inherently non-compensatable If the MH m carries some data and autonomously operates on them, as in case (1), other MHs could concurrently update the same data Hence, if MH m recovers

14 14 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 from a failure, the undo of pending transactions produces an obsolete version of the database For this reason, under the most widely adopted solution, the recovered MHs refetch the updated version of the data objects to report their database view to a consistent state The durability property is ensured only on xed stations The problem is to determine when the data maintained at MHs need to be recovered and the most appropriate recovery technique according to the adopted transaction management policy With respect to the classication presented in Section 151, the recovery mechanisms are as follows: Transaction Proxy: As MHs do not maintain any data object, none of the above failure modes needs a MH recovery Read-Only Transactions: The MH m cache may become out of date during disconnections, movements, or crashes Usually, data have a version number [28] or timestamp [21], so that invalid caches can be detected on reconnection When MH m reconnects, it is sent an invalidation message from the MSS (server callback) Such a message could result in query aborts It is up to the MH m to refetch the invalidated data and to reexecute the aborted queries A dierent approach is taken in data-broadcasting algorithms [10, 26], where MSSs broadcast either the whole database or the more frequently accessed data objects and MHs autonomously keep up to date We further discuss these algorithms in Section 16 Weak Transactions: If the disconnected transactions cannot globally commit, according to the denitions we gave in Section 153, a transaction failure occurs The problem here is how to make durable the locally committed (but globally aborted) transactions The recovery procedures that can be adopted range from the automated refetch of the updated data and reexecution of the globally aborted transaction, to the user notication, to the execution of an application-dependent algorithm [30] Such an algorithm has the purpose of understanding whether the globally aborted transaction can be dropped, or only a part of it can be reexecuted, or one of the two previous solutions can be adopted, according to the application semantics [30] The redo is performed according to the recorded log le After the redo is completed, a global commit is tried again Site failures do not aect either the redo of globally aborted transactions or the global commit By contrast, media failures that cause the loss of the log le are unrecoverable Transactions not yet globally committed are lost, and the failure has to be reported to the application In Section 16, we describe some of the algorithms that have been proposed and the recovery mechanisms they implement An important question is that in mobile systems, because of mobility, disconnections, and higher failure rates than in static environments, recovery procedures are likely to be executed more frequently

15 Sec 16 Classes of solutions 15 Hence, besides being lightweight in terms of the required MH resources, recovery procedures should support a fast transaction restart 16 CLASSES OF SOLUTIONS We discussed in the previous sections how fault-tolerance and recovery concepts are adapted to a mobile environment In this section, we describe algorithms for managing replicated data in mobile environments The aim of this description is to highlight the capability of a given approach to satisfy the previously mentioned fault-tolerance requirements 161 The \Data-Broadcasting"Aapproach The data-broadcasting approach is a special case of the read-only transactions approach in which MHs can only query data The database resides on one or more FHs; a MSS holding a copy of the database periodically broadcasts this copy to the MHs in the cell (Figure 15) The broadcast database version corresponds to a checkpoint at a given time; updates are performed between two successive broadcasts ACID properties are guaranteed on FHs and mobile queries cannot cause inconsistencies mobile user MH disconnected MSS broadcast query reconnect tune broadcast data fecth data disconnected broadcast Figure 15 Layout of the data-broadcasting approach This approach requires that an MH stay active by listening to all the incoming messages until it receives the desired data Some improvements to this approach have been proposed that allow MHs to tune in when data are transmitted to save energy This is achieved by broadcasting only the hot-spot data, that is, the most frequently accessed data, and by periodically broadcasting an index, or directory,

16 16 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 of the database Indexes are interleaved with blocks of data and allow MHs to determine which data will be sent next The broadcast of frequently accessed data is based on the principle of data access skew, that is, on the hypothesis that data objects are not accessed with the same frequency (80 : 20 rule) [19] Less accessed data can be retrieved on demand Explicit requests of data are recorded by MSSs and used to adapt the hot -spot composition accordingly The reader can nd in [10, 26] details on how to interleave index information with data in order to optimize the amount of time an MH has to be connected before receiving the data By adopting the data-broadcasting approach, MHs cannot suer transaction failures Simple fault-tolerant mechanisms are required to ensure consistency of the data that MHs are likely to cache The introduction of proper information about the modied data in the index results in an ecient policy to selectively update cached data while a MH is connected [26] In the event of a disconnection or a crash, the consistency is achieved by refetching the data copies in the MH cache The periodic broadcast of index and data guarantees that an MH eventually receives the required information despite network packet loss and corruptions The main advantage of this approach is simplicity and low complexity at the MHs These advantages are, however, obtained at the price of low performance and high-bandwidth consumption It is suitable for applications that involve simple MHs, for example, dumb terminals, accessing public data on FHs 162 The \proxy" approach Under the transaction proxy approach, MHs can generate both queries and transactions Data are maintained at FHs Two methods may be distinguished for managing transactions: (1) MHs submit the transactions or subtransactions [13, 44]; (2) MHs submit the requests for read or write-locks on the required data objects [27] Under the former method, MHs do not maintain data MHs submit transactions or subtransactions to the MSSs they visit while moving Fixed stations are in charge of enforcing correctness properties on data by adopting proper fault-tolerant mechanisms If an MH fails while submitting the query, the failure does not aect the database By contrast, under the latter method, MHs can maintain local data The locks on the data, however, are recorded and managed at the MSSs In [13], the database is assumed to be fully replicated at the MSSs and the MHs can submit transactions to their current MSS The ISIS system provides the required fault tolerance within the group of MSSs The ISIS ABCAST primitive is used to ensure a total order in the delivery of the multicast messages that transport transactions MH status information is partially replicated over cluster of MSSs centered around the current location of the MH Both status and location data are considered and managed as the database objects

17 Sec 16 Classes of solutions 17 In [44], MHs can submit subtransactions and are free to distinguish between a global database and a local database, which is locally maintained and accessed The global database is replicated over the group of MSSs Data correctness is guaranteed among the copies of the global database Suppose that a global transaction or subtransaction T g is executed that precedes a local transaction T l that satises some integrity constraints between global and local databases If T l causally precedes another global transaction T g1, the group of MSSs database servers must process T g1 after T g To enforce this causality in the transaction processing order, global transactions or subtransactions carry a ticket, that is, a timestamp The method described in [27] is quite similar to the one in [44] If that is unlikely, it assumes that MHs can require read and write locks to \handle" the data according to a revised version of the optimistic two-phase-locking Before accessing a data item, an MH m must require the appropriate lock to the current MSS (MSS1); read locks are immediately granted, whereas write locks are postponed at commit time MSS c status_req MH m MH m locks and unlocks read lock (D) read unlock (D) unlock MSS 1 MSS 2 ack (a) (b) Figure 16 Optimistic 2-phase-locking for mobile systems Because of mobility, locks may be requested to dierent MSSs Lock information could be transmitted together with the MH state during the hando procedure To improve performance, however, read locks and unlocks are maintained at the MSSs to which they have been required, up to the commit time; see Figure 16(a) At this point, before granting write locks, the current MSS checks for the existence of conicting locks on the other MSSs The procedure is shown in Figure 16(b), where MSS c is the current MSS It sends a status request to the other copies and waits for all the replies, reporting the locks and unlocks recorded at those sites MSS c tries to match each read lock with a corresponding unlock If it succeeds and no other write lock already exists, it releases all the existing read locks on the other copies and, after receiving their acknowledgments, it grants the write lock to the MH A copy sending the acknowledgment records at the same time to be write-locked An MH failure or disconnection can leave some data unlocked This problem can be handled by associating a time out with each lock On expiration of the time out the lock is unilaterally released The timeout may be specied by the user Once the time out is expired, the MSS that drops the lock must inform the other

18 18 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 copy holders that the transaction has aborted The MH is notied on reconnection, to make its state consistent and to preserve the atomicity property Serializability is guaranteed by associating version numbers with the data Moreover, each copy must record a write-intent lock when it receives a status-request message by MSS c The described methods are suitable to manage both public and shared data and can support both vertical and horizontal applications The algorithm described in [44] also considers the possibility of combining private and global data that have some integrity relationship 163 The \Disconnected" Approach The disconnected approach has been introduced with the Coda le system [28, 30] and has been adopted in other algorithms to manage either le systems [39, 40] or databases [32] It uses the weak transaction approach and enforces correctness by relaxing either the isolation property [28, 30] or the consistency property [32, 39, 40], as we described in Section 153 The mentioned algorithms assume that the database is fully replicated on the MSSs MSS copies are considered rst-class replicas, whose consistency is always guaranteed The MHs can host second-class replicas of the database or of a part of it MHs and MSSs can execute transactions on the database according to an optimistic concurrency control strategy Transactions initiated by connected MHs are executed so that the usual denition of the ACID properties is satised MHs work on the data they have cached up to commit time In [32], these strict transactions are processed guaranteeing onecopy serializability When a strict transaction commits, all the rst- and secondclass copies are updated accordingly, thus automatically guaranteeing cache consistency In Coda, only xed hosts apply the changes to their copies; hence, data maintained on MHs can become inconsistent Cache coherence is enforced among connected MHs with a protocol based on callback primitives The MH whose cache has been invalidated or that experiences a cache miss during transaction processing can obtain the updated data on demand from its MSS Disconnected MHs can rely only on the contents of their caches The problem of cache management in mobile environments has been widely discussed in [42, 43] To this purpose, Coda includes the special-purpose module Venus The Venus module is located at the MH and operates to maintain in cache the most recently used data While preparing for a disconnection (hoarding phase), it can also use user's hints to fetch in the cache the data that are likely to be needed once disconnected; see Figure 17(a) Data are tagged with version numbers that are also stored in the cache A disconnected (weak) transaction T is processed so that it locally serializes with the other weak transactions executed on the same host (Section 153) In the

19 Sec 16 Classes of solutions 19 MH m MH m record weak transactions in the log MH m request (D) log transfer D, version number MSS c MSS c MSS c (a) hoarding (b) disconnected operation (c) reconnection Figure 17 Weak transactions in the Coda le system approach proposed in [32], weak transactions can also be executed by connected MHs Therefore, communication on the wireless network is reduced, but there is the additional requirement that weak transactions cannot observe partial results of strict transactions Operations executed on the database by disconnected transactions are recorded in a log le together with the data version numbers; see Figure 17(b) The log le is stored in permanent storage together with the cached data This allows the MHs to survive long disconnections in spite of the reduced size of the volatile storage The eects of weak transactions are not permanent until they globally commit The global commitment is executed when MH m reconnects; see Figure 17(c) The log le is transferred to the current MSS c, which checks for conicts by comparing the version numbers recorded in the log with those currently associated with its copy of the data MSS c detects conicts by building and analyzing a precedence graph amongst the weak transactions and the previously committed strict transactions If no conicts are detected, the MSS c locks all the data on which disconnected transactions operated, and redos these transactions according to the trace in the log A commit message is sent to MH m and the results are propagated to all the connected hosts holding a database copy Locks on data are released Otherwise, an abort message is sent to MH m ; cascading abort only aects pending disconnected transactions executed on MH m In Coda, a reply le is sent together with the abort message, containing the results of the MSS c attempt and the current state of the database at the MSS c If a transaction aborts during the global commitment phase, three solutions are proposed in [28, 30]: (1) the abort is notied to the application; (2) MH m refetches the updated data and reexecutes the aborted transaction; (3) application-dependent algorithms are executed These algorithms examine the reply le to decide whether one between solutions (1) and (2) is appropriate or whether the transaction can be only partially redone Transaction failures can also be experienced by MHs on local commitment A disconnected transaction aborts when a cache miss occurs or the cache overows as a consequence of either the increase in the log size or the creation of new data In these cases, the MH has to suspend transaction processing waiting for reconnection

20 20 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 Since these algorithms are able to tolerate long disconnections of the MHs without either blocking the system or jeopardizing the correctness of the primary copies of the database, the same mechanisms allow them to tolerate both MHs site and media failures From the point of view of MH recovery, site failures are not harmful The transport protocol ensures that the log transfer is performed as an atomic action Hence, the MH fails either before the transmission be successfully completed or after that In the former case, the global commitment of pending transactions is executed when MH recovers from the crash In the latter case, the MSS autonomously proceeds in its processing and the message containing the outcome of the global commitment is delivered to MH when it recovers On the contrary, media failures causing the loss of the log and the cached data cannot be recovered and the failure has to be notied to the application These algorithms can be executed by MHs having nonnegligible computing and storage capabilities, such as walkstations They can be used to manage both public and shared data, and to support both horizontal and vertical applications In the last case, however, the database cannot always be replicated at all the MSSs, and an MH m must reconnect in a cell in which the access to the database is supported to stabilize its weak transactions Finally, since processing is executed locally, both long and interactive transactions are supported for MHs 164 The \Distributed" Approach This approach adopts the weak-transaction model as the previous one It diers from the disconnected approach in that it supports only strict transactions that require cooperation among the hosts holding a copy of the database The correctness notion used in this approach is based on the usual denition of ACID properties In the literature, only [7] describes an algorithm that follows this approach; hence, we refer to it in the following description Copies of the database are maintained at both FHs and MHs Copies are classi- ed as core (primary) ones and cached (secondary) ones, according to their consistency degree This classication however is independent from the type of the host in which a copy resides It rather depends on whether the host actively participates in the data management or not Cached copies are maintained at the hosts where applications run that can tolerate inconsistent data Only queries can be executed on cached copies On the contrary, hosts requiring consistent data or wishing to generate transactions must belong to the group of core sites Core nodes periodically generate multicast messages containing the current version of the data to bring up to date the cached copies The delivery of these messages must serialize with the queries processed at the cache sites Transactions generated by core sites are processed according to existing algorithms used in wired networks; in particular, in [7], the following approaches to transaction processing are considered:

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT - 1 Mobile Data Management: Mobile Transactions - Reporting and Co Transactions Kangaroo Transaction Model - Clustering Model Isolation only transaction 2 Tier Transaction Model Semantic based nomadic

More information

Transaction Processing in Mobile Database Systems

Transaction Processing in Mobile Database Systems Ashish Jain* 1 http://dx.doi.org/10.18090/samriddhi.v7i2.8631 ABSTRACT In a mobile computing environment, a potentially large number of mobile and fixed users may simultaneously access shared data; therefore,

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Mobile Computing Models What is the best way to partition a computation as well as the functionality of a system or application between stationary and

Mobile Computing Models What is the best way to partition a computation as well as the functionality of a system or application between stationary and Mobile Computig: Conclusions Evaggelia Pitoura Computer Science Department, University of Ioannina, Ioannina, Greece http://www.cs.uoi.gr/~ pitoura Summer School, Jyvaskyla, August 1998 Mobile Computing

More information

Global Transactions Global Transaction Global Manager Transaction Interface Global Global Scheduler Recovery Manager Global Log Server Log

Global Transactions Global Transaction Global Manager Transaction Interface Global Global Scheduler Recovery Manager Global Log Server Log Recovery in Multidatabase Systems Angelo Brayner Federal University of Ceara brayner@lia.ufc.br Theo Harder University of Kaiserslautern haerder@informatik.uni-kl.de Abstract A multidatabase consists of

More information

A can be implemented as a separate process to which transactions send lock and unlock requests The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction

More information

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology

Mobile and Heterogeneous databases Distributed Database System Transaction Management. A.R. Hurson Computer Science Missouri Science & Technology Mobile and Heterogeneous databases Distributed Database System Transaction Management A.R. Hurson Computer Science Missouri Science & Technology 1 Distributed Database System Note, this unit will be covered

More information

Chapter 17: Recovery System

Chapter 17: Recovery System Chapter 17: Recovery System Database System Concepts See www.db-book.com for conditions on re-use Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery

More information

A Transaction Model to Improve Data Availability in Mobile Computing

A Transaction Model to Improve Data Availability in Mobile Computing Distributed and Parallel Databases, 10, 127 160, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. A Transaction Model to Improve Data Availability in Mobile Computing SANJAY KUMAR

More information

Distributed Transaction Management

Distributed Transaction Management Distributed Transaction Management Material from: Principles of Distributed Database Systems Özsu, M. Tamer, Valduriez, Patrick, 3rd ed. 2011 + Presented by C. Roncancio Distributed DBMS M. T. Özsu & P.

More information

Transaction Management. Pearson Education Limited 1995, 2005

Transaction Management. Pearson Education Limited 1995, 2005 Chapter 20 Transaction Management 1 Chapter 20 - Objectives Function and importance of transactions. Properties of transactions. Concurrency Control Deadlock and how it can be resolved. Granularity of

More information

Security Mechanisms I. Key Slide. Key Slide. Security Mechanisms III. Security Mechanisms II

Security Mechanisms I. Key Slide. Key Slide. Security Mechanisms III. Security Mechanisms II Database Facilities One of the main benefits from centralising the implementation data model of a DBMS is that a number of critical facilities can be programmed once against this model and thus be available

More information

Chapter 17: Recovery System

Chapter 17: Recovery System Chapter 17: Recovery System! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management! Failure with

More information

Failure Classification. Chapter 17: Recovery System. Recovery Algorithms. Storage Structure

Failure Classification. Chapter 17: Recovery System. Recovery Algorithms. Storage Structure Chapter 17: Recovery System Failure Classification! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management!

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Distributed Transaction Management. Distributed Database System

Distributed Transaction Management. Distributed Database System Distributed Transaction Management Advanced Topics in Database Management (INFSCI 2711) Some materials are from Database Management Systems, Ramakrishnan and Gehrke and Database System Concepts, Siberschatz,

More information

Control. CS432: Distributed Systems Spring 2017

Control. CS432: Distributed Systems Spring 2017 Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques

More information

Chapter 25: Advanced Transaction Processing

Chapter 25: Advanced Transaction Processing Chapter 25: Advanced Transaction Processing Transaction-Processing Monitors Transactional Workflows High-Performance Transaction Systems Main memory databases Real-Time Transaction Systems Long-Duration

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Abstract One way of avoiding unpredictable delays, in a distributed real-time database, is to allow transactions to commit locally. In a system suppor

Abstract One way of avoiding unpredictable delays, in a distributed real-time database, is to allow transactions to commit locally. In a system suppor A CONFLICT DETECTION AND RESOLUTION MECHANISM FOR BOUNDED-DELAY REPLICATION Johan Lundstrom Submitted by Johan Lundstrom to the University of Skovde as a dissertation towards the degree of M.Sc. by examination

More information

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed McGraw-Hill by

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed McGraw-Hill by Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available

More information

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI PART 1 2 RECOVERY Topics 3 Introduction Transactions Transaction Log System Recovery Media Recovery Introduction

More information

Transaction Processing in a Mobile Computing Environment with Alternating Client Hosts *

Transaction Processing in a Mobile Computing Environment with Alternating Client Hosts * Transaction Processing in a Mobile Computing Environment with Alternating Client Hosts * Sven Buchholz, Thomas Ziegert and Alexander Schill Department of Computer Science Dresden University of Technology

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

MODELS OF DISTRIBUTED SYSTEMS

MODELS OF DISTRIBUTED SYSTEMS Distributed Systems Fö 2/3-1 Distributed Systems Fö 2/3-2 MODELS OF DISTRIBUTED SYSTEMS Basic Elements 1. Architectural Models 2. Interaction Models Resources in a distributed system are shared between

More information

A Concurrency Control for Transactional Mobile Agents

A Concurrency Control for Transactional Mobile Agents A Concurrency Control for Transactional Mobile Agents Jeong-Joon Yoo and Dong-Ik Lee Department of Information and Communications, Kwang-Ju Institute of Science and Technology (K-JIST) Puk-Gu Oryong-Dong

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature

More information

Chapter 19: Distributed Databases

Chapter 19: Distributed Databases Chapter 19: Distributed Databases Chapter 19: Distributed Databases Heterogeneous and Homogeneous Databases Distributed Data Storage Distributed Transactions Commit Protocols Concurrency Control in Distributed

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Advances in Data Management Transaction Management A.Poulovassilis

Advances in Data Management Transaction Management A.Poulovassilis 1 Advances in Data Management Transaction Management A.Poulovassilis 1 The Transaction Manager Two important measures of DBMS performance are throughput the number of tasks that can be performed within

More information

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases Distributed Database Management System UNIT-2 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi-63,By Shivendra Goel. U2.1 Concurrency Control Concurrency control is a method

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Introduction Version: October 25, 2009 2 / 26 Contents Chapter

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

Chapter 19: Distributed Databases

Chapter 19: Distributed Databases Chapter 19: Distributed Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 19: Distributed Databases Heterogeneous and Homogeneous Databases Distributed Data

More information

Correctness Criteria Beyond Serializability

Correctness Criteria Beyond Serializability Correctness Criteria Beyond Serializability Mourad Ouzzani Cyber Center, Purdue University http://www.cs.purdue.edu/homes/mourad/ Brahim Medjahed Department of Computer & Information Science, The University

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition. Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Version: February 21, 2011 1 / 26 Contents Chapter 01: 02: Architectures

More information

UNIT-IV TRANSACTION PROCESSING CONCEPTS

UNIT-IV TRANSACTION PROCESSING CONCEPTS 1 Transaction UNIT-IV TRANSACTION PROCESSING CONCEPTS A Transaction refers to a logical unit of work in DBMS, which comprises a set of DML statements that are to be executed atomically (indivisibly). Commit

More information

CERIAS Tech Report Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien

CERIAS Tech Report Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien CERIAS Tech Report 2003-56 Autonomous Transaction Processing Using Data Dependency in Mobile Environments by I Chung, B Bhargava, M Mahoui, L Lilien Center for Education and Research Information Assurance

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally Hazard-Free Connection Release Jennifer E. Walter Department of Computer Science Texas A&M University College Station, TX 77843-3112, U.S.A. Jennifer L. Welch Department of Computer Science Texas A&M University

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 20 Introduction to Transaction Processing Concepts and Theory Introduction Transaction Describes local unit of database processing Transaction processing systems Systems with large databases and

More information

CA464 Distributed Programming

CA464 Distributed Programming 1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook

More information

Termination Protocol. Database. Site 3

Termination Protocol. Database. Site 3 The Database State Machine Approach Fernando Pedone Rachid Guerraoui Andre Schiper Departement d'informatique Ecole Polytechnique Federale de Lausanne 1015 Lausanne, Switzerland Abstract Database replication

More information

Database Management Systems

Database Management Systems Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Concurrency control CS 417. Distributed Systems CS 417

Concurrency control CS 417. Distributed Systems CS 417 Concurrency control CS 417 Distributed Systems CS 417 1 Schedules Transactions must have scheduled so that data is serially equivalent Use mutual exclusion to ensure that only one transaction executes

More information

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Computer Science and Engineering CS6302- DATABASE MANAGEMENT SYSTEMS Anna University 2 & 16 Mark Questions & Answers Year / Semester: II / III

More information

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply

More information

Chapter 22. Transaction Management

Chapter 22. Transaction Management Chapter 22 Transaction Management 1 Transaction Support Transaction Action, or series of actions, carried out by user or application, which reads or updates contents of database. Logical unit of work on

More information

tion mechanism presented is based on the state machine approach [4], and diers from traditional replication mechanisms in that it does not handle repl

tion mechanism presented is based on the state machine approach [4], and diers from traditional replication mechanisms in that it does not handle repl The Database State Machine Approach Fernando Pedone Rachid Guerraoui Andre Schiper Departement d'informatique Ecole Polytechnique Federale de Lausanne 1015 Lausanne, Switzerland Abstract Database replication

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 17-1

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 17-1 Slide 17-1 Chapter 17 Introduction to Transaction Processing Concepts and Theory Chapter Outline 1 Introduction to Transaction Processing 2 Transaction and System Concepts 3 Desirable Properties of Transactions

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Transactions. Transaction. Execution of a user program in a DBMS.

Transactions. Transaction. Execution of a user program in a DBMS. Transactions Transactions Transaction Execution of a user program in a DBMS. Transactions Transaction Execution of a user program in a DBMS. Transaction properties Atomicity: all-or-nothing execution Consistency:

More information

Chapter 16: Recovery System. Chapter 16: Recovery System

Chapter 16: Recovery System. Chapter 16: Recovery System Chapter 16: Recovery System Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 16: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based

More information

TRANSACTION PROCESSING CONCEPTS

TRANSACTION PROCESSING CONCEPTS 1 Transaction CHAPTER 9 TRANSACTION PROCESSING CONCEPTS A Transaction refers to a logical unit of work in DBMS, which comprises a set of DML statements that are to be executed atomically (indivisibly).

More information

Degrees of Transaction Isolation in SQL*Cache: A Predicate-based Client-side Caching System. Oracle Corporation Palo Alto, CA

Degrees of Transaction Isolation in SQL*Cache: A Predicate-based Client-side Caching System. Oracle Corporation Palo Alto, CA Degrees of Transaction Isolation in SQL*Cache: A Predicate-based Client-side Caching System Julie Basu Arthur M. Keller Stanford University Stanford University and Computer Science Department Oracle Corporation

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014 TRANSACTION PROCESSING SYSTEMS IMPLEMENTATION TECHNIQUES TRANSACTION PROCESSING DATABASE RECOVERY DATABASE SECURITY CONCURRENCY CONTROL Def: A Transaction is a program unit ( deletion, creation, updating

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

Module 8 Fault Tolerance CS655! 8-1!

Module 8 Fault Tolerance CS655! 8-1! Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 01 (version September 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Chapter 14: Recovery System

Chapter 14: Recovery System Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure

More information

CS514: Intermediate Course in Computer Systems

CS514: Intermediate Course in Computer Systems : Intermediate Course in Computer Systems Lecture 23: March 12, 2003 Challenges of Mobility Mobility is a huge topic Breaks existing applications Anything bandwidth intensive or synchronous Opportunities

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols Preprint Incompatibility Dimensions and Integration of Atomic Protocols, Yousef J. Al-Houmaily, International Arab Journal of Information Technology, Vol. 5, No. 4, pp. 381-392, October 2008. Incompatibility

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

Lecture X: Transactions

Lecture X: Transactions Lecture X: Transactions CMPT 401 Summer 2007 Dr. Alexandra Fedorova Transactions A transaction is a collection of actions logically belonging together To the outside world, a transaction must appear as

More information

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

CS5412: TRANSACTIONS (I)

CS5412: TRANSACTIONS (I) 1 CS5412: TRANSACTIONS (I) Lecture XVII Ken Birman Transactions 2 A widely used reliability technology, despite the BASE methodology we use in the first tier Goal for this week: in-depth examination of

More information

Chapter 15 : Concurrency Control

Chapter 15 : Concurrency Control Chapter 15 : Concurrency Control What is concurrency? Multiple 'pieces of code' accessing the same data at the same time Key issue in multi-processor systems (i.e. most computers today) Key issue for parallel

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

Problems Caused by Failures

Problems Caused by Failures Problems Caused by Failures Update all account balances at a bank branch. Accounts(Anum, CId, BranchId, Balance) Update Accounts Set Balance = Balance * 1.05 Where BranchId = 12345 Partial Updates - Lack

More information

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Important Lessons. Today's Lecture. Two Views of Distributed Systems Important Lessons Replication good for performance/ reliability Key challenge keeping replicas up-to-date Wide range of consistency models Will see more next lecture Range of correctness properties L-10

More information

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc Extensions to RTP to support Mobile Networking Kevin Brown Suresh Singh Department of Computer Science Department of Computer Science University of South Carolina Department of South Carolina Columbia,

More information

Introduction to Transaction Processing Concepts and Theory

Introduction to Transaction Processing Concepts and Theory Chapter 4 Introduction to Transaction Processing Concepts and Theory Adapted from the slides of Fundamentals of Database Systems (Elmasri et al., 2006) 1 Chapter Outline Introduction to Transaction Processing

More information

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the

1 Introduction A mobile computing system is a distributed system where some of nodes are mobile computers [3]. The location of mobile computers in the Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems Ravi Prakash and Mukesh Singhal Department of Computer and Information Science The Ohio State University Columbus, OH 43210. e-mail:

More information

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat Fault tolerance with transactions: past, present and future Dr Mark Little Technical Development Manager, Overview Fault tolerance Transaction fundamentals What is a transaction? ACID properties Distributed

More information

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc.

Foundation of Database Transaction Processing. Copyright 2012 Pearson Education, Inc. Foundation of Database Transaction Processing Copyright 2012 Pearson Education, Inc. Chapter Outline - 17.1 Introduction to Transaction Processing - 17.2 Transaction and System Concepts - 17.3 Desirable

More information

Concurrency Control and Recovery. Michael J. Franklin. Department of Computer Science and UMIACS. University of Maryland.

Concurrency Control and Recovery. Michael J. Franklin. Department of Computer Science and UMIACS. University of Maryland. Concurrency Control and Recovery Michael J. Franklin Department of Computer Science and UMIACS University of Maryland College Park, MD 1 Introduction Many service-oriented businesses and organizations,

More information

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery In This Lecture Database Systems Lecture 15 Natasha Alechina Transactions Recovery System and Media s Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand Widom8.6

More information

Distributed Transaction Management 2003

Distributed Transaction Management 2003 Distributed Transaction Management 2003 Jyrki Nummenmaa http://www.cs.uta.fi/~dtm jyrki@cs.uta.fi General information We will view this from the course web page. Motivation We will pick up some motivating

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics System Models Nicola Dragoni Embedded Systems Engineering DTU Informatics 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models Architectural vs Fundamental Models Systems that are intended

More information