MSS. MH <Disconnected> MSS MSS. wired link. wireless link. cell boundary

Size: px

Start display at page:

Download "MSS. MH <Disconnected> MSS MSS. wired link. wireless link. cell boundary"

Bennett Andrews
5 years ago
Views:

1 Chapter 1 Fault Tolerance and Recovery in Mobile Computing Systems Elisa Bertino, Elena Pagani, and Gian Paolo Rossi 11 INTRODUCTION Through wireless networks, mobile personal machines have the ability to access data and services that can be located on both mobile and wired servers Unlike wired hosts, mobile hosts can be temporarily unreachable as a consequence of their moving across dierent cells, their energy limitation or unavailable wireless channels Mobility forces mobile hosts to alternate connected and disconnected work When connected, they perform personal communications and access shared data and services; when disconnected, they can process locally cached data objects As for wired networks, data replication is the key element to ensure high data availability and to increase performance However, disconnected work and the uncertainties of the underlying wireless network introduce new challenging issues that have been recently discussed in the literature There are three main aspects that we wish to discuss in this chapter: 1 how to provide a fault-tolerant architecture that addresses data access and management despite mobility and disconnected work 2 how to manage data replication to ensure data consistency, integrity, and durability according to the application requirements 3 the extend to which the general requirement of network independency can be met or otherwise the application awareness of mobility can be eectively exploited to provide the level of quality of service needed 1

2 2 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 In this chapter, we investigate how the problem of managing a distributed database and guaranteeing data consistency is aected by the characteristics of the mobile setting We discuss the impact of mobility and disconnections on fault tolerance and recovery We investigate how fault-tolerance can be ensured, by analyzing some of the algorithms proposed for database management in mobile systems This chapter is organized as follows In Section 12, we describe the mobile environment and introduce data management issues in such an environment In Section 13, we present the system architecture to which we refer in the remainder of the discussion, and in Section 14, we characterize mobile applications In Section 15, we investigate how fault tolerance is aected in mobile environments and how the ACID properties can be re dened to guarantee data correctness in a mobile setting In Sections 16 and 17, we discuss some of the approaches proposed for managing distributed databases Finally, in Section 18 we report some performance evaluation results concerning some of the described algorithms 12 DISTRIBUTED SYSTEMS WITH MOBILE HOSTS Unlike computer networks with xed stations (FHs), a mobile host (MH) can retain its network connection even while moving This is possible because of the use of dierent network technologies, such as radio links, satellite networks, and infrared links [35, 36], that do not impose any physical constraint to the hosts, that is, they are wireless Wireless networks may be classied in single-hop [3, 13, 23, 24, 27, 28] and multihop [11, 12]; in the latter case, all the machines in the system are mobile, whereas in the former, both mobile and xed stations are involved In the sequel, we restrict our attention almost only to single-hop systems, as they are the most considered in the literature Single-hop networks are organized as shown in Figure 11: Some of the xed hosts, denoted as Mobile Support Stations (MSSs) [1, 2, 4, 24] are equipped with a wireless interface; they support communication between the MHs that reside in a cell and the MHs in dierent cells The cell is the area in which the signal generated by the MSS can be received by the MHs The messages generated by a given MSS are broadcast within the cell The MHs lter the messages according to their destination address; on the contrary, a MH can communicate with another MH of the same cell only by sending its message to the cell MSS that executes the broadcast FHs and MSSs are connected through a wired network, whose topology is static and used to support the communication between cells Because of movements, the topology of the wireless network may change over time The diameter of the cells may vary according to the wireless technology foe example, it spans from a few meters for infrared technology to 1 to 2 miles for radio or single-hop satellite networks [36] Moreover, the technology also aects the available bandwidth: LANs that use infrared technology have transfer rates on

3 Sec 12 Distributed systems with mobile hosts 3 MH <Disconnected> MH MH MSS FH MH MH MH MSS MH MH MSS MH wired link wireless link FH cell boundary Figure 11 Example of a single-hop network the order of 1 to 2 Mbps (up to 100 Mbps in the recent experiments [36]); on the contrary, WANs have poorer performance, as they usually provide bandwidths in the range 14:4 to 64 Kbps Wireless networks that oer around 100-Kbps services are under development [36] Finally, wireless networks are supposed to be less reliable than wired ones: It has been estimated that the failure rate will increase at least of one order of magnitude with respect to the current wired networks Cells may overlap Hence, a MH may be contemporarily in more than one cell although it refers to only one MSS a time In most systems, MHs choose their current MSS according to the highest signal they receive [1, 13] When the MH m moves from MSS 1 to MSS 2, a hando (or handover) procedure [4, 24] is activated between MSS 1 and MSS 2 to transfer the state information about MH m to MSS 2 A mobility assumption [1, 2] is required to ensure system liveness: MH m resides in a cell at least for the time sucient to complete the hando procedure and to allow the MSS to deliver to MH m at least one of the messages that were still pending at MSS 1 ; in this way, we guarantee that messages do not starve In their wandering, MHs could move to places that are out of the cell coverage, that is, become disconnected This depends on the system capability to cover a given geographical area In the United States [36], the wireless service with the broadest coverage is Ardis, which reaches more than the 80% of the US population and spans around 90% of the metropolitan areas and the 30? 40% of the rest of the country As a counterpart, Ardis has a very low bandwidth; it oers 4:8-Kbps connections Other wireless networks exist but because of the current lack of standards, it is not possible to exploit the services of dierent wireless networks to gain a greater coverage Disconnections can also occur because of several events, for example, the MH

4 4 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 may exhaust its battery charge, it can be lost, or it can crash MHs can be classied as either dumb terminals or walkstations [24] The former ones are diskless hosts (such as, for instance, palmtops) with reduced memory and computing capabilities They can receive from the wireless network, but they are not able to send messages Walkstations are comparable to classical workstations and can both receive and send messages on the wireless network We will focus our considerations on this latter type of MHs Despite their computing resources, MHs are mainly constrained by the short lifetime of batteries [8] that are heavily aected by the communications over wireless channels To reduce the energy waste, MHs enter a doze mode when they are not involved in sending or receiving packets A doze MH only has the network interface active, which is able to lter the messages broadcast in the cell on the basis of the destination address If a message is observed that is addressed to the MH, the system is awakened to revert to the normal operation mode The described system behaviors impact on the design of distributed applications To our purposes, the most relevant are the high failure rate requires to address both the fault-tolerance and the recovery issues the energy-saving argument generates some new constraints that must be considered while designing the services to support distributed applications 13 SYSTEM ARCHITECTURE Figure 12 helps us to identify the main functional modules that compose a MH architecture [29] The hardware interface provides the physical access to the network and also lters the messages broadcast in the cell according to their destination address Mobile applications Data/Service/Resource mobility Mobile transport protocol (Mobile TCP) Multicast transport protocol Mobile IP/Handoff procedure Hardware wireless interface Figure 12 Reference architecture of a mobile host On top of it, the Network layer, for example, through the Mobile-IP [31] protocol, provides transparent addressing of MHs and executes the hando procedure [6, 38] Communication over the wireless network is unreliable, that is, packets are lost, corrupted, or duplicated, and the transmission delay is highly variable due to different wireless technologies and load conditions Hence, a wireless network may be

5 Sec 14 Mobile applications 5 considered as an asynchronous system; this implies the unpredictable duration of transaction processing The transport layer masks network uncertainties to upper layers and provides some sort of reliable, point-to-point, or multicast [1, 2] channels amongst MHs Certain multicast transport channels (eg, [1, 2]) can ensure FIFO order in the message delivery The higher layer provide the value-added services to directly support the application communication requirements These services mainly address the management of the data objects and les in the presence of mobility They are also responsible for negotiating with MSSs the quality of the service according to both user requirements and services actually supplied by the wireless network [7, 13, 15, 22, 28, 29, 32] The problem of locating MHs, ie of knowing their current position to allow the routing of the messages, has received great attention [5, 25] and is emphasized by the trend of reducing the cell's size to improve the communication bandwidth Location service is architecturally located within the network layer, although some interesting evolution of the basic location service are oriented to allow their direct use by mobile applications (see, eg, [41]) 14 MOBILE APPLICATIONS Applications that run on mobile hosts are likely to have dierent requirements with respect to those designed for traditional environments Most users will use MHs for personal communications (eg, , around 25%) and for mobile oce activities (around 45%) [36] The latter possibility implies the ability of porting existing applications on MHs and of allowing them to access and share remote data objects In [23, 24] a rst attemp to classify mobile applications has been introduced based on the locality of the data the application accesses In vertical applications, the users access the data within a specic cell and the access is denied to users that are out of that cell, for instance, data concerning the availability of parking places in that cell, the position of the nearest doctor, or the personal identities of the other users in the cell On the contrary, horizontal applications handle data that span over users being distributed on the whole system; typically, they are applications whose users cooperate toward a common task, in spite of their movements, or multimedia applications such as conferencing The nature of the applications impacts on the pattern of access in reading and writing the data; in particular, in [23, 24] the following classes of data have been identied: Private Data: They are maintained, accessed, and managed by a single user, the owner; no other users may access the data Public Data: One user may update them, and all the users of the system can read them Consider, for instance, applications such as weather forecast, news bulletins, or broadcast of nancial data Another important kind of

6 6 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 information in this category is location data [41], that is, data concerning the identity of the cell in which a MH currently resides In [41], data have been further classied into three categories according to their semantics, which reects the frequency of their updating: (1) terminal mobility data, which concern the location of the host; (2) personal mobility data, which concern the user's identity and are used for the user authentication and (3) service mobility data, which describe the users' proles, regarding, for instance, the customization of the applications they use or the subscribed services Shared Data: They are accessed both in read and write by a group of users cooperating to a common task (eg, cooperative workgroup) or managing multiple copies of the data to achieve availability and reliability Whereas public data are mainly managed by vertical applications, the use of shared data in the framework of horizontal applications introduces a general and complete range of fault-tolerance and recovery issues that mainly concern the topics of this chapter In this work, we will mainly consider this setting To ensure the service availability and to improve the performance, shared data can be replicated Copies may be located both in xed and mobile stations Mobility introduces new challenging issues in the design of the mechanisms that guarantee data consistency and integrity The scalability of these mechanisms over a possibly large amount of MHs is also an important issue 15 FAULT TOLERANCE IN MOBILE DATABASE PLATFORMS As the mobile setting highly diers from the xed setting, it is necessary to redene what a failure is, and what \fault tolerance" means in this new context In general, a system is fault-tolerant if it guarantees to behave correctly with respect to its service specication despite malfunctions; in the case of database systems, correctness is usually dened in terms of ACID properties In this section, we explore the approaches to fault-tolerance in database management and show some examples of normal MHs behaviours that may be misinterpreted as failures We discuss the impact of these behaviours on the correct operation of the system and show how fault-tolerance may be redened according to these considerations, and how to achieve it In the following, we do not consider the failures on the wireless network because their detection and recovery are the responsibility of the transport protocol A reliable transport service is observed at the interface with the transport protocol (see Section 12) The services we consider throughout this chapter are built on top of such reliable transport protocol 151 Transaction Execution in Mobile Database Systems The characteristics of MHs introduce new fault-tolerance issues in transaction management Among these issues, the capability of tolerating the disappearance of

7 Sec 15 Fault-tolerance in mobile database platforms 7 MHs from the cells is of primary concern because of mobility and disconnections Whether the MHs store the entire database or part of it, and actively participate in the management of the database, is a design choice that impacts on the eects that failures may have MH m submit T MSS l t 0 MH m < disconnected > MSS l < processing T> t 1 (a) MH m query D D MSS l MH m < processing query disconnected > t 0 MSS l < processing T> (b) MH m ask for data item D D MSS l t 0 MH m t 1 < processing T disconnected > MSS l < processing T> (c) Figure 13 Management of mobile databases (a) The MH m submits a transaction T according to the transaction proxy approach (b) The MH m submits a query on the data D according to the read-only transaction approach (c) The MH m submits a transaction T according to the weak transaction approach Most approaches assume that copies of data on MHs are secondary copies, whereas primary copies are maintained at FHs and MSSs As we will see in Section 16, dierent approaches may be used that give more or less autonomy to MHs in operating on the database Almost all these approaches, however, force MHs to perform periodic checkpoints and to maintain their backups on FHs The adopted approaches may be classied as follows: Transaction Proxy: The MHs do no execute any computation, but instead ask the MSSs to execute transactions on their behalf [13, 27, 44] Therefore, the MSSs always hold the consistent database, and MHs do not need to execute t 1

8 8 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 any update action on the data objects nor to keep any data object in their caches, see Figure 13(a) Read-Only Transactions: MHs only cache data objects for queries, and updates are performed as in the preceding case, see Figure 13(b) Weak Transactions: Besides performing queries on cached data, MHs may update data objects in their caches even while disconnected [7, 28, 30, 40] see Figure 13(c) In this case, they must stabilize their updates as soon as they re-connect, that is, they have to globally commit the updates in order to re-establish consistency and to guarantee durability For the purpose of stabilizing the disconnected transactions or undoing them in the event of an abort, a log is maintained in secondary storage at the MH, recording the actions executed by uncommitted transactions [28, 30] The log of each transaction is sent to the MSS on reconnection, so that the MSS can reexecute the transaction on its primary copy, to verify whether it can safely commit It should be evident that in the case of the weak-transaction approach, applications have to deal with more dicult fault-tolerance problems than in the transaction proxy approach, because of the maintenance of data objects stored by the MHs In the weak-transaction approach, recovery mechanisms must be designed by properly taking into account the scarce availability of storage resources of walkstations 152 Impact of mobility on transaction correctness In section 12, we have shown that a mobile environment is characterized by hosts that can be temporarily unreachable, because of entering the doze mode, disconnecting or moving to uncovered zones, and by the intrinsic asynchrony of the underlying network environment In this section, we discuss how these features may lead to the violation of ACID properties or may jeopardize the liveness of the system even in the absence of failures Figure 14 shows two cases In the rst case [Figure 14(a)], MH m caches a set D of data objects while being in cell l, and then it disconnects and continues to process its transactions while being disconnected When it later reconnects to the same or to a dierent MSS, its copy of the data is inconsistent with respect to the one held at MSSs The same problem also arises when the MH m caches data for read-only transactions because of the updates carried out by other FHs or MHs, whereas MH m is unreachable To solve this problem, MH m can lock the data objects D at the MSS l site while executing local transactions, thus preventing concurrent execution of other transactions originating from the MSS l or other MHs [Figure 14(b)] If MH m disconnects or cannot be reached for a long time, it can be suspected of having failed If MSS l maintains the lock, problems arise in the event of actual crashes or

9 Sec 15 Fault-tolerance in mobile database platforms 9 MH m MH m data D MSS l D MSS k lock D MSS l D < locked > MSS k (a) (b) Figure 14 Example of possibly incorrect behaviurs in the case of movement of an MH long-lived disconnections If the MSS l releases the lock after a time out the problem arises of inconsistent copies [Figure 14(a)] The preceding problems arise because of the diculty in distinguishing a temporaneous unreachability from a crash It is possible,however, to distinguish between planned and sudden disconnections [4] The former are predictable When the MH becomes doze, or it disconnects either to recharge the batteries or to save power, some safety actions can be performed to tolerate such a temporary disconnection For instance, the pending transactions can be moved to the destination MSS, where an agent can execute them on behalf of the mobile application to allow the MH to obtain the results on reconnection, as proposed in [13] Another approach allows MHs to prefetch in the cache the data they require while being disconnected Appropriate algorithms are designed to reestablish consistency among the existing copies of the data when the MH reconnects 153 Transaction Correctness in Mobile Computing Systems The ability of working in a mobile context must coexist with the possibility that even normal system conditions may lead to the violation of the database correctness As a consequence, the eorts to achieve fault tolerance have been addressed to redesign the notion of correctness rather than to redene the notion of failure A number of alternative denitions of ACID properties have been proposed [15, 21, 30, 33] that weaken one or more of the properties In general, their goal is to guarantee the MHs a certain degree of autonomy in transaction processing during disconnections, and to preserve the (modied) system correctness by allowing bounded inconsistencies among the data copies In the following, we describe how each property has been redened in some proposals in the scientic literature Usually, only one property is considered at a time The weakening of a given property, however, may impact the other ACID properties Atomicity property The rst step toward fault tolerance in mobile systems is allowing MHs to submit \pieces" of transactions in dierent cells according to

10 10 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 their movements Several alternative methods are described in [21] This approach weakens the classical formulation of atomicity and requires the ability of breaking a transaction so that subtransactions can be concurrently executed and interleaved with subtransactions of other transactions while guaranteeing other ACID properties These mobile transaction models are based on extended transaction models developed for long-duration transactions, such as Open Nested Transactions (ONT) [16] and Saga Transactions [18] For instance, in the ONT model, the abort of one or more subtransactions does not necessarily imply the abort of the entire transaction Hence, when a transaction T commits, only some of its operations may have been actually executed The decomposition of a transaction in subtransactions can be performed according to dierent principles and at dierent levels of granularity Dierent approaches to decomposition have been proposed for each of the three transaction models discussed in Section 151 In the case of transaction proxies and read-only transactions, transactions may be either submitted as a whole at a unique MSS [13, 44], or they may be split during processing (Kangaroo model, [21]) following the movements of the MH that submitted the transaction In the latter case, communication costs are reduced by relocating computations as near to the MH as possible In the case of weak transactions, a transaction T is decomposed into mutually independent subtransactions This decomposition ensures that the subtransactions of a transaction T can be concurrently processed at the dierent MSSs to which they have been submitted, and that their execution order does not impact on the successful commit of T Independence may be guaranteed, for instance, according to Bernstein's conditions [9] Each subtransaction S i has a write set W i and a read set R i Every two subtransactions S i and S j of a transaction T satisfy the following conditions: W i \ W j = ; R i \ W j = ; W i \ R j = ; that guarantee their independence (Reporting and Co-transaction model [21]) This approach is based on the Split Transaction Model [34] A run-time support must exist that computes the decomposition by determining an appropriate partition of the read and write sets of the transaction Another approach is based on fragmenting a transaction T so that each of its subtransactions S i executes operations that are commutative with those of the other subtransactions of T [21] Both in the Kangaroo model and in the Reporting and Co-transaction model, the hando procedure must be extended to involve the transfer of information concerning pending transactions generated by the MH These models imply a redenition of the other ACID properties Both isolation and durability are restricted to subtransactions instead of global transactions In the case of weak transactions the consistency property is also aected, and a mechanism is required to merge copies and reestablish consistency on reconnection

11 Sec 15 Fault-tolerance in mobile database platforms 11 Consistency property An approach alternative to that of weakening the atomicity property consists in the redenition of the consistency property Under this approach, the database is considered partitioned in clusters, either according to semantics-based criteria (eg, data objects related by integrity constraints belong to the same cluster) or to location proximity (clustering model [21, 33]) Data in the same cluster must be strictly consistent, whereas a bounded degree of inconsistency is tolerated amongst clusters, according to some denition of consistency Hence, clustering can, for instance which support multiversion databases or tolerate divergences between the secondary copy of the data maintained at a MH (that constitutes a cluster) and the primary copies on FHs MHs are therefore allowed to process transactions while being disconnected According to this approach, two classes of primitives are used to update data: 1 weak-write and weak-read that modify data only in the local cluster, thus possibly causing inconsistencies with respect to other clusters 2 strict-write and strict-read that modify data in the global database, thus maintaining consistency These primitives are executed so that operations that work on the same cluster do not conict Conicts are prevented by locking mechanisms In assigning locks, the usual lock compatibility modes are applied Moreover, to guarantee that weak operations do not observe intermediate results produced by strict operations, strictwrite locks and weak-read locks conict By contrast, strict-read and weak-write operations are not conicting operations The implementation must only guarantee that a strict-read operation reads the value written by the last strict-write operation A consequence of the redenition of the consistency property is a more complex notion of serializability Strict transactions must serialize with respect to each other according to one-copy serializability [17] Moreover, let the projection of a strict transaction T on a MH m be the subtransaction of T that operates on the data objects held at the MH m Weak transactions processed at MH m must serialize always according to the one-copy serializability with respect to each other and with respect to the projections on MH m of the strict transactions The degree of inconsistency can be dened, for instance, in terms of the maximum number of versions of the same data objects that can exist at the same time or the maximum number of weak operations that can be executed on a copy of a data without being propagated to the other copies [32] A timestamp can be associated with each datum so that locks on that data are automatically released after the expiration of a time out (time-based consistency model [21]) This way, MHs can operate disconnected for a limited interval of time Isolation property Some transaction models have been devised for mobile environments in which the isolation property is not guaranteed, that is, intermediate results of a transaction T can be observed by other transactions This is usually a side eect of the relaxation of other ACID properties

12 12 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 We have observed in the previous paragraph that isolation can be enforced by properly modifying conict rules amongst locks, in the case of operations that work on data having dierent degrees of consistency By contrast, if a transaction model is adopted such as those mentioned in the discussion on atomicity, the isolation property only holds for subtransactions If a subtransaction S j of a transaction T j is processed at a MSS after that subtransaction S i of a transaction T i has been processed at the same MSS, then S j can observe the result of S i, that is, an intermediate result of T i Helal et al observe in [21] that the sharing of partial results amongst transactions, for example, by means of Reporting Transactions, may be desirable for some applications On the other hand, enforcing isolation property is probably expensive in that it may severely restrict concurrency among transactions Durability property Durability of committed transactions is mainly aected by the possibility MHs have of autonomously operating on data In some transaction models, MHs can execute transactions on locally stored data even while being disconnected (Reporting and Co-transaction model, Clustering model [21]) If a MH m operating in the disconnected mode fails before stabilizing the results of its committed transactions on the primary copies of the database, these results could be never recovered (for example, in the case of a media failure) To ensure fault tolerance, in the Coda le system [28, 30], the durability property is relaxed by providing two types of transactions and two degrees of commitment First-class transactions are those executed by either connected MHs or users on FHs, and second-class transactions are those processed by disconnected MHs A disconnected MH can only commit a transaction locally if this transaction does not conict with other transactions executed on the same host while the host is disconnected On reconnection, the transactions are globally committed, unless they conict with already committed rst-class or second-class transactions executed on dierent MHs Hence, rst-class transactions have one level of commitment a rst-class transaction can commit if it is serializable with respect to all the transactions previously committed By contrast, second-class transactions are subject to two levels of commitment: local commitment: the transaction can commit if it is serializable with respect to all the previously committed second-class transactions executed on the same host global commitment: the transaction can commit if it is serializable with respect to all the committed transactions in the system Two levels of commitment have also been adopted for the weak transactions described in the consistency paragraph [33] The global commitment allows to detect

13 Sec 15 Fault-tolerance in mobile database platforms 13 possible inconsistencies caused by weak transactions on reconnection A locally committed transaction however can globally abort The durability of locally committed transactions is not guaranteed until these transactions globally commit 154 Recovery in Mobile Databases Transaction recovery deals with the capability of ensuring failure atomicity [14] It concerns the durability and the atomicity properties Recovery mechanisms guarantee that these properties are satised in spite of failures As in most proposals found in the current literature, we assume that the xed network is reliable We therefore focus on the problem of recovering MHs As in the case of fault tolerance, we must understand which situations require recovery Moreover, we investigate how recovery could be achieved according to the limited computing and storage resources of the MHs According to [20], we may classify failures in three categories: Transaction Failures: A transaction may abort because the MH was disconnected, as in the read-only transactions and weak-transaction approach On reconnection, the invalidation of its cache is communicated to the transactions or a conict is raised between the updates of the transaction and the updates of other, possibly already committed, transactions Site Failures: The MH crashes, but the content of its permanent storage is not lost Media Failures: The loss of part or all the secondary storage holding the database can occur Recovery in general makes of a log le recording information on the operations executed by both committed and still uncommitted transactions, the last safe state, and what else is needed to rebuild a consistent database in case of failures This information is used during recovery to undo the partially executed transactions (atomicity property) and to redo the committed transactions (durability property) based on the last safe state As we have seen in the previous section, atomicity in mobile environments is aected by failures when a MH m either (1) processes transactions while being disconnected, or (2) submits subtransactions In the latter case, a subtransaction S may have to be undone whose results have been observed by other subtransactions, possibly belonging to an already committed transaction In [21], this situation is dealt with by executing compensating transactions that semantically undo the effects of S (Saga model [21]) This solution however is not always viable because some operations are inherently non-compensatable If the MH m carries some data and autonomously operates on them, as in case (1), other MHs could concurrently update the same data Hence, if MH m recovers

14 14 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 from a failure, the undo of pending transactions produces an obsolete version of the database For this reason, under the most widely adopted solution, the recovered MHs refetch the updated version of the data objects to report their database view to a consistent state The durability property is ensured only on xed stations The problem is to determine when the data maintained at MHs need to be recovered and the most appropriate recovery technique according to the adopted transaction management policy With respect to the classication presented in Section 151, the recovery mechanisms are as follows: Transaction Proxy: As MHs do not maintain any data object, none of the above failure modes needs a MH recovery Read-Only Transactions: The MH m cache may become out of date during disconnections, movements, or crashes Usually, data have a version number [28] or timestamp [21], so that invalid caches can be detected on reconnection When MH m reconnects, it is sent an invalidation message from the MSS (server callback) Such a message could result in query aborts It is up to the MH m to refetch the invalidated data and to reexecute the aborted queries A dierent approach is taken in data-broadcasting algorithms [10, 26], where MSSs broadcast either the whole database or the more frequently accessed data objects and MHs autonomously keep up to date We further discuss these algorithms in Section 16 Weak Transactions: If the disconnected transactions cannot globally commit, according to the denitions we gave in Section 153, a transaction failure occurs The problem here is how to make durable the locally committed (but globally aborted) transactions The recovery procedures that can be adopted range from the automated refetch of the updated data and reexecution of the globally aborted transaction, to the user notication, to the execution of an application-dependent algorithm [30] Such an algorithm has the purpose of understanding whether the globally aborted transaction can be dropped, or only a part of it can be reexecuted, or one of the two previous solutions can be adopted, according to the application semantics [30] The redo is performed according to the recorded log le After the redo is completed, a global commit is tried again Site failures do not aect either the redo of globally aborted transactions or the global commit By contrast, media failures that cause the loss of the log le are unrecoverable Transactions not yet globally committed are lost, and the failure has to be reported to the application In Section 16, we describe some of the algorithms that have been proposed and the recovery mechanisms they implement An important question is that in mobile systems, because of mobility, disconnections, and higher failure rates than in static environments, recovery procedures are likely to be executed more frequently

15 Sec 16 Classes of solutions 15 Hence, besides being lightweight in terms of the required MH resources, recovery procedures should support a fast transaction restart 16 CLASSES OF SOLUTIONS We discussed in the previous sections how fault-tolerance and recovery concepts are adapted to a mobile environment In this section, we describe algorithms for managing replicated data in mobile environments The aim of this description is to highlight the capability of a given approach to satisfy the previously mentioned fault-tolerance requirements 161 The \Data-Broadcasting"Aapproach The data-broadcasting approach is a special case of the read-only transactions approach in which MHs can only query data The database resides on one or more FHs; a MSS holding a copy of the database periodically broadcasts this copy to the MHs in the cell (Figure 15) The broadcast database version corresponds to a checkpoint at a given time; updates are performed between two successive broadcasts ACID properties are guaranteed on FHs and mobile queries cannot cause inconsistencies mobile user MH disconnected MSS broadcast query reconnect tune broadcast data fecth data disconnected broadcast Figure 15 Layout of the data-broadcasting approach This approach requires that an MH stay active by listening to all the incoming messages until it receives the desired data Some improvements to this approach have been proposed that allow MHs to tune in when data are transmitted to save energy This is achieved by broadcasting only the hot-spot data, that is, the most frequently accessed data, and by periodically broadcasting an index, or directory,

16 16 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 of the database Indexes are interleaved with blocks of data and allow MHs to determine which data will be sent next The broadcast of frequently accessed data is based on the principle of data access skew, that is, on the hypothesis that data objects are not accessed with the same frequency (80 : 20 rule) [19] Less accessed data can be retrieved on demand Explicit requests of data are recorded by MSSs and used to adapt the hot -spot composition accordingly The reader can nd in [10, 26] details on how to interleave index information with data in order to optimize the amount of time an MH has to be connected before receiving the data By adopting the data-broadcasting approach, MHs cannot suer transaction failures Simple fault-tolerant mechanisms are required to ensure consistency of the data that MHs are likely to cache The introduction of proper information about the modied data in the index results in an ecient policy to selectively update cached data while a MH is connected [26] In the event of a disconnection or a crash, the consistency is achieved by refetching the data copies in the MH cache The periodic broadcast of index and data guarantees that an MH eventually receives the required information despite network packet loss and corruptions The main advantage of this approach is simplicity and low complexity at the MHs These advantages are, however, obtained at the price of low performance and high-bandwidth consumption It is suitable for applications that involve simple MHs, for example, dumb terminals, accessing public data on FHs 162 The \proxy" approach Under the transaction proxy approach, MHs can generate both queries and transactions Data are maintained at FHs Two methods may be distinguished for managing transactions: (1) MHs submit the transactions or subtransactions [13, 44]; (2) MHs submit the requests for read or write-locks on the required data objects [27] Under the former method, MHs do not maintain data MHs submit transactions or subtransactions to the MSSs they visit while moving Fixed stations are in charge of enforcing correctness properties on data by adopting proper fault-tolerant mechanisms If an MH fails while submitting the query, the failure does not aect the database By contrast, under the latter method, MHs can maintain local data The locks on the data, however, are recorded and managed at the MSSs In [13], the database is assumed to be fully replicated at the MSSs and the MHs can submit transactions to their current MSS The ISIS system provides the required fault tolerance within the group of MSSs The ISIS ABCAST primitive is used to ensure a total order in the delivery of the multicast messages that transport transactions MH status information is partially replicated over cluster of MSSs centered around the current location of the MH Both status and location data are considered and managed as the database objects

17 Sec 16 Classes of solutions 17 In [44], MHs can submit subtransactions and are free to distinguish between a global database and a local database, which is locally maintained and accessed The global database is replicated over the group of MSSs Data correctness is guaranteed among the copies of the global database Suppose that a global transaction or subtransaction T g is executed that precedes a local transaction T l that satises some integrity constraints between global and local databases If T l causally precedes another global transaction T g1, the group of MSSs database servers must process T g1 after T g To enforce this causality in the transaction processing order, global transactions or subtransactions carry a ticket, that is, a timestamp The method described in [27] is quite similar to the one in [44] If that is unlikely, it assumes that MHs can require read and write locks to \handle" the data according to a revised version of the optimistic two-phase-locking Before accessing a data item, an MH m must require the appropriate lock to the current MSS (MSS1); read locks are immediately granted, whereas write locks are postponed at commit time MSS c status_req MH m MH m locks and unlocks read lock (D) read unlock (D) unlock MSS 1 MSS 2 ack (a) (b) Figure 16 Optimistic 2-phase-locking for mobile systems Because of mobility, locks may be requested to dierent MSSs Lock information could be transmitted together with the MH state during the hando procedure To improve performance, however, read locks and unlocks are maintained at the MSSs to which they have been required, up to the commit time; see Figure 16(a) At this point, before granting write locks, the current MSS checks for the existence of conicting locks on the other MSSs The procedure is shown in Figure 16(b), where MSS c is the current MSS It sends a status request to the other copies and waits for all the replies, reporting the locks and unlocks recorded at those sites MSS c tries to match each read lock with a corresponding unlock If it succeeds and no other write lock already exists, it releases all the existing read locks on the other copies and, after receiving their acknowledgments, it grants the write lock to the MH A copy sending the acknowledgment records at the same time to be write-locked An MH failure or disconnection can leave some data unlocked This problem can be handled by associating a time out with each lock On expiration of the time out the lock is unilaterally released The timeout may be specied by the user Once the time out is expired, the MSS that drops the lock must inform the other

18 18 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 copy holders that the transaction has aborted The MH is notied on reconnection, to make its state consistent and to preserve the atomicity property Serializability is guaranteed by associating version numbers with the data Moreover, each copy must record a write-intent lock when it receives a status-request message by MSS c The described methods are suitable to manage both public and shared data and can support both vertical and horizontal applications The algorithm described in [44] also considers the possibility of combining private and global data that have some integrity relationship 163 The \Disconnected" Approach The disconnected approach has been introduced with the Coda le system [28, 30] and has been adopted in other algorithms to manage either le systems [39, 40] or databases [32] It uses the weak transaction approach and enforces correctness by relaxing either the isolation property [28, 30] or the consistency property [32, 39, 40], as we described in Section 153 The mentioned algorithms assume that the database is fully replicated on the MSSs MSS copies are considered rst-class replicas, whose consistency is always guaranteed The MHs can host second-class replicas of the database or of a part of it MHs and MSSs can execute transactions on the database according to an optimistic concurrency control strategy Transactions initiated by connected MHs are executed so that the usual denition of the ACID properties is satised MHs work on the data they have cached up to commit time In [32], these strict transactions are processed guaranteeing onecopy serializability When a strict transaction commits, all the rst- and secondclass copies are updated accordingly, thus automatically guaranteeing cache consistency In Coda, only xed hosts apply the changes to their copies; hence, data maintained on MHs can become inconsistent Cache coherence is enforced among connected MHs with a protocol based on callback primitives The MH whose cache has been invalidated or that experiences a cache miss during transaction processing can obtain the updated data on demand from its MSS Disconnected MHs can rely only on the contents of their caches The problem of cache management in mobile environments has been widely discussed in [42, 43] To this purpose, Coda includes the special-purpose module Venus The Venus module is located at the MH and operates to maintain in cache the most recently used data While preparing for a disconnection (hoarding phase), it can also use user's hints to fetch in the cache the data that are likely to be needed once disconnected; see Figure 17(a) Data are tagged with version numbers that are also stored in the cache A disconnected (weak) transaction T is processed so that it locally serializes with the other weak transactions executed on the same host (Section 153) In the

19 Sec 16 Classes of solutions 19 MH m MH m record weak transactions in the log MH m request (D) log transfer D, version number MSS c MSS c MSS c (a) hoarding (b) disconnected operation (c) reconnection Figure 17 Weak transactions in the Coda le system approach proposed in [32], weak transactions can also be executed by connected MHs Therefore, communication on the wireless network is reduced, but there is the additional requirement that weak transactions cannot observe partial results of strict transactions Operations executed on the database by disconnected transactions are recorded in a log le together with the data version numbers; see Figure 17(b) The log le is stored in permanent storage together with the cached data This allows the MHs to survive long disconnections in spite of the reduced size of the volatile storage The eects of weak transactions are not permanent until they globally commit The global commitment is executed when MH m reconnects; see Figure 17(c) The log le is transferred to the current MSS c, which checks for conicts by comparing the version numbers recorded in the log with those currently associated with its copy of the data MSS c detects conicts by building and analyzing a precedence graph amongst the weak transactions and the previously committed strict transactions If no conicts are detected, the MSS c locks all the data on which disconnected transactions operated, and redos these transactions according to the trace in the log A commit message is sent to MH m and the results are propagated to all the connected hosts holding a database copy Locks on data are released Otherwise, an abort message is sent to MH m ; cascading abort only aects pending disconnected transactions executed on MH m In Coda, a reply le is sent together with the abort message, containing the results of the MSS c attempt and the current state of the database at the MSS c If a transaction aborts during the global commitment phase, three solutions are proposed in [28, 30]: (1) the abort is notied to the application; (2) MH m refetches the updated data and reexecutes the aborted transaction; (3) application-dependent algorithms are executed These algorithms examine the reply le to decide whether one between solutions (1) and (2) is appropriate or whether the transaction can be only partially redone Transaction failures can also be experienced by MHs on local commitment A disconnected transaction aborts when a cache miss occurs or the cache overows as a consequence of either the increase in the log size or the creation of new data In these cases, the MH has to suspend transaction processing waiting for reconnection

20 20 Fault Tolerance and Recovery in Mobile Computing Systems Ch 22 Since these algorithms are able to tolerate long disconnections of the MHs without either blocking the system or jeopardizing the correctness of the primary copies of the database, the same mechanisms allow them to tolerate both MHs site and media failures From the point of view of MH recovery, site failures are not harmful The transport protocol ensures that the log transfer is performed as an atomic action Hence, the MH fails either before the transmission be successfully completed or after that In the former case, the global commitment of pending transactions is executed when MH recovers from the crash In the latter case, the MSS autonomously proceeds in its processing and the message containing the outcome of the global commitment is delivered to MH when it recovers On the contrary, media failures causing the loss of the log and the cached data cannot be recovered and the failure has to be notied to the application These algorithms can be executed by MHs having nonnegligible computing and storage capabilities, such as walkstations They can be used to manage both public and shared data, and to support both horizontal and vertical applications In the last case, however, the database cannot always be replicated at all the MSSs, and an MH m must reconnect in a cell in which the access to the database is supported to stabilize its weak transactions Finally, since processing is executed locally, both long and interactive transactions are supported for MHs 164 The \Distributed" Approach This approach adopts the weak-transaction model as the previous one It diers from the disconnected approach in that it supports only strict transactions that require cooperation among the hosts holding a copy of the database The correctness notion used in this approach is based on the usual denition of ACID properties In the literature, only [7] describes an algorithm that follows this approach; hence, we refer to it in the following description Copies of the database are maintained at both FHs and MHs Copies are classi- ed as core (primary) ones and cached (secondary) ones, according to their consistency degree This classication however is independent from the type of the host in which a copy resides It rather depends on whether the host actively participates in the data management or not Cached copies are maintained at the hosts where applications run that can tolerate inconsistent data Only queries can be executed on cached copies On the contrary, hosts requiring consistent data or wishing to generate transactions must belong to the group of core sites Core nodes periodically generate multicast messages containing the current version of the data to bring up to date the cached copies The delivery of these messages must serialize with the queries processed at the cache sites Transactions generated by core sites are processed according to existing algorithms used in wired networks; in particular, in [7], the following approaches to transaction processing are considered:

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT - 1 Mobile Data Management: Mobile Transactions - Reporting and Co Transactions Kangaroo Transaction Model - Clustering Model Isolation only transaction 2 Tier Transaction Model Semantic based nomadic