Information Sciences

Similar documents
Transaction-Consistent Global Checkpoints in a Distributed Database System

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

On Correctness of Nonserializable Executions

Load Balancing for Hex-Cell Interconnection Network

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Parallelism for Nested Loops with Non-uniform and Flow Dependences

An Optimal Algorithm for Prufer Codes *

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure

CMPS 10 Introduction to Computer Science Lecture Notes

A Binarization Algorithm specialized on Document Images and Photos

Analysis of Continuous Beams in General

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

Hermite Splines in Lie Groups as Products of Geodesics

Module Management Tool in Software Development Organizations

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Related-Mode Attacks on CTR Encryption Mode


Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Parallel matrix-vector multiplication

A New Transaction Processing Model Based on Optimistic Concurrency Control

Problem Set 3 Solutions

Concurrent Apriori Data Mining Algorithms

the nber of vertces n the graph. spannng tree T beng part of a par of maxmally dstant trees s called extremal. Extremal trees are useful n the mxed an

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

CHAPTER 2 DECOMPOSITION OF GRAPHS

Meta-heuristics for Multidimensional Knapsack Problems

Math Homotopy Theory Additional notes

Constructing Minimum Connected Dominating Set: Algorithmic approach

Ramsey numbers of cubes versus cliques

Notes on Organizing Java Code: Packages, Visibility, and Scope

Mathematics 256 a course in differential equations for engineering students

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Virtual Machine Migration based on Trust Measurement of Computer Node

AADL : about scheduling analysis

Cluster Analysis of Electrical Behavior

CSE 326: Data Structures Quicksort Comparison Sorting Bound

CSE 326: Data Structures Quicksort Comparison Sorting Bound

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

Optimal Fault-Tolerant Routing in Hypercubes Using Extended Safety Vectors

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Lecture 5: Multilayer Perceptrons

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Greedy Technique - Definition

Load-Balanced Anycast Routing

Support Vector Machines

More on the Linear k-arboricity of Regular Graphs R. E. L. Aldred Department of Mathematics and Statistics University of Otago P.O. Box 56, Dunedin Ne

Simulation Based Analysis of FAST TCP using OMNET++

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

A NOTE ON FUZZY CLOSURE OF A FUZZY SET

CE 221 Data Structures and Algorithms

Report on On-line Graph Coloring

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

F Geometric Mean Graphs

Query Clustering Using a Hybrid Query Similarity Measure

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Run-Time Operator State Spilling for Memory Intensive Long-Running Queries

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

ELEC 377 Operating Systems. Week 6 Class 3

Fast Computation of Shortest Path for Visiting Segments in the Plane

Wishing you all a Total Quality New Year!

A Progressive Fault Tolerant Mechanism in Mobile Agent Systems

Introduction. Leslie Lamports Time, Clocks & the Ordering of Events in a Distributed System. Overview. Introduction Concepts: Time

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Today s Outline. Sorting: The Big Picture. Why Sort? Selection Sort: Idea. Insertion Sort: Idea. Sorting Chapter 7 in Weiss.

A Topology-aware Random Walk

Semi - - Connectedness in Bitopological Spaces

Hierarchical clustering for gene expression data analysis

The Codesign Challenge

Reducing Frame Rate for Object Tracking

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Concurrent models of computation for embedded software

UNIT 2 : INEQUALITIES AND CONVEX SETS

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Electrical analysis of light-weight, triangular weave reflector antennas

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Insertion Sort. Divide and Conquer Sorting. Divide and Conquer. Mergesort. Mergesort Example. Auxiliary Array

Machine Learning: Algorithms and Applications

PYTHON IMPLEMENTATION OF VISUAL SECRET SHARING SCHEMES

NETWORKS of dynamical systems appear in a variety of

Loop-Free Multipath Routing Using Generalized Diffusing Computations

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

The Erdős Pósa property for vertex- and edge-disjoint odd cycles in graphs on orientable surfaces

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Unsupervised Learning

3D vector computer graphics

Transcription:

Informaton Scences 79 (9) 369 367 ontents lsts avalable at ScenceDrect Informaton Scences ournal homepage: www.elsever.com/locate/ns Necessary and suffcent condtons for transacton-consstent global checkponts n a dstrbuted database system Jang Wu a, D. Manvannan a, *, Bhavan Thurasngham b a Department of omputer Scence, Unversty of Kentucky, Lexngton, KY 6, Unted States b Department of omputer Scence, Unversty of Texas at Dallas, Unted States artcle nfo abstract Artcle hstory: Receved November 7 Receved n revsed form 9 June 9 Accepted June 9 Keywords: heckpontng Recovery Dstrbuted databases heckpontng and rollback recovery are well-known technques for handlng falures n dstrbuted systems. The ssues related to the desgn and mplementaton of effcent checkpontng and recovery technques for dstrbuted systems have been thoroughly understood. For example, the necessary and suffcent condtons for a set of checkponts to be part of a consstent global checkpont has been establshed for dstrbuted computatons. In ths paper, we address the analogous queston for dstrbuted database systems. In dstrbuted database systems, transacton-consstent global checkponts are useful not only for recovery from falure but also for audt purposes. If each data tem of a dstrbuted database s checkponted ndependently by a separate transacton, none of the checkponts taken may be part of any transacton-consstent global checkpont. However, allowng ndvdual data tems to be checkponted ndependently results n non-ntrusve checkpontng. In ths paper, we establsh the necessary and suffcent condtons for the checkponts of a set of data tems to be part of a transacton-consstent global checkpont of the dstrbuted database. Such condtons can also help n the desgn and mplementaton of non-ntrusve checkpontng algorthms for dstrbuted database systems. Ó 9 Elsever Inc. All rghts reserved.. Introducton It s a common practce to take checkpont of a database from tme to tme, and restore the database to the most recent checkpont when a falure occurs. It s desrable that a global checkpont of a database records a state of the database whch reflects the effect of a set of completed transactons and not the results of any partally executed transactons. Such a checkpont of the database s called a transacton-consstent global checkpont [3]. A straghtforward way to take a transacton-consstent global checkpont of a dstrbuted database s to block all newly submtted transactons and wat untl all the currently executng transactons fnsh and then take the checkpont. Such a checkpont s guaranteed to be transactonconsstent, but ths approach s not practcal, snce blockng newly-submtted transactons wll ncrease transacton response tme whch may not be acceptable for the users of the database. Another approach would be to run a read only transacton whch would read the entre database and save t to stable storage; the underlyng concurrency control algorthm wll ensure that the saved state s transacton-consstent. Ths would be neffcent especally n the presence of long-lvng transactons. A more effcent way would be to save (checkpont) the state of each data tem ndependently and perodcally wthout blockng any transacton. However f each data tem s checkponted ndependently and perodcally, some checkponts of some data tems may not be part of any transacton-consstent global checkpont of the database and hence are useless. * orrespondng author. Tel.: + 89 7 93; fax: + 89 33 37. E-mal addresses: wu6@cs.uky.edu (J. Wu), man@cs.uky.edu, manvann@cs.uky.edu (D. Manvannan), bhavan.thurasngham@utdallas.edu (B. Thurasngham). -/$ - see front matter Ó 9 Elsever Inc. All rghts reserved. do:.6/.ns.9.6.6

366 J. Wu et al. / Informaton Scences 79 (9) 369 367 In ths paper, we address ths ssue and establsh the necessary and suffcent condtons for a checkpont of a data tem (or the checkponts of a set of data tems) to be part of a transacton-consstent global checkpont of the database. Ths result would be useful for constructng a transacton-consstent global checkpont ncrementally from the checkponts of each ndvdual data tem. By applyng ths condton, we can start from an useful checkpont of any data tem and then ncrementally add checkponts of other data tems untl we get a transacton-consstent checkpont of the database... Motvaton and obectves In a dstrbuted system, to mnmze the lost computaton due to falures, the state of the processes nvolved n a dstrbuted computaton are perodcally saved (checkponted). When one or more processes nvolved n the dstrbuted computaton fals, the processes are restarted from a prevously saved consstent global checkpont. When processes are ndependently checkponted, the checkponts taken may not be part of any consstent global checkpont and hence are useless []. Netzer and Xu [] ntroduced the noton of zgzag paths between checkponts of processes nvolved n a dstrbuted computaton and establshed the necessary and suffcent condtons for a gven checkpont of a process to be part of a consstent global checkpont (.e., useful). They proved that a checkpont of a process s useful f and only f there s no zgzag path from that checkpont to tself. Several checkpontng algorthms have been proposed for dstrbuted systems [,8,9,8,9,3,7]. heckpontng s also an establshed technque for handlng falures n database systems. Many of the checkpontng schemes proposed n the lterature for dstrbuted database systems are ntrusve to dfferent extent. Some of these are dscussed n Secton and Secton 3. Non-ntrusve checkpontng algorthms under whch transactons do not have to be blocked when checkponts are taken are desrable [3]. If each data tem n a dstrbuted database s checkponted by an ndependent transacton perodcally, t s qute possble that none of the checkponts taken s part of any transacton-consstent global checkpont of the database. Motvated by the work of Netzer and Xu for dstrbuted computatons [], n ths paper, we establsh the necessary and suffcent condtons for a gven checkpont of a data tem (or checkponts of a set of data tems) to be part of a transacton consstent global checkpont... Organzaton of the paper The remander of ths paper s organzed as follows. In Secton we ntroduce the background requred for understandng the paper. Secton 3 dscusses related works. In Secton we present the necessary and suffcent condtons for a set of checkponts of a set of data tems to be part of a transacton consstent global checkpont and prove ts correctness; we also dscuss the applcatons of our work. Secton concludes the paper.. Background.. System model We consder a model of dstrbuted database system smlar to the model n [3]. In ths model, a dstrbuted database system conssts of a set of data tems resdng at varous stes. Stes can exchange nformaton va messages transmtted on a communcaton network, whch s assumed to be relable. The data tems of the database are accessed by transactons and the transactons are controlled by transacton managers (TM) that resde at these stes. The TM s responsble for the proper schedulng of transactons usng approprate concurrency control algorthms n such a way that the ntegrty of the database s mantaned. In addton, the data tems at each ste are controlled by a data manager (DM). Each DM s responsble for controllng access to data tems at ts ste. Each data tem s checkponted by a local transacton perodcally. Before a transacton takes a checkpont of a data tem t obtans an exclusve lock on the data tem so no other transacton can be accessng that data tem whle t s checkponted. The state of a data tem changes when a transacton accesses that data tem for a wrte operaton. In order to guarantee the ntegrty and effcency of transacton processng, the followng four propertes, referred to as AID [7], must be mantaned. Atomcty: Each transacton s executed n ts entrety, or not at all executed. onsstency preservaton: Executon of a transacton n solaton (that s, wth no other transacton execute concurrently) preserves the consstency of the database. Isolaton: Even though multple transactons may execute concurrently, the system guarantees that for every par of transactons T and T, t appears to T that ether T fnshed executon before T started, or T started executon after T fnshed. Thus, each transacton s unaware of other transactons executng concurrently n the system. Durablty: After a transacton completes successfully, the changes t has made to the database persst, even f there are system falures. In order to mantan AID requrements and acheve maxmum performance, a proper schedule of transactons need to be arranged n whch the operatons of varous transactons are nterleaved as much as possble. Gven a schedule, a drected graph, referred to as precedence graph [8] or seralzaton graph [7], can be constructed to llustrate the procedure of

J. Wu et al. / Informaton Scences 79 (9) 369 367 366 all the transactons runnng n the database system. The seralzaton graph serves as an mportant tool to analyze transacton processng n the dstrbuted database systems. Each checkpont of a data tem s assgned a unque sequence number. We assume that the database conssts of a set of n data tems X ¼fx 6 6 ng. In addton, we denote by k the checkpont of x wth sequence number k. The set of all checkponts of data tem x s denoted by ¼f k k : k P g. The ntal state of data tem x s represented by checkpont and a vrtual checkpont vrtual represents the last state obtaned after termnaton of all transactons accessng data tem x.to mnmze checkpontng overhead, a data tem s checkponted only after the state of the data tem changes. That s, after a data tem s checkponted, t s not checkponted agan untl at least one other transacton has accessed and changed the data tem. Let T ¼fT 6 6 mg be a set of transactons that access the database system. In order to make the analyss of the relatonshp between checkponts of varous data tems smple, we assume that each checkpont of a data tem x s taken by a specal transacton called checkpontng transacton. We denote by T k the checkpontng transacton that takes checkpont k of data tem x. In order to mantan atomcty of transactons, T k s the local transacton whch s requred to be scheduled to access a data tem when there are no other transactons accessng the data tem, enforced by ssung an exclusve lock. The set of checkpontng transactons that produce the checkponts s denoted by T and the set of all checkpontng transactons n the system s denoted by T. A global checkpont of the database s a set S ¼f k 6 6 ng of local checkponts consstng of one checkpont for each data tem. The set of checkpontng transactons that produce the global checkpont S s denoted by S T ¼fT k 6 6 ng. We use k and T k nterchangeably. Sometmes, when we say a checkpont of a data tem we mean the checkpontng transacton whch takes that checkpont. Each regular transacton s a partally ordered set of read and/or wrte operatons (operatons are partally ordered because two adacent read operatons n a transacton are not comparable). A checkpontng transacton conssts of only one operaton (namely the checkpontng operaton), an operaton that requres mutually exclusve access to the data tem. Let R ðx Þ (respectvely, W ðx Þ) denote the read (wrte) operaton of T on data tem x X and O k ðx Þ denote the checkpontng operaton of T k on data tem x. A schedule e over T S T s a famly of dsont sets of partally ordered operatons of transactons n T S T on the data tems (one set for each data tem) [3]. Let eðx Þ consst of all read, wrte and checkpontng operatons on x of transactons n T S T. We denote by < x the partal order nduced by all read, wrte, and checkpontng operatons on x by the schedule e over T S T. Gven a schedule e over T S T, we defne the relaton < T between transactons n T S T wth respect to the schedule e as follows: () T < T T ()ð Þ^ð9x k X : ðr ðx k Þ< xk W ðx k ÞÞ _ ðw ðx k Þ< xk W ðx k ÞÞ _ ðw ðx k Þ< xk R ðx k ÞÞÞ. () T < T T k () ðw ðx Þ< x O k ðx ÞÞ _ ðr ðx Þ< x O k ðx ÞÞ. (3) T k < T T ()ðo k ðx Þ< x W ðx ÞÞ _ ðo k ðx Þ< x R ðx ÞÞ. A schedule s seral f the operatons belongng to each transacton appear together n the schedule [7]. A schedule e s seralzable f the schedule has the effect equvalent to a schedule produced when the transactons are run serally n some order. The concurrency control algorthm ensures that a schedule of transactons runnng n the dstrbuted database system s seralzable. One mportant knd of seralzaton, called conflct seralzablty (SR) [7,] s consdered n ths paper. An executon e SR ff the relaton < T s acyclc []. A seralzaton order of a set of transactons wth respect to a schedule e over T s defned as a lnear orderng of all the transactons such that f T < T ðt (ether T or T could be a checkpontng transacton), then T must appear before T n the orderng. If e SR, there must exst a seralzaton order for e over T that s compatble wth < T. Formal defnton of a transacton consstent global checkpont follows [3]: Defnton. A global checkpont of a dstrbuted database s sad to be transacton-consstent (tr-consstent or smply consstent, for short) wth respect to the executon of a set of transactons T f there exsts a seralzaton order (whch s an orderng of transactons n T) r r for an executon e SR of T such that the data tem states represented by the global checkpont s the same as those read by a read-only transacton T P after all transactons n r have fnshed executon and before any transacton n r has started executon. If the concurrency control algorthm guarantees an executon e SR, then the relaton < T nduces a drected acyclc graph (Dag) on T S T and conversely []. We call ths graph the global seralzaton graph wth respect to the schedule e of T S T. For each data tem, the transactons accessng that data tem nduce a component of the global seralzaton graph. The local seralzaton graph nduced by the transactons n T S T accessng data tem x s denoted by G x ðv x ; E x Þ: the vertex set V x ¼fT k [ T k T k T has accessed data tem x ; k s the k th checkpont of x taken by local checkpont transacton T k } and the edge set E x ¼fE TT x [ E T x [ E T x g, where : E TT x ¼fðT ; T ÞT ; T V x ; T < T T g. : E T x ¼fðT ; T k ÞT ; T k V x ; T < T T k g. 3: E T x ¼fðT k ; T ÞT ; T k V x ; T k < T T g.

366 J. Wu et al. / Informaton Scences 79 (9) 369 367 By mergng the local seralzaton graphs G x ðv x ; E x Þ, we can construct the global seralzaton graph GðV; EÞ where V ¼ [ x X V x and E ¼ [ x X E x : Next we llustrate the constructon of the global seralzaton graph wth an example. Suppose we have the followng nne transactons T ¼fT ;...; T 9 g accessng a database contanng fve data tems X ¼fx ;...; x g.. T : R ðx Þ; W ðx Þ.. T : W ðx Þ; W ðx Þ. 3. T 3 : R 3 ðx Þ; W 3 ðx Þ; W 3 ðx Þ; W 3 ðx Þ.. T : W ðx 3 Þ; W ðx Þ; W ðx Þ; R ðx Þ.. T : R ðx 3 Þ; R ðx Þ. 6. T 6 : W 6 ðx 3 Þ; W 6 ðx Þ. 7. T 7 : W 7 ðx Þ; R 7 ðx Þ. 8. T 8 : R 8 ðx Þ; W 8 ðx Þ. 9. T 9 : W 9 ðx Þ; W 9 ðx Þ. onsder the schedule e SR over ðt S T Þ where e ¼fO R 3 ðx Þ; O ðx Þ; W ðx 3 Þ; W ðx Þ; O ðx Þ; O ðx Þ; W 3 ðx Þ; O ðx Þ; W ðx Þ; W 9 ðx Þ; O ðx Þ; W 7 ðx Þ; R ðx Þ; W ðx Þ; R 8 ðx Þ; R 7 ðx Þ; W 8 ðx Þ; R ðx Þg: ðx Þ; O 3 ðx Þ; W ðx Þ; O ðx 3 Þ; O ðx Þ; O ðx Þ; W ðx Þ; W 9 ðx Þ; ðx Þ; W 3 ðx Þ; R ðx 3 Þ; R ðx Þ; W 6 ðx 3 Þ; W 6 ðx Þ; W 3 ðx Þ; O Ths schedule nduces the followng partal order on the operatons performed by the transactons on each data tem:. eðx Þ : O. eðx Þ : O 3. eðx 3 Þ : O 3. eðx Þ : O. eðx Þ : O ðx Þ< x W 9 ðx Þ< x R 3 ðx Þ< x W 3 ðx Þ< x O ðx Þ< x W ðx Þ< x O ðx 3 Þ< x3 W ðx 3 Þ< x3 R ðx 3 Þ< x3 W 6 ðx 3 Þ. ðx Þ< x W ðx Þ< x O ðx Þ< x W 9 ðx Þ< x O ðx Þ< x W 3 ðx Þ< x O ðx Þ< x W 3 ðx Þ< x W ðx Þ< x O ðx Þ< x R ðx Þ< x W ðx Þ< x R ðx Þ. ðx Þ< x W ðx Þ. ðx Þ< x W 7 ðx Þ< x R 8 ðx Þ< x R 7 ðx Þ< x W 8 ðx Þ. ðx Þ< x R ðx Þ< x W 6 ðx Þ. Ths schedule nduces the followng relatons among the transactons. These relatons n fact correspond to edges n the global seralzaton graph constructed from the local seralzaton graphs. T T < T T 9 ; T 9 < T T 3 ; T 3 < T T < T T 3 ; T 3 < T T T < T T 6 < T T ; T < T T T < T T ; T < T T 6 < T T ; T < T T 7 ; T 7 < T T 8 3 < T T 9 ; T 9 < T T < T T 3 ; T 3 < T T < T T ; T < T T ; < T T ; T < T T ; < T T ; < T T ; T < T T : The local seralzaton graphs nduced by the partal orders eðx Þ on the data tems are shown n Fgs. 3 The global seralzaton graph constructed from the local graphs s shown n Fg.. The global seralzaton graph G ¼ðV; EÞ s obtaned by mergng the local seralzaton graphs where and V ¼fT ; T ; T 3 ; T ; T ; T 6 ; T 7 ; T 8 ; T 9 3 g Fg.. Example of local seralzaton graph on x, on whch a set of transactons ft 9; T 3; T g have fnshed executon. Fg.. Example of local seralzaton graph on x, whch has been accessed by the set of transactons ft ; T 3; T 7; T 8g.

J. Wu et al. / Informaton Scences 79 (9) 369 367 3663 Fg. 3. Local seralzaton graphs for the rest of the data tems: x 3; x, and x. Fg.. Global seralzaton graph constructed from local seralzaton graphs on x ; x ; x 3; x, and x. E ¼fðT ; T 9 Þ; ðt ðt 7 ; T 8 Þ; ðt ; T Þ; ðt 3 Þ; ðt ; T Þ; ðt ; T 3 Þ; ðt 3 ; T Þ; ðt Þ; ðt ; T 9 Þ; ðt 9 ; T 3 Þ; ðt 3 ; T Þ; ðt ; T Þ; ðt Þ; ðt ; T 9 Þ; ðt 9 ; T Þ; ðt Þ; ðt Þ; ðt ; T 3 Þ; ðt 3 ; T ; T Þ; ðt ; T Þ; ðt ; T 6 Þg: Þ; ðt ; T 7 Þ; The graph n Fg. s acyclc and hence the schedule s SR. Snce e SR, we have the followng seralzaton order that s compatble wth < T. Ths orderng may not be unque because some transactons n ths can be reordered wthout volatng < T. For example, T and T 9 can be nterchanged n the order. T 3 ; T ; T 9 ; T 3 ; T ; T ;T ; T ; T ; T 7 ; T 8 ; T 6 : We use the followng notatons throughout the paper: T! þ T ff there s a path from transacton T to T n the seralzaton graph (T and/or T could be a checkpontng transacton). T! T ff there s an edge from T to T (T or T could be a checkpontng transacton). Let r # T and r # T be such that r T r ¼ /. Then, by r S T r wth respect to the seralzaton order nduced by conflct-seralzable executon e over T, we mean that each checkpontng transacton n S T starts executng only after every transacton n r has been executed and before any transacton n r has started executon. In partcular, f r S r ¼ T, then the set of checkponts S taken by S T s tr-consstent ff r S T r. Next, we make the followng observatons: Observaton. For any checkpontng transacton T k, snce t accesses the data tem x exclusvely, T k must have a path n the local seralzaton graph ether to or from any transacton T that has accessed x.

366 J. Wu et al. / Informaton Scences 79 (9) 369 367 Observaton. For any checkpontng transacton T k, snce t accesses the data tem x exclusvely, f there exsts two transactons T and T that access x and T! þ T k ; T k! þ T, then n the local seralzaton graph nduced by the operatons n T [ T on the data tem x, any path from T to T must pass through T k. Observaton 3. In the local seralzaton graph nduced by the operatons n T [ T on the data tem x, for any checkpontng transacton T k and two other transactons T and T that have accessed x, the followng holds:. If T k! þ T and there exsts T! þ T wthout any checkpont along the path n the local seralzaton graph, then T k! þ T.. Smlarly, f T! þ T k and there exsts a path T! þ T from T to T wthout any checkpont along the path n the local seralzaton graph, then T! þ T k. Observatons and are trval. Observaton 3 holds because n case, suppose T k! þ T s not true, then T! þ T k from Observaton. Snce T k! þ T, from Observaton, every path n the local seralzaton graph from T to T must pass through T k, whch contradcts our assumpton that there exsts a path T! þ T wthout any checkpontng transactons along the path. A smlar argument can be used to prove the correctness of case n Observaton 3. We make use of these Observatons n the proof of the two mportant theorems n Secton. Notce that the transactons T ; T n the prevous observatons could be checkpontng transactons themselves. 3. Related work The checkpontng algorthms for dstrbuted database systems can be classfed as log-orented and dump-orented [6]. In the log-orented approach, perodcally a dump of the database s taken and also a marker s saved at approprate places n the log. When a falure occurs, the latest dump s restored and the operatons on the log after the dump was taken s appled to the dump untl the marker s reached to restore the database to a consstent state. In ths approach, proper postonng of the marker n the log wll result n restorng the database to a tr-consstent global checkpont. Algorthms belongng to ths class nclude [6] and []. In the dump-orented approach, checkpontng s referred to as the process of savng the state of all data tems n the database (or takng a dump of the database) n such a way that t represents a tr-consstent checkpont of the database. The algorthms proposed n [9,,] take ths approach. The basc dea behnd the algorthm n [9] s to dvde the transactons nto two groups: those before or after the checkpontng process. Ths algorthm s non-ntrusve but requres a copy of the database stored temporarly. Ths temporary copy s accessed by transactons that cannot be decded whch group they belong to whle the checkpontng s n progress. Pu [] uses colorng (whte and black) to dstngush data tems that have started checkpontng from data tems that have not started checkpontng. Transactons accessng both whte and black data tems have to be aborted or delayed n order to mantan consstency, whch ncreases transacton response tme. Plarsk et al. [] consder checkponts as checkpont transactons, one for each data tem. In addton, a checkpont number (PN) s assocated wth each checkpont. By comparng the PN, forced checkponts on data tems are taken n order to make every checkpont useful. The prevous two algorthms are coordnated algorthms, n whch one process ntates and coordnates the checkpontng actvty. The algorthm proposed by Baldon et al. [] uses a noncoordnated approach, n whch no process ntates checkpontng and each data tem s checkponted ndependently. Lke the algorthm of [], checkpont numbers are used to synchronze the checkpontng process and forced checkponts are taken to prevent useless checkponts. Ths algorthm s fully dstrbuted but may ncur a large number of forced checkponts, dependng on the executon pattern of the transactons. Plarsk et al. [3] formally defne the dependency relaton caused by transactons among states of data tems. They also analyze checkpontng n a dstrbuted database system by establshng a correspondence between consstent snapshots n a dstrbuted system and tr-consstent checkponts n a dstrbuted database system. Moreover, they establsh suffcent condtons for a set of checkponts to be part of a tr-consstent global checkpont. However, they do not establsh necessary and suffcent condtons for a set of checkponts to be part of a tr-consstent global checkpont. Our goal n ths paper s to establsh the necessary and suffcent condtons for the checkponts of a set of data tems to be part of a tr-consstent global checkpont. Kumar and Moe [3] present a performance evaluaton of some exstng recovery algorthms for databases. Recently, many researchers have focussed on fuzzy checkpontng algorthms [,,] that wrte drty pages to dsk and requre transacton logs for reconstructng a tr-consstent state. Fuzzy checkpontng methods appear to be sutable for nmemory databases (IMDB), whch store the data n RAM and back t up on the dsk [7]. Fuzzy checkpontng does not obstruct the transacton processng but requres an undo/redo log to brng the nconsstent checkpont back to a consstent state. In the next secton, we present the necessary and suffcent condtons for the checkponts of a set of data tems to be part of a tr-consstent global checkpont. Stll, regular checkpontng approach appears to be sutable for database systems n whch the entre database need not be loaded nto memory, and hence we focus on such databases only.. Necessary and suffcent condton In dstrbuted database systems, t would be deal f ndvdual data tems could be checkponted wthout any coordnaton and a tr-consstent checkpont could be constructed from the checkponts of the ndvdual data tems whenever t s

J. Wu et al. / Informaton Scences 79 (9) 369 367 366 needed for recovery. For ths, we need to know what checkponts could be combned to construct a tr-consstent global checkpont. The followng theorem establshes the necessary and suffcent condton for a set of checkponts, one from each data tem (.e., a global checkpont of the database) to form a tr-consstent global checkpont of the database wth respect to a gven set of transactons. Theorem. Let T ¼fT ;...; T m g be a set of transactons accessng the database consstng of n data tems X ¼fx ;...; x n g. Assume that each data tem s checkponted by a checkpontng transacton that runs at the ste contanng the data tem. Let S ¼f k 6 6 ng be a set of checkponts, one for each data tem and let S T ¼fT k 6 6 ng be the set of checkpontng transactons that produce S. Let e be a schedule over T. Then S s a tr-consstent global checkpont ff there s no path between any two checkpontng transactons belongng to S T n the global seralzaton graph correspondng to the schedule e. Proof. (If Part) Suppose there s no path n the global seralzaton graph between any two checkpontng transactons n S T. Then we prove that the set S forms a tr-consstent global checkpont. It s suffcent to prove that there exst a seralzaton order r r of T wth respect to e such that r S T r,.e., each checkpontng transacton n S T s executed only after every transacton n r has fnshed executon and before any transacton n r has started executon. We say T! þ S T f there exsts a path from T to some checkpontng transacton n S T. Smlarly, we say S T! þ T f there exsts a path from some checkpontng transacton n S T to T. Any transacton n T belongs to at least one of the followng three sets. () r a ¼fT TT! þ S T g, () r b ¼fT TS T! þ T g, (3) r c ¼fT T nether T! þ S T nor S T! þ T g. From Observaton, we know that r c ¼ / snce T must access at least one data tem. In addton, we have S S S r a rb rc ¼ T. Snce r c ¼ /; r a rb ¼ T. Let Tv r a, then Tv! þ S T by defnton. In partcular Tv! þ T k for some, whch means that Tv has fnshed accessng x before T k takes checkpont on data tem x. lam. Tv must have fnshed accessng every data tem x before T k Proof of clam. The followng three cases arse: S T starts executon. () T v! þ T k. Ths means T v has fnshed accessng x before T k takes checkpont on x. () T k! þ T v. Ths case cannot arse snce from T v! þ T k we have T k! þ T k, a contradcton to the assumpton that there s no path between any two checkpontng transactons n S T. (3) Nether T v! þ T k nor T k! þ T v. From Observaton, T v does not access x. In ths case we can smply treat T v as a transacton that has executed before T k has started. Therefore Tv has fnshed accessng every data tem x (that t needs to access) before T k starts executon. Ths proves our clam. So, each transacton n r a fnshes executon before any of the checkpontng transactons n S T has started executon. Smlarly, we can prove that each transacton T v r b starts accessng any data tem x only after checkpontng transacton T k S T has fnshed executon. Let r ¼ r a, the set of all transactons that have fnshed executon before none of the checkpontng transactons n S T has started executon. Let r ¼ r b, the set of all transactons that have started executon S T T after every checkpontng transacton n S T has fnshed executon. We have r r ¼ T and r r ¼ r a rb. Moreover, T T T ðr a rb Þ¼/, because f ðr a rb Þ /, let T ðr a rb Þ. Then, by defnton of r a and r b, there exsts T kv ; T v kw S T, such w that T! þ T kv, and T v kw! þ T. Hence T w kw! þ T w kv, a contradcton to the assumpton that there s no path between any two v checkpontng transactons n S T. Therefore, we have r S T r. (Only-f Part) onversely, suppose S s a tr-consstent global checkpont, then we prove that no two elements n S T have a path between them n the global seralzaton graph. Suppose there s a path from T k S T to T k S T. Then there exsts a transacton T c T such that T k! þ T c! T k. Frst we show that T c starts executon after every checkpontng transacton n S T has fnshed. Because of the path T k! þ T c, we know that T c must start executon after T k S T has executed. Snce T k S T, where S T produces a trconsstent global checkpont S, by defnton of tr-consstent global checkpont, besdes T k ; T c must start executon after every other checkpontng transacton T kv S T, where v v, fnshes executon. Therefore T c starts executon after every checkpontng transacton n S T has executed. On the other hand, on x ; T c has started executon before T k S T has started due to the edge T c! T k, a contradcton.

3666 J. Wu et al. / Informaton Scences 79 (9) 369 367 Hence a global checkpont S s tr-consstent wth respect to a seralzable schedule of a set of transactons ff there s no path between any two checkpontng transactons n S T n the global seralzaton graph correspondng to the schedule. h Theorem s useful for verfyng whether a gven global checkpont s tr-consstent. For nstance, n Fg., S ¼fT ; T ; T ; T 3 ; T g forms a tr-consstent global checkpont because no two elements n S have a path between them. However, ths theorem does not help n constructng a tr-consstent global checkpont ncrementally. Ths s because f there s no path between two checkponts of two dfferent data tems, t does not mean that these two checkponts together can be part of a tr-consstent global checkpont. For example, n Fg., there s no path between T and T. However, checkponts and cannot belong to a consstent global checkpont because data tem x does not have a local checkpont that can be combned wth and to extend t to a tr-consstent global checkpont. For nstance, cannot be used because there s a path from T to T and the remanng checkponts of x cannot be used for smlar reasons. Therefore, addtonal restrctons need to be added n order to be able to extend a gven set of checkponts to a tr-consstent global checkpont. As mentoned earler, our goal s to come up wth the necessary and suffcent condtons for a set of checkponts to be part of a tr-consstent global checkpont. The next theorem addresses ths problem. For that we need to ntroduce some new termnology. Next, we ntroduce some termnology for developng the necessary and suffcent condtons for a set of checkponts to be part of a tr-consstent global checkpont. Netzer and Xu [] ntroduced the concept of zgzag paths between checkponts of a dstrbuted computaton and used t to establsh the necessary and suffcent condtons for a set of checkponts of a dstrbuted computaton to be part of a consstent global checkpont. We generalze ther defnton of zgzag paths to checkponts n dstrbuted database systems and use t for establshng a necessary and suffcent condton for a set of checkponts of a set of data tems of a dstrbuted database system to be part of a tr-consstent global checkpont. Defnton. Let T be a set of transactons executng on a database. Let k be a checkpont taken by the checkpontng transacton T k, and let k be another checkpont taken by checkpontng transacton T k. We say a zgzag path wth respect to T exsts from T k to T k f there exsts a set of transactons T ¼fT ; T ;...; T v g # T such that (a) T T s a transacton such that T k! T n the global seralzaton graph; (b) for any T k T ð 6 k < vþ; T kþ T ð < ðk þ Þ 6 vþ s a transacton such that : T k T kþ (we call such an edge as reverse edge); or : T k! T kþ or (T k! T kw and T w kw! þ T kþ for some w); w (c) T v T s a transacton such that T v! T k ; For example, n the global seralzaton graph shown n Fg., : A zgzag path exsts from T : A zgzag path exsts from T to T to T 3: No zgzag path exsts between T, the path beng T, the path beng T and T or between T! T! T! T! T! T T 3! T and T. Note that a path n the global seralzaton graph s also a zgzag path but not conversely. A checkpont k (or, the correspondng checkpontng transacton T k ) s nvolved n a zgzag cycle (z-cycle for short) ff there s a zgzag path from T k to tself. Example checkponts that are nvolved n z-cycle n Fg., nclude checkponts T, the z-cycle beng T! T! T T 3 T 9! T ; and T, the z-cycle beng T! T T 3! T. T s also on a z-cycle. Next, we establsh the necessary and suffcent condton formally. Theorem. A set S of checkponts, each checkpont of whch s from a dfferent data tem, can belong to the same tr-consstent global checkpont wth respect to a seralzable schedule of a set of transactons ff no checkpont n S has a zgzag path to any checkpont (ncludng tself) n S n the global seralzaton graph correspondng to that schedule. Proof. (If-Part:) Suppose no checkpont n S has a zgzag path to any checkpont (ncludng tself) n S. We construct a trconsstent global checkpont S that contans the checkponts n S and one checkpont for each data tem not represented n S as follows: For each data tem that has no checkpont n S and that has a checkpont wth a zgzag path to a member of S, we nclude n S ts frst checkpont that has no zgzag path to any checkpont n S. Such a checkpont s guaranteed to exst because the vrtual checkpont of a data tem, representng the state of the data tem after all the transactons n T have termnated, does not have an outgong zgzag path. For each data tem that has no checkpont n S and that has no checkpont wth zgzag path to a member of S, we nclude ts ntal checkpont (t s also the frst checkpont that has no zgzag path to any member of S and there cannot be a zgzag path from any checkpont n S to ths ntal checkpont)...

J. Wu et al. / Informaton Scences 79 (9) 369 367 3667 We clam that S s a tr-consstent global checkpont. From Theorem, t s suffcent to prove that there s no path between any two checkponts of S n the global seralzaton graph. Suppose there s a path from a checkpont A S to a checkpont B S. Assume that the checkpont A was taken on data tem x and checkpont B was taken on data tem on x. ase : A; B S. Ths condton mples that a zgzag path exsts from A to B (note that a path n the graph s a zgzag path but not conversely), contradctng the assumpton that no zgzag path exsts between any two checkponts n S. ase : A S S and B S. Ths contradcts the way S S s constructed (checkponts n S S are chosen n such a way that no zgzag path exsts to any member of S from those checkponts). ase 3: A S and B S S. B cannot be an ntal checkpont, snce no checkpont can have a path to an ntal checkpont. Then by the choce of B; B must be the frst checkpont on x that has no zgzag path to any member of S. The checkpont precedng B on x, say D, must have a zgzag path to some member of S, say E. Snce D precedes B on x,we have, n the local seralzaton graph of x ; D! þ B, whch also exsts n the global seralzaton graph.let T u be the transacton (that accessed x and created the edge T u! B) that les on the zgzag path from A to B. Note that such transacton exsts because B and D are checkponts of data tem x and we assume that a checkpont s taken only after the state of the data tem has been changed by one or more transactons. h lam. There exsts a zgzag path from A to E n the global seralzaton graph. Proof of clam. Snce D! þ B; T u! B, and B s created by a checkpontng transacton that s also a transacton, we get D! þ T u n the local seralzaton graph of x from Observaton 3. Any path n the local seralzaton graph s also a path n the global seralzaton graph. Therefore the path D! þ T u can be found n the global seralzaton graph. Then n the global seralzaton graph, the zgzag path from A to T u, the reverse path D þ T u, and the zgzag path from D to E consttutes a zgzag path from A to E, whch s a contradcton to the assumpton that no zgzag path exsts between any two checkponts n S. ase : A S S and B S S. As n case 3, B must be the frst checkpont of x that has no zgzag path to any member of S and A must be the frst checkpont of x that has no zgzag path to any member of S. Then the checkpont that precedes B on data tem x, say D, must have a zgzag path to some member of S, say E. Then, as n case 3, there exsts a zgzag path from A to E. Ths contradcts the choce of A where A s the frst checkpont on data tem x wth no zgzag path to any member of S. Therefore S, contanng S, s a tr-consstent global checkpont. (Only-f Part:) onversely, suppose there exsts a zgzag path between two checkponts n S (ncludng zgzag cycle), then we show that they cannot belong to the same tr-consstent global checkpont. Assume that a zgzag path exsts from A to B (A could be B) and along such a path, the length of consecutve reverse edges s at most w. We use nducton on w to show that A and B cannot belong to the same consstent global checkpont. Base case (w ¼ ): If the length of consecutve reverse edges s at most zero, the zgzag path from A to B s n fact a path from A to B. Then, from Theorem, A and B cannot belong to the same consstent global checkpont. Base case (w ¼ ): Suppose the length of consecutve reverse edges along the zgzag path from A to B s at most one. Let the consecutve reverse edges wth length equal to one from A to B be T ; T ; ;...; T ;u T ;u, as shown n Fg. a. Suppose those reverse edges are components of local seralzaton graph correspondng to data tems x ; ;...; x ;u respectvely. h lam. x ; ;...; x ;u cannot all be equal to x, where A takes place. Proof of clam. Suppose x ; ;...; x ;u are all equal to x. Then A; T ; ; T ; ;...; T ;u ; T ;u are transactons accessng x (note that we use A for the checkpontng transacton that takes the checkpont A as well as the checkpont tself). From Observaton, the followng two cases arse: () A! þ T ;u. If ths s the case, a path A! þ B va T ;u exsts and hence A and B cannot be part of a tr-consstent global checkpont, by Theorem. () T ;u! þ A. Snce T ;u! T ;u, we must have T ;u! þ A from Observaton 3. If ths s the case, when we consder the reverse edge T ;u T ;u, the followng two sub-cases arse: (.) A! þ T ;u. In ths case, a cycle T ;u! þ A! þ T ;u! þ T ;u from T ;u to tself exsts. However, a cycle cannot exst f the schedule of T [ T SR. (.) T ;u! þ A. Snce T ;u! T ;u, we must have T ;u! þ A from Observaton 3. If ths s the case, we need to consder the prevous reverse edge T ;u T ;u n the zgzag path and make a smlar argument wth that edge. Proceedng lke ths, we wll end up wth a path T ;! þ A; snce A! þ T ;, we have A! þ A,.e., A s on a cycle whch s a contradcton to the assumpton that the schedule of T [ T SR s seralzable. So, our assumpton that x ; ;...; x ;u are all equal to x s wrong and hence the proof of the clam. Ths stuaton s llustrated n Fg. b. In ths fgure, dotted lnes ndcate the possble paths and the dotted lnes wth X mark ndcate the mpossble paths.

3668 J. Wu et al. / Informaton Scences 79 (9) 369 367 Usng arguments smlar to the one above, we can show that x ; ; ; x ;u cannot all be x. Fg. c llustrates how we can get contradcton by showng the exstence of a cycle. So far, we have proved that there must exst a data tem assocated wth a reverse edge that s dfferent from both x and x. Let us assume such a data tem s x ;p wth assocated reverse edge as Fg.. Dependency of transactons I.

J. Wu et al. / Informaton Scences 79 (9) 369 367 3669 T ;p T ;p. Next we prove our clam that A and B cannot belong to a tr-consstent global checkpont. Fg. d llustrates the basc dea behnd the proof. On data tem x ; that both T ; and T ; have accessed, no checkpont taken after T ;, say D, (refer to Fg. d) can be combned wth A to form a consstent global checkpont due to the path A! þ D (from Theorem ). Therefore, on x ; we can only use some checkpont taken before T ; accessed x ; to construct a tr-consstent global checkpont contanng A. Usng a smlar argument, on x ;, whch both T ; and T ; have accessed, any checkpont taken after T ; accessed, say D, cannot be combned wth to form a consstent global checkpont due to the path from! þ D (refer to Fg. d). So we have to use some checkpont on x ;, whch was taken before T ; accessed x ;. Smlarly, on x ;p, whch both T ;p and T ;p have accessed, we have to use some checkpont p, whch was taken before T ;p to construct a tr-consstent global checkpont contanng A. On the other hand, on data tem x ;u that both T ;u and T ;u have accessed, no checkpont taken before T ;u, say u, can be combned wth B to construct a tr-consstent global checkpont due to the path u! þ B. Therefore, on x ;u, we can only use some checkpont D u taken after T ;u accessed x ;u to construct a tr-consstent global checkpont contanng B. Smlarly, on x ;u, whch both T ;u and T ;u have accessed, any checkpont taken before T ;u, say u cannot be combned wth D u to construct a tr-consstent global checkpont contanng due to the path u! þ D u. So we have to use some checkpont D u on x ;u that was taken after T ;u accessed x ;u. Proceedng lke ths, on x ;p, whch both T ;p and T ;p have accessed, we have to use some checkpont D p, whch s taken after T ;p accessed x ;p, to construct a tr-consstent global checkpont contanng B. Thus, there exsts a data tem x ;p whch s nether x nor x. On such a data tem, we can only use a checkpont taken before T ;p and T ;p have accessed x ;p to construct a tr-consstent global checkpont contanng A; on the other hand, we can only use a checkpont taken after T ;p and T ;p have accessed to construct a tr-consstent global checkpont contanng B. So, for data tem x ;p, there s no checkpont that can be combned wth both A and B to construct a tr-consstent global checkpont. Ths proves the Theorem n the base case wth w ¼. Next, assume that f there s a zgzag path from A to B whch contans consecutve reverse edges wth length at most k, then A and B together cannot belong to a tr-consstent global checkpont. We prove that f there exsts a zgzag path from A to B whch contans consecutve reverse edges of length k þ, then A and B cannot belong to the same tr-consstent global checkpont. Suppose the sequence of consecutve reverse edges along the zgzag path from A to B are T ; T u ; (u 6 k þ ), T ; T u ; (u 6 k þ ),, and T ;v T u v ;v (uv 6 k þ ). Thus, on the zgzag path from A to B that we consder, we have consecutve reverse edges of lengths u ;...; uv, (u 6 k þ 8). Each of these reverse edges should come from the local seralzaton graph of a data tem. Suppose the reverse edges are edges of local seralzaton graphs of data tems x ; ;...; x u ;;...;..., x ;v ;...; x u v ;v respectvely. Fg. 6a shows the zgzag path along wth the data tems from whch each of the reverse edges along the path comes. Frst, we show that at least one of the data tems x ; ; ; x u ;...;... x ;v ; ; x u v ;v s not equal to x (recall that A s a checkpont of the data tem x ). Suppose x ; ;...; x u ;...;..., x ;v ;, x u v ;v are all the same as x. Then A; T ; ;...; T u ;...;... T ;v ; ; T u v ;v are transactons accessng x. Based on Observaton, two cases arse: () A! þ T uv ;v. If ths s the case, a path A! B va T uv ;v exsts, and hence A and B together cannot be part of a tr-consstent global checkpont by Theorem. () T uv ;v! þ A. Because of the sequence of reverse edges T ;v T uv ;v on x, from Observaton 3, we have T ;v! þ A. Then, when we consder the sequence of reverse edges T ;v T u v ;v, the followng two sub-cases arse: (.) A! þ T u v ;v. In ths case, a cycle (namely, T ;v! þ A! þ T u v ;v! þ T ;v ) from T ;v to tself exsts, whch s a contradcton to the fact that the schedule of T [ T SR. (.) T u v ;v! þ A. Because of the sequence of reverse edges T ;v T u v ;v on x, based on Observaton 3, we have T ;v! þ A. In ths case, we need to consder the prevous sequence of reverse edges T ;v T u v ;v and repeat the analyss smlar to case (.) and (.). ontnung ths process, we wll end up wth a cycle n the seralzaton graph whch s a contradcton to the fact that T [ T SR. Ths means that our assumpton that x ; ;...; x u ;...;...x ;v,...; x u v ;v are all x s wrong. In Fg. 6b, dotted lnes wthout an X mark show the possble paths and the dotted lnes wth an X mark show the mpossble paths. Usng smlar arguments, we can show that not all the data tems x ; ;...; x u ;...;...x ;v ;..., x u v ;v can be equal to x. Fg. 6c llustrates ths. Suppose x ; ;...; x u ;...;...x ;v ;...; x u v ;v are all x. So far we have proved that there must exst a data tem assocated wth at least one reverse edge n the zgzag path from A to B that s dfferent from both x and x. Suppose such a data tem s x g;p and s assocated wth the reverse edge T g;p T gþ;p whch s one of the reverse edges n the sequence of reverse edges T ;p T up;p. Next, we prove that A and B cannot be part of a tr-consstent global checkpont. Fg. 6d can help n understandng the proof. On data tem x ; that both T ; and T ; have accessed, no checkpont D, taken after T ; has accessed x ;, can be combned wth A to construct a consstent global checkpont because there s a path from A to D. Therefore we can only use some checkpont, taken before T ; on x ; to construct a consstent global checkpont contanng A. Onx ;, whch both T ; and T ; have accessed, no checkpont taken after T ;, say D, can be combned wth to form a consstent global checkpont because there s a zgzag path from to D wth consecutve reverse edges of length at most k (by nducton hypothess). So we have to use some checkpont on x ;, whch was taken before T ; accessed x ;.

367 J. Wu et al. / Informaton Scences 79 (9) 369 367 Proceedng lke ths, on data tem x g;p, whch was accessed by the transactons T g;p and T gþ;p, no checkpont D p taken after both T g;p and T gþ;p have accessed can be combned wth p, to construct a consstent global checkpont by nducton hypothess ( due to the exstence of the zgzag path contanng consecutve reverse edges of length at most k). So we have to use some checkpont p whch was taken before T gþ;p have accessed x g;p. Fg. 6. Dependency of transactons II.

J. Wu et al. / Informaton Scences 79 (9) 369 367 367 On the other hand, on x ;v, whch both T ;v and T ;v have accessed, no checkpont v that was taken before T ;v accessed x ;v can be combned wth B to construct a tr-consstent global checkpont because v has a zgzag path to B wth consecutve reverse edges of length at most k. Therefore, on x ;v, we have to use some checkpont Dv, that was taken after T ;v accessed x ;v.onx ;v, whch both T ;v and T ;v have accessed, we cannot use any checkpont v that was taken before T ;v to construct a consstent global checkpont contanng Dv due to the exstence of a zgzag path wth consecutve reverse edges of length at most k. So we have to use some checkpont D v on x ;v that was taken after T ;v accessed. Proceedng lke ths, on x g;p, we have to use some checkpont D p that was taken after T g;p has accessed to construct a tr-consstent global checkpont contanng D pþ. Thus, for the data tem x g;p, whch s dfferent from both x and x, no checkpont that was taken before T g;p accessed x g;p can be used to construct a tr-consstent global checkpont contanng A and no checkpont taken after T gþ;p accessed x g;p can be used to construct a tr-consstent global checkpont contanng B. Snce no checkponts exsts between T g;p and T gþ;p on x g;p, t does not have any checkpont that can be combned wth both A and B to construct a tr-consstent global checkpont. Therefore, A and B cannot belong to a consstent global checkpont. Ths proves the theorem. h orollary. A checkpont of a data tem n a dstrbuted database can be part of a tr-consstent global checkpont of the database ff t does not le on a zgzag cycle. Proof. Follows from the Theorem by takng S as the sngleton set contanng the checkpont. h.. Applcatons orollary and Theorem are useful for constructng a tr-consstent global checkponts ncrementally. We can start wth any checkpont of any data tem that s not on a z-cycle, and keep addng checkponts from other data tems wthout volatng Theorem untl we have fnshed constructng a tr-consstent global checkpont of the entre database. Ths would help n falure recovery, because when a falure occurs the database needs to be restored to a tr-consstent global checkpont. When data tems are checkponted ndependently, some of the checkponts of some of the data tems may be useless because they cannot be part of any tr-consstent global checkpont, as llustrated n orollary. So, Theorem can throw lght on desgnng non-ntrusve checkpontng algorthms that allow each of the data tems to be checkponted ndependently whle at the same tme makng all checkponts useful. A federated database system (FDBS) s a collecton of cooperatng database systems [6,,3,3,]. Kleewen [] dscusses practcal ssues wth commercal mplementaton of federated databases. The ndvdual database systems n a FDBS could be heterogeneous and dstrbuted across several geographcally separated stes. In such a system, the ndvdual databases are somewhat autonomous and hence almost all transactons updatng a database wll be local transactons. Thus, the ndvdual databases can be checkponted ndependently n a non-ntrusve manner. However, when a falure occurs, all the component databases should be restored to a transacton-consstent global checkpont. So, constructng a tr-consstent global checkpont would be useful n such systems. Federated database systems are lkely to play an mportant role n the future, especally n ntegratng medcal databases. Eventhough the concept of federated databases have been proposed n the early 9 s, t has not been wdely mplemented. FDBSs are sutable for ntegratng complex data. For example, as Mulu et al. [] pont out, large-scale bobank-based post-genome era research proects lke GenomEUtwn (an nternatonal collaboraton between eght Twn Regstres) requre extensve amounts of genotype and phenotype data combned from dfferent data sources located n dfferent countres. Buldng a sold nfrastructure for accessng such data requres usng the model of federated databases. Mulu et al. [] also descrbe how they constructed a federated database nfrastructure for genotype and phenotype nformaton collected n seven European countres and Australa and connected ths database settng va a network called TwnNET.. oncluson heckpontng has been tradtonally used for handlng falures n dstrbuted database systems. An effcent checkpontng algorthm should be non-ntrusve n the sense that t should not block the normal transactons whle checkponts are taken. A smple approach would be to run a read only transacton whch would read the entre database and store t n stable storage. The underlyng concurrency control algorthm would ensure that the saved state s tr-consstent. Ths approach would be very neffcent, especally n the presence of long-lvng transactons. If each data tem s ndependently checkponted, not all the checkponts taken may not be useful for constructng a tr-consstent global checkpont of the entre database. We have presented the necessary and suffcent condton for a set of checkponts of a set of data tems n the database to be part of a tr-consstent global checkpont of the dstrbuted database. Ths theory helps n determnng whch checkponts are useful for constructng tr-consstent global checkponts and whch are not. It also helps n constructng tr-consstent global checkponts of the database ncrementally startng from an useful checkpont of a data tem. Moreover, the necessary and suffcent condtons establshed can throw lght on desgnng non-ntrusve checkpontng methods whch allow data tems to be checkponted ndependently whle at the same tme ensure each checkpont taken s (useful) part of a tr-consstent global checkpont.