A Theory of Redo Recovery

Size: px
Start display at page:

Download "A Theory of Redo Recovery"

Transcription

1 A Theory of Redo Recovery David Lomet Microsoft Research Redmond, WA Mark Tuttle HP Labs Cambridge Cambridge, MA ABSTRACT Our goal is to understand redo recovery. We define an installation graph of operations in an execution, an ordering significantly weaker than conflict ordering from concurrency control. The installation graph explains recoverable system state in terms of which operations are considered installed. This explanation and the set of operations replayed during recovery form an invariant that is the contract between normal operation and recovery. It prescribes how to coordinate changes to system components such as the state, the log, and the cache. We also describe how widely used recovery techniques are modeled in our theory, and why they succeed in providing redo recovery. 1. INTRODUCTION After a system crash, redo recovery tries to reconstruct the state of the database at the time of the crash by redoing some subset of the logged operations in some order. There are many techniques for doing this [2, 3, 4, 14], and there has been some work on classifying them [1, 7] and explaining specific techniques [9]. But what are the general principles underlying redo recovery? For redo recovery to work, the state update and recovery processes must cooperate in some fashion. The goal of this paper is to characterize the nature of this cooperation as precisely as possible. 1.1 State Update Database operations change the state of the database, and state update puts these changes into the stable state. The order in which these updates are made is crucial to the success of redo recovery. Consider the operations A : x y +1 and B : y 2 that read and write variables x and y, both initially 0. Consider Scenario 1 illustrated in Figure 1, in which the operations are invoked in the order A followed by B, and B s changes update the state before the crash, but not A s. Now there is no way for redo recovery reconstruct the value 1 for x simply by redoing one of A or B or both. The problem is that there is a read-write conflict from A to B Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2003, June 9-12, 2003, San Diego, CA. Copyright 2003 ACM X/03/06...$5.00. stable state stable state x y +1 y 2 y =2 x =1 y =2 install y Figure 1: Read-write edges are important. y 2 x y +1 x =3 x =3 y =2 install x redo y 2 recovery impossible crash x =3 y =2 crash Figure 2: Write-read edges are unimportant. in the conflict graph, and updating the state with B s changes before A s violates this ordering. The problem is that B s update of y makes it impossible to redo A to regenerate the value of x. One way to ensure that redo recovery is always possible is to use the conflict graph. If state updates are made in conflict graph order, then redo recovery can replay the remaining operations in conflict graph order and recover the state. But state update has more flexibility than this. Consider Scenario 2 illustrated in Figure 2, in which the operations are invoked in the order B followed by A, and A s changes update the state before the crash, but not B s. Now the state can be recovered by redoing B, in spite of the fact that updating the state with A s changes violated the write-read edge from B to A in the conflict graph. A s update to x (which required reading y) does not interfere with B s ability to update y. Hence, the state update order does not need to respect the write-read edges in the conflict graph. To capture this, we define the installation graph to be the conflict graph with the write-read edges removed, and we use this graph to

2 stable state x x +1 y y +1 x y +1 y =1 x =2 y =1 install y redo x y +1 Figure 3: Only exposed variables matter. x =2 y =1 crash guide the state update procedure. We can prove that if state update writes the changes made by operations in an order consistent with installation graph order, then redo recovery can replay the remaining operations in conflict graph order and recover the state. These installed operations form a prefix of the installation graph. As long as the stable state can be understood or explained as the result of changes made by the operations in a prefix of the installation graph, redo recovery can recover the state after a crash. Again, however, state update has more flexibility than this. Consider the operations C : x x +1;y y +1 and D : x y +1 that read and write variables x and y, both initially 0. Consider Scenario 3 illustrated in Figure 3, in which the operations are invoked in the order C followed by D, but only C s change to y updates the stable state before the crash, and not x. Notice that redo recovery can recover the state by simply replaying D. There is a read-write edge from C to D due to the variable x, so the installation graph demands that C s changes are made before D s, but there is really only one change made by C that is visible to later operations, and it isn t the value of x. C s change to y is exposed to D because D reads y, butc s change to x is not exposed to any following operation because D overwrites x before any other operation has a chance to read it. To capture this, we define the notion of an exposed variable. Given a set I of operations that have been installed into the stable state and its complement U of uninstalled operations, we say that a variable x is exposed if no operation in U reads or writes x (meaning that x already has its final value), or if some operation in U accesses x and the first such operation reads x (meaning x has to have the right value if the system crashes now). We can prove that to install an operation, one need only update its exposed variables. We say that a prefix of the installation graph explains the state if each variable x left exposed by the operations in the prefix has the value written by the last operation writing x in the prefix. Our first main result is that if the stable state can be explained by a prefix of the installation graph, then recovery can replay the remaining operations in conflict graph order and recover the state. 1.2 Recovery Informally, the preceding result says that if a prefix of the installation graph explains the values of the exposed variables, then recovery can recover the state by replaying the remaining operations in conflict graph order. Conversely, if recovery chooses to replay a set of operations, then the remaining operations must form a prefix of the installation graph explaining the values of the exposed variables in order for recovery to succeed. This is our second main result. We capture this requirement with a recovery invariant. This invariant says that if, at any time, the recovery procedure would choose to redo some set of operations, then the set of remaining operations form a prefix of the installation graph explaining the state. This is the sense in which state update and recovery must cooperate. The point of this paper is that the recovery invariant serves as contract between state update and recovery that ensures recoverability. We end this paper with a discussion of how our framework explains logical, physical, and physiological redo recovery techniques, and how these techniques maintain the recovery invariant. We go farther, in fact, and show how focusing on maintaining the recovery invariant allows us to generalize physiological operations to log the split of a B-tree node more efficiently than can be done with conventional physiological operations. Making the preceding ideas mathematically precise requires some technical machinery. First, theory is traditionally a bit informal about the relationship between operation sequences and conflict graphs, and we find it helpful to nail down this relationship. Second, theory is also somewhat informal about the relationship between between a conflict graph and the changes made to the state by operations in the graph, so we define a state graph to help us talk about state changes. To some extent, this machinery simply lets us prove intuition many already have about recovery. More significant, however, is our formulation of an abstract redo recovery procedure. Recovery actually involves many subtle issues that we have ignored in this introduction, such as interactions among logs, checkpoints, log scans prior to recovery, and the the method of choosing the set of operations to redo during recovery. Finally, we define the notion of a write graph, a kind of state graph derived from an installation graph, and use this write graph to model how real systems accumulate the effects of multiple operations before changing the stable state and installing these operations. 1.3 Prior Work We have written on recovery theory in the past [12, 13]. Our prior formulations focused primarily on characterizing states from which it is possible to recover. In this work, we focus on the interaction between changing the state and recovery. In particular, our recovery invariant is a precise statement of the contract between the state update process (both during normal operation and during recovery) and the recovery process (and its selection of operations to replay after a crash) in order for these two processes to interact seamlessly. In addition, our work has several other properties: 1. Simpler: Our definition of the installation graph is now a simple weakening of the conflict graph. Our prior definition [12] removed write-write edges in addition to write-read edges, but identifying which write-write edges to remove involved an elaborate construction. It turns out that the two definitions are equivalent, in the sense that a state is explainable by a prefix of one iff it is explainable by a prefix of the other. 2. More precise: Our definition of the state graph lets us capture precisely our intuitive understanding of the relationships among the state, the cache, and the conflict and installation graphs. State graphs unify how we treat stable state and volatile state, and permit us to consider regimes that maintain multiple versions of variables. 3. More abstract: We focus on recovering state without assuming any particular structure for the database implementation, such as how the state is stored on stable or volatile storage,

3 in contrast to [12]. This abstract characterization can be applied in a number of ways to real systems, as demonstrated in Section More complete: We consider general redo tests, instead of restricting attention to redo tests satisfying a particular specification [12]. We also model redo tests and their interaction with the state update process more realistically. Prior formulations [12] modeled the interface between state update and recovery in terms of a particular explanation of the stable state, but we know that real recovery procedures do not explicitly refer to such an explanation at the start of recovery. Real systems fix a redo procedure, and engineer the rest of the system to enforce the recovery invariant with that specific procedure in mind. The remainder of this paper is organized as follows. In Section 2, we define our model of a database and the notions conflict and state graphs. In Section 3, we define the installation graph and exposed variables, and we prove that an explainable state can be recovered. In Section 4, we define the abstract recovery protocol, describe when it will lead to successful recovery, and state the recovery invariant. In Section 5, we define the write graph and show how the write graph can be used to manage the state so that recovery is possible. Section 6 describes how real systems maintain the recovery invariant, and thus successfully provide recovery. Finally, in Section 7 we discuss some future directions for this work. 2. PRELIMINARIES We begin with a model of a recoverable system, the definitions of conflict and state graphs, and the notion of an exposed variable. 2.1 System Model Our system model is quite simple. A full model of a database would require distinguishing between stable state and volatile state, and providing cache managers and log managers to coordinate the movement of information from the volatile state to the stable state. Our notions of a state and changes to this state are indifferent to whether these changes are physically recorded on a disk or in a cache. We study the problem of changing state in a way that allows us to recover the remaining changes if the systems should crash before they are all installed. This model can be extended to a model that more faithfully represents pragmatic implementations of database systems, as we have done [12] and as we intend to do in the future. This model, however, has enough detail to allow us to explain how state changes and recovery are coordinated. A recoverable system has a set of variables and a set of values they can assume. A system state describes the values of the variables at a given point in time. A program accesses or changes the state by invoking system operations, and these changes are recorded in the log with logged operations. These two sets of operations can be quite different, and the only operations of interest to us are the logged operations. Each operation atomically reads a set of variables and then writes a set of variables. These ideas can be modeled mathematically as follows. Fix a set of variables and a set of values. A state is a function that maps each variable to a value. An operation is a function with a fixed set of input variables and a fixed set of output variables. We call these sets of input and output variables the read set and write set, respectively. An operation sequence is a sequence O 1O 2...O k of operations. A state sequence is a sequence S 0S 1S 2...S k of states generated by an operation sequence O 1O 2...O k, where S 0 is an initial state and each S i is the result of applying O i to S i 1. The state sequence generated by an operation sequence obviously depends on the initial state S 0, but the value of S 0 is usually clear from context or unimportant, and we usually omit reference to it. In the next two sections, we define directed graphs to model operation sequences and state sequences. Given a directed graph, the predecessors of a node n is the set of all nodes m such that there is a path from m to n in the graph. A prefix of a directed graph is a subgraph induced by a set of nodes having the following property: If a node is in the prefix, then all of its predecessors are in the prefix. 2.2 Conflict Graph An operation sequence generates a conflict graph defined as follows. For an operation O, the preceding write to x is the preceding operation W that writes x such that there is no operation writing x between W and O. Similarly, the following write to x is the following operation W that writes x such that there is no operation writing x between O and W. The conflict graph generated by an operation sequence is a directed, acyclic graph with each node labeled with an operation. There is an edge from O to P if they conflict in one of the following ways: write-write conflict: O writes x and P writes x and O is P s preceding write. write-read conflict: O writes x and P reads x and O is P s preceding write. read-write conflict: O reads x and P writes x and P is O s following write. We assume that operations labeling nodes are distinct, so we can refer to a node by the operation labeling it. Conversely, a conflict graph generates a set of operation sequences obtained by totally ordering the operations labeling the graph. The following result states that each of these operation sequences can be used to generate the conflict graph. Lemma 1: If σ is any total ordering of the operations labeling a conflict graph C, then C is the conflict graph generated by σ. One consequence of this lemma is that we can model a log as a set of operations ordered only by the conflict graph. It is not necessary to have a totally ordered log reflecting the exact execution order of the operations. Only conflicting logged operations need to be ordered. 2.3 Exposed Variables Conflict graphs help specify what values variables should have during recovery. The recovery process replays a subset of the operations in conflict graph order, bypassing others which are considered installed. If x contains the wrong value at the start of recovery, then an operation replayed during recovery must write to x to give it an appropriate value before any other operation replayed during recovery reads x. Then the current value of x will never be observed and can be considered unexposed. Otherwise the current value of x is exposed because an operation reads it during recovery or it is needed following recovery. Notice that, given a subset of the operations in a conflict graph that read or write x, the conflict graph induces a partial order on these operations, so the notion of a minimal such operation is well-defined. Let C be a conflict graph and let I be a subset of the operations labeling C. For example, I might be the operations labeling a prefix of C, representing the set of installed operations. A variable x is exposed by I if no operation outside of I accesses x,or

4 some operation outside of I accesses x, and a minimal such operation reads x, and x is unexposed otherwise. In particular, a variable x is unexposed by I if some operation outside of I accesses x, and a minimal such operation writes x without reading x. The conflict graph C grows as operations are performed during normal execution. The set I of installed operation grows as updates made by these operations are written into the state. If the conflict graph C grows and the installed set I does not, then the status of a variable x may flip from exposed to unexposed, but once it becomes unexposed by I, it remains unexposed. On the other hand, if the installed set I grows and the conflict graph C does not, then the status of a variable x can flip back and forth between exposed and unexposed. 2.4 State Graphs A state graph is an abstract representation of a state sequence, just as a conflict graph is an abstraction of an operation sequence. The conflict graph tells us how the changes made by operations in the graph interact, but it does not tell us what these changes are. A state graph models the evolution of the state over time. We show that two operation sequences that have the same conflict graph also have the same conflict state graph. A state graph is a directed, acyclic graph satisfying the following properties: Each node n is labeled with a set ops(n) of operations, and a set writes(n) of variable-value pairs x, v, at most one pair x, v for each variable x, and only pairs x, v for variables x written by operations in ops(n). For each pair of nodes m and n, ifwrites(m) and writes(n) both contain a variable-value pair for variable x, then there is a path from m to n or from n to m. We assume that the sets of operations labeling the nodes are disjoint. We say that node n writes v to x if the set writes(n) labeling n contains the pair x, v. We define vars(n) to be the set of variables x such that x, v writes(n) for some value v. An operation sequence generates a state graph as follows. Given an initial state S 0, the state graph S generated by O 1 O n is obtained by relabeling each node n of the conflict graph generated by O 1 O n as follows: the set ops(n) is the singleton set {O i} where O i is the operation labeling n in the conflict graph, and the set writes(n) is the set of all variable-value pairs x, v such that x is in O i s write set and v is the value of x in S i, where S 0S 1 S n is the state sequence generated by O 1 O n. Notice that the value of x in S i is the value that O i writes when it is performed in state S i 1. Since the operations labeling the conflict graph are distinct, the same is true of the state graph, so we can again refer to nodes by the operations that label them. In the preceding definition, an operation sequence O 1 O n was used to generate a state sequence S 0S 1 S n, and the state sequence was used to generate a state graph S. We can also use the state graph S to recover the states in the sequence S 0S 1 S n. The natural state defined by a state graph S maps each variable x to the last value written to x by any node in the graph. Since the definition of a state graph requires that the nodes writing x are totally ordered, the last value written to x is well-defined. Of course, there may be variables that are never written by any node in S or by any x=0,y=0 x=1,y=0 x=1, y=2 x=3, y=2 O: readset{x} writeset{<x,1>} Q: readset{x} writeset{<x,3>} P: readset{x} writeset{<y,2>} Figure 4: Conflict state graph operation in the sequence O 1 O n used to generate S. Values for these variables must come from the initial state S 0 in the sequence S 0S 1 S n used to generate S. Formally, given an initial state S 0, the state determined by a state graph S maps a variable x to a value v, where x, v writes(n) and n is the last node in S with x vars(n), or to the value of x in S 0 if there is no node n with x vars(n). Since a prefix of a state graph is a state graph, every prefix of the state graph S determines a state, and the next lemma shows that we can reconstruct the state S i in the sequence S 0S 1 S n used to generate S by considering the subgraph of S induced by the operations O 1,...,O i. Lemma 2: If the operation sequence O 1 O n generates a state sequence S 0S 1 S n and a state graph S, then S i is the state determined by the prefix ofs induced by the operations O 1,...,O i. In fact, we can prove that any state determined by any prefix of this state graph is reachable by any total ordering of the operations labeling that prefix. Finally, the state graph generated by an operation sequence depends only on the conflict graph of the operation sequence. The crucial observation is that the state graph depends only on the order of conflicting operations. Since the conflict graph characterizes the order of conflicting operations, the conflict graph uniquely determines a state graph. In light of this, we say that the conflict state graph for a conflict graph C is the state graph generated by any operation sequence generating C, since they are all the same. We call the state determined by the entire conflict graph the final state. Our goal will be to recover the final state. Figure 4 illustrates the conflict graph and its state graph produced by the sequence of operations O (reading x and writing x), P (reading x and writing y) and Q (also reading x and writing x). The solid lines separating the operations indicate the system states determined by the prefixes that the lines identify. The values of the variables of these states are given in the rectangles associated with the lines. Most of the results in this section are intuitively clear or are things that one might expect to be true, but we have established them from first principles. Recovery is essentially the problem of reconstructing state without knowing much about the way the state was originally constructed, and state graphs tell us how much flexibility we have when reconstructing the state.

5 3. POTENTIAL RECOVERABILITY In this section, we identify the states that can be recovered by replaying operations. Given a conflict graph, we say that a state is potentially recoverable if we can replay some subset of the operations in the conflict graph from this state in conflict graph order and end up with the final state determined by the conflict graph. 3.1 Installation Graph What sets of operations can appear as installed in a potentially recoverable state? One trivial answer: if a state contains the operations labeling a prefix of the conflict graph, then we can replay the remaining operations in conflict graph order and recover the state. However, this is not the only answer. In Scenario 2 in the introduction, we saw a conflict graph with a write-read edge from B to A, yet we can take a state with A installed and recover the final state by replaying B. The state containing A is potentially recoverable, yet the set {A} does not determine a prefix of the conflict graph. However, {A} does determine a prefix of a graph we call the installation graph. Prefixes of the installation graph include the prefixes of the conflict graph. Their sets are the sets of operations that can appear in potentially recoverable states. The installation graph is the subgraph of the conflict graph obtained by removing the edges resulting solely from write-read conflicts. In the example of Figure 4, the conflict graph contains an edge from O to P that denotes a write-read conflict. Hence, the installation graph for the example in Figure 4, which is shown in Figure 5, does not have an edge from O to P. Only the edges from O to Q (a write-write and read-write conflict) and from P to Q (a read-write conflict) remain. The removed conflict graph edge is shown by a dotted arrow in Figure 5. It is easy to see that the operations labeling a prefix of the conflict graph induce a prefix of the installation graph, but not vice versa. Every prefix of the installation graph induces a prefix of the installation state graph labeled with the same operations. The state determined by a prefix of the installation graph is the state determined by the corresponding installation state graph prefix. This state contains the final values for all variables written by the operations in that prefix (when the operations are executed in conflict graph order). In Figure 5, the conflict graph edge from O to P does not appear in the installation graph and there is an additional recoverable state that can be identified, denoted by the dashed line separating operation nodes P from nodes O from Q. 3.2 Explainable States Does a potentially recoverable state need to contain values for all of the variables written by the operations in a prefix? Not necessarily. In particular, values of unexposed variables are irrelevant. We say that a variable x is exposed by a prefix of the installation graph if x is exposed by the set of operations labeling the prefix, and x is unexposed otherwise. We say that a prefix σ of the installation graph explains a state S if the exposed variables have the same value in S and the state determined by σ. We call states that are explained by a prefix of the installation graph explainable. An unexposed variable need not contain any specific value. We will show that explainable states are potentially recoverable. 3.3 Replaying Operations Consider just the first step of recovering a state, namely replaying the first operation whose effects do not appear in the state. We say that an operation O is applicable to a state S if the values of variables in O s read set are the same in S and the state determined by O s predecessors in the conflict graph. This means that O will read the same values in S as it did in any execution represented by x=0,y=0 x=1,y=0 x=1, y=2 x=3, y=2 O: readset{x} writeset{<x,1>} Q: readset{x} writeset{<x,3>} P: readset{x} writeset{<y,2>} x=0,y=2 Figure 5: Installation state graph illustrating the removal of write-read edges. the conflict graph, so it will write the same values. An operation O is said to be installed in a state S relative to a particular prefix σ that explains S if O is in σ, and is uninstalled otherwise. A minimal uninstalled operation after σ is a minimal operation in the conflict graph that is not in σ. For example, in Figure 5, the minimal uninstalled operation after O is P, while the minimal uninstalled operation after P, a prefix permitted by the installation graph, is O. Note that in all cases, the minimal uninstalled operation sees exactly the same values for its read set, e.g. operation O sees x = 0 even when operation P is installed before it, as it does when operations are executed in conflict graph order. We denote by σ; O the extension of a prefix σ by a minimal uninstalled operation O, and we denote by S; O the state obtained by applying O to S. We can show that if S is a state explained by σ and if O is a minimal uninstalled operation after σ, then O is applicable to S and S; O is a state explained by σ; O. 3.4 Potentially Recoverable States It should be clear that an explainable state is potentially recoverable. If we can identify an installation graph prefix explaining the state, then we can recover the final state by replaying uninstalled operations in conflict graph order. Theorem 3 (Potential Recoverability Theorem): If S is a state explained by a prefix σ of the installation graph, then S is potentially recoverable. PROOF. We proceed by induction on m 0, the number of uninstalled operations. Suppose m =0. Since there are no uninstalled operations, σ contains every operation in the conflict graph. Since σ explains S and since every variable is exposed, each variable x in S has the value written by the last operation writing x in σ and hence in the conflict graph, so each variable x has the same value in S and the state determined by the conflict graph, so S is the state determined by the conflict graph. Suppose m 0 and the induction hypothesis holds for m 1. Since O is a minimal uninstalled operation, O is applicable to S. Further, σ; O is a prefix of the installation graph that explains S; O. Since the number of uninstalled operations after σ; O is m 1, the induction hypothesis implies that replaying these uninstalled operations against S; O in any order consistent with the conflict graph

6 yields the state determined by the conflict graph. Since replaying O and then replaying these operations amounts to replaying the operations uninstalled after σ in an order consistent with the conflict graph, we are done. 4. RECOVERY So far we have characterized the states from which we can recover. Now we focus on the recovery process itself. We describe an abstract recovery procedure and what is required for this procedure to work. A recovery procedure begins with the state and the log as of the time of the crash. The recovery procedure scans the log and examines each log record in turn. It determines whether the operation mentioned in this record should be replayed, and replays the operation if the answer is yes. When the recovery procedure reaches the end of the log, it has rebuilt the system state, and it terminates. The heart of this procedure is the redo test that recovery uses to determine whether a logged operation should be replayed. During normal operation, the system may take a checkpoint. A checkpoint procedure updates the state and the log so as to identify a point in the log prior to which the effects of logged operations are reflected in the state. If the system crashes at a later point, the recovery procedure need only examine the part of the log following this checkpointed log prefix instead of starting at the beginning of the log. Some recovery procedures have an analysis phase in which they scan the log to find information needed by the redo test to determine whether an operation should be replayed. An analysis phase usually happens at most once, at the start of recovery. We model each of these aspects of recovery as follows. 4.1 The Log In practice, a log is a linear sequence of records, each record containing information about the invocation of an operation, together with additional information about this operation and its invocation, with log records maintained in invocation order. Given a conflict graph C,wedefine a log for C to be any directed, acyclic graph with the following properties: Logged operations are from the conflict graph: Each node (record) is labeled with an operation and possibly some other labels, and the set of operations labeling the conflict graph and the log are the same. Log order is consistent with the conflict order: If there is a path from O to P in the conflict graph, then there is a path from O to P in the log. 4.2 The Checkpoint There are many ways that systems implement a checkpoint. It might be as simple as flushing the log to disk and writing a special checkpoint record to the end of the log, indicating where the recovery procedure should begin, or it could be much more complicated. No matter how the checkpoint is implemented, its function is to identify a set of operations (or log records) that the recovery procedure can ignore. We assume that the recovery procedure is called with a set of operations that have been checkpointed. The checkpointed log records usually constitute a prefix of the log, but that is not required. What is required is that the checkpointed log records denote operations that do not need to be replayed during recovery (because these operations are installed). 4.3 The Analysis We permit the analysis phase of recovery to be arbitrary. We assume the recovery procedure has a function analyze that performs procedure recover(state, log, checkpoint) unrecovered = operations(log) - checkpoint analysis = null while unrecovered is not empty O = minimal operation in unrecovered analysis = analyze(state,log,unrecovered,analysis) state = if redo(o,state,log,analysis) then O(state) else state unrecovered = unrecovered - {O} end while Figure 6: A redo recovery procedure. the analysis phase and returns some value called the analysis. This value might be a simple log position or a complicated data structure. In the simple case, the analysis function might map the state and the log at the start of recovery to a position in the log for the start of recovery (the position of the last checkpoint record, for example). In a more complex case, an analysis phase might be performed once for each operation before invoking the redo test on this operation, e.g. to determine the set of currently unrecovered operations and even the result of the previous analysis phase. To cover this range of options, our model permits the analyze function to map a state, a log, a set of operations (the unrecovered operations), and an earlier analysis to another analysis. 4.4 The Recovery Procedure The heart of the recovery procedure is a redo test that returns true or false, given an operation, a state, a log, and an analysis. For a redo test redo and an analysis function analyze, we define the redo recovery procedure to be the algorithm recover given in Figure 6. The recovery procedure is called with a state, a log, and a checkpoint at the time of the crash. It begins by setting the unrecovered operations to the logged operations not appearing in the checkpoint and then considers the unrecovered operations in log order. For each operation O, the procedure goes through an analysis phase, then it invokes the redo test to determine whether the state should be updated by replaying O. This procedure has an analysis phase at the start of every iteration of the loop. The typical case where there is a single analysis phase at the start of recovery is captured when the analysis phase is invoked with the initial null value and is the identity function otherwise. Given a state, a log, and a checkpoint, the recovery procedure replays some subset of the operations operations(log) in the log. Let U be the set of (uninstalled) operations for which the redo test returned true. Then the set I of remaining operations are the installed operations. For every variable v appearing in the recovery procedure, let v i be the value of v at the end of the ith iteration of the main loop, and let v 0 be the initial value of v at the start of the first iteration. Let redo set = {O i : redo(o i, state i 1,log i 1,analysis i)} be the set of operations replayed during the execution; that is, the set of operations O i such that on the ith iteration of the main loop the redo test returned true when applied to O i. Let installed i = operations(log) (redo set unrecovered i). be the set of all logged operations that will not be redone after the ith iteration of the loop, and hence can be considered installed. We can now prove the procedure recover is correct:

7 Corollary 4 (Recovery Corollary): Given a state, a log, a checkpoint, and an execution of the procedure recover, if the set of installed operations operations(log) redo set induces a prefix of the installation graph that explains state, then recover terminates with the state determined by the conflict graph. PROOF SKETCH. It follows by a simple inductive argument that installed l is a prefix that explains state l at the end of each iteration l. In particular, the recovery procedure terminates with a state explained by the entire installation graph. This means that the exposed variables in this final state have the same value in this state and the state determined by the entire installation graph. Since all variables are left exposed by the entire installation graph, and since the state determined by the installation graph and the conflict graph are equal, the recovery procedure terminates in the state determined by the conflict graph. 4.5 Keeping Systems Recoverable We have just shown that the recovery procedure will recover the system state as long as the following is true: Recovery Invariant The set operations(log) redo set induces a prefix ofthe installation graph that explains the state. Supporting recovery is difficult because the truth of this invariant depends on every system component. The state is what needs to be explained. The log is where the operations are recorded. The checkpoint and the log initialize the set of unrecovered operations in the recovery procedure. The redo test determines which of these unrecovered operations are replayed, and hence what prefix of the installation graph must explain the state. Maintaining this invariant requires that every change to the state be accompanied by a corresponding change to the set of operations that the redo test choses to redo. 5. MANAGING SYSTEM STATE If a system updates the stable state in installation graph order, then the stable state can always be recovered. However, real systems don t update the state with changes made by operations just one operation at a time. When they move pages from the cache to the disk, a page frequently contains changes made by a number of operations. This can make installing operations a bit tricky. Consider the operations E : x y +1 then F : y x +1 then G : x x +1 After executing these operations, suppose the system updates the stable state with the value of x in an attempt to install E and G. Such an update violates a read-write installation graph edge between F and G. Similarly, we cannot update the stable state with the value of y, as that update would violate a read-write installation graph edge from E to F. Stable state is unrecoverable if either x or y is updated singly, since we can t recover the other value by replaying any combination of the operations. The system has to update the stable state with the values of both x and y atomically, hence installing the operations E, F, and G atomically. Accumulating the changes from multiple operations can require the system to update sets of variables atomically. Multi-variable write sets can also require this. However, we can take advantage of unexposed variables to keep these sets of variables from getting too large. Consider the operations H : x x +1;y y +1 then J : y 0 The fact that H writes both x and y appears to force us to update x and y atomically to install H. Further, if we accumulate the changes in y, it would appear that we need to install H and J atomically. However, we notice that J s blind write to y makes y unexposed after H. It follows that we need only update the stable state with the value of x, for us to consider H installed, since y s value is unexposed and does not need to be explained. As these examples illustrate, a system has a great deal of flexibility in how to update the stable state. We now define a write graph and operations on this graph that capture this flexibility. The write graph tells the system what sets of operations must be installed atomically, and what updates to the stable state are required in order to install them. We prove that if the system respects the write graph, then the stable state can always be explained by a prefix of the installation graph, and hence that the stable state can always be recovered. 5.1 Write Graph A write graph is a state graph where each node is labeled with a boolean variable installed such that the nodes with installed set to true form a prefix of the write graph, and where the underlying state graph is obtained from an installation state graph by applying the following operations: Install a node: Set the boolean variable installed labeling a node n to true. Every predecessor of n must have installed set to true. Add an edge: Add a directed edge from a node n to a node m. The node m must have installed set to false, and the resulting graph must be acyclic. Collapse nodes: Replace a set S of nodes with a single node n. The resulting graph must be acyclic. For node n, ops(n) is the union of ops(s) for all s S and writes(n) is the set x, v for which there is an s S satisfying: (i) x, v writes(s) and (ii) if x, v writes(t) for some t S then t is ordered before s in the old graph. There is an edge from m to n in the new graph if there is an edge from m to some s S in the old graph, and an edge from n to m in the new graph if there is an edge from some s S to m in the old graph. The value of installed at n is true iff any node in S has installed set to true. Remove a write: Remove a pair x, v from the set writes(n) labeling a node n. For every node m reading x, either m has installed set to true, or m is ordered before n and a node following n writes x without reading it. (For example, m could be ordered before n due to an edge resulting from the Add an edge operation). The simplest write graph is the installation state graph where each node corresponds to an installation graph node. If the installation graph node is labeled with operation O, then the write graph node is labeled with the variable-value pairs that O writes. A system can keep the state potentially recoverable by installing operations in installation graph order. This corresponds to updating the state by choosing nodes of the write graph in write graph order, and atomically updating the state with the values labeling the node. A system might choose to constrain the order of updates more than they are constrained by an installation graph state graph by adding an edge to the state graph with the Add an edge operation. Further, most systems do not maintain multiple versions of a page in cache, but keep a single copy that reflects the effects of all recent operations on the page. The system can enforce a single copy of a

8 x=0,y=0 x=3, y=2 Collapsed Node O: readset{x} writeset{<x,1>} Ops(n) ) = {O,P} Writes(n) ) = {<x,3>} Q: readset{x} writeset{<x,3>} P: readset{x} writeset{<y,2>} x=0,y=2 Figure 7: Write graph showing the collapse of nodes writing to x. page by merging the cache nodes of the write graph that write to the page with the Collapse nodes operation. Collapsing an uninstalled node with an installed node is frequently the way in which systems update the stable system state. If operations can write to multiple variables (pages), then merging write graph nodes that write a common variable can lead to a single write graph node writing a larger number of variables than any operation does on its own [12], requiring atomic writes to large portions of the state. We can sometimes avoid writing a variable (page) x updated by an operation of the write graph by removing the update to x from a node with the Remove a write operation. This is possible exactly when there is no uninstalled node that reads the value of x being removed. The operations on write graphs guarantee that a prefix of the write graph corresponds to a prefix of the installation graph from which it is derived, and the state determined by a prefix of a write graph is explained by the corresponding prefix of the installation graph. Thus, if we update the state in write graph order, then the state remains potentially recoverable: Corollary 5: The state determined by a prefix of a write graph is potentially recoverable. The identification of the states of write graph prefixes as states of installation state graph prefixes makes it obviously true that cache managers exploiting write graphs keep the system state explainable, and hence potentially recoverable. Figure 7 illustrates a write graph derived from our running example of Figure 4 in which all operations writing to a common set of variables are collapsed. That results in operations O and Q being collapsed, making some recoverable states inaccessible. Note in particular, that the edges of this write graph make it clear that the cache manager needs to write y into the state prior to writing x, since operation P needs to be installed prior to the collapsed operations O and Q. 6. REAL RECOVERY METHODS We now turn our attention to how actual recovery methods maintain the recovery invariant. To demonstrate the generality of our work, we consider three generic log-based recovery technologies logical, physical, and physiological recovery [6] and note that every log-based redo recovery method used today that we are aware of fits into one of these categories. These methods all update the state and make other changes that have the effect of updating redo set, and they make these changes in a way that updates the state and redo set atomically. Since these changes are made atomically, the recovery invariant is maintained, and the recover procedure can always recover the state after a crash. In all recovery methods described below, stable state is represented by a single write graph node, the initial or minimum node of the write graph. This node is the result of collapsing into a single node all of nodes of the state graph labeled with operations that are installed in the stable state. 6.1 Logical Recovery A logical operation may be thought of as a mapping from one database state to the next. At least in concept, the entire database (all its pages) may be read, and the entire database may be written. Hence, we need to atomically transform one database state, in its entirety, to a successor database state. This appears difficult, but in fact this kind of transformation was done previously by System R [5], though its operations were not actually logical operations. In System R, system stable state on disk is unchanged between checkpoints. Pages updated since the last checkpoint are maintained partially in a main memory cache and partially in a disk staging area. Pages in the cache contain the most recent updates to the pages, more recent than the updates in the staging area. The system periodically stops normal execution (quiesces). Prior to quiescing, the staging area pages do not constitute a write graph node. During the quiesce, the more recently updated cached pages are written to the staging area, either as new pages or as updates to existing pages. At this point, the staging area becomes the second node of a two node write graph, the other node being the stable state. (Alternatively, during normal system operation, the updates labeling this second write graph node are split between the cache and the staging area, and it is only after quiescence, after the cached updates have been flushed to the staging area, that the staging area itself contains all of the updates labeling this second node.) A checkpoint record is then appended to the log, indicating that the staging area pages now replace prior versions of these pages in the installed system state. Writing this checkpoint record swings a pointer that atomically installs into stable state all operations logged since the previous checkpoint. This collapses the two write graph nodes into a single node. After a crash, the recovery procedure starts with the stable system state identified in the last checkpoint record. It replays all subsequent log operations against that version. Writing the checkpoint record, therefore, simultaneously installs operations (collapses write graph nodes) and removes these operations from redo set by moving them to the checkpoint set. Since the change to state and the change to redo set are done atomically, the recovery invariant is preserved. 6.2 Physical Recovery Early recovery techniques frequently exploited physical recovery, logging the exact bytes of data and the exact locations written by the logged operations. Physical operations do not read data, they only write. So there are no write-read or read-write conflicts in the installation graph, only write-write conflicts. Both whole and partial page logging have been used [1]. Like logical recovery, all operations logged since a last checkpoint record on the log are replayed during recovery Stable system state is represented by the minimum write graph node. The write graph suffix represents the main memory cache

9 holding the accumulated changes not yet reflected in the stable state. In this write graph suffix, there is at most one node writing to any variable, since updates to a page are accumulated in the cache on a page-by-page basis. The installation graph and corresponding state graph consist of chains of nodes, one chain for each page consisting of all operations that access that page. The write graph, which is formed by collapsing all nodes with operations writing the same page, is an initial node followed by a single write graph node for each page. As with logical logging, checkpointing has the effect of installing operations. But now this is much simpler. The checkpoint process shifts log operations from redo set to the checkpoint set. While these operations are in redo set, the variables they update are unexposed, since physical operations do not read variables. Hence, values of these variables are irrelevant to recovery, so we can set them to whatever values are convenient. Prior to writing the checkpoint record, therefore, we set them to the values they possess in the cache (which includes the effects of the moved operations). Writing the checkpoint record therefore has the effect of atomically removing these operations from redo set and, since their effects are already present in the stable state, installing the operations. This atomic installation preserves the recovery invariant. 6.3 Physiological Recovery A physiological operation reads and writes exactly one page. It identifies the page by a physical page identifier, but performs a logical operation on that page. The operation s log record position is called its log sequence number (LSN). Each page of the system state is tagged with the LSN of the last operation that updated it. The LSN is usually on the page, but there are other ways of tagging pages [13]. LSNs increase monotonically with each new operation. Each update operation on the page sets the page LSN to its LSN. Stable state is again represented by the minimum write graph node. The write graph suffix is the same as for physical recovery. Since operations read and write at most one page, there are edges from the initial node to the subsequent write graph nodes, but there are no edges among these subsequent nodes. Consequently, all of these subsequent nodes are uninstalled minimal nodes, and the system is free to install their operation sets in any order. All the operations of the write graph node are installed into the state atomically when its page is written to disk. This atomic installation is modeled by collapsing a minimal node of the write graph into the initial node. This has the effect of updating the LSN of the page in the initial node with the LSN of the last operation writing to the page. After a crash, the recovery procedure scans the log. For each operation, it compares the LSN tagging the page with LSN of the operation. If the page LSN is at least as high the operation s LSN, then the operation is already installed and is bypassed during recovery. It follows that the LSN updating that occurs when write graph nodes are collapsed also has the effect of updating the set redo set of operations replayed during recovery. Since the change to the state and redo set is atomic, the recovery invariant is maintained. 6.4 Generalized LSN Based Recovery While the restriction of physiological operations to read and write exactly one page simplifies cache management, it is possible to exploit LSNs to deal with a broader class of operations. As with physiological operations, we associate the LSN of an updating operation with each variable (page) in its write set when the operation is executed. Thus, as before, the page LSN denotes the last operation that updated it. Since we are writing the updates of a write x=0,y=0 x=3, y=2 X: Old Page Y: New Page O: readset{x} writes{<x,1> x,1>} Collapsed Ops(n) ) = {O,P} Node Writes(n) ) = {<x,3>} Q: readset{x} writes{<x,3> x,3>} Update old node remove records Node split op Read old X, write new Y P: readset{x} writes{<y,2> y,2>} Write Y before X x=0,y=2 Figure 8: Write graph showing the effect of using log operations that read and write different variables to produce a B-tree split. graph node atomically, all variables in an installed operation s write set must be tagged with an LSN that is at least as large as the LSN of the operation. If an operation is uninstalled, all variables of its write set will have LSNs that are less than the operation s LSN. Exploiting log operations that can read and write different variables is potentially very useful for database recovery [11]. A log operation might deal with node splitting in a B-tree by reading the old full B-tree page and writing a new page with half the contents of the old node. Such an operation avoids physically logging the half of a splitting B-tree node that is used to initialize the value of a new node, which is required when physiological operations are used. A subsequent operation then removes the moved half of the contents from the old node. The cache manager must be sure to enforce that the new B-tree node is written before the old node is over-written by the operation that removes the moved contents and completes the split. The write graph of Figure 8 illustrates exactly this. Operation P has the form of the operation that reads the old page x and writes the new page y, while operation Q has the form of the operation writing the old page x to remove the half of its contents that were moved to new page y. This results in the write graph edge from the node for P to the collapsed node for O and Q. This edge provides for exactly the enforcement of installation graph order of a non-trivial sort by the cache manager. This requirement for careful write order flows directly from our recovery theory. 7. DISCUSSION AND DIRECTIONS Our theory allows us to talk about the system state and accumulated changes to the state, and to talk about installing some of these changes and recovering the remaining changes after a crash. With this, we can precisely state and prove the contract that must exist between normal execution when state is updated and recovery when operations are replayed. This contract is our recovery invariant. We have shown how most common recoverable systems are describable in our model and how they enforce this recovery invariant. Using this understanding, one future direction is to define new classes of logged operations having recovery methods with potential advantages over current methods, especially when extending recovery to new areas [10, 13]. Another direction is to make explicit more of the detail and structure present in real database systems. One example is the existence

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Matching Theory. Figure 1: Is this graph bipartite?

Matching Theory. Figure 1: Is this graph bipartite? Matching Theory 1 Introduction A matching M of a graph is a subset of E such that no two edges in M share a vertex; edges which have this property are called independent edges. A matching M is said to

More information

Handout 9: Imperative Programs and State

Handout 9: Imperative Programs and State 06-02552 Princ. of Progr. Languages (and Extended ) The University of Birmingham Spring Semester 2016-17 School of Computer Science c Uday Reddy2016-17 Handout 9: Imperative Programs and State Imperative

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous

More information

Consistency and Set Intersection

Consistency and Set Intersection Consistency and Set Intersection Yuanlin Zhang and Roland H.C. Yap National University of Singapore 3 Science Drive 2, Singapore {zhangyl,ryap}@comp.nus.edu.sg Abstract We propose a new framework to study

More information

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES Introduction Storage Failure Recovery Logging Undo Logging Redo Logging ARIES Volatile storage Main memory Cache memory Nonvolatile storage Stable storage Online (e.g. hard disk, solid state disk) Transaction

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Repetition Through Recursion

Repetition Through Recursion Fundamentals of Computer Science I (CS151.02 2007S) Repetition Through Recursion Summary: In many algorithms, you want to do things again and again and again. For example, you might want to do something

More information

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1

Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) January 11, 2018 Lecture 2 - Graph Theory Fundamentals - Reachability and Exploration 1 In this lecture

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

AXIOMS FOR THE INTEGERS

AXIOMS FOR THE INTEGERS AXIOMS FOR THE INTEGERS BRIAN OSSERMAN We describe the set of axioms for the integers which we will use in the class. The axioms are almost the same as what is presented in Appendix A of the textbook,

More information

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H.

Abstract. A graph G is perfect if for every induced subgraph H of G, the chromatic number of H is equal to the size of the largest clique of H. Abstract We discuss a class of graphs called perfect graphs. After defining them and getting intuition with a few simple examples (and one less simple example), we present a proof of the Weak Perfect Graph

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

(Refer Slide Time 6:48)

(Refer Slide Time 6:48) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 8 Karnaugh Map Minimization using Maxterms We have been taking about

More information

Principles of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic

Principles of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic Principles of AI Planning June 8th, 2010 7. Planning as search: relaxed planning tasks Principles of AI Planning 7. Planning as search: relaxed planning tasks Malte Helmert and Bernhard Nebel 7.1 How to

More information

Lecture Notes on Contracts

Lecture Notes on Contracts Lecture Notes on Contracts 15-122: Principles of Imperative Computation Frank Pfenning Lecture 2 August 30, 2012 1 Introduction For an overview the course goals and the mechanics and schedule of the course,

More information

Lecture 1. 1 Notation

Lecture 1. 1 Notation Lecture 1 (The material on mathematical logic is covered in the textbook starting with Chapter 5; however, for the first few lectures, I will be providing some required background topics and will not be

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Leveraging Transitive Relations for Crowdsourced Joins*

Leveraging Transitive Relations for Crowdsourced Joins* Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,

More information

A Reduction of Conway s Thrackle Conjecture

A Reduction of Conway s Thrackle Conjecture A Reduction of Conway s Thrackle Conjecture Wei Li, Karen Daniels, and Konstantin Rybnikov Department of Computer Science and Department of Mathematical Sciences University of Massachusetts, Lowell 01854

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

CPSC 536N: Randomized Algorithms Term 2. Lecture 10 CPSC 536N: Randomized Algorithms 011-1 Term Prof. Nick Harvey Lecture 10 University of British Columbia In the first lecture we discussed the Max Cut problem, which is NP-complete, and we presented a very

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p.

CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections p. CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer Science (Arkoudas and Musser) Sections 10.1-10.3 p. 1/106 CSCI.6962/4962 Software Verification Fundamental Proof Methods in Computer

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 10 Lecture No. # 16 Machine-Independent Optimizations Welcome to the

More information

(Refer Slide Time: 0:19)

(Refer Slide Time: 0:19) Theory of Computation. Professor somenath Biswas. Department of Computer Science & Engineering. Indian Institute of Technology, Kanpur. Lecture-15. Decision Problems for Regular Languages. (Refer Slide

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Recursively Enumerable Languages, Turing Machines, and Decidability

Recursively Enumerable Languages, Turing Machines, and Decidability Recursively Enumerable Languages, Turing Machines, and Decidability 1 Problem Reduction: Basic Concepts and Analogies The concept of problem reduction is simple at a high level. You simply take an algorithm

More information

CPS352 Lecture - The Transaction Concept

CPS352 Lecture - The Transaction Concept Objectives: CPS352 Lecture - The Transaction Concept Last Revised March 3, 2017 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state

More information

Concurrency Control / Serializability Theory

Concurrency Control / Serializability Theory Concurrency Control / Serializability Theory Correctness of a program can be characterized by invariants Invariants over state Linked list: For all items I where I.prev!= NULL: I.prev.next == I For all

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

6.001 Notes: Section 4.1

6.001 Notes: Section 4.1 6.001 Notes: Section 4.1 Slide 4.1.1 In this lecture, we are going to take a careful look at the kinds of procedures we can build. We will first go back to look very carefully at the substitution model,

More information

3.7 Denotational Semantics

3.7 Denotational Semantics 3.7 Denotational Semantics Denotational semantics, also known as fixed-point semantics, associates to each programming language construct a well-defined and rigorously understood mathematical object. These

More information

CS352 Lecture - The Transaction Concept

CS352 Lecture - The Transaction Concept CS352 Lecture - The Transaction Concept Last Revised 11/7/06 Objectives: 1. To introduce the notion of a transaction and the ACID properties of a transaction 2. To introduce the notion of the state of

More information

MA651 Topology. Lecture 4. Topological spaces 2

MA651 Topology. Lecture 4. Topological spaces 2 MA651 Topology. Lecture 4. Topological spaces 2 This text is based on the following books: Linear Algebra and Analysis by Marc Zamansky Topology by James Dugundgji Fundamental concepts of topology by Peter

More information

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions.

These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions. CSE 591 HW Sketch Sample Solutions These are not polished as solutions, but ought to give a correct idea of solutions that work. Note that most problems have multiple good solutions. Problem 1 (a) Any

More information

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting CS2 Algorithms and Data Structures Note 10 Depth-First Search and Topological Sorting In this lecture, we will analyse the running time of DFS and discuss a few applications. 10.1 A recursive implementation

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 8 Consistency and Redundancy in Project networks In today s lecture

More information

An Evolution of Mathematical Tools

An Evolution of Mathematical Tools An Evolution of Mathematical Tools From Conceptualization to Formalization Here's what we do when we build a formal model (or do a computation): 0. Identify a collection of objects/events in the real world.

More information

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Philip A. Bernstein Microsoft Research Redmond, WA, USA phil.bernstein@microsoft.com Sudipto Das Microsoft Research

More information

ACONCURRENT system may be viewed as a collection of

ACONCURRENT system may be viewed as a collection of 252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 10, NO. 3, MARCH 1999 Constructing a Reliable Test&Set Bit Frank Stomp and Gadi Taubenfeld AbstractÐThe problem of computing with faulty

More information

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery

The transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery Failures in complex systems propagate Concurrency Control, Locking, and Recovery COS 418: Distributed Systems Lecture 17 Say one bit in a DRAM fails: flips a bit in a kernel memory write causes a kernel

More information

CS122 Lecture 19 Winter Term,

CS122 Lecture 19 Winter Term, CS122 Lecture 19 Winter Term, 2014-2015 2 Dirty Page Table: Last Time: ARIES Every log entry has a Log Sequence Number (LSN) associated with it Every data page records the LSN of the most recent operation

More information

4.1 Review - the DPLL procedure

4.1 Review - the DPLL procedure Applied Logic Lecture 4: Efficient SAT solving CS 4860 Spring 2009 Thursday, January 29, 2009 The main purpose of these notes is to help me organize the material that I used to teach today s lecture. They

More information

Bases of topologies. 1 Motivation

Bases of topologies. 1 Motivation Bases of topologies 1 Motivation In the previous section we saw some examples of topologies. We described each of them by explicitly specifying all of the open sets in each one. This is not be a feasible

More information

Computer Science Technical Report

Computer Science Technical Report Computer Science Technical Report Feasibility of Stepwise Addition of Multitolerance to High Atomicity Programs Ali Ebnenasir and Sandeep S. Kulkarni Michigan Technological University Computer Science

More information

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS 1 THE FORMALIZATION OF MATHEMATICS by Harvey M. Friedman Ohio State University Department of Mathematics friedman@math.ohio-state.edu www.math.ohio-state.edu/~friedman/ May 21, 1997 Can mathematics be

More information

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017 CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Topology and Topological Spaces

Topology and Topological Spaces Topology and Topological Spaces Mathematical spaces such as vector spaces, normed vector spaces (Banach spaces), and metric spaces are generalizations of ideas that are familiar in R or in R n. For example,

More information

Programming and Data Structure

Programming and Data Structure Programming and Data Structure Dr. P.P.Chakraborty Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture # 09 Problem Decomposition by Recursion - II We will

More information

Reasoning About Programs Panagiotis Manolios

Reasoning About Programs Panagiotis Manolios Reasoning About Programs Panagiotis Manolios Northeastern University March 22, 2012 Version: 58 Copyright c 2012 by Panagiotis Manolios All rights reserved. We hereby grant permission for this publication

More information

An Eternal Domination Problem in Grids

An Eternal Domination Problem in Grids Theory and Applications of Graphs Volume Issue 1 Article 2 2017 An Eternal Domination Problem in Grids William Klostermeyer University of North Florida, klostermeyer@hotmail.com Margaret-Ellen Messinger

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Spanning tree congestion critical graphs

Spanning tree congestion critical graphs Spanning tree congestion critical graphs Daniel Tanner August 24, 2007 Abstract The linear or cyclic cutwidth of a graph G is the minimum congestion when G is embedded into either a path or a cycle respectively.

More information

System Malfunctions. Implementing Atomicity and Durability. Failures: Crash. Failures: Abort. Log. Failures: Media

System Malfunctions. Implementing Atomicity and Durability. Failures: Crash. Failures: Abort. Log. Failures: Media System Malfunctions Implementing Atomicity and Durability Chapter 22 Transaction processing systems have to maintain correctness in spite of malfunctions Crash Abort Media Failure 1 2 Failures: Crash Processor

More information

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}.

Greedy Algorithms 1 {K(S) K(S) C} For large values of d, brute force search is not feasible because there are 2 d {1,..., d}. Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees. Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path. Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

More information

Lecture Notes on Induction and Recursion

Lecture Notes on Induction and Recursion Lecture Notes on Induction and Recursion 15-317: Constructive Logic Frank Pfenning Lecture 7 September 19, 2017 1 Introduction At this point in the course we have developed a good formal understanding

More information

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi. Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 18 Tries Today we are going to be talking about another data

More information

Lecture Notes on Memory Layout

Lecture Notes on Memory Layout Lecture Notes on Memory Layout 15-122: Principles of Imperative Computation Frank Pfenning André Platzer Lecture 11 1 Introduction In order to understand how programs work, we can consider the functions,

More information

Computing Full Disjunctions

Computing Full Disjunctions Computing Full Disjunctions (Extended Abstract) Yaron Kanza School of Computer Science and Engineering The Hebrew University of Jerusalem yarok@cs.huji.ac.il Yehoshua Sagiv School of Computer Science and

More information

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute

Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Functional Programming in Haskell Prof. Madhavan Mukund and S. P. Suresh Chennai Mathematical Institute Module # 02 Lecture - 03 Characters and Strings So, let us turn our attention to a data type we have

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Representation of Finite Games as Network Congestion Games

Representation of Finite Games as Network Congestion Games Representation of Finite Games as Network Congestion Games Igal Milchtaich To cite this version: Igal Milchtaich. Representation of Finite Games as Network Congestion Games. Roberto Cominetti and Sylvain

More information

Time and Space Lower Bounds for Implementations Using k-cas

Time and Space Lower Bounds for Implementations Using k-cas Time and Space Lower Bounds for Implementations Using k-cas Hagit Attiya Danny Hendler September 12, 2006 Abstract This paper presents lower bounds on the time- and space-complexity of implementations

More information

Propositional Logic. Part I

Propositional Logic. Part I Part I Propositional Logic 1 Classical Logic and the Material Conditional 1.1 Introduction 1.1.1 The first purpose of this chapter is to review classical propositional logic, including semantic tableaux.

More information

Lemma. Let G be a graph and e an edge of G. Then e is a bridge of G if and only if e does not lie in any cycle of G.

Lemma. Let G be a graph and e an edge of G. Then e is a bridge of G if and only if e does not lie in any cycle of G. Lemma. Let G be a graph and e an edge of G. Then e is a bridge of G if and only if e does not lie in any cycle of G. Lemma. If e = xy is a bridge of the connected graph G, then G e consists of exactly

More information

5 MST and Greedy Algorithms

5 MST and Greedy Algorithms 5 MST and Greedy Algorithms One of the traditional and practically motivated problems of discrete optimization asks for a minimal interconnection of a given set of terminals (meaning that every pair will

More information

Mathematically Rigorous Software Design Review of mathematical prerequisites

Mathematically Rigorous Software Design Review of mathematical prerequisites Mathematically Rigorous Software Design 2002 September 27 Part 1: Boolean algebra 1. Define the Boolean functions and, or, not, implication ( ), equivalence ( ) and equals (=) by truth tables. 2. In an

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 33 Overloading Operator for User - Defined Types: Part 1 Welcome

More information

Chapter 17: Recovery System

Chapter 17: Recovery System Chapter 17: Recovery System! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management! Failure with

More information

Failure Classification. Chapter 17: Recovery System. Recovery Algorithms. Storage Structure

Failure Classification. Chapter 17: Recovery System. Recovery Algorithms. Storage Structure Chapter 17: Recovery System Failure Classification! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management!

More information

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs.

Advanced Combinatorial Optimization September 17, Lecture 3. Sketch some results regarding ear-decompositions and factor-critical graphs. 18.438 Advanced Combinatorial Optimization September 17, 2009 Lecturer: Michel X. Goemans Lecture 3 Scribe: Aleksander Madry ( Based on notes by Robert Kleinberg and Dan Stratila.) In this lecture, we

More information

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube

Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Maximal Monochromatic Geodesics in an Antipodal Coloring of Hypercube Kavish Gandhi April 4, 2015 Abstract A geodesic in the hypercube is the shortest possible path between two vertices. Leader and Long

More information

Lecture Notes on Binary Decision Diagrams

Lecture Notes on Binary Decision Diagrams Lecture Notes on Binary Decision Diagrams 15-122: Principles of Imperative Computation William Lovas Notes by Frank Pfenning Lecture 25 April 21, 2011 1 Introduction In this lecture we revisit the important

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

hal , version 1-11 May 2006 ccsd , version 1-11 May 2006

hal , version 1-11 May 2006 ccsd , version 1-11 May 2006 Author manuscript, published in "Journal of Combinatorial Theory Series A 114, 5 (2007) 931-956" BIJECTIVE COUNTING OF KREWERAS WALKS AND LOOPLESS TRIANGULATIONS OLIVIER BERNARDI ccsd-00068433, version

More information

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop AXIOMS OF AN IMPERATIVE LANGUAGE We will use the same language, with the same abstract syntax that we used for operational semantics. However, we will only be concerned with the commands, since the language

More information

5 MST and Greedy Algorithms

5 MST and Greedy Algorithms 5 MST and Greedy Algorithms One of the traditional and practically motivated problems of discrete optimization asks for a minimal interconnection of a given set of terminals (meaning that every pair will

More information

An Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees

An Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees An Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees Wayne Goddard, Stephen T. Hedetniemi Department of Computer Science, Clemson University {goddard,hedet}@cs.clemson.edu Zhengnan Shi

More information

FOUR EDGE-INDEPENDENT SPANNING TREES 1

FOUR EDGE-INDEPENDENT SPANNING TREES 1 FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

On the Adequacy of Program Dependence Graphs for Representing Programs

On the Adequacy of Program Dependence Graphs for Representing Programs - 1 - On the Adequacy of Program Dependence Graphs for Representing Programs Susan Horwitz, Jan Prins, and Thomas Reps University of Wisconsin Madison Abstract Program dependence graphs were introduced

More information

MATH 682 Notes Combinatorics and Graph Theory II

MATH 682 Notes Combinatorics and Graph Theory II 1 Matchings A popular question to be asked on graphs, if graphs represent some sort of compatability or association, is how to associate as many vertices as possible into well-matched pairs. It is to this

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Network Topology and Equilibrium Existence in Weighted Network Congestion Games

Network Topology and Equilibrium Existence in Weighted Network Congestion Games Network Topology and Equilibrium Existence in Weighted Network Congestion Games Igal Milchtaich, Bar-Ilan University August 2010 Abstract. Every finite noncooperative game can be presented as a weighted

More information

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models.

These notes present some properties of chordal graphs, a set of undirected graphs that are important for undirected graphical models. Undirected Graphical Models: Chordal Graphs, Decomposable Graphs, Junction Trees, and Factorizations Peter Bartlett. October 2003. These notes present some properties of chordal graphs, a set of undirected

More information

Notes for Recitation 8

Notes for Recitation 8 6.04/8.06J Mathematics for Computer Science October 5, 00 Tom Leighton and Marten van Dijk Notes for Recitation 8 Build-up error Recall a graph is connected iff there is a path between every pair of its

More information

11.9 Connectivity Connected Components. mcs 2015/5/18 1:43 page 419 #427

11.9 Connectivity Connected Components. mcs 2015/5/18 1:43 page 419 #427 mcs 2015/5/18 1:43 page 419 #427 11.9 Connectivity Definition 11.9.1. Two vertices are connected in a graph when there is a path that begins at one and ends at the other. By convention, every vertex is

More information

Chapter S:V. V. Formal Properties of A*

Chapter S:V. V. Formal Properties of A* Chapter S:V V. Formal Properties of A* Properties of Search Space Graphs Auxiliary Concepts Roadmap Completeness of A* Admissibility of A* Efficiency of A* Monotone Heuristic Functions S:V-1 Formal Properties

More information

arxiv:submit/ [math.co] 9 May 2011

arxiv:submit/ [math.co] 9 May 2011 arxiv:submit/0243374 [math.co] 9 May 2011 Connectivity and tree structure in finite graphs J. Carmesin R. Diestel F. Hundertmark M. Stein 6 May, 2011 Abstract We prove that, for every integer k 0, every

More information

CS103 Spring 2018 Mathematical Vocabulary

CS103 Spring 2018 Mathematical Vocabulary CS103 Spring 2018 Mathematical Vocabulary You keep using that word. I do not think it means what you think it means. - Inigo Montoya, from The Princess Bride Consider the humble while loop in most programming

More information

On Resolution Proofs for Combinational Equivalence Checking

On Resolution Proofs for Combinational Equivalence Checking On Resolution Proofs for Combinational Equivalence Checking Satrajit Chatterjee Alan Mishchenko Robert Brayton Department of EECS U. C. Berkeley {satrajit, alanmi, brayton}@eecs.berkeley.edu Andreas Kuehlmann

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

STORING IMAGES WITH FRACTAL IMAGE COMPRESSION

STORING IMAGES WITH FRACTAL IMAGE COMPRESSION STORING IMAGES WITH FRACTAL IMAGE COMPRESSION TYLER MCMILLEN 1. Introduction and Motivation One of the obvious limitations to the storage of images in a form a computer can read is storage space, memory.

More information