Proving the Correctness of Distributed Algorithms using TLA

Proving the Correctness of Distributed Algorithms using TLA Khushboo Kanjani, khush@cs.tamu.edu, Texas A & M University 11 May 2007 Abstract This work is a summary of the Temporal Logic of Actions(TLA) proposed by Leslie Lamport as a language for specifying and verifying concurrent systems. Dijkstra s self-stabilizing mutual exclusion algorithm is discussed to demonstrate the use of TLA. 1 Introduction Formal methods are mathematically-based techniques for the specification, development and verification of software and hardware systems. Formal verification is the act of proving the correctness of algorithms with respect to a property, using formal methods of mathematics. There are two approaches to formal verification as defined in [1]: Model Checking: This is a technique that relies on building a finite model of a system and checking that a desired property holds in that model.the check is performed as an exhaustive state space search that is guaranteed to terminate since the model is finite. Theorem Proving: Theorem proving is the process of finding a proof of a property from the axioms of the system. It is a technique in which the behavior of the system and its desired properties are expressed as formulas in some mathematical logic. The temporal logic of actions(tla) is one such logic aiming at proving correctness of multiprocess programs. 1

The properties which define the correctness of a program are often described in temporal logic. The following is a brief overview of the kinds of logic: 1.1 Logic Binary Logic has two boolean values True and False. Propositional Logic adds the following operators to the binary logic. conjunction(and) disjunction(or) negation(not) implication(implies) equivalence. First-Order(Predicate) Logic extends propositional logic with two quantifiers: existential quantification(there exists) universal quantification (for all) Temporal Logic quantifies in terms of time and has the following two operators: - now or sometime in future - now and forever Time is viewed as a sequence of states in temporal logic. The Temporal Logic of Actions(TLA) is a combination of two logics : logic of actions and the standard temporal logic. In TLA, the program and its properties are written in the same language. The behavior of the program is written as a temporal formula σ. To prove that the program satisfies a property P, it is sufficient to prove that σ => P. 1.2 Related Work The other formal methods based on temporal logic are Unity Logic [4], the logic of Manna, Pnueli [11] and Process Algebra by Hoare [2], Milner[9]. Unity logic is based on assertions of the form {p}s{q}, which denotes that the execution of statement s in any state satisfying predicate p results in a state satisfying predicate q. Properties of a program are expressed in terms of the basic operators unless, invariant, ensures and (leads-to).

The language of temporal logic defined by Manna, Pnueli [11] is built from a state language used to construct state formulas, and a set of logical and temporal operators. By applying the logical and temporal operators to the state formulas, they construct general temporal formulas. Process algebra provides a tool for the high-level description of interactions, communications, and synchronizations between a collection of independent processes. Some examples of this are Hoare s Communicating Sequential Processes(CSP)[2] and Milner s Calculus of Communicating Systems(CCS) [9]. 2 Definitions This section defines all the definitions used in the logic. The semantic meaning of every object T in the logic in denoted by [[T]]. The semantic meaning of state functions, predicates, actions etc. are stated in Figure 1 from [6] in the appendix. 1. Values, Variables and States: A set Val of all possible values of variables is assumed. It includes sets like the set Nat of natural numbers. The booleans true and false do not belong to this set Val. The set Var is an infinite set of all variable names. A state is a mapping from the set Var to the set Val. A state s assigns a value s[x] to a variable x. St is the collection of all possible states. 2. State Functions: A non boolean expression built from variables and constants. For example: z=x+y+3. 3. State Predicate: It is a boolean expression built from variables and constant symbols. For example x + y = 1 and x, y Nat 4. Actions: An action represents an atomic operation in a concurrent program. It is a relation between unprimed variables(referring to old state) and primed variables(referring to the new state after the action is executed). For example : y =x+y+1. s[[a]]t is true if executing the A operation in state s produces state t.

5. Validity: The formal definition of validity of an action A, denoted as = A is: = A s, t St : s[[a]]t 6. Rigid Variables: A variable whose value does not change in the execution of the program is termed as a rigid variable. 7. Enabled Predicate: For any action A, Enabled A is defined as follows: s[[enableda]] t St : s[[a]]t 8. Unchanged Action: An action Unchanged f, for a state function f is defined as a step in which the value of f does not change. Formally : Unchanged f f = f 3 TLA In TLA, specification of the system and the desired properties are stated by TLA formulas. A TLA formula is true or false on a behavior, which is a sequence of states, where a state is an assignment of values to variables. 3.1 Specification A specification is a formal description of the desired behavior of a program. The approach to define it can be divided into two steps: State the variables that define the system s state. State the granularity of the steps that change those variables values.

P1: while true do if x 1 = x n then x 1 := (x 1 + 1)mod(n + 1) end end P i (i 1) : while true do if x i x i 1 then x i := x i 1 end end Algorithm 1: Dijkstra self-stabilizing algorithm for ME Example: Here we give a TLA specification of the famous Dijkstra s selfstabilizing algorithm for mutual exclusion in a ring described in Algorithm 1. The notations used here are explained in Figure 1. Equation 1 describes the initial condition of the variables. Equation 2 states that i [0, N]i 1 if the value of x i is not equal to that of its left neighbor, it is assigned that value when process P i is activated. For P 1, equation 3 states that the value of x 1 is incremented if its value is equal to x n. In equation 4, w defines the state function of all the variables in the program. These TLA formulas C 1, C 2,...C n describe the behavior of the processes P 1, P 2,...P n respectively. All possible executions of the program satisfy the temporal formula defined in equation 6. Init φ i n, 0 x i n (1) i [0, N]i 1, C i (x i x i 1 ) (x i = x i 1 ) Unchanged < AllBut(x i ) > (2) C 1 (x 1 = x n ) (x 1 = (x 1 + 1)mod(n + 1)) Unchanged < AllBut(x 1 ) > (3) w =< x 1, x 2,..., x n > (4) C C 1 C 2... C n (5) φ Init φ [C] w (6)

3.2 Safety Properties Safety properties assert that something bad never happens. For example, for the problem of mutual exclusion, the safety property is that at most one processor is in the critical section. For the self-stabilizing Algorithm 1, mutual exclusion will be guaranteed if only one processor is allowed to change its value. In other words, only one of C 1, C 2,...C n is enabled. Safety properties are usually described as invariance properties with TLA formulas of the form P where P is predicate. These invariance properties are proved with rule INV1 of Figure 1. 3.3 Fairness Properties Weak fairness asserts that eventually the action is either executed or become impossible to execute- maybe only briefly. Strong fairness rules out that last condition. It means that either the action is eventually executed, or its execution is eventually always impossible. For an action A and state function f, weak fairness (WF) and strong fairness(sf) are expressed as follows: W F f (A) = ( A f ) ( Enabled A f ) (7) SF f (A) = ( A f ) ( Enabled A f ) (8) For the algorithm 1, starting with a random initial configuration, the program eventually reaches a safe configuration where only one processor changes its value. The program guarantees W F C w. 4 Verification of the Byzantine Generals algorithm In [8], the one-traitor oral-message solution to the Byzantine Generals problem is verified using TLA. The specification is divided into three levels and a hierarchical proof is presented. The high-level specification defines the problem statement. The mid-level specification captures the oral-message solution to the problem that works in the presence of at most of one traitor. The underlying communication is ignored. The low-level specification models the way values are transmitted over communication channels. All these three level specifications are long. So cannot be included here.

5 Developments TLA+[7] provides a language for specifying TLA specifications. It can be used for a wide class of systems - from program interfaces(api) to distributed systems. It is an extension to TLA and it contains operators for defining and manipulating data structures and syntactic structures for handling large specifications. The syntax for expressions in TLA+ aims to capture some of the richness of ordinary mathematical notation. But a precise specification in TLA+ gets very long and complicated. TLA+ is good for software and hardware engineers and of little use to researchers concentrating on design of algorithms. 6 Comments TLA is good as a formal method for verifying systems but I feel that it is not good for proving the correctness of distributed algorithms. The designer of the algorithm has an intuition of why the algorithm is correct. TLA only gives a language to specify the behavior of the program. If the behavior is specified correctly, the safety and liveness proofs are direct conclusions by applying the TLA rules. Capturing the complete behavior of the algorithm can get long and complicated. I believe informal proofs give a better insight of the correctness of the algorithm. Some points to be noted about TLA : Booleans are distinct from values of any variable and so state predicates are different from state functions. The variables in TLA have no types. Type-correctness is a provable property and not a syntactic requirement for specifying programs in TLA. A specification of a multiprocess program can be decomposed as conjunction of its processes. The rules stated in Figure 2 as described in [6] form a complete proof system for reasoning programs in TLA. There is no distinction between a program and a property in TLA.

References [1] E.M. Clarke and J.M. Wing. Formal methods: State of the art and future directions. ACM Computing Surveys, 1996. [2] C.A.R. Hoare. Communicating Sequential Processes. Prentice-Hall International,London, 1985. [3] Rajeev Joshi, Leslie Lamport, John Matthews, Serdar Tasiran, Mark Tuttle, and Yuan Yu. Checking cache-cohorence protocols with tla+. Formal Methods in System Design, 2003. [4] Chandy K.M. and Misra. Parallel Program Design. Addison-Wesley, 1988. [5] Leslie Lamport. Proving the correctness of multiprocess programs. IEEE Transactions on Software Engineering, 1977. [6] Leslie Lamport. The temporal logic of actions. ACM Transactions on Programming Languages and Systems, pages 1 52, 1993. [7] Leslie Lamport. Specifying Systems:The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, 2003. [8] Leslie Lamport and Stephan Merz. Specifying and verifying faulttolerant systems. International Symposium on Formal Techniques in Real and Fault Tolerant Systems, 1994. [9] Robin Milner. A complete inference system for a class of regular behaviors. Journal of Computer and System Sciences, 28:439 466, 1984. [10] Joao Luis Sobrinho. An algebraic theory of dynamic network routing. ACM Transcations on Networking, 2004. [11] Manna Z. and Pnuelli A. The temporal logic and reactive and concurrent systems. Springer-Verlag, New York, 1991.

Figure 1: Syntax of TLA

Figure 2: Proof Rules of TLA Khushboo Kanjani

Figure 3: Quantification in TLA