Efficient Byzantine Broadcast in Wireless Ad-Hoc Networks

Size: px

Start display at page:

Download "Efficient Byzantine Broadcast in Wireless Ad-Hoc Networks"

Bertha Lawrence
6 years ago
Views:

1 Efficient Byzantine Broadcast in Wireless Ad-Hoc Networks Vadim Drabkin Roy Friedman Marc Segal Computer Science Department Technion - Israel Institute of Technology Haifa, 32 Israel {dvadim,roy,marcs}@cs.technion.ac.il Abstract This paper presents an overlay based Byzantine tolerant broadcast protocol for wireless ad-hoc networks. The use of an overlay results in a significant reduction in the number of messages. The protocol overcomes Byzantine failures by combining digital signatures, gossiping of message signatures, and failure detectors. These ensure that messages dropped or modified by Byzantine nodes will be detected and retransmitted and that the overlay will eventually consist of enough correct processes to enable message dissemination. An appealing property of the protocol is that it only requires the existence of one correct node in each one-hop neighborhood. The paper also includes a detailed performance evaluation by simulation. Keywords: Byzantine failures, broadcast, ad-hoc networks, unreliable failure detectors. Revised version of technical report TR This research was supported by the Israeli Science Foundation and by an academic grant from Intel Inc.

2 1 Introduction Technion - Computer Science Department - Technical Report CS Context of this Study: Wireless ad-hoc networks are formed when an ad-hoc collection of devices equipped with wireless communication capabilities happen to be in proximity to each other [46]. Clearly, each pair of such devices whose distance is less than their transmission range can communicate directly with each other. Moreover, if some devices occasionally volunteer to act as forwarders, it is possible to form a multiple hop ad-hoc network. An important distinguishing element of these networks from standard networks is that they do not rely on any pre-existing infrastructure or management authority. Also, due to their ad-hoc nature and device mobility, there is no sub-netting to assist routing and data dissemination decisions. Moreover, due to mobility, the physical structure of the network is constantly evolving. Semi-reliable broadcast is a basic service for many collaborative applications as it provides nearly reliable dissemination of the same information to many recipients. It ensures that most messages will be received by most of their intended recipients. Yet, implementing semi-reliable broadcast in an efficient manner, and in particular over a wireless ad-hoc network, is far from trivial. It involves ensuring that a message is forwarded to all nodes as well as overcoming possible message losses. Unlike infrastructure based networks in which routers are usually considered to be trusted entities, in ad-hoc networks routing is performed by the devices themselves. Thus, there is a high risk that some of the nodes of an ad-hoc network will act in a Byzantine manner, or in other words, would not respect the networking protocols. This can be due to maliciousness, or simply selfishness (trying to save battery power). Thus, the possibility of having Byzantine nodes in the system motivates the development of Byzantine tolerant broadcast protocols for ad-hoc networks. The simplest way to obtain broadcast in a multiple hop network is by employing flooding [45]. That is, the sender sends the message to everyone in its transmission range. Each device that receives a message for the first time delivers it to the application and also forwards it to all other devices in its range. While this form of dissemination is very robust, it is also very wasteful and may cause a large number of collisions. Hence, many multicast/broadcast protocols maintain an overlay, which can be thought of as a logical topology superimposed over the physical one, e.g., [25, 4, 47, 48]. The overlay typically covers all nodes, yet each node has a limited number of neighbors. Given an overlay, broadcast messages are flooded only along the arcs of the overlay, thereby reducing the number of messages sent as well as the number of collisions. The overlay composition and structure may be determined by either deterministic or probabilistic methods, and they can change dynamically over time. On the other hand, having an efficient overlay reduces the robustness of the broadcast protocol against failures, and in particular against Byzantine behavior of overlay nodes. One way around this is to maintain f + 1 node independent overlays, where f is the assumed maximal number of Byzantine devices, and flood each message along each of these overlays, guaranteeing that each message will eventually arrive despite possible Byzantine nodes [15, 34, 36]. Of course, the price paid by this approach is that every message has to be sent f + 1 times even if in practice none of the devices suffered from a Byzantine fault. In this paper we propose an approach that reduces this overhead to a single overlay when there are no Byzantine failures. Contribution of this Work: This paper presents an efficient Byzantine tolerant broadcast protocol for wireless ad-hoc networks. The protocol is based on the following principles: The protocol employs an overlay on which messages are disseminated. In parallel, signatures about these messages are being gossiped by all nodes in the system in an unstructured manner. This allows all nodes to learn about the existence of a message even if some of the overlay nodes fail to forward them, e.g., if they are Byzantine or due to 1

3 collisions. When a node learns about a message it is missing, it requests the missing message from another node that has it. The benefit of this approach comes from the fact that message signatures are typically much smaller than the messages themselves. Moreover, as gossips are sent periodically, multiple gossip messages are aggregated into one packet, thereby greatly reducing the number of messages generated by the protocol. Additionally, the protocol employs several failure detectors in order to eliminate from the overlay nodes that act a noticeable Byzantine manner. Specifically, we rely on a mute failure detector, a verbose failure detector, and a trust failure detector. The mute failure detector detects when a process has failed to send a message with an expected header [17, 18]. The verbose failure detector detects when a node sends messages too often. Finally, the trust failure detector reports suspicions of a faulty behavior of nodes based on the other two failure detectors and the history of nodes. 1 An interesting property of the failure detectors we use is that they only detect benign failures, such as a failure to send a message with an expected header, sending too many messages, or trying to forge a signed message. They do not detect, for example, sending messages with inconsistent data, or sending messages with different data to different processes. Thus, their properties can be detected locally and they can be implemented in an eventually synchronous environment, such as the timed-asynchronous model [16], regardless of the ratio between the number of Byzantine processes and the entire set of processes. Interestingly, combining this with signatures on messages is enough to overcome Byzantine failures. An important aspect of our failure detector based approach is its modularity, as they encapsulate timing requirements behind a timeless functional specification. The use of failure detectors greatly simplifies the protocol s structure and enables us to present it with an asynchronous design. This is often considered more elegant and robust than synchronous alternatives, in which timing assumptions are explicit. The result is a protocol that sends a small number of messages when all nodes behave correctly most of the time. The paper also includes a detailed performance evaluation of this protocol carried out by simulation. In these simulations, we investigate the behavior of the protocol both in failure free runs and when some nodes experience mute failures, as these failures seem to have the most adverse impact on the protocol s performance. Paper s road-map: The model and basic definitions and assumptions are described in Section 2. Section 3 describes the protocol and its proof of correctness. The results of the performance evaluation are given in Section 4. Section 5 compares our work with related work. We conclude with a discussion in Section 6. 2 System Model and Definitions In this work we focus on wireless mobile systems. Specifically, we assume a collection of nodes placed in a given finite size area. A node in the system is a device owning an omni-directional antenna that enables wireless communication. A transmission of a node p can be received by all nodes within a disk centered on p whose radius depends on the transmission power, referred to in the following as the transmission disk; the radius of the transmission disk is called the transmission range. The combination of the nodes and the transitive closure of their transmission disks forms a wireless ad-hoc network. 2 1 Notice that standard definitions of failure detectors require some properties to hold forever. However, this can be bounded along the lines of [24]. 2 In practice, the transmission range does not behave exactly as a disk due to various physical phenomena. However, for the description of the protocol it does not matter, and on the other hand, a disk assumption greatly simplifies the formal model. At any event, our simulation results are carried on a simulator that simulates a real transmission range behavior including distortions, background noise, etc. 2

4 We denote the transmission range of device p by r p. This means that a node q can only receive messages sent by p if the distance between p and q is smaller than r p. A node q is a direct neighbor of another node p if q is located within the transmission disk of p. In the following, N(1, p) refers to the set of direct neighbors of a node p and N(k, p) refers to the transitive closure with length k of N(1, p). By considering N(1, p) as a relation (defining the set N(1, p)), we say that a node p has a path to a node q if q appears in the transitive closure of the N(1, p) relation. As nodes can physically move, there is no guarantee that a neighbor q of p at time t will remain in the transmission disk of p at a later time t > t. Additionally, messages can be lost. For example, if two nodes p and q transmit a message at the same time, then if there exists a node r that is a direct neighbor of both, then r will not receive either message, in which case we say that there was a collision. Yet, we assume that a message is delivered with positive probability. Each device p holds a private key k p, known only to itself, with which p can digitally sign every message it sends [44]. It is also assumed that each device can obtain the public key of every other device, and can thus authenticate the sender of any signed message. Finally, we assume an abstract entity called an overlay, which is simply a collection of nodes. Nodes that belong to the overlay are called overlay nodes. Nodes that do not belong to the overlay are called non-overlay nodes. In the following, OVERLAY refers to the set of nodes that belong to the overlay and OL(1, p) N(1, p) OVERLAY (the neighbors of p that belong to the overlay). Later in this paper we give examples of a couple of known overlay maintenance protocols that we adapted to our environment. 2.1 Byzantine Failures Up to f out of the total of n nodes in the system may be Byzantine, meaning that they can arbitrarily deviate from their protocol. In particular, Byzantine processes may fail to send messages, send too many messages, send messages with false information, or send messages with different data to different nodes. We also assume that correct and Byzantine processes are spread such that the transitive closure of the transmission disks of correct nodes form a connected graph (clearly, without this assumption, it is impossible to ensure dissemination of messages to all correct nodes). In Section 3.4 we refine this requirement. Yet, a node cannot impersonate another node, which is achieved using digital signatures [44]. 3 Nodes that follow their protocol are called correct. If a node is correct, then it is presumed to be correct throughout the execution of the protocol. A node p that sends a message m is called the originator of m. We denote sig(m) to be the cryptographic signature of a message m. 2.2 Failure Detectors and Nodes Architecture As already mentioned in the Introduction, we assume that each node is equipped with three types of failure detectors, MUTE, VERBOSE, and TRUST (see also illustration in Figure 1). In this work we assume that each message has a header part and a data part. The header part can be anticipated based on local information only while the data part cannot. For example, the type of a message (application data, gossip, request for retransmission, etc.), the id of the originator, and a sequence number of the message are part of the header. On the other hand, the information that the application level intended to send, or the actual gossiped information, is part of the data. Based on this, we define a mute failure as failure to send a message with an expected header w.r.t. the protocol. Similarly, a verbose failure is sending messages too often w.r.t. the protocol. Note that both 3 In the implementation of our protocol we use the DSA protocol [44]. For brevity, we do not repeat a discussion about the infrastructure required for this, as this is can be found in many papers and text-books on the use of cryptography. 3

5 appl Technion - Computer Science Department - Technical Report CS VERBOSE TRUST overlay MUTE network multicast FD interceptor Figure 1: Node Architecture types of failures can be detected accurately in a synchronous system based on local knowledge only. This is because in synchronous systems each message has a known bounded deadline, so it is possible to tell that a message is missing. Similarly, it is possible to accurately measure the rate of messages received and verify that it is below an agreed upon threshold. Obtaining synchronous communication in ad-hoc networks with standard hardware and operating systems is extremely difficult. On the other hand, observations of communication networks indicate that they tend to behave in a timely manner for large fractions of the time. This is captured by the notion of the class P mute of failure detectors [5, 17, 18, 23]. Such failure detectors are assumed to eventually (i.e., during periods of timely network behavior) detect mute failures accurately. In this eventuality, all nodes that suffer a mute failure are suspected (known as completeness) and only such nodes are suspected (known as accuracy). This approach has the benefit that all synchrony assumptions are encapsulated behind the functional specification of the failure detector (i.e., its ability to eventually detect mute failures in an accurate manner). This also frees protocols that are based on such failure detectors from the implementation details related to timers and timeouts, thus making them both more general and more robust. In a similar manner to P mute, we define P verbose as a class of failure detectors that eventually reliably detect verbose failure. We assume that the failure detector MUTE is in the class P mute while VERBOSE is in the class P verbose. The TRUST failure detector collects the reports of MUTE and VERBOSE, as well as detections of messages with bad signatures and other locally observable deviations from the protocol. In return, TRUST maintains a trust level for each neighboring node. This information is fed into the overlay, as illustrated in Figure 1. As we describe later in the paper, the information obtained from TRUST is used to ensure that there are enough correct nodes in the overlay so that the correct nodes of the overlay form a connected graph and that each correct node is within the transmission disk of an overlay node that does not exhibit detectable Byzantine behavior. Interval Failure Detectors Since the specification of P failure detectors require the accuracy property to hold from some point on forever, they are not practical in a real long running system. Hence, we present a new type of failure detectors called Interval failure detector. We define I mute as the class of failure detectors that detect mute failures that occur during special intervals called mute intervals for the duration of an interval called suspicion interval. Byzantine Broadcast Mechanism 4

6 Formally, the I mute failure detector is defined by two properties: Technion - Computer Science Department - Technical Report CS Interval Strong Accuracy: Non-mute processes are not suspected by any other correct process during a certain interval that we call suspicion free interval. Interval Local Completeness: Every process p that suffers a mute failure with respect to a correct process q during a mute interval is suspected by q during a suspicion interval. In a similar manner to I mute, we define I verbose as a class of failure detectors that detect verbose failure. In Section 3.4 we show that if the failure detector belongs to an interval failure detector class, then during periods of good connectivity messages will be disseminated fast (i.e., via the overlay nodes). 2.3 The Broadcast Problem Intuitively, the broadcast problem states that a message sent by a correct node should usually be delivered to all correct nodes. We capture this by the eventual dissemination and the validity properties. Eventual dissemination specifies the ability of a protocol to disseminate a message to all the nodes in the system. Validity specifies that when a correct node accepts a message, then this message was indeed generated by the claimed originator. Formally, we assume a primitive broadcast(p, m) that can be invoked by a node p in order to disseminate a message m to all other nodes in the system, and a primitive accept(p,q,m) in which a message claimed to be originated by q is accepted at a node p. Eventual dissemination: If a correct node p invokes broadcast(p, ) infinitely often, then eventually every correct node q invokes accept(q,p, ) infinitely often. 4 Validity: If a correct node q invokes accept(p,q,m) and p is correct, then indeed q invoked broadcast(p,m) beforehand. Moreover, for the same message m, a correct node p can only invoke accept(p,q,m) once. 3 The Dissemination Protocol As indicated in the Introduction, our protocol includes three concurrent tasks. First, messages are disseminated over the overlay by the overlay nodes. Second, signatures about sent messages are gossiped among all nodes in the system. This allows all nodes to learn about the existence of messages they did not receive either due to collisions or due to a Byzantine behavior by an overlay node. When a node p discovers that it misses a message following a gossip it heard from q, then p requests the missing message from q as well as from its overlay neighbors. The third and final task is the maintenance of the overlay, whose goal is to ensure that the evolving overlay indeed disseminates messages to all correct nodes. Note that the dissemination and recovery tasks are independent of the overlay maintenance. At any event, for performance reasons, most overlay maintenance messages can be piggybacked on gossip messages. As the protocol and overlay rely on failure detectors, we first describe the interface to these failure detectors in Figure 2 and in Section 3.1. The pseudo-code of the main protocol appears in Figures 3 and 4 and is described in detail in Section 3.2. These figures use two primitives. The primitive broadcast denotes a broadcast of a message with a given TTL value, i.e., it reaches through flooding all nodes in the 4 Clearly, with this property it is possible to implement a reliable delivery mechanism. In order to bound the buffers used by such a mechanism, it is common to use flow control mechanisms. 5

7 MUTE expect(message header,set of nodes,one or all) This method notifies the MUTE failure detector about an expected message. It accepts as parameters the expected message header, the set of nodes that are supposed to send the message, and a one or all indication. The latter parameters indicates if ALL nodes are assumed to send the message or only ONE of them. VERBOSE indict(node id) This method indicts a node with node id for being too verbose It causes the VERBOSE failure detector to increment the suspicion level of node id. TRUST suspect(node id,suspicion reason) This method notifies the TRUST failure detectors that the level of trust of node node id should be reduced based on the provided suspicion reason. Figure 2: Failure Detectors Interface corresponding hop distance from the sender. The primitive lazycast initiates periodic broadcasting of the given message only to the immediate neighbors of the sender. The overlay maintenance is described in Section Interfacing with the Failure Detectors Recall that the goal of the MUTE failure detector is to detect when a process fails to send a message with a header it is supposed to. To notify this failure detector about such messages, its interface includes one method called expect (see Figures 1 and 2). This method accepts as parameters a message header to look for, a set of nodes that are supposed to send this message, and an indication if all of these nodes must send the message or only one of them is enough. Note that the header passed to this method can include wildcards as well as exact values for each of the header s fields. In this paper we do not focus on how such a failure detector is implemented. Intuitively, a simple implementation consists of setting a timeout for each message reported to the failure detector with the expect method. When the timer times out, the corresponding nodes that failed to send anticipated messages are suspected for a certain period of time (see discussion in [17, 18]). The goal of the VERBOSE failure detector is to detect verbose nodes. Such nodes try to overload the system by sending too many messages that may cause other nodes to react with messages of their own, thereby degrading the performance of the system. Detecting such nodes is therefore useful in order to allow nodes to stop reacting to messages from these nodes. Similarly to MUTE, the VERBOSE failure detector also gets hints from the broadcast protocol about what would constitute a verbose fault. For this, the interface of VERBOSE exports one method called indict. This method simply indicts a process that has sent too many messages of a certain type. Practically, we assume that VERBOSE maintains a counter for each node that was listed in any invocation of its method. The counter is incremented on each such event, and after a given threshold, the node is considered to be a suspect. VERBOSE also includes a method that allows to specify general requirements about the minimal spacing between consecutive arrivals of messages of the same type. Such a method is typically invoked at initialization time. As it it is not directly accessed by our protocol s code, we do not 6

8 discuss it any further. In order to recover from mistakes, both the MUTE and the VERBOSE failure detectors employ an aging mechanism. That is, the suspicion counters for each node are periodically decremented. Finally, the TRUST failure detector maintains a trust level for every node known to it. TRUST suspects a node q if q is suspected by either the MUTE failure detector or the VERBOSE failure detector, or if the suspect method has been invoked for q. 3.2 The Main Protocol The Dissemination Task in Detail Dissemination consists of the following steps (described from the point of view of a node p): (1) The originator p of a message m sends m sig(m) to all nodes in N(1, p). The header part of m includes a sequence number and the identifier of the originator. (2) The originator p of m then gossips sig(m) to all nodes in N(1, p). (3) When a node p receives a message m for the first time, p first verifies that sig(m) matches m. If it does, then p accepts m. If the node that sent m is not an originator of m and is not in OL(1, p), p instructs its MUTE failure detector to expect a transmission of m by any of its overlay neighbors. Moreover, if p is also an overlay node, then p forwards m to all nodes in N(1, p). However, if m does not fit sig(m), then m is ignored and the process that sent it is suspected by the TRUST failure detector. (4) If a node p receives a message m it has already received beforehand, then m is ignored Gossiping and Message Recovery in Detail Intuitively, the idea here is that nodes gossip about messages they received (or sent) to all their neighbors. This way, if a node hears a gossip about a message that it has never received, it can explicitly ask the message both from its overlay neighbor and from the node from which it received the gossip. If any of the contacted nodes has the message, it forwards it to the requesting node. Messages can be purged either after a timeout, or by using a stability detection mechanism. In this work, we have chosen to use timeout based purging due to its simplicity. Additionally, there are several mechanisms in place to overcome Byzantine failures (in addition to signatures that detect impersonations). In order to prevent a Byzantine overlay node from blocking the dissemination of a message, searching a missing message can be initiated by limited flooding with TTL=2, which ensures that the recovery request will reach beyond a single Byzantine overlay node. This, in addition to requesting the message from the process that gossiped about its existence. Also, when a node feels that it has received a request for a missing message too often, or that such a request is unjustified, it notifies its VERBOSE failure detector about it. More accurately, the gossiping and message recovery task is composed of the following subtasks: 1. When a node p receives a gossip header(m) for a message m it has already received before, then p gossips header(m) to other nodes in N(1, p). Otherwise, p ignores such gossips. In particular, p only gossips about messages it has already received and does not forward gossips about messages it has not receive yet. This is done in order to make the recovery process more efficient, and in order to help detect mute failures more accurately. 5 5 It is possible to piggyback the first gossip of a message by the sender and by overlay nodes on the actual message. This saves one message and makes the recovery of messages a bit faster, since gossips about messages advance slightly faster this way. For clarity of presentation, we separate these two types of messages in the pseudo-code. 7

9 Upon send(msg) by application do (1 )message := msg id node id msg sig(msg id node id msg); (2 )gossip message := msg id node id sig(msg id node id); (3 )broadcast(message,data,ttl=1); (4 )lazycast(gossip message,gossip,ttl=1); Upon receive(message,data,ttl) sent by p j do (5 ) if (have not received this message before) then (6 ) if (authenticate-signature(message) = TRUE) then (7 ) Accept(p i,p j,message) /* forward it to the application */; (8 ) if (p j / OV ERLAY and p j is not originator of message) then (9 ) /* The correct message was received (but not from the overlay node) */ (1 ) MUTE.expect(message.header,OL(1,current node),any); (11 ) endif; (12 ) if (current node OV ERLAY ) then (13 ) broadcast(message,data,1); (14 ) else /* the message is correct and I am not in the overlay */; (15 ) if (ttl = 2) then (16 ) broadcast(message,data,ttl-1); (17 ) endif; (18 ) endif; (19 ) if (already received a gossip message about message before) then (2 ) lazycast(gossip message,gossip,ttl=1); (21 ) endif; (22 ) else/* the message is not correct */; (23 ) TRUST.suspect(p j, bad signature reason ); /* notify the trust failure detector */ (24 ) endif; (25 )endif; Upon receive(gossip message,gossip,ttl) sent by p j: do (26 )if (authenticate-signature(gossip message) = TRUE) then (27 ) if (there is no message that fits the gossip message) then (28 ) MUTE.expect(gossip message.header,p j,any); (29 ) if (p j is not originator of message that fits the gossip message) then (3 ) /* The node asks from the node that sent the gossip message and from overlay nodes to */ (31 ) /* send the real message */ ; (32 ) broadcast(gossip message,request MSG,ttl=1,p j); (33 ) endif; (34 ) else /* the message that fits the gossip message was received */ ; (35 ) if (gossip message have not been sent yet) then (36 ) lazycast(gossip message,gossip,ttl=1); (37 ) endif; (38 ) endif; (39 )else/* the message is not correct */; (4 ) TRUST.suspect(p j, bad signature reason ); (41 )endif; Figure 3: Byzantine Dissemination Algorithm 8

10 Upon receive(missing message,request MSG,ttl,p k ) sent by p j do (42 )if (authenticate-signature(missing message) = TRUE) then (43 ) if (current node OV ERLAY or current node = p k ) then (44 ) if (message that matches missing message was received) then (45 ) if (current node OV ERLAY ) then (46 ) VERBOSE.indict(p j); (47 ) endif; (48 ) broadcast(message,data,ttl=1,p j); (49 ) else /* the message that fits the gossip message was not received */; (5 ) if (p j is not originator of the message that matches missing message) then (51 ) if (current node OV ERLAY ) then (52 ) broadcast(missing message,find MISSING MSG,2,p k ); (53 ) endif; (54 ) else (55 ) VERBOSE.indict(p j); (56 ) endif; (57 ) endif; (58 ) endif; (59 )else/* the message is not correct */; (6 ) TRUST.suspect(p j, bad signature reason ); (61 )endif; Upon receive(missing message,find MISSING MSG,ttl,p k ) sent by p j do (62 )if (authenticate-signature(missing message) = TRUE) then (63 ) if ( message that matches missing message was not received) then (64 ) if (ttl = 2) then (65 ) broadcast(missing message,find MISSING MSG,ttl-1); (66 ) endif; (67 ) else /*message that matches missing message was received)*/ (68 ) if(current node OV ERLAY or current node = p k ) then (69 ) if (p j N(1, current node)) then (7 ) if(current node OV ERLAY ) then (71 ) VERBOSE.indict(p j); (72 ) endif; (73 ) broadcast(message,data,1); (74 ) else (75 ) broadcast(message,data,2); (76 ) endif; (77 ) endif; (78 ) endif; (79 )else/* the message is not correct */; (8 ) TRUST.suspect(p j, bad signature reason ); (81 )endif; Figure 4: Byzantine Dissemination Algorithm continued 9

11 2. When p receives a gossip header(m) for a message m it has not received, p asks its overlay neighbors and the sender q of the gossip to forward m to it using a REQUEST MSG message. p also instructs its MUTE failure detector to expect a transmission of m by q. Intuitively, since q gossiped about m, it should have m and supply it when needed. If q gossips about messages that do not exist or q does not want to supply them, it will be suspected. 3. When an overlay node p receives a REQUEST MSG for the same message m too many times from the same node q, it causes p s VERBOSE failure detector to suspect q. 4. When an overlay node p receives a REQUEST MSG for the message m, yet p has not received m, then p sends a FIND MISSING MSG message to nodes in OL(2, p) asking them to retransmit m. (The message is sent to overlay nodes at distance 2 in order to bypass a potential neighboring Byzantine node.) Intuitively, if p receives a REQUEST MSG from q for a message m, and p does not have m, then it means that some neighbor r of q has gossiped header(m) to q. Therefore, at least one node in N(1, q) has m. Since real messages are broadcasted by the overlay nodes faster than the gossips on these messages, it means that m is missing and therefore p asks nodes in OL(2, p) to retransmit m. 5. When an overlay node p receives a FIND MISSING MSG message for m from a node q and p has m, then p first broadcast m to q. If q N(1, p), then p notifies its VERBOSE failure detector about it. Intuitively, if q is p s neighbor and p is an overlay node that has m, then p has broadcasted m to its neighbors and therefore q should have m. 6. When a non-overlay node p receives a FIND MISSING MSG message for m (it gossiped about) from a node q, p broadcasts m to q. 3.3 Overlay Maintenance Overlay maintenance is executed by a distributed protocol. There is no global knowledge and each node must decide whether it considers itself an overlay node or not. Thus, the collection of overlay nodes is simply the set of all nodes that consider themselves as such. At the same time, every correct overlay node periodically publishes this fact to its neighbors, so in particular, each overlay node eventually knows about all its correct overlay neighbors. The goal of the protocol is to ensure that indeed the overlay can serve as a good backbone for dissemination of messages. This means that eventually between every pair of correct nodes p and q there will be a path consisting of overlay nodes that do not exhibit externally visible Byzantine behavior. At the same time, for efficiency reasons, the overlay should consist of as few nodes as possible. For scalability and resiliency reasons, we are interested in a self-stabilizing distributed algorithm in which every node decides whether it participates in the overlay based only on the knowledge of its neighbors. Recall that the neighbors of p are the nodes that appear in the transmission disk of p. Thus, p can communicate directly with them and every message p sends is received by all of them. In order to enable nodes to decide locally if they should become overlay nodes, we need some deterministic symmetry breaking rule. In this paper we utilize the overlay maintenance protocols of [21]. The work of [21] defined the goodness number as a generic function that associates each node with a value taken from some ordered domain. The goodness number represents the node s appropriateness to serve in the overlay. This way, it is possible to compare any two nodes using their goodness number and to prefer to elect the one whose value is highest to the overlay. Since in a Byzantine environment nodes can lie about their goodness 1

12 number, this becomes a useless criterion. Thus, we replace the notion of a goodness number with the nodes id (which is unforgeable, by assumption). Each node has a local status, which can be either active or passive; active means that the node is in the overlay whereas passive means that it is not. The local state of each node includes a status (active or passive), and its knowledge of the local states of all its neighbors (based on the last local state they reported to it). Additionally every node p maintains a variable overlay trust for each of its neighbors q, which can be either trusted, untrusted or unknown; untrusted means that the TRUST failure detector of p suspects q, unknown means that the TRUST failure detector of p does not suspect q but another neighbor of p that p trusts reported to p that it suspects q, and trusted means that p has no reason to suspect q. Also, p records for each neighbor the list of its active neighbors. We assume that overlay maintenance messages are signed as well. In order to ensure the appropriateness of the overlay, we need to verify that the overlay includes alternatives to each detected mute or verbose node. Ideally, we would like to eliminate these nodes from the overlay, but as they are Byzantine, they may continue to consider themselves as overlay nodes. Thus, the best we can do is make sure that there is an alternative path in the overlay that does not pass through such nodes, and that correct nodes do not consider mute and verbose nodes as their overlay neighbors. The protocol for deciding if a node should be in the overlay consists of computation steps that are taken periodically and repeatedly by each node. In each computation step, each node makes a local computation about whether it thinks it should be in the overlay or not, and then exchanges its local information with its neighbors. For simplicity, we concentrate below on the local computation steps only. Additionally, if a node p receives a message from its neighbor q in which q reports that it suspects a node r, then p changes r s overlay trust to unknown, unless p already suspects either q or r. This is done because a Byzantine node might be suspected only by some of its neighbors. Therefore, a node that suspects one of its neighbors should notify its other neighbors about this suspicion in order to preserve connectivity of correct nodes in the overlay. Note that a Byzantine node may abuse this and cause its correct neighbors to join the overlay. In other words, a Byzantine node can cause correct nodes to unnecessarily join the overlay, but it cannot destroy the connectivity of the overlay w.r.t. correct nodes. Our goal is to ensure that a node elects itself to the overlay if it has the highest identifier among its trusted neighbors. Below, we mention a couple of overlay maintenance protocols that realize this intuition (by making the goodness number of a node be its identifier). Specifically, we have implemented two overlay maintenance protocols, namely the Connected Dominating Set (CDS) and the Maximal Independent Set with Bridges (MIS+B) of [21], augmented with trust levels (i.e., the overlay trust variable). 6 Since other than adding the trust level, the protocols are the same as in [21], we do not repeat them here. 3.4 Correctness Proof Let us remind the reader that in Section 2.1 we assumed that there are enough correct nodes so that non- Byzantine nodes form a connected graph. With this assumption, we prove the following validity and eventual dissemination properties. We present proofs only for MUTE failure detector, since the proofs for TRUST and VERBOSE failure detectors are trivial. For this, we first introduce a few definitions. 1. gossip timeout - the time between two consecutive gossip messages by a correct node. 2. request timeout - the time between receiving a gossip message and sending a request message. 6 The CDS and MIS+B protocols in [21] are in fact self-stabilizing generalizations of the work of [48]. 11

13 3. rebroadcast timeout - the time between getting a request message and sending the message that fits the requested message. Technion - Computer Science Department - Technical Report CS β - transmission time (the latency that takes a message to arrive to the receiver) 5. δ - the number of new messages that are injected to the network every second. 6. max timeout = gossip timeout + request timeout + rebroadcast timeout + 3 beta In the following, a pair of nodes p and q are well connected at time t if both are correct and p N(1, q) and q N(1, p) during the time interval [t, t + max timeout]. We denote this relation by W C(p, q, t). In order to ensure dissemination of messages, we assume the following: starting with some time t, for every t > t, the graph induced by all pairs of well connected nodes at time t is connected, and this graph includes all correct nodes in the network. 7 This can be seen as a refinement of the similar requirement in [15] to mobile ad-hoc networks. Theorem 3.1 The protocol satisfies the validity property. Proof: According to the protocol, the originator of a message m adds a signature sig(m) and then disseminates the message m sig(m) to other nodes. Note that on receiving of m sig(m), every correct node checks if sig(m) corresponds to m before the node accepts m. As a part of the model s basic assumptions, a Byzantine node cannot forge signatures. Therefore, no correct node will accept a message other than m as if it was m. Moreover, according to the protocol, correct nodes filter duplicates of messages they have already received. Theorem 3.2 The protocol satisfies the eventual dissemination property. Proof: We show that a message m that is sent infinitely often by a correct originator p is disseminated to all the correct nodes. Assume, by way of contradiction, that there is a message m that is not received by some correct process. Let k be the smallest number such that there exists a correct node q N(k, p) that does not receive m. Recall that by assumption, during every time interval all the well connected nodes form a connected graph that includes all correct nodes. Therefore there exists a correct node l N(k 1, p) such that the distance between q and l is smaller than r l and l received m. According to the protocol, l will send a gossip about m to its neighbors and if requested by its neighbors, l will also send m. Thus, q will receive m either from its overlay node or from l. This is a contradiction to the assumption about the minimality of k Protocol Analysis In this section we compute a bound on the time required to disseminate a message to all the nodes and present a limit on the size of buffers that every node must maintain to successfully disseminate all the messages. In order to provide a bound on the dissemination time we assume that messages do not collide. 7 One can weaken this requirement by saying that starting with some time t, there are infinite times for which the graph induced by all pairs of well connected nodes is connected, and this graph includes all correct nodes in the network. The price of it will be that the dissemination time of message to all the nodes will grow proportionally to the durations in which this graph is not connected. 12

14 s B B B B Technion - Computer Science Department - Technical Report CS The Dissemination Time: received m by time t. Figure 5: Byzantine overlay In the following, CR(m, t) refers to the number of correct processes that Lemma 3.3 Let m be a message sent by some correct process p. Let t 1 be the time in which the first correct process received m. Let t 2 be the time in which the last correct process received m. During the interval [t 1, t 2 ], for all t 3, t 4 such that t 2 t 4 > t 3 t 1 and (t 4 - t 3 ) max timeout, CR(m,t 4 ) > CR(m, t 3 ). Proof: Assume by way of contradiction that the lemma does not hold. Therefore, there are two neighboring nodes u and q such that q received the message m at time t and u does not receive m by time t + max timeout. Recall our assumption that the graph induced by well connected nodes is connected and includes all correct nodes. After getting the gossip that fits the correct message m (that q received before), q broadcasts the gossip to its neighbors after at most gossip timeout seconds. If its neighbors do not have it, they send a request message after at most request timeout seconds and then after at most rebroadcast timeout seconds q broadcasts the message to its neighbors. Therefore, after at most max timeout seconds the message will be disseminated to u. A contradiction. Theorem 3.4 Let m be a message sent by some correct process p at time t. Then all correct nodes will receive m by time t + max timeout (n 1). Proof: According to Lemma 3.3, every max timeout seconds at least one node (that has not received the message before) will get m. Since the graph of correct nodes is connected and the number of correct nodes is at most n, all correct nodes will receive message m after at most max timeout (n 1) seconds. The dissemination time depends on the mobility of nodes. If all nodes are static, each message will be disseminated to all the nodes in at most max timeout n 2 seconds, where n is the number of nodes. The explanation for this bound is as follows: According to Lemma 3.3, a node that has message m will broadcast it to its neighbors in at most max timeout seconds. In the worst case, as illustrated in Figure 5, all nodes that belong to the overlay are Byzantine and therefore all messages will be disseminated using the gossip-request mechanism. Due to the assumption that the graph of correct nodes is connected, the maximal number of hops in the network is n 2 hops (in every hop there is one Byzantine overlay node and one correct node). Therefore, the message should pass n 2 hops and it will take at most max timeout n 2 seconds. In mobile networks, each message will be disseminated to all the nodes according to Theorem 3.4 within max timeout (n 1) seconds. Buffers Size: The size of buffers that every node should have depends on the mobility of the nodes. If all the nodes do not move, every node has to hold every message for max timeout seconds, i.e., the time it takes to disseminate the message only to all its neighbors. Therefore, every node in a static network should have a buffer of size max timeout δ messages. In mobile networks, every message should be kept until all the nodes receive the message. As we showed, every message is disseminated to all the nodes within max timeout (n 1) seconds. Therefore, the buffer size of every node in mobile network should be max timeout (n 1) δ messages. 13

15 In the following sections, we show that the messages are propagated fast (via the overlay nodes) during certain periods if the failure detectors behave like eventually perfect failure detectors or like interval failure detectors Fast Dissemination with Eventually Perfect Failure Detectors In the following, we show that if the MUTE failure detector indeed belongs to P mute, then eventually messages are disseminated to all correct nodes by the overlay. The significance of this is that dissemination along overlay nodes is fast, since it need not wait for the periodic gossip mechanism. Lemma 3.5 Assume that the MUTE failure detector P mute. Then eventually the non-mute overlay nodes form a connected graph COL such that every correct node is either in COL, or within the transmission range of a non-mute node in COL. Proof: Eventually, P mute of all correct nodes will suspect all the mute nodes. Thus, the goodness number in the overlay maintenance protocol for mute nodes will be lower than all other nodes. Consequently, the overlay built by the maintenance protocol will have the desired property. Theorem 3.6 Eventually, when there are no collisions, most messages propagate to all the nodes via the overlay nodes, if the MUTE failure detector P mute. Proof: In Lemma 3.5, we showed that eventually, the non-mute nodes of the overlay form a connected graph that covers all non-mute nodes. Therefore, eventually, all messages are propagated by overlay nodes to all correct nodes, which proves the theorem Fast Dissemination with Interval Failure Detectors In this section we discuss the conditions under which our protocol implements I mute correctly. We show that, if the MUTE failure detector indeed belongs to I mute then during periods of good connectivity messages are disseminated to all correct nodes by the overlay. Observation 3.1 If suspicion interval f mute interval then there will be an interval CI i in which at least one overlay node in N i will be correct. We call CI i a correct interval for node i. Observation 3.2 There exists a suspicion interval such that there are CI 1, CI 2,... CI n for which CI 1 CI 2... CI n. Observation 3.3 In order to prevent false suspicions of the overlay nodes the mute interval of the I mute failure detector should be larger than (n 1) max timeout. Observation 3.4 If some correct node p decides that it is not in the overlay, then after some finite time all of its correct neighbors know that p / OV ERLAY. This immediately follows from the protocols that maintain the overlay. Let OL(p 1,p 2,t start,t end ) be a relation such that p 1 believes that p 2 OL(1,p 1 ) during interval [t start,t end ]. Lemma 3.7 Every Byzantine overlay node q that is mute w.r.t. another node p during an interval [t, t + mute interval] and satisfies OL(p,q,t,t+mute interval) will be suspected by p during suspicion interval. 14

16 Proof: Let q be a Byzantine overlay node that does not forward the message m. Let p be a correct node that satisfies OL(p,q,t,t+mute interval). We assume that q is not forwarding m to p during mute interval and we will show that q will be suspected during suspicion interval. According to Theorem 3.4, all correct nodes will receive m after at most max timeout (n 1) seconds. Therefore, according to the protocol, after receiving m, p will activate its MUTE failure detector and if q is not forwarding messages during mute interval, it will be suspected during suspicion interval by p. Lemma 3.8 Non-mute processes are not suspected by some correct process during suspicion f ree interval. Proof: A non-mute non-overlay node p cannot be suspected, since according to the protocol a non-overlay node can be suspected by the MUTE failure detector only if it is not forwarding a message m that it gossiped about. Since p is not mute, it will always forward the message m and therefore will not be suspected. According to Observation 3.4, if p leaves the overlay, then after some finite time that is smaller than mute interval all its correct neighbors believe that p / OV ERLAY. Therefore, p s neighbors will not expect p to broadcast messages and thus will not suspect it either. Similarly, a non-mute overlay node p also cannot be suspected. This is because according to the protocol, an overlay node can be suspected by the MUTE failure detector only if it is not forwarding a message m that it received from another overlay node or from the originator of m. Yet, since p is not mute, if p has m, it will always forward it. If p does not have the message m, p will send a FIND MISSING MSG message and it will receive the missing message m either from nodes that belong to N(2, p) or after at most max timeout (n 1) seconds (as we show in Theorem 3.4). Once p receives m, it will forward m to its neighboring nodes and therefore p will not be suspected since mute interval > (n 1) max timeout (according to Observation 3.3). Lemma 3.9 Assume that the MUTE failure detector I mute. Then there is an interval in which the nonmute overlay nodes form a connected graph such that every correct node is either in the overlay, or within the transmission range of a non-mute overlay node. Proof: Lemma 3.8 shows that non-mute nodes are not suspected and according to Lemma 3.7 there is an interval such that I mute of all correct nodes suspects all mute nodes. Thus, none of the mute nodes will be trusted. Consequently, the overlay built by the maintenance protocol will have the desired property. Theorem 3.1 If the MUTE failure detector I mute then there is an interval such that, when there are no collisions, most messages propagate to all the nodes via the overlay nodes. Proof: Lemma 3.7 shows that there is a certain interval when every mute overlay node is suspected. In Lemma 3.9, we showed that there is an interval such that the non-mute nodes of the overlay form a connected graph that covers all non-mute nodes. Therefore, during a certain interval, all messages are propagated by overlay nodes to all correct nodes, which proves the lemma. 4 Results We have measured the performance of our protocol using the SWANS/JIST simulator [1]. In the simulations, we have compared the performance of our protocol with the performance of flooding on one hand and 15

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast

Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast HariGovind V. Ramasamy Christian Cachin August 19, 2005 Abstract Atomic broadcast is a communication primitive that allows a group of