Nemor: A Congestion-Aware Protocol for Anonymous Peer-based Content Distribution

Nemor: A Congestion-Aware Protocol for Anonymous Peer-based Content Distribution Abstract As content providers adopt peer-to-peer approaches for content sharing and distribution, they face new challenges in guaranteeing privacy to their clients. Participating peers can glean information from their communication with other peers, such as their identities or the shared data and use this information for malicious purposes. We present Nemor, a protocol that allows a requesting peer and a corresponding serving peer to communicate anonymously with each other and from other participating peers, while protecting the identity of the content being exchanged. Nemor relies on a trusted intermediary, such as a provider-managed tracker, to identify a potential serving peer. A peer in Nemor joins one or more trees. Using a combination of a random walk, a probabilistic jump from one tree to another and constrained flooding, the requesting and serving peer dynamically construct an overlay path between them. A key differentiator of Nemor is the integrated design of a congestion avoidance mechanism that yields significant performance benefits without compromising on anonymity. Using experimental results from PlanetLab and simulations with traces from an operational VoD system, we demonstrate that Nemor outperforms state of the art approaches like TOR and OneSwarm. Our results confirm that Nemor, while being resilient to attacks on anonymity, achieves high performance and scalability and is suitable for a range of applications, including distribution of large volume content, such as streaming video. I. INTRODUCTION As service providers start employing peer-to-peer (P2P) approaches to distribute videos and data, they have to address previously nonexistent privacy risks. For example, with centralized video services, only the content provider knows what videos are available and viewed by a user. There is explicit agreement on what the provider can and cannot do with this information. With P2P mechanisms, when a user Alice requests a video, she can learn about the videos shared by the serving peer, say Bob. Conversely, Bob can learn about the video that Alice is interested in. Worse yet, many such peers can collect this data and collude with each other to mine useful information [31], such as user interests and viewing patterns, out of malice or for profit. Thus, for the provider to make similar guarantees about users privacy, they need to make use of anonymous communication in these applications. Moreover, the scenarios we consider demand unique requirements. First, unlike most other existing approaches (e.g., [6][3][9][30]), the paramount goal here is not to protect the identity of a user from a central authority (i.e., the provider). Instead, the goal is to protect a user s private information from the other participating users. Secondly, we cannot sacrifice performance for the sake of anonymity. For example, viewers watching a video might want to stay anonymous, but will not tolerate glitches due to delayed data. Unfortunately, anonymity and performance are often conflicting goals: Efficient delivery typically requires detailed information about the network and the participating peers, while anonymity requires that such information is hidden. Existing anonymity approaches primarily focus on providing anonymity, often at the cost of performance. For example, P5 [21] gives strong anonymity guarantees, but is heavy weight and is not suited for transferring large files. Protocols like TOR [6] were designed for private communication with a public entity like a web server or a torrent server. They are best suited for communication that is session oriented and are not ideal for P2P transfers where many small pieces of data are transferred from a large set of peers. Others [10] provide mutual anonymity, but only provide probabilistic guarantees in locating and delivering content, which is not acceptable in a provider setting. Instead, our goal is to achieve the strict needs of anonymity for users while designing a protocol that is also efficient and provides guaranteed content delivery. To that end, we present Nemor 1, a protocol for efficient, anonymous content distribution between peer entities in a provider setting. We assume a content (or service) provider setting that has a large number of subscribing nodes (or peers). Its library is distributed among these peers and requests are served using P2P mechanisms. Nemor aims at protecting the identities of the content requesting peer (Alice in the rest of the paper) and the content serving peer (Bob in the rest of the paper) from each other as well as from other peers. In addition, the content should be efficiently delivered while unidentifiable to all participating peers except Alice and Bob. Nemor, as its name suggests, uses an overlay consisting of a set of trees. Each peer is a member of one or more trees. Multiple trees are constructed to constrain the size of each tree and in turn achieve efficient data transfers. Content delivery in Nemor is divided into two phases: search and retrieval. In the search phase, Alice tries to find a potential serving peer anonymously. This is challenging to perform efficiently without revealing Alice or Bob s identity. Traditionally, this is achieved by either flooding the request over the entire network, or through random walks. However, flooding incurs severe overhead, while random walks only give probabilistic guarantees. In Nemor, we take advantage of the provider setting and employ a tracker (owned and operated by the provider) as a trusted intermediary to facilitate the search. The tracker in Nemor tracks the membership information of a peer (i.e., given a peer n, all the trees that n is a member of) and the objects stored by that peer. When searching for an object, Alice contacts the tracker through a secure connection and requests for a serving peer for the object. The tracker identifies Bob as a potential serving peer. Instead of returning Bob s identity the tracker constructs a secret token that only Bob can decipher and provides it to Alice. The use of the tracker is acceptable in our setting, as the provider does not learn anything that it did not know previously. Further, as we show in Section V.E, there are well known ways to protect the tracker from attacks. 1 Nemor is Latin for a grove of trees. 1

In the retrieval phase, Alice initiates a random walk of the request on one of her trees. The request traverses through the tree before eventually jumps to a node (called the Landing Point or LP) on one of Bob s trees. The peer that jumps the request from Alice s tree to Bob s tree, called the Jumping Point (JP), is marked in the request. Each node on the path in Alice s tree stores the path state information to facilitate content delivery later. On receiving the message, Bob decrypts the token and sends a response by performing a random walk over one of his trees. This response eventually transitions to Alice s tree via the JP that is stored in the request. Thus, the paths from Bob to JP and from Alice to JP are stitched together. Bob can then send Alice the requested content along this stitched path. A key differentiator of Nemor is the integrated design of a congestion avoidance mechanism that yields significant performance benefits without weakening anonymity. Since peers are used as relays, their uplinks have to be used carefully; otherwise it will result in degraded throughput. However, the congestion avoidance mechanism cannot be independent of the anonymity mechanism as this may expose a user s identity. Hence we incorporate a congestion avoidance mechanism in the core design of Nemor. Our results show that our mechanism facilitates in substantially better performance than existing approaches. In this paper, we present a detailed description of Nemor. We qualitatively reason about the anonymity provided by Nemor by considering possible attacks and how they are addressed. We use PlanetLab experiments to compare Nemor with widely used protocols TOR and OneSwarm to show its performance benefits. For example, our results show that Nemor takes 427 sec on average to download a 25 min video compared to 821 sec by OneSwarm and 1403 sec by TOR. Using trace data from a nationally deployed Video-on-Demand service, we run simulations to show that Nemor scales well and handles the vagaries of P2P networks well. Our results make the case that Nemor easily satisfies the twin requirements of anonymity and performance. II. RELATED WORK Anonymous online communication started with Chaum's seminal work on untraceable email communication [3]. Chaum uses a trusted intermediate node called the mix to relay messages from clients. A mix hides the correspondences between its input and out messages, hence protecting the identity of the origin of a message. Most of the following work, Crowds [18], Onion Routing [16], Tarzan [9], MorphMix [19], Hordes [12], Cashmere [24], and Information Slicing [11] follow this key design of routing message though a series of mixes for providing anonymity. While exact method differs; they share the same high-level goal of anonymizing only the client initiating the communication. None of them provides anonymity to the responder. APFS [20] provides anonymity to both the sender and receiver by having them anonymously connect to the same publicly advertised proxy, which relays bidirectional traffic. A similar design is proposed in TOR [6], which offers mutual anonymity through the notion of a hidden service. A serving node publishes the service it offers by advertising the identity of an introduction point. Through the introduction point, requesting and serving nodes agree on a rendezvous point and both build anonymous circuits to it, over which content can be delivered. Unfortunately, TOR is not very efficient; a study by McCoy et al. [28] showed that TOR s circuit latency can have high variance, with a maximum value of 120 seconds. BitBlender [1] was designed to anonymize BitTorrent transfers. It adds relay peers into the serving peer list in a torrent file. Relay peers don t have the content and hence relay the request further until it reaches a serving peer. Content is delivered in the reverse direction of the request path. Anonymity is achieved because the requesting and the serving peers are indistinguishable from the relay peers. However, the requesting and the serving peers are still exposed at some level as they are listed in the group of peers in the torrent file. Mantis [2] provides mutual anonymity by flooding content requests. The requesting peer s identity, however, is not protected and known to the serving peer as the serving peer transfers data directly using source address spoofed UDP. To reduce the overhead of flooding, a peer can drop the request, which means there is no guarantee in content being served. This critical limitation is shared by other protocols as well. Peers in MUTE [23] use pseudo identities to communicate with each other. A request is flooded with a probabilistic time-tolive counter. As a result, it might never reach the serving peer. The pseudo identity in MUTE is also vulnerable to attacks [5]. In RWAP [10], a requesting peer constructs and initiates random walks for two messages. The first node to receive both messages reconstructs the request and floods it. The content is sent back over an onion-routed path specified in the request. RWAP cannot guarantee that the request will be served because two random walks might never meet at a common node. OneSwarm [26] also floods requests. This leads to high communication overhead and potentially long response times. The identity of the content can be learned from the request message. Nodes in OneSwarm can be overloaded with a high degree of connectivity. OneSwarm only provides a probabilistic guarantee in serving the content because a node can drop the request when its uplink is highly utilized. DC-net [4] has been shown to ensure both sender and receiver anonymity even in the presence of a global adversary. Such strong anonymity comes at the price of having to broadcast every single packet. This is prohibitively expensive in practice. P5 [21] tries to address the inefficiencies in DC-net by constructing a broadcast hierarchy. Different broadcast groups in the hierarchy have different sizes (the smaller, the less anonymous, but more efficient). A peer can choose which group to join based on how much it is willing to trade anonymity for efficiency. However, to communicate, a peer needs an offline channel with the other party to send over its broadcast group ID first. It is unclear how this channel can be setup, especially under an anonymous P2P setting. P5 is still inefficient when the content is a large file because it is broadcast to multiple groups. III. ADVERSARY MODEL We assume an adversary model in which one or more malicious peers attempt to identify Alice, Bob and/or the content being requested. Malicious peers can collude by exchanging information through a back channel and launch a coordinated attack. However, the information available to a malicious peer is limited to what it can gather locally through 2

its direct interactions with other peers or through exchanges with other colluding peers over the back channel. We place no restriction on what the malicious peers can do locally; they can intercept, modify, drop and insert messages. But no peer can decrypt a message without the proper key. Alice or Bob can also be a member of the malicious peers trying to identify the other end point. We emphasize that we do not adopt a commonly used model of a global adversary who has the knowledge of all the network entities, the ability to observe arbitrary amounts of network traffic, or launch attacks on every network entity. We argue that such an adversary is unrealistic in the globally distributed P2P network we consider. We aim for a protocol that is efficient and guarantees anonymity in a more practical, large-scale distributed setting. IV. NEMOR PROTOCOL DESCRIPTION The goal of Nemor is to allow a requesting peer Alice to obtain some content Obj from a serving peer Bob anonymously: that is, Alice has no knowledge of who Bob is and vice versa. Also, none of the other peers in the system know who Alice and Bob are or that they are exchanging Obj. Nemor is designed to allow anonymous data transfer between Alice and Bob without compromising on efficiency. There are three components to the Nemor that make this possible: (a) Overlay construction and maintenance, (b) Content search, and (c) Content retrieval. 1) Overlay Construction and Maintenance Nemor is a P2P overlay network consisting of multiple trees. A node joins the network by connecting to one or more trees. Peers exchange content over dynamically constructed paths across different trees. For simplicity and efficiency we assume the existence of a central, trusted entity called a tracker that is hosted and controlled by the content provider. The tracker facilitates the construction and maintenance of the trees, also keeps track of peers in the network and the content they share. It authenticates peers to ensure their validity, to prevent Sybil attacks [8], and to authorize their content requests. When a peer joins the network, it authenticates itself and registers with tracker to obtain a shared symmetric key. This key is used to encrypt all its future communications with the tracker. When a client departs from the system, it deregisters with the tracker. The tracker keeps track of the number of trees, the peers in the network and their tree memberships. While this is a departure from the traditional notion of a tracker, we argue that this additional responsibility does not add significant load on the tracker, but can substantially simplify anonymous content search (discussed below in Section IV.2). To minimize its work, the tracker does not keep information about the network topology (i.e., how peers are connected in a tree). Depending on the deployment, the tracker can be implemented on a single server, clusters of servers or as distributed DHTs [14]. a) Handling Node Joins When a node N joins the network, it first contacts the tracker. The tracker first selects a random set of trees T (with T being a system parameter), and then randomly selects a node on each of these trees. N joins these trees and peers with the selected nodes. The tracker starts with a small number of initial trees and bounds the size of each tree by a soft upper bound. The set of trees T are chosen as follows: The tracker first tries to add the node N to existing trees, biasing its choice towards smaller trees. The tracker creates new trees when all the existing trees grow to hit the value of the current upper bound,. Depending on system needs, the tracker may also choose to adjust the soft upper bound dynamically in order to achieve small, balanced trees, while also keeping the number of trees manageable. Having multiple trees allows Nemor to limit the scope of flooding to a single, small tree instead of the entire network. Flooding overhead becomes insignificant compared to the content transfer. Our simulations of Nemor show that trees with P nodes have an average diameter of O(logP). Since a path is built across two trees, this means paths tend to be short allowing efficient content transfer. We also gain several significant advantages by allowing the tracker to control the overlay construction. First, most P2P networks need a bootstrapping mechanism that allows new peers to connect to existing peers. Using the tracker simplifies this process and prevents colluding nodes from attaching themselves to arbitrary portions of the overlay network in attempting to compromise anonymity. Secondly, tree information (count and sizes) are well concealed against malicious peers, which is essential for ensuring anonymity. b) Handling Peer Departure and Failure Peer departure and failure are the norm in P2P systems. It is particularly important to handle them in Nemor for multiple reasons. Since the content is relayed across multiple peers, there is a higher probability of disruption compared to direct transfers. Without repair, the entire transfer has to be reinitiated. Re-transmitting large video files may result in unacceptable delay, wasted resources, and also has been shown to be susceptible to attacks [15]. More importantly, node failures result in disconnected trees. We only discuss node failures here since node departures can be handled similarly. When a node X fails, all its neighbors detect and report the failure to the tracker. The tracker selects one of the reporting neighbors as an anchor node and instructs all the other reporting nodes to connect to the anchor node to repair the tree. We also have to repair the disconnected transfer paths for them to resume the disrupted transfers. We illustrate the repair process in Figure 1. Suppose that node X fails and all its 5 neighbors A, B, C, D and E report to the tracker. Assume the tracker selects A as the anchor node and requests B, C, D and E to connect to A. The new links (A, B), (A, C), (A, D) and (A, E) are established with a two-way handshake between A and B, A and C, etc. This enables A to reinstall all the disrupted paths that were going through X before. This process has to be repeated for each tree that node X was a member of. The tracker classifies failure reports based on their trees and repairs each tree separately. Our repair mechanism will result in a small delay in content transfer. In the worst case, when some key node (e.g., JP Alice or LP Bob ) fails, Alice reinitiates the content request. But our results in Section VI indicate such instances are very rare. In extreme cases where multiple adjacent nodes fail simultaneously, Nemor will not be able to repair that tree. In such cases, the broken tree is destroyed and the tracker removes its entry for the tree. Disconnected nodes will contact the tracker, which can then get them to join other trees. 3

Figure 1. Node failure repair. 2) Content Search Efficient anonymous content search is challenging. Alice needs to locate a potential serving peer without revealing that she is searching for content or what she is searching for. Similarly, Bob cannot reveal openly that he has the content. Traditionally, this is achieved by either flooding the request over the entire network, or through random walks. However, flooding incurs severe overhead, while random walks only give probabilistic guarantees. In Nemor, we leverage the provider environment to address this problem. We make use of the tracker to keep track of the peers and the content they share. We believe that this is acceptable in our setting because (a) it allows for guaranteed and efficient discovery of a potential source for the object, and (b) it does not reveal any new information to the provider. Further, we can use existing approaches to secure the tracker from attacks against it and from leaking information to malicious users. When Alice wants content Obj, she sends an encrypted request to the tracker. The tracker randomly picks one of the nodes that have the content as the serving peer, say Bob. The tracker also picks a random peer in one of Bob s trees as the Landing Point (LP Bob ) for this request. The tracker is able to pick this node because it keeps track of each peer s tree membership information. The tracker assigns a path identifier P Obj for this transfer and generates a AES session key K Obj that Bob and Alice can use to encrypt and decrypt Obj. The tracker generates a token REQ, which is the container of all this information. REQ is encrypted using Bob s secret key; thus only Bob can retrieve the information in the token. The tracker returns this token along with LP Bob, P Obj, Bob s tree ID T Bob, and K Obj to Alice, together encrypted using Alice s secret key. 3) Content Retrieval Having obtained the token, Alice can move into the content retrieval phase. Alice does not know who Bob is or where he is, but she knows that she can get to Bob through LP Bob. Alice constructs a request that contains P Obj, LP Bob, T Bob, and REQ. Alice initiates a random walk in one of her trees with the intent of reaching LP Bob. Alice then locally determines a jumping probability p Alice. With probability p Alice she sends/jumps the request directly to LP Bob ; with probability 1 p Alice she forwards the request to one of her neighbors, say node N. N then repeats the same process using its own locally determined jumping probability value p N. This process is repeated until the request eventually reaches LP Bob. This is a directed random walk, meaning the request is never forwarded backwards. So when a leaf node receives a request, it has to jump directly to LP Bob. We call the node that jumps the request to LP Bob as the Jumping Point (JP Alice ). Such directed random walk in a tree is loop-free, which means request always reaches LP Bob. To protect Alice s identity, each peer on the path rewrites the request by marking itself as the source. Each node on the path stores the path identifier, P Obj, and the upstream and downstream neighbors for later relaying the content back. The process of jumping the request between trees also makes it hard for an attacker to correlate across requests. After receiving the request, LP Bob initiates the flooding of the request to all the nodes in tree T Bob. When Bob receives the request he continues to flood it; otherwise the colluding peer can identify the node that terminates the flooding to be Bob. We use flooding, instead of direct delivery of the request, to protect Bob s identity while guaranteeing the delivery of the request. The fact that tree sizes in Nemor are small, limits the overhead of flooding. From the request and the token REQ within, Bob gets the information of what he should serve (Obj), the path ID (P Obj ) and the jumping point (JP Alice ). He responds by initiating a random walk of the response over one of his trees, which eventually jumps to JP Alice. At JP Alice, the path taken by the two random walks are stitched together to form a path connecting Alice and Bob. Each node on that path has stored P Obj and its upstream and downstream neighbors. Bob can now transfer the content to Alice. Recall that the tracker provides a session key K Obj for protecting the content during transfer. Bob extracts this key from REQ and encrypts Obj before transferring it. Alice has the key from the tracker, so she can use it to decrypt the content. Alice can check the integrity of the content against a checksum from the tracker in case Alice suspects Bob or any peer on the path might have intentionally corrupted the content. The sequence of steps in Nemor is illustrated in Figure 2. Alice also keeps track of how long she has been waiting for the response. So when Bob refuses to serve or fails, jumping points fails, or the network is congested, Alice won t starve, instead she restarts the request process using the same information or goes back to tracker with a new request. P Obj is maintained as a soft state on peers. P Obj state is destroyed when transfer completes or when the timer associated with the soft state expires, whichever comes first. Figure 2. The sequence of steps in Nemor. 4) Avoiding Congestion P2P systems rely on peers having sufficient bandwidth to efficiently transfer the requested content. Contention for a peer s uplink from different transfers might introduce significant delays, and even loss in value. Relaying content over nodes selected randomly like in TOR and OneSwarm can lead to very poor performance due to congestion as shown in Section VI. Hence, Nemor incorporates a simple, yet effective congestion avoidance mechanism to achieve high performance. During the random walk of a request (or response), a peer can send a message to the upstream peer refusing to forward 4

the request, if its uplink is congested. The upstream peer will then try to continue the random walk along a different neighbor. When all the neighbors refuse to do it, it jumps the request directly to LP Bob itself. This congestion avoidance approach also adds to Nemor s resilience against a host of attacks that depend on sending repeated requests and timing analysis, described in the next section. Notice we do not use LP Bob for stitching the two random walk paths together. This design decision was also to avoid congested nodes. Recall that the tracker selected LP Bob. The tracker does not have information about LP Bob s uplink, which may be congested. Instead, by adopting its own random walk of the response, Bob ensures the use of uncongested links in the path back to JP Alice. V. RESILIENCE TOWARDS ATTACKS We discuss Nemor s anonymity and its resilience against known attacks on anonymous systems. Traffic analysis attacks [9][12][16][18] launched by colluding peers is a huge concern for many anonymity protocols such as TOR. So we also consider similar attacks that are specifically engineered to target Nemor and explain why they are ineffective on Nemor. A. Nemor s Anonymity Nemor s anonymity depends on an attacker not being certain on whether a peer is the requesting (or serving) peer. We discuss the anonymity of the requesting peer Alice when all the other peers (including Bob) can be malicious and colluding. The anonymity of Bob can be worked out similarly. Assume that in a tree of X peers, there is a set of malicious or compromised peers, possibly including Bob, who collude to try to identify the requesting peer. Through collusion and inference, they identify a subset of x peers known to them. Suppose this subset consists of k good peers (not malicious or collude), so there are x k malicious peers 1 k x X. However, the total number of peers X is unknown to all the peers. So the attackers do not know for certain that Alice or even any one of k good peers in their known community is the requesting peer; the requesting peer may very well be a peer in the remaining X x peers that are unknown to the attackers. Let the information content [13] for determining that Alice is the requesting peer be I = log p where p is the probability that Alice is the requesting peer. It measures the uncertainty of Alice being the requesting peer. With the information of k good nodes and x k malicious nodes, an attacker can only make a random guess of Alice being the requesting peer from among any one of the X (x k) good nodes on the tree. Hence the information content needed to identify Alice is: 1 I = log p = log = log( X x + k) X x + k However, the attackers do not know X. Approaches to estimate the size of a P2P network (i.e., X) exist [22][27], but they are all based on active probing and rely on the cooperation of all nodes for accurate estimation. Hence, we have: Proposition 1: The information content for an attacker to identify the requesting or serving peer is unbounded in Nemor. Note that Nemor s anonymity is different than the commonly used notion of k-anonymity where Alice is one of a set of k indistinguishable peers who are all known to an attacker. The attacker can then make a random guess of Alice s identity and claim with a probability of 1/k that this is correct. Nemor s design ensures that an attacker cannot even determine whether Alice (or Bob) is among a community of k known peers; hence Nemor s anonymity is stronger than k-anonymity. The key to Alice s anonymity is that she may have good neighbors that may in turn be connected to more good peers; all of these are unknown to the attackers. Consequently, upon receiving a request message from Alice, an attacker cannot be certain whether Alice is the requesting peer or if it was from one of the good peers connected to Alice. Next we discuss attacks that target Nemor s anonymity. B. Passive Probabilistic Attack on Alice Consider the case that a sufficient number of nodes in Alice s tree are malicious and they can observe or infer all the random walks that traverse (either originating or transiting) Alice. Also suppose, against Nemor s design, all peers execute the random walk process with the same jumping probability p. Alice is anonymous as the requesting peer because there is a possibility that she is forwarding messages for nodes unknown to the attackers. In this attack, the attackers attempt to rule out this possibility. Suppose the observed portion of a random walk from Alice has a length L. If attackers can guess with a high probability that the random walk is indeed of length L, they succeed in determining Alice is the requesting peer. Then: Lemma 1: If there is an observed random walk of length L, then the probability that the first peer is the requesting peer is p. Sketch of Proof: Let X be the length of a random walk. Then Pr( X = L, X L) Pr( X = L) Pr( X = L X L) = = Pr( X L) Pr( X L) Pr( X = L) = (1 p) L p, i L Pr( X L) = (1 p) p = (1 p) i= L Therefore, Pr( X = L X L) = p. Consequently, given an observed random walk, an attacker can guess that the first node is the requesting peer with probability of p being correct, independent of the length of the walk. Now suppose that r random walks from Alice have been observed. Each time an attacker guesses Alice is the requesting peer, the probability that Alice is actually the requesting peer is r 1 (1 p). This converges to 1 when r is large, meaning an attacker knows with a high probability that Alice is the requesting peer, breaking Alice s anonymity. The root cause for this attack is the false assumption that each peer picks the same jumping probability. Nemor s strength is that each peer picks its jumping probability locally and independently. Congestion avoidance mechanism further increases attacker s uncertainty. So Nemor is resilient to such analysis attack. C. Active Attacks on Alice s Tree Attackers in Alice s tree can collude and collect information about the tree to try to identify the requesting peer. This falls under the general class of traffic analysis attack. We describe several such attacks and show how attacks on the tree characteristics cannot break Nemor s anonymity. 1) Attacks on Nemor s Congestion Avoidance Mechanism Attackers may target Nemor s core congestion avoidance component for compromising Alice s anonymity. Alice can conceal herself as a requesting peer only if the attackers are not 5

sure whether she has other good neighbors. Hence it is in the attackers best interest to identify (or exhaust) all of Alice s neighbors through what we call neighbor exhaustion attack. Resembling surround target [2], in this attack, malicious neighbors of Alice repeatedly send dummy request messages to a colluding LP through Alice. When Alice tries to forward the request to any one of her malicious neighbors, they reject it by falsely claiming that they are congested. If Alice has no other good neighbors, she is forced to jump to the colluding LP. Over time, the attackers learn that, with a high probability, Alice has no other good neighbor. Later, when Alice sends a request, since none of her malicious neighbors has initiated it, they have high confidence in Alice being the requesting peer, breaking Alice s anonymity. However, in Nemor, Alice would actually decline to forward such repeated requests as a result of becoming congested from such requests (see Section IV.4). This, coupled with dynamic trees, makes it impossible for attackers to succeed in such an attack within a reasonable time. 2) Identifying Members of a Tree A variant of the previous attack is when the attackers attempt to identify all the good nodes in Alice s tree. Once the attackers know all the members in a tree, it becomes easier for them to identify Alice when she makes a request. To do so, again attackers could initiate a large number of requests destined to a colluding LP. The colluding LP s job is to collect the identities of all the JPs. With a sufficient number of requests, eventually, every node in the tree jumps at least once and exposes itself to the attackers. However, Nemor is not vulnerable to this attack because: First, it uses random walk, so attackers must send massive number of requests to try to cause all the nodes to jump. But a valid request authorized by the tracker is expensive. Secondly, nodes will discard repeated requests or decline to participate in the path if they are congested. Thirdly, the dynamic tree adds further protection. 3) Attack to Determine the Tree Topology Attackers may even attempt to use the previous approach to derive the complete topology of Alice s tree. To do so, in addition to build a list of JPs, the colluding LP also keeps a count on the number of jumps from each JP. With these counts the attacker tries to construct a hidden Markov chain. However, the observation of a jump from a node depends on a request reaching that node as well as its locally jumping probability. To reconstruct the tree, the attacker would need to determine the transition probabilities of the Markov chain; this is an ill-posed problem that is very hard to solve, especially in a short time. Nodes also drop repeat requests making the attack difficult. 4) Timing Analysis based Attack By flooding a large number of requests into Alice s tree, an attacker can apply timing analysis on how long it takes the request to reach a JP to try to figure out who are his children, grandchildren, and so on. Then, the attacker may attempt to use this information to determine the requesting peer. Nemor can use common techniques such as introducing random delays at each node [3] to thwart such timing analysis. Also, peers drop repeated requests and might refuse to participate in a path because of potential congestion. Attackers might also measure the delays between requests coming out of Alice. So Alice should not send out many requests at the same time; instead leave some time in between to mimic the processing delay for relaying a request. D. Predecessor Attack Predecessor attack [25] is an effective attack on many anonymity protocols in which and attacker tries to identify the requesting peer by forcing it to rebuild paths. The attacker forces the content path to be repeatedly rebuilt by interrupting the transfer (drop request or connection). The requesting node is the only common node on all the paths leading to its identification being exposed. Nemor, interestingly, is immune to this attack. When a transfer fails after a couple of retries, Alice will reinitiate a fresh request from the tracker with a new path ID. An attacker will not be able to correlate path for this new request with earlier observations and identify Alice. Also, since the Nemor overlay is a tree, all retries of a request from Alice can go through the same neighbor only if Alice s jumping probability is small, and this neighbor alone has sufficient bandwidth. When Alice forwards all the requests to the same neighbor; this neighbor will be the only node the attackers see on all the rebuilt paths. This implies that the predecessor attack does not expose Alice. E. Attacks against the Tracker In Nemor, we assume that there is a trusted tracker that manages the tree membership and assists in locating a content serving peer. One might claim that the tracker could come under attack, its content may be polluted, or it is a single point of failure. But these are common and well-studied issues with centralized servers and are not unique to Nemor. The tracker can easily be implemented as a cluster of servers or as a DHT. Fault tolerance mechanisms such as replication and load balancing can be used to address reliability concerns. There are well known techniques to keep the data consistent among server clusters or in a DHT. Numerous methods exist to prevent the tracker from being compromised and secure its contents against unauthorized access or pollution. The tracker plays a similar role as a Domain Name Server (DNS). Again, there is a wealth of work on DNS reliability and security; these known techniques, such as to prevent DNS cache poisoning[7], can be applied to the tracker. F. Other Attacks Attacks from a global observer, such as global traffic analysis [17], traceback attack [12], and timing analysis [9][16][18][19] are not applicable to Nemor, because these are impractical in our global Internet environment. The nondeterministic nature of the time for each random walk makes it difficult, or even infeasible to correlate events for performing timing analysis. Attacks such as a replay attack can be dealt with timestamp and nonce. Cryptanalysis and content corruption for uncovering or scrambling the message content can be handled using strong encryption schemes and error checking techniques. There is a host of attacks on general network services (e.g., DoS, degrading QoS, compromising nodes [9]). However, these are not unique to Nemor and known techniques for defending against them can be applied here. VI. PERFORMANCE EVALUATION We implemented Nemor and compared its performance with existing TOR and OneSwarm implementations on PlanetLab. We also compared the performances of Nemor and 6

TOR at large scale using detailed simulations. Our results validate our claim that Nemor not only offers anonymity to both requesting and serving peers, but is also efficient, making it suitable for distributing all forms of content. A. PlanetLab Experiments We have implemented Nemor in C and deployed it on PlanetLab. We set up a tracker that keeps track of the content stored at each node and their tree membership. When requested by a requesting peer, the tracker returns necessary information for it to reach a potential serving peer. We compared Nemor with a HTTP client/server setup, TOR and OneSwarm. In the HTTP setup, each node ran its own HTTP client and server, so the requesting peer could directly download content from the serving peer. This direct P2P transfer serves as our baseline (best) performance. We compiled a modified version of the OneSwarm implementation [30] (to circumvent its GUI) and deployed it. OneSwarm has two types of content searches, textbased and hash-based. Since text-based search incurs more rounds of communication, we used hash-based search to get the best performance from OneSwarm. We also set up our own private TOR network using the unmodified version of TOR implementation [29]. All nodes ran local TOR proxies that communicated with each other to form a TOR overlay. Each node ran its own hidden service through a local HTTP server. Then a HTTP client configured to use its local TOR proxy can request content anonymously though this TOR network. A tracker is setup to provide the hidden service URL for content requests. Note our TOR network was completely isolated from the heavily used public TOR network. We also took measures to try to obtain the best performance from TOR (e.g., we ensured that we avoided some known problems with TOR such as the congestion at exit node reported in [26]). We used 50 PlanetLab nodes for our experiments. In all the cases, the same 25 nodes were used as the requesting peers while the remaining acted as the serving peers. The first phase was for the nodes to form an overlay. In Nemor, a newly joined node contacts the tracker, which picked 3 trees (biased towards smaller trees) and a random node in each tree for the new node to connect to. The tracker started with 3 trees and a soft upper bound of 3 for the tree size. As more nodes joined the network, both the number of trees and the tree size grow. With 50 nodes Nemor constructed 12 trees each having about 12 nodes. In OneSwarm, a new node contacted the tracker, which randomly selected 3 existing nodes for it to peer with. TOR s implementation automatically constructed the overlay. A driver program was used to orchestrate the experiment. It was able to command a node to request a particular video at a specific time. We report results for an experiment where 25 requesting peers all made different video requests at the same time (flash crowd situation in the network). Each video was 25 min long and was divided into 50 chunks (3.75 MB each). We pre-populated every video chunk at one random serving peer. The requesting peer downloaded chunks 4 at a time and in order. For each chunk, both Nemor and TOR needed to contact its tracker first for serving peer information. For OneSwarm, we assumed the requesting peer knew the hash value of the chunk; so it could flood the content request immediately. We kept detailed logs to calculate various performance metrics. We repeated every experiment 5 times to minimize the variability due to transient network conditions on PlanetLab and report aggregated results from all these runs. 1) Comparsion of Video Transfer Time Figure 3. Cummulative distribution of video transfer time. In the first result, we measured video transfer time for the baseline, OneSwarm, TOR and Nemor with a congestion avoidance threshold of 6 (i.e., a peer participating in 6 or more transfers considers itself congested). Cumulative distribution of the video transfer times for the 25 video requests across the 5 runs is shown in Figure 3. In the baseline, all the videos are transferred within 553 sec, offering a goodput of 2.7 Mbps in the worst case. Nemor performs very close to the baseline. 77.6% of the transfers complete within 553 sec, with a max of 901 sec and an average of 427 sec. Nemor provides a goodput of at least 1.66 Mbps, indicating that the price paid for the strong mutual anonymity provided is small. OneSwarm and TOR perform progressively worse. OneSwarm needs 1661 sec to complete all transfers and has an average of 821 sec. TOR needs 2722 sec in the worst case with an average of 1403 sec. The goodput in the worst case were 0.9 Mbps and 0.55 Mbps for them respectively. On average Nemor is almost twice as fast as OneSwarm, more than three times faster than TOR, and only 31% slower than the baseline that offers no anonymity. 2) Distribution of the Number of Hops Figure 4. Distribution of number of hops traversed by chunk. The key difference between the baseline and different anonymity protocols is that the latter use multiple overlay hops to deliver content. Hence, the number of hops and the congestion at each hop dictate the differences in performance. To understand this, we compared the distribution of the number 7

of hops that a chunk traverses between Nemor and OneSwarm. Figure 4 shows that the number of hops in Nemor varies from 1 to 9 with an average of 2.33, while with OneSwarm it ranges from 1 to 7 and the average is 2.55 hops. It doesn t show TOR as it always takes the same number of hops. Having a wide range of number of hops makes it difficult for a malicious peer to guess the requesting or the serving peer. Nemor maintains a wider range of path lengths, which adds to its anonymity; while not compromising efficiency by: (1) ensuring a predominant number of transfers (> 93%) traverse less than 4 hops, and (2) using the congestion avoidance mechanism to spread the load more evenly across the nodes. 3) Comparsion of Uplink Utilization The uplink bandwidth at each peer in a P2P system is a vital resource. Nemor s congestion avoidance mechanism uses peer uplinks judiciously and seeks to spread the load across participating nodes to avoid traffic concentration. We compare uplink utilization of peers in Nemor and OneSwarm by having each peer keep track of the number of transfers that it serves and forwards over time. We plot the cumulative distribution of the maximum number of transfers at each peer in Figure 5. than 33 sec in TOR and 57.5 sec in OneSwarm. Maximum response time for TOR and OneSwarm are 1177 and 1295 sec respectively, which is not desirable because it means they need larger playout buffer. Small trees in Nemor allow request to quickly jump to its landing point and then get flooded into Bob s tree. The response is also quickly returned to Alice, and the resulting path is built very quickly. This results in shorter response times. With TOR, a new circuit (potentially new TCP connections) has to be built for each request. This process introduces long delay, which is inefficient for P2P transfers where many small pieces of data are obtained from different sources. In fact, using 30-second chunks only works in the favor of TOR since fewer circuits need to be built. OneSwarm performs poorly despite the advantage of using hash-based search. If we had used text-based search, the response time would have been far worse. Figure 5. Cumulative distribution of max uplink utilization. We see that the uplink utilization is balanced across the links in the overlay in Nemor. Less than 23% of the nodes transfer 5 chunks concurrently on a given uplink and over 90% of the nodes transfer no more than 10 chunks concurrently. The number of nodes transferring a large number of chunks in parallel is small, so is the maximum uplink utilization. With OneSwarm, although about 70% of the nodes transfer at most 5 chunks in parallel, 8.4% of the nodes transfer 10 or more, with some nodes transferring as many as 35 chunks concurrently. This indicates that in OneSwarm, nodes can be hot-spots and can be congested with a large number of transfers. 4) Comparison of Chunk Response Time In applications such as video streaming, the time between when the request is made and the first bit of data is received, called the response time, is critical. This time tells how long the user has to wait for the content, or how far in advance the request has to be made so that the content is available when it is needed. We show the distribution of chunk response time in Figure 6. Note this time does not include the time for contacting the tracker to identify a serving peer. In the baseline, the response time for 90% of the chunks is less than 0.9 sec. The maximum response time is 22.2 sec. The response time of Nemor is worse than the baseline with 90% of the chunks taking 19.7 sec and a maximum of 293 sec. TOR and OneSwarm take longer with 90% of requests taking less Figure 6. Cummulative distribution of chunk response time. 5) Nemor s Sensitivity to Congestion Avoidance Threshold To understand the impact of congestion avoidance threshold in Nemor s performance, we ran experiments by setting this threshold to c={2, 6}. Video transfer times for different thresholds are presented in Figure 7. The transfer times are better when c=2; because with c=2, transfers avoid nodes with 2 or more ongoing transfers, resulting in higher goodput for each transfer. Note, however, that the degradation in performance with c=6 is not significant, with only a 73 sec increase in average transfer time. Chunk response time using different thresholds are not very different either. Figure 7. Video transfer time for different congestion avoidance thresholds. 6) Nemor s Sensitivity to Jumping Probability The jumping probabilities picked by peers have a direct impact on the path lengths and hence on Nemor s performance. We studied the impact of jumping probabilities by varying the upper limit on the range from which jumping probabilities are 8

drawn. For example, with a upper limit of 0.25, all nodes pick jumping probabilities p from the range [0.0, 0.25]. We summarize the result for lack of space. Our results show that higher jumping probability leads to shorter video transfer time. When p is from the range [0.0, 1.0], it only takes 77% of the time to transfer video compared to when p is from the range [0.0, 0.25]. This is expected because when the range is higher, nodes prefer jumping to forwarding, resulting in shorter paths, hence shorter transfer time. In fact, the average path length is 2.33 when p is from [0.0, 1.0] and 3.65 when p is from [0.0, 0.25]. But even when p is from [0.0, 0.25], Nemor performs better than OneSwarm and TOR. B. Trace-driven Simulation We use simulation results to show the scalability and robustness of Nemor in large scale. We implemented an eventdriven simulator that simulates the Nemor protocol and compared it against a baseline with no anonymity and TOR. Input Trace: We used a request trace from a nationally deployed Video-on-Demand service. The trace had requests from the busy period of a day with about 17K requests for videos from 11K clients. The type and genre of videos varied from very small trailers to full-length movies. System Parameters: The network had 1000 nodes. Each node had a 1 Mbps uplink and a 24 Mbps downlink (typical with VDSL). For consistency, we use the same values as in the PlanetLab experiments for other parameters, such as the fraction of requesting nodes (50%), the number of parallel chunks requests (4), etc. We ran micro benchmarks to find out some actual request processing time. Our results showed that it takes 150 µsec to encrypt and 1600 µsec to decrypt one KB of data using asymmetric keys (used by TOR to build circuits). In contrast, it takes 9 µsec for both encryption and decryption of one KB of data using symmetric keys (used in Nemor). The simulator used these values to account for processing overhead at each node. We assume that the propagation delay between nodes ranges from 5 to 20 msec. We ran each experiments 3 times and report the mean and 95% confidence interval. 1) Chunk Transfer Time The transfer time from the simulation gives us an idea of the efficiency of the protocol in large scale and the ability to avoid traffic concentration and balance load. We plot the time to transfer a chunk in Nemor and TOR normalized to the baseline in Figure 8. We vary the number of nodes from 200 to 1000 (50% active). With Nemor, we examine the performance for two different congestion avoidance threshold values, c={2, 6}. When c=2, Nemor takes only about 4% extra time (17.36 sec compared to 16.68 sec at 800 nodes) to transfer the chunk. With c=6, Nemor needs less than 30% extra time compared to the baseline. But, at 46 seconds, TOR is three times slower. 2) Effect of System Load We study the effect of increasing system load by increasing the number of active peers (peers that request content) from 20% to 80%. The ideal protocol should gracefully handle the increased load. We varied the number of nodes in the network from 200 to 1000 and measured the chunk transfer time of Nemor and TOR, shown in Figure 9. We observe that the three lines for Nemor are actually indistinguishable, which means the transfer time is almost independent of the system load. Performance of TOR, however, deteriorates rapidly as the system load increases. TOR requires 1.5 times the baseline for 20% active nodes and over 4 times the baseline for 80% active nodes. Also, note that even the performance of Nemor with 80% active is better than TOR with 20% active (~1.04 vs. 1.5). This is because Nemor s congestion avoidance mechanism is able to distribute load better by routing around congested nodes. Nemor has a smaller average path length too. The lower chunk transfer time has a cyclic effect; it finishes quickly, thereby reducing the load on the system earlier. 3) Effect of Chunk Size We study the effect of chunk size on the transfer time of a video (750 MB at 1 Mbps) by varying the chunk size from 256 KB to 4 MB, shown in Figure 10. For the baseline, it takes 97 min to transfer (less than the length of the video, 100 min) and the effect of chunk size is small. Both anonymity protocols perform a little better as the chunk size increases. This is not unexpected; as the chunk size increases, there are less number of chunks; hence less number of paths to set up, meaning less control overhead. With Nemor, when c=2 it only takes 104 min to transfer the video, a little more than the baseline. When c= 6, the total transfer time increases to a little over 2 hr. TOR, however, performs much worse and requires about 5 hr to deliver the video in its best case. 4) Control Overhead We measure the control traffic generated in each protocol for downloading a chunk as the number of nodes increases. TOR is unaffected because the number of control messages for building a circuit does not change with network size. In contrast, the overhead (random walks plus flooding) increases for Nemor when there are more nodes. This increase grows as the log of the number of nodes because the individual tree size grows as the log of the number of nodes. However, the total volume of control traffic is small (17 KB in worst case). The control message overhead per chunk is ~13% when the chunk size is 256 KB but goes down to ~3% when the chunk is 512 KB. This leads to the conclusion that the control overhead in Nemor is well within acceptable limits. 5) Handling Churn in the P2P Network We study the effectiveness of Nemor s failure recovery mechanism by synthesizing a high rate of node churn. We use a network of 1000 nodes, with 50% of them actively requesting data. Node arrivals and departures follow a Poisson process, with a mean arrival and departure rate of 1 per minute. Out of the 346K chunk transfers, 176 were interrupted due to node churn. The average time to repair a tree affected by node departure is about 64 msec. No chunk transfer was interrupted more than once. The extra delay for a transfer caused by node departure is smaller than the delay caused by failure, in which case nodes have to wait for the TCP timeout before it knows a neighbor has left the system. Although the time for repairing the tree is small, the average chunk (512 KB) transfer time increased by about 2 sec (17 sec to 19 sec) after we introduce node churn. This is because in addition to the tree repair time, churn causes congestion which impacts transfer time. VII. CONCLUSION We proposed Nemor, a protocol that ensures the anonymity of a requesting peer and a serving peer, and confidentiality of the content exchanged, in a provider-managed P2P setting. Nemor builds on the strengths of existing protocols while 9