LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

Similar documents
Early Measurements of a Cluster-based Architecture for P2P Systems

Flexible Information Discovery in Decentralized Distributed Systems

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Data Replication under Latency Constraints Siu Kee Kate Ho

DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS

Distributed Hash Table

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

An Agenda for Robust Peer-to-Peer Storage

Shaking Service Requests in Peer-to-Peer Video Systems

Athens University of Economics and Business. Dept. of Informatics

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

A Distributed Codec Placement Algorithm for Network-Embedded FEC

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Modifying the Overlay Network of Freenet-style Peer-to-Peer Systems after Successful Request Queries

Building a low-latency, proximity-aware DHT-based P2P network

A Simple Fault Tolerant Distributed Hash Table

Should we build Gnutella on a structured overlay? We believe

Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

Evolution of Peer-to-peer algorithms: Past, present and future.

Distributed Two-way Trees for File Replication on Demand

A Scalable Content- Addressable Network

Reducing Outgoing Traffic of Proxy Cache by Using Client-Cluster

PAST: A large-scale, persistent peer-to-peer storage utility

ReCord: A Distributed Hash Table with Recursive Structure

Self-Organizing Subsets: From Each According to His Abilities, To Each According to His Needs

Data Indexing and Querying in DHT Peer-to-Peer Networks

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

Adaptive Load Balancing for DHT Lookups

Survey of DHT Evaluation Methods

Adaptive Replication and Replacement in P2P Caching

Brocade: Landmark Routing on Overlay Networks

SCAR - Scattering, Concealing and Recovering data within a DHT

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

Multi-level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment

Continuous Timestamping for Efficient Replication Management in DHTs

Cycloid: A Constant-Degree and Lookup-Efficient P2P Overlay Network

Exploiting the Synergy between Peer-to-Peer and Mobile Ad Hoc Networks

On Name Resolution in Peer-to-Peer Networks

Towards a Scalable Distributed Information Management System

Time-related replication for p2p storage system

FPN: A Distributed Hash Table for Commercial Applications

Chapter 6 PEER-TO-PEER COMPUTING

Query Processing Over Peer-To-Peer Data Sharing Systems

A Generic Scheme for Building Overlay Networks in Adversarial Scenarios

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

A Method for Designing Proximity-aware Routing Algorithms for Structured Overlays

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Malugo: A peer-to-peer storage system

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

Location Efficient Proximity and Interest Clustered P2p File Sharing System

DATA. The main challenge in P2P computing is to design and implement LOOKING UP. in P2P Systems

Chapter 10: Peer-to-Peer Systems

Effect of Links on DHT Routing Algorithms 1

Mill: Scalable Area Management for P2P Network based on Geographical Location

EAD: An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

L3S Research Center, University of Hannover

SELF-HEALING NETWORK FOR SCALABLE FAULT TOLERANT RUNTIME ENVIRONMENTS

P2P: Distributed Hash Tables

Design of PriServ, A Privacy Service for DHTs

Effective File Replication and Consistency Maintenance Mechanism in P2P Systems

TAP: A Novel Tunneling Approach for Anonymity in Structured P2P Systems

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

Peer Clustering and Firework Query Model

Peer-to-Peer Web Caching: Hype or Reality?

Dorina Luminiţa COPACI, Constantin Alin COPACI

INTELLIGENT OBJECT LOCALITY NAMING MODEL IN AN OBJECT-BASED DISTRIBUTED SYSTEM FOR ENGINEERING APPLICATIONS

Application Layer Multicast For Efficient Peer-to-Peer Applications

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Key: Location and Routing. Driving Applications

Diminished Chord: A Protocol for Heterogeneous Subgroup Formation in Peer-to-Peer Networks

Implementation and Performance Evaluation of a P2PSIP Distributed Proxy/Registrar

Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

Data-Centric Query in Sensor Networks

A Peer-to-Peer Approach to Resource Discovery in Multi-Agent Systems

A Survey of Peer-to-Peer Content Distribution Technologies

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

DRing: A Layered Scheme for Range Queries over DHTs

Load Balancing in Structured P2P Systems

Efficient Content Location in Mobile Ad hoc Networks

Performance Modelling of Peer-to-Peer Routing

Research and Performance Evaluation of Data Replication Technology in Distributed Storage Systems

Comparing the performance of distributed hash tables under churn

Distributed Balanced Tables: Not Making a Hash of it All

BOOTSTRAPPING LOCALITY-AWARE P2P NETWORKS

Content Overlays. Nick Feamster CS 7260 March 12, 2007

IJSRD - International Journal for Scientific Research & Development Vol. 5, Issue 05, 2017 ISSN (online):

Update Propagation Through Replica Chain in Decentralized and Unstructured P2P Systems

Evaluation and Comparison of Mvring and Tree Based Application Layer Multicast on Structured Peer-To-Peer Overlays

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks

Understanding Chord Performance

Architectures for Distributed Systems

Fault Resilience of Structured P2P Systems

Survive Under High Churn in Structured P2P Systems: Evaluation and Strategy

A Search Theoretical Approach to P2P Networks: Analysis of Learning

Transcription:

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems Kuang-Li Huang, Tai-Yi Huang and Jerry C. Y. Chou Department of Computer Science National Tsing Hua University Hsinchu, Taiwan 3, R.O.C {klhuang, tyhuang, cychou}@cs.nthu.edu.tw Abstract The technique of replicating frequently-accessed files to other nodes has been widely used in a high-performance distributed system to reduce the load of the nodes hosting these files. Traditional file replication algorithms rely on the analysis of client-access logs to determine the location of the replicated nodes. In this paper, we present LessLog, a logless file replication algorithm, developed for a peer-to-peer distributed system. We first construct a lookup tree for each node. LessLog uses bitwise operations to determine the location of the replicated node without any client-access history. In addition, each replication is guaranteed to reduce the workload of the replicating node by half. A fault-tolerant LessLog model is also presented. The experimental results show that LessLog successfully and efficiently reduces the load of overloaded nodes. 1 Introduction Recently, there are a number of large-scaled servers implemented in the manner of a peer-topeer (P2P) distributed system [1, 5, 8, 11]. The technique of replicating frequently-accessed files to other nodes has been widely used to reduce the load of overloaded nodes and improve system performance. Traditional file replication algorithms determine the location of the replicated nodes by carefully analyzing client-access logs. These log-based approaches consume extra system resources such as disk storage and memory. In addition, analyzing client-access logs is a both CPU-intensive and I/O-intensive task that further hinders system performance. In this paper, we present LessLog, a logless file replication algorithm, to construct a highperformance, load-balanced, and fault-tolerant file system for P2P distributed systems. LessLog first uses a template tree to build a unique binominal lookup tree for each node. The binominal lookup tree bounds the lookup time at O(log N) in an N-node P2P system. Based on the properties of the template lookup tree, we next use bitwise operations to determine the nodes where popular files should be replicated to when a node is overloaded. The determination of replicated nodes requires no client-access information. In addition, each replication is guaranteed to reduce the 1

1111 111 111 111 111 11 11 11 11 11 11 1 1 1 1 Figure 1: The virtual lookup tree of a 16-node system workload of the overloaded node by half if requests are evenly distributed. To achieve the faulttolerant goal, LessLog divides a lookup tree into 2 b independent and identical subtrees where N 2 m and b m. Each file stores totally 2 b copies in the system. LessLog guarantees fault tolerance for the system as long as the 2 b nodes storing the same file do not fail simultaneously. Finally, we provide an automatic recovering mechanism to maintain the LessLog integrity when nodes join or leave the system. We conducted a series of experiments to compare LessLog with two commonly-used file replication methods by the number of replicas created to achieve load-balanced. The experimental results show that LessLog creates significantly fewer replicas than a random-replication method and slightly more replicas than a log-based method that determines the replicated node by carefully analyzing client-access logs. By implementing a simple counter-based mechanism to remove replicas that are not frequently accessed, we can further reduce the replicas created by LessLog. The rest of this paper is structured as follows. Section 2 describes the system model when N = 2 m. Section 3 extends the discussion to include the cases when N 2 m. A fault-tolerant LessLog model is presented in Section 4. Section 5 discusses the recovering mechanism when nodes join or leave. The experimental results are presented in Section 6. Section 7 discusses related work. Finally, Section 8 concludes this paper and discusses future work. 2 System Model We first present the system model. We next describe a set of file operations. The file operations include inserting, getting, replicating, and updating a file. To simplify the discussion, we assume that there are N nodes in the system, where N = 2 m. We will remove this assumption later. 2.1 Lookup Tree For each node in a LessLog system, we first assign it a unique identifier between to N 1. We call this identifier the physical identifier or PID of the node. The assignment of PID can be carried out in a random manner or any user-specified method as long as no two nodes will have the same PID. We use P(i) to denote the node with PID = i. 2

PID VID 4 1111 5 111 6 111 111 12 111 7 11 1 11 13 11 2 11 14 11 8 11 3 1 15 1 9 1 1 1 11 Figure 2: The binomial lookup tree of P(4) in a 16-node system We construct a unique lookup tree for each node. A lookup tree is a binomial tree consisting of all N nodes. When a node receives a request, we first determine its target node storing the requested file. The node uses the lookup tree of the target node to route the request towards the root node of the lookup tree, which is the target node itself. The request forwarding process stops until the request reaches a node storing the requested file or its target node. Figure 2 shows the lookup tree of P(4) as an example. The top number indicated in each node is the PID of the node. When P(8) receives a request whose target node is P(4), it routes the request to P(), which in turn routes the request to P(4), if there is no replicated copy found in the forwarding path. We use a virtual lookup tree to construct each of the N physical lookup trees. The node identifier used in the virtual lookup tree is called the VID of the node. To make a clear presentation, we present VID in binary and PID in decimal. The VID of the root node of the virtual lookup tree is represented by m continuous 1 s bits. We use Property 1 to construct the virtual lookup tree. The virtual lookup tree exhibits Properties 2 and 3. Property 1: A node has i children nodes if and only if the leftmost i bits of its VID are all 1 s; the VID of each of the i children nodes is obtained by converting one of the i continuous 1 s bits to. Property 2: Given a node and its VID, we obtain the VID of its parent node by converting the leftmost s bit of its VID to 1. Property 3: The node of VID i has more or the same offspring nodes than the node of VID j, if i > j. The VID binomial tree shown in Figure 1 is the unique virtual lookup tree of a 16-node system. Since m = 4 in this case, the VID of the root node is 1111 2. The node of VID 111 2 has 3 children nodes; the VIDs of the children nodes are 11 2, 11 2, and 11 2. For the node of VID 11 2, we obtain the VID of its parent node by converting the leftmost s bit to 1. Finally, the nodes of VID 111 2 and 111 2 has 7 and 3 offspring nodes, respectively. To construct the physical lookup tree of P(k), we first obtain the complement of k, denoted by k. We next obtain the PID of each node in the physical lookup tree by doing k the virtual 3

lookup tree, where stands for the XOR operation. To construct the physical lookup tree of P(4) shown in Figure 2, we first obtain 4 = 111 2. We next do 111 2 each VID in the virtual lookup tree to obtain the PID of each node. Because of the 1-to-1 and onto characteristics of the XOR operation, we map one virtual lookup tree to N different physical lookup trees using N different complements. For simplicity, we will use the term lookup tree to refer to the physical lookup tree unless otherwise specified. Because A B = C implies A C = B, we observe Property 4. Finally, we define ψ as a hash function that takes as input a string and outputs a number between and 2 m 1. Property 4: Given the PID of the root node in a lookup tree, we can do the conversion between PID and VID for each node in the lookup tree. 2.2 File Operations LessLog provides a high-performance file-replication system without any client-access history. Because of the four properties described earlier, the implementation of each file operation below requires only the PID of the node that receives the request and the PID of the target node of the requested file. We determine the PID r of the target node of a requested file by r = ψ(f), where f is the unique information of the requested file such as its URL address. Inserting File Inserting a file is fairly simple in this system model. Let a client issue an insert request to P(k). Upon receiving the request, P(k) first determines the target node P(r) of the request. We next forward the request to P(r) which then saves the file in its local storage system. Getting File We use a lookup tree to resolve a request that accesses a file. Let a client issue an access request to P(k). P(k) first determines the target node P(r) of the request. P(k) next checks if it has a replicated copy of the requested file, and returns the file directly to the client if a replicated copy is found. Otherwise, P(k) forwards the request to its parent node in the lookup tree of P(r). The request forwarding process stops until a replicated copy is found and returned or the target node is reached. P(k) obtains the PID of its parent node by three steps: (1) obtaining its VID in the lookup tree of P(r) by Property 4, (2) obtaining the VID of its parent node by Property 2, and (3) converting the VID of its parent node to the PID by Property 4. The GETFILE algorithm is outlined below, where this refers to the current node, and FP r k determines the PID of the parent node for P(k) in the lookup tree of P(r). GETFILE(f ile) 1 k this.pid 2 if this.fileexisted(f) = true 3 then this.returnfile(f) 4 else FP r k.getfile(f) 4

Replicating File When a node is overloaded, LessLog reduces its load by replicating popular files to its children nodes. Let P(r) denote an overloaded node and f be the popular file that causes overloading. Because LessLog uses a binomial lookup tree, each child node of P(r) may have a different number of offspring nodes. We sort the children nodes of P(r) from the one that has the most offspring nodes to the one that has the least offspring nodes. We define the sorted list the children list of P(r). For example, the children list of P(4) in Figure 2 is (P(5), P(6), P(), P(12)). By replicating f to the first node in the children list, LessLog reduces the load of P(r) by half if the requests for f are evenly distributed. If requests are not evenly distributed and P(r) is still overloaded after the first replication, we continue replicating f to the next node in the children list until P(r) is not overloaded. A simple counter-based mechanism can be used to remove replicas that are not frequently accessed. We define Ck r (f) to denote, in the lookup tree of P(r), the first node in the children list of P(k) that does not have a replicated copy of f. We can use Property 1 and 4 to determine the PIDs of the children nodes of P(k). The REPLICATEFILE algorithm is outlined below, where this refers to the current node and CREATEFILE creates a replicated copy of f in the local storage. REPLICATEFILE(f) 1 k this.pid 2 C r k (f).createfile(f) Updating File A file and its replicas are updated in a top-down manner. Let P(k) receive an update request on f. P(k) forwards the request to its target node P(r). Upon receiving the request, P(r) first updates its local copy of f. P(r) next broadcasts the update request to its children list. Each child node first checks if it has a replica of f. If a replica is found, the child node updates its copy and broadcasts the request to its children list. Otherwise, the child node discards the request. 3 Advanced System Model In this section, we augment the discussions to deal with an N-node system where N 2 m. Each node is assigned a unique PID between and 2 m 1. We use the same unique 2 m -node virtual lookup tree as shown in Figure 1 to construct the physical lookup tree for each node. Because N 2 m, not every node in the lookup tree exists. We call the existing nodes the live nodes and the rest nodes the dead nodes. Figure 3 shows the lookup tree of P(4) in a 14-node system as an example, where m = 4, P() and P(5) are dead nodes, and the rest are live nodes. Inserting File Let P(k) receive a request to insert a file f whose target node is P(r). If P(r) is a live node, P(k) forwards the request to P(r), which in turn saves f ile in its local storage system. Otherwise, P(k) forwards the request to a live node that has the most offspring nodes in the lookup tree of P(r). The decision is fairly straightforward as such a node will receive more requests than the other live 5

PID VID 4 1111 5 111 6 111 111 12 111 7 11 1 11 13 11 2 11 14 11 8 11 3 1 15 1 9 1 1 1 11 Figure 3: The binomial lookup tree of P(4) in a 14-node system nodes. By Property 3, we define FINDLIVENODE(s, r) to, starting with P(s), locate the node with the most offspring nodes in the lookup tree of P(r). The determination of such a node requires only s, r, and a few bitwise operations due to the characteristics of the lookup trees. The algorithm ADVANCEDINSERTFILE then follows. FINDLIVENODE(s, r) 1 if P(s) is alive 2 then return P(s) 3 else s.vid r s 4 for i s.vid 1 to 5 do p r i 6 if P(p) is alive 7 then return P(p) 8 return f alse ADVANCEDINSERTFILE(f) 1 r = ψ(f) 2 FINDLIVENODE(r, r).createfile(f) Getting File Let P(k) receive an access request. If P(r) is a dead node, it takes two steps to finish the process of getting this file. First, we use GETFILE, described in the previous section, to search along the path from P(k) to P(r). We augment the definition of FP r k to return the PID of the first alive ancestor node of P(k) in the lookup tree of P(r). The new FP r k can be implemented by Properties 2 and 4. If the first step fails and P(r) is a dead node, the file must be stored in a node that has the most offspring nodes. We next use FINDLIVENODE(r, r).getfile(file) to access this file. 6

Replicating File To reduce the load of an overloaded node in the advanced system model, we need to consider the effects of dead nodes in determining the location of replicated nodes. We first redefine the children list of P(k) to include every live child node of P(k) and the children list of each dead node. The children list of P(4) shown in Figure 3 is (P(6), P(7), P(1), P(12), P(13), P(8)), sorted by the VID. When a root node P(r) of a lookup tree is overloaded by requests for f, we call Cr r (f) to replicate f to the first node in the children list of P(r) that does not have a replicated copy. When P(k) is overloaded by requests for f where r = ψ(f) and k r, we first determine if there is any live node whose VID is larger than the VID of P(k) in the lookup tree of P(r). If such a node is found, by the algorithm of getting a file, we know the overloaded situation in P(k) results from requests forwarded by the offspring nodes of P(k). To resolve the overloaded situation, we simply call Ck r (f) to replicate f to the children list of P(k). On the other hand, if there is no live node whose VID is larger than the VID of P(k), the requests for f may either come from the offspring nodes of P(k) or the rest nodes, due to the characteristics of FINDLIVENODE. For example, let P(4) and P(5) be the dead nodes in a 14- node system as shown in Figure 3 and let P(6) be overloaded by requests for f, 4 = ψ(f). Apparently, every request for f in the system will be forwarded to P(6). If a large amount of requests come from the offspring nodes of P(k), it makes sense to replicate f to the children list of P(k). Otherwise, we can resolve the overloaded situation by replicating f to the children list of P(r). Because there is no client-access history, we cannot tell exactly to which children list f be replicated. In this case, LessLog makes a proportional choice between these two children list according to the ratio of the number of offspring nodes of P(k) to the rest nodes. Updating File Since we update a file in a top-down recursive manner, a slight modification of the basic algorithm will work in the advanced system model. When a live node receives an update request, it checks whether it has a replica of the requested file. If a replica is found, the live node updates its copy and broadcasts this update request to its children list. Otherwise, the request is discarded. On the other hand, the update request will bypass a dead node and be forwarded to the children list of the dead node, if there is such a list. 4 Fault-Tolerant Model The ADVANCEDINSERTFILE algorithm stores a file in one target node initially. If the target node fails before any replica of the file is made, the system will fail to return response for requests that access this file. To overcome this problem, LessLog provides a fault-tolerant model that sets aside the last b out of the m bits for fault-tolerance in an N-node system, where N 2 m. A file is stored initially at 2 b target nodes. LessLog guarantees fault tolerance as long as the 2 b target nodes storing the same file do not fail simultaneously. We first use the same unique virtual lookup tree as shown in Figure 1 to construct each of the 2 m physical lookup trees as shown in Figure 3. We next divide each lookup tree into 2 b independent and identical subtrees where each node in the same subtree has the same last b-bit pattern in their 7

4 1111 5 111 6 111 111 12 111 7 11 1 11 13 11 2 11 14 11 8 11 3 1 15 1 9 1 1 1 11 Figure 4: The binomial lookup tree of P(4) in a 16-node system where b = 2 VIDs. We call the first (m b)-bit of each VID the subtree VID and the last b-bit the subtree identifier. Figure 4 shows the lookup tree of P(4) in a 16-node system where b = 2. Each of the circles in this figure represents a subtree, and there are 4 subtrees totally in this system. The subtree identifier of the leftmost subtree is 2 and the subtree identifier of the rightmost subtree is 11 2. The subtree VID of the root node in each subtree is 11 2. Because each subtree is also a binominal lookup tree, all file operations described in Section 3 still work inside each subtree. Let P(k) receive a request to insert a file f in an N-node system where N 2 m and 2 b -degree fault tolerance (i.e. each file is stored initially at 2 b target nodes) is required. Let r = ψ(f). We first divide the lookup tree of P(r) into 2 b identical binominal subtrees. We next use a modified version of FINDLIVENODE to determine the VID of a target node in each of the 2 b subtrees. The modified FINDLIVENODE algorithm uses subtree VIDs in its computation. We then use Property 4 to obtain the PID of each target node. Finally, the file-insertion request is forwarded to each of the 2 b target nodes and 2 b copies of f are created. The algorithms of replicating a file and updating a file work in a similar way. When a node receives a request to access a file, it first follows the algorithm of getting a file described in Section 3 to locate the file in its subtree. If this request generates a fault, i.e., no replica of the file is found in this subtree, we can easily migrate the request to another subtree by changing the subtree identifier and the file operation continues. 5 Self-organized Mechanism We present a self-organized mechanism to deal with the cases of joining, leaving, and failing nodes. Let N denote the number of live nodes in the new system. We assume in our discussion that N 2 m such that the same virtual lookup tree is used. If N > 2 m, we need to use a new virtual lookup tree of 2 m nodes, N 2 m, to reconstruct the whole system. For this reason, we suggest to use a larger m during the construction of a LessLog system when many node-join activities are expected. 8

5.1 Joining Node A node must first obtain a valid PID k before it joins the system. A PID is valid as long as no node in the system uses this PID. For the sake of performance, we maintain in each live node the status word where each bit indicates whether a corresponding node is a live node. P(k) next broadcasts to every live node a message of registering P(k) as a live node. At the same time, it obtains the updated status word from a neighboring live node. To make LessLog function properly, we need to copy to P(k) the files that were stored in other nodes due to the absence of P(k). Let f denote such a file. To locate every f in the system, we need to examine each of the 2 m lookup trees. In the lookup tree of P(r), f must be stored in the children list of P(k) if k = r or if k r and there is a live node whose VID is larger than the VID of P(k). We examine each file in the children list and copy a file f to P(k) if r = ψ(f). If k r and there is no live node whose VID is larger than the VID of P(k) in the lookup tree of P(r), f may be stored in a node that is not in the children list of P(k). For example, let P(4) and P(5) be the dead nodes in a 14-node system as shown in Figure 3 and let 4 = ψ(f). The ADVANCEDINSERTFILE inserts f into P(6). If P(5) is the joining node, f must be copied back to P(5). In this case, we examine each file in the live node with the largest VID (P(6) in this example) and copy a file f back to P(k) where r = ψ(f). 5.2 Leaving Node When a node P(k) leaves voluntarily, it first broadcasts to every live node a message of registering P(k) as a dead node. The files stored in P(k) can be divided into two categories, the inserted files and the replicated files. An inserted file is the original copy of the file stored by ADVANCEDIN- SERTFILE. On the other hand, a replicated file is replicated from an overloaded node. We discard the replicated files when P(k) is leaving. However, we need to copy the inserted files to other nodes in case there are no replicated copies in the system. A file f is an inserted file if k = ψ(f) or if r = ψ(f), k r, and there is no live node whose VID is larger than the VID of P(k) in the lookup tree of P(r). For each inserted file f, we call ADVANCEDINSERTFILE with P(k) as a dead node to insert f into another node. 5.3 Failing Node When a node P(k) fails, all its inserted files are lost. If b = and no replica of an inserted file f is made before its failure, the system returns faults for requests accessing f. On the other hand, if b >, we can copy f from another subtree and maintain 2 b -degree fault tolerance as long as the 2 b target nodes storing f do not fail simultaneously. When P(i) learns the failure of P(k), it first broadcasts to every live node a message of registering P(k) as a dead node. To locate the set of inserted files in P(k), we examine each of the 2 m lookup trees. In the lookup tree of P(r), we determine whether, before its failure, P(k) is the live node with the largest subtree VID in its subtree. If P(k) is such a node, we locate a corresponding node P(j) which is the live node with the largest subtree VID in another subtree. For each inserted file f in P(j), we call ADVANCEDINSERTFILE to insert f into the subtree of P(k). 9

replicas 7 6 5 4 3 2 1 log-based LessLog random 1 3 5 7 9 11 13 15 17 19 21 replicas 4 35 3 25 2 15 1 5 1% dead 2% dead 3% dead 1 3 5 7 9 11 13 15 17 19 21 incoming requests/1 incoming requests/1 Figure 5: An evenly-distributed load Figure 6: An evenly-distributed load on LessLog 6 Experimental Results We demonstrate the effectness of LessLog in reducing the load of overloaded nodes through a series of experiments. We compare LessLog with two commonly-used file replication methods by the number of replicas created to achieve load-balanced. We set the maximum load of a node to 1 requests per second. If a node receives more than 1 requests per second, it is overloaded and a replica can be created to reduce its load. A system is load-balanced if no node is overloaded. In addition, we set m = 1 and b = throughout the experiments. Figure 5 compares LessLog, a random-replication method, and a log-based replication method. All three methods use the same binomial lookup tree as shown in Figure 2 to resolve a lookup request. There is only one file initially in the system. The number of requests that access this file ranges from 1, to 2, requests per second, and all requests are evenly distributed among all nodes. The random-replication method replicates the file to a random node when a node is overloaded. The log-based method records client-access logs and replicates the file to the child node that forwards most requests by carefully analyzing client-access logs. The experimental results show that LessLog uses significantly fewer replicas than the random-replication method. Compared with the log-based method, LessLog uses slightly more replicas to achieve load-balanced. To further reduce the replicas in LessLog, we can implement a simple counter-based mechanism to remove a replica that is not frequently accessed. LessLog successfully and efficiently reduces the load of overloaded nodes without client-access logs. Figure 6 uses the same experiment to demonstrate the effectness of LessLog when there are 1%, 2%, and 3% dead nodes. A similar number of replicas are created in all three different configurations. This result shows that LessLog efficiently resolves the overloaded situation even when an incomplete binomial lookup tree is used. The system with 3% dead nodes creates more replicas when the number of requests increases due to the incomplete lookup tree. Again, we can remove replicas not frequently accessed by a counter-based mechanism. Figures 7 and 8 repeat the same experiments on a locality model where 8% of the requests are received by 2% of the nodes. Such a locality mode often happens when a certain region of the P2P system accesses this file more frequently than the rest part of the system. Figure 7 shows that LessLog uses significantly fewer replicas than the random-replication method and slightly more replicas than the log-based method. Figure 8 shows that LessLog creates a similar number 1

replicas 7 6 5 4 3 2 1 log-based LessLog random 1 3 5 7 9 11 13 15 17 19 21 replicas 5 45 4 35 3 25 2 15 1 5 1% dead 2% dead 3% dead 1 3 5 7 9 11 13 15 17 19 21 incoming requests/1 incoming requests/1 Figure 7: A locality model Figure 8: A locality model on LessLog of replicas when there are dead nodes. All experimental results demonstrate that, without wasting system resources on client-access logs and using bitwise operations in determining replicated nodes, LessLog provides a high-performance and load-balanced file replication system for P2P distributed systems. 7 Related Work A number of P2P file lookup and replication protocols [1, 2, 5, 7 9, 11] have been developed by different research groups recently. Similar to LessLog, Chord [9] uses a binomial lookup tree to bound the lookup path. Files are stored evenly among all nodes by consistent hashing [3]. CAN [7] assigns nodes and files into a d-dimension space, and each node is responsible for files stored in a particular region. However, there is no file replication mechanism in these lookup protocols. Consequently, the nodes hosting popular files may be overloaded when a large number of requests simultaneously access these files. Plaxton [6], a distributed data location protocol, replicates a popular file based on client-access logs to ensure that a request is forwarded to a geographically closest node with a replica of the requested file. Oceanstore [4], an implementation of Plaxton [6], uses agents to collect and analyze client-access information for determining the location of replicated nodes. Overlook [1] deploys a proxy-like name service on an overlay network to place a replica of a popular file on a node with most lookup requests. All these protocols use client-access history to determine the location of replicated nodes. Maintaining client-access information consumes extra system recourses and hinders system performance. In contrast, LessLog determines the replicated nodes without any client-access history. 8 Conclusions and Future Work Traditional file replication algorithms rely on client-access logs to determine the location of replicated nodes. In this paper, we present LessLog, a logless file replication algorithm. LessLog first constructs a binominal lookup tree for each node. The binominal lookup tree bounds the lookup time at O(log N) in an N-node system. LessLog next uses bitwise operations to determine repli- 11

cated nodes. A fault-tolerant LessLog model is also presented. The experimental results show that, compared with two commonly-used file replication methods, LessLog successfully and efficiently achieves load-balanced without wasting any system resources on client-access logs. The future work is to implement LessLog in a large-scaled P2P system and obtain performance data in a real-world scenario where nodes dynamically join and leave the system. References [1] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. LNCS, 29:46+, 21. [2] A. Fiat and J. Saia. Censorship Resistant Peer-to-Peer Content Addressable Networks. In Proceedings of Symposium on Discrete Algorithms, 22. [3] D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In ACM Symposium on Theory of Computing, pages 654 663, May 1997. [4] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proceedings of ACM ASPLOS, November 2. [5] J. Li, J. Jannotti, D. De Couto, D. Karger, and R. Morris. A Scalable Location Service for Geographic Ad-hoc Routing. In Proceedings of the 6th ACM MobiCom, pages 12 13, August 2. [6] C. Greg Plaxton, R. Rajaraman, and A. W. Richa. Accessing Nearby Copies of Replicated Objects in a Distributed Environment. In ACM Symposium on Parallel Algorithms and Architectures, pages 311 32, 1997. [7] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A Scalable Content Addressable Network. In Proceedings of ACM SIGCOMM, 21. [8] A. Rowstron and P. Druschel. Pastry: Scalable, Decentralized Object Location and Routing for Large-Scale Peer-to-Peer Systems. LNCS, 21. [9] I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peerto-Peer Lookup Service for Internet Applications. In Proceedings of ACM SIGCOMM, 21. [1] M Theimer and M.B Jones. Overlook: Scalable Name Service on an Overlay Network. In Proceedings of the 22nd ICDCS, pages 52 61, 22. [11] B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An Infrastructure for Faulttolerant Wide-area Location and Routing. Technical Report UCB/CSD-1-1141, UC Berkeley, 21. 12