Athens University of Economics and Business. Dept. of Informatics

Athens University of Economics and Business Athens University of Economics and Business Dept. of Informatics B.Sc. Thesis Project report: Implementation of the PASTRY Distributed Hash Table lookup service over the NS-3 Network Simulator by Marios Pomonis Supervised by Asst. Prof. George Xylomenos Athens, Greece June 2011

Abstract This B.Sc. thesis report documents an implementation of the Pastry Distributed Hash Table (DHT) for the Network Simulator 3 (NS-3) simulation environment. To our knowledge, this thesis constitutes the first and only effort to cover the lack of any Pastry implementation for the NS-3 discrete event simulator. Given that NS-3 comes as an open source distribution, it is expected to meet the high demands of the research community and to follow the success and popularity of its predecessor NS-2. Thus, the contribution of this thesis is to provide an efficient C++ implementation to the NS community for research purposes. Moreover, for the sake of code clarity, ease of use and extendability of the work presented here, this thesis is further supported with a separate code manual and system documentation. Acknowledgments I would like to thank Dr. George Xylomenos for his support and patience while supervising this thesis. I would also like to show my gratitude to Xenofon Vasilakos for his invaluable help and advice.

Table of Contents Table Of Contents 1. Introduction...4 1.1 A Brief Background on Pastry...4 1.2 Personal Contribution...5 1.3 Report Structure...5 2. The Pastry DHT...6 2.1 Pastry Design...6 2.2 A Brief Comparison To Other DHTs...6 3. Implementation Outline...7 3.1 Overview...7 3.2 Messages and Proximity Metric...7 3.3 Maintenance...8 4. Validation Simulations Results...9 4.1 Simulation Layout...9 4.2 Results...9 5. Conclusion...10 6. References...10

1. Introduction The contribution of this thesis is to provide an efficient implementation of the Pastry Distributed Hash Table in the NS-3 simulator [1], with respect to performance, manageability and re-usability. NS-3 is the third generation of the C++ NS discrete event simulators and the successor of the popular NS-2. NS-3 is widely considered as one of the most powerful and robust network simulators and it is extensively used by the research community. Moreover, it comes as an open source implementation supported by an excellent online documentation which consists of tutorials, a manual and Doxygen online code documentation. For more information and guidance on the usage and the implementation provided by this thesis, please refer to the code manual and system documentation also provided. 1.1 A Brief Background on Pastry Distributed systems can be described as independent computer systems, co-operating in order to accomplish a shared goal. Their significance is underlined by the recent popularity of Peer-to-Peer (P2P) systems which currently constitute the most prominent branch of the distributed systems. P2P systems are decentralized distributed overlay networks composed by nodes with equal roles and responsibilities, acting both as servers and clients at the same time. There are two P2P categories, namely, unstructured and structured P2P systems. The main difference between the two lies on how network connections develop between overlay nodes and how routing takes place. In the case of the former, nodes are organised in a non-deterministical fashion, e.g., randomly or based on some feature such as node interests depending on the nature of the application. On the contrary, structured P2P systems are strictly organised along the lines of a specific algorithm. Distributed hash tables (DHTs) fall under the category of structured P2P systems. As the term denotes, a DHT is a hash table whose entries lie in different nodes in the P2P system. The main advantage of DHTs lies in their ability to scale in big numbers of participant nodes, regardless of new node arrivals, graceful or ungraceful departures, and node failures. Pastry [2] is a form of a DHT, that satisfies the aforesaid scalability condition, while being fault-resilient and able to scale with the number of nodes in the system in terms of routing time. More specifically, it guarantees that the number of hops during message routing do not exceed log(n), where N stands for the number of nodes in the overlay. Finally, the main advantage of Pastry in comparison to other DHTs proposed in literature, such as Chord [3], lies on the fact that Pastry takes into consideration userspecified node-locality metrics in order to organize its routing process. The aforesaid affirms Pastry as the most efficient DHT proposed in literature to date.

1.2 Personal Contribution This thesis project was developed in the period between September 2010 and May 2011. The first couple of months were mainly spent in understanding the properties and the development philosophy of NS-3, followed by the implementation effort. Throughout this time, I developed an implementation of Pastry using the C++ programming language, within the context of the NS3 simulation environment. This thesis comes with the respective C++ project using the attributes of the NS-3 simulator such as smart pointers and ObjectFactories. I used a simple CSMA topology, leaving testing for different topology connections out of the scope of this thesis. During the development phase, I used the Eclipse Integrated Development Environment and the storage assistance of a Mercurial source control management [4]. I also used GDB and Valgrind for debugging purposes and the helpful advice I received from the google group of the NS-3 community. 1.3 Report Structure The rest of this thesis is organized as follows. Section 2 summarizes the properties and attributes of the Pastry DHT. Section 3 describes the implementation choices made. Section 4 provides a proof-of-concept small scale evaluation of the implementation and, finally, we conclude with section 5.

2. The Pastry DHT 2.1 Pastry Design A pastry node can be uniquely identified by a 128-bit nodeid which expresses the position of the node in a circular identifier space. This space follows a uniform distribution on [0, 2 128-1 ] and every nodeid is chosen randomly from it. As a result, the choice of the identifier is independent from the location or any other attribute of the node. For routing and maintenance purposes every node has three tables, the leaf set, the neighborhood set and the routing table. The leaf set consists of the nodes whose nodeids are numerically closest to the current node. The routing table places every node into a row whose cardinal number is equal to the common prefix between its nodeid and the nodeid of the current node. Finally, the neighborhood set contains the nodes that are closest to the node according to the proximity metric and its main use is in the maintenance process. When routing a message, if its key falls within the range of the leaf set, then the message is sent to the node with the numerically closest nodeid, which is the destination node. Otherwise, the node computes its common prefix with the message's key and sends the message to a node that has at least one more common digit. If the routing table contains multiple nodes that meet the aforesaid requirement, then the selection is based on locality, namely, we choose the node that is the closest according to the proximity metric. In the case that no such node exists, then the message is sent to a node that is numerically closer to the message's key than the current node. 2.2 A Brief Comparison To Other DHTs DHTs manage to distribute IDs of objects from an ID space along the lines of [5] proposed by Plaxton et al. However, Pastry has a major advantage over other DHTs as it is designed to route requests by taking a proximity metric into account. For instance, assuming that the latter metric is some sort of network proximity assessment (e.g., latency) between nodes, Pastry can achieve smaller routing stretch values on average, i.e, a smaller ratio of network hops over the number of overlay hops. Tapestry. Tapestry[6] is very similar to Pastry; its approach on maintaining locality and offering replication is nevertheless more complex. Chord. Routing in Chord[3] is different and, unlike Pastry, it does not depend on common prefix matching of the message key and the current node ID, or some metric. A Chord node routes messages to the numerically closest known node ID to the message key. Clearly, routing in Pastry outclasses routing in Chord for Chord does not consider locality at all. Moreover, the space complexity for management actions is O(log 2 N) in Chord compared to Pastry's O(logN). CAN. CAN[7] organizes the nodes in a D-dimensional space. The size of the routing table in each node is O(D), outclassing O(logN) is in Pastry. However, this comes with the cost of O(D*N 1/D ) routing hops compared to merely O(logN) in Pastry.

3. Implementation Outline 3.1 Overview The implementation consists of three different flavors of nodes, BaseOverlayNode, DummyOverlayNode and PastryNode. BaseOverlayNode is capable of identifying itself as bootstrap or joining node and taking the appropriate actions in either case. DummyOverlay extends the BaseOverlay and creates a simple ring of nodes in which each node holds a pointer to the node with the next (numerically greater) nodeid. PastryNode extends BaseOverlayNode and has full Pastry functionality, such as routing and maintenance procedures, computing proximity between nodes etc. We use SHA-1 for producing uniformly distributed nodeids as suggested in [2]. NodeIDs are computed as the hash of the IP address of the node and its port number. This allows a physical node to support multiple nodes in the same BaseOverlay ring per different port number. Finally, a node that wishes to join the overlay must know the IP address and the port of some well known bootstrapping node. Passing a 0.0.0.0 IP address to a node at creation time results in the node identifying itself as a bootstrap node and creating a new Pastry overlay; any other IP address denotes that the node is not a bootstrap node and consequently must send a join message to trigger its advent to the overlay. 3.2 Messages and Proximity Metric The communication between nodes in the system is done over the User Datagram Protocol (UDP). Pastry makes no reliability assumptions regarding connections; we let the system administrator (i.e., simulation user) to define reliability conditions on links by using error-free channels in the topology. As a result of using UDP, it minimizes the traffic and the overhead in the system message exchanges, which is useful in a simulation environment. The distance between the nodes is defined by the proximity metric used by the node. For the purpose of assessing our thesis implementation, we chose to use the time elapsed from sending a packet to the destination node up to receiving its respective reply. However, it can be easily substituted by overloading the methods defined in the superclass ProximityMetric. Finally, we assume that the system administrator (i.e., simulation user) provides an arriving node information about a node that is close according to the proximity metric and thus the arriving node does not search for a node that is closer before the join process.

3.3 Maintenance We implemented three procedures that preserve the validity of a node's leaf set, routing table and neighborhood set. The leaf set of a node needs to be updated due to the arrival of some new node, or due to the node failures. A newly joined node is assumed to inform the other entries in its leaf set, which in turn make any needed changes. If a node that receives this message needs to update its own leaf set, it sends this information to the nodes in its own leaf set. This procedure continues until all the nodes that receive a message do not need to update their leaf set. The preservation of the validity of the routing table is achieved by the periodic exchange of the routing state between a given node and the nodes in its leaf set. The state of the node consists of the nodes in its routing table and the changes made after the last exchange of state with the nodes. Given that [2] is not very specific on technical details, we have decided to enhance maintenance by pinging nodes, i.e., by sending a message to every node in the neighborhood set and waiting for their reply. If the present node receives a reply from every node, then the neighborhood set is up to date; otherwise non-responding nodes are considered dead/failed and therefore they are removed from the routing state. Pinging periods are predefined to one minute; we performed all simulations with the preset value, leaving out of scope any implications from adjusting the aforesaid period value. Yet it is possible for the simulation user to choose a better value that will balance messaging cost and routing state reliability. The period value can be easily adjusted by changing the simulator scheduled events.

4. Validation Simulations Results In this section, we present the limitations we faced when we ran some small scale simulations for the purpose of validating the current implementation with respect to the analysis in [2] and the results we obtained. 4.1 Simulation Layout The simulations were performed on a quad-processor Intel Core 2 Quad Q6600 (4 x 2.40 GHz) with 3 GB of main memory, running Red Hat's Fedora Linux Core 15. We used the ns-3.12.1 version of the NS-3 simulator. Due to memory limitations, we did not use periodic maintenance procedures when running the simulations, in an effort to add as many nodes as possible to the system. Because of this, the routing table of each node contains only the entries that were received during the initialization procedure and possibly entries received during the leafset maintenance procedure. Therefore, the results show that the routing table contains on average less than log 2 bn *(2 b -1) and as a result, more hops are required when routing a message. Nonetheless, this is not due to an implementation miss, as explained above. Finally, in the validation simulations we used a network where all nodes are connected with each other through CSMA channels with b = 4, L = 16 and M = 32 for 50 trials. 4.2 Results The results are shown in the following figure. Average Number Of Routing Hops 6 5 Number Of Hops 4 3 2 1 0 200 400 600 800 1000 Number Of Nodes Fig. 1 Average number of routing hops versus number of Pastry nodes, b = 4, L = 16 and M = 32 and 50 lookups. As we expected, the average number of hops are slightly greater than the log 2 bn suggested by the authors of Pastry. Again, this is expected due to the limitations described above.

5. Conclusion This thesis presented an efficient implementation of the Pastry Distributed Hash Table in the NS-3 simulator using the attributes of the simulator such as smart pointer and ObjectFactories. We described the implementation outline of the project such as messages and maintenance. In the last section we presented results from small scale validation simulations that show the efficiency of the implementation. This thesis also provides a code manual and a system documentation for further assistance. 6. References 1. Network Simulator 3. http://www.nsnam.org/. 2. Antony I. T. Rowstron, Peter Druschel, Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems, Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, p.329-350, November 12-16, 2001 3. I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pages 149 160, San Diego, California, USA, 2001. 4. Mercurial http://mercurial.selenic.com/ 5. C. G. Plaxton, R. Rajaraman, and A. W. Richa, Accessing nearby copies of replicated objects in a distributed environment, ACM Symposium on Parallel Algorithms and Architectures, 1997. 6. B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for faultresilient wide-area location and routing. Technical Report UCB//CSD-01-1141, U. C. Berkeley, April 2001. 7. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable contentaddressable network. In Proc. ACM SIGCOMM 01, San Diego, CA, Aug. 2001.