FUtella Analysis and Implementation of a Content- Based Peer-to-Peer Network

8th Netties Conference Technische Universität Ilmenau September 30th to October 2nd 2002 T. Zahn / H. Ritter / J. Schiller / H. Schweppe FUtella Analysis and Implementation of a Content- Based Peer-to-Peer Network Abstract FUtella is a content-based peer-to-peer network. It extends the mere file-sharing capabilities of traditional peer-to-peer networks to serve as a dynamic network platform for general knowledge sharing especially suited for educational environments such as universities and research institutes. The idea of FUtella is to provide participants with a dynamic platform to exchange information. A typical scenario using FUtella could be the following: A student arrives on campus carrying her PDA or wireless laptop and needs material for deeper understanding of a teaching issue, for example pipelining in x86-cpus. The student, then, poses her question to the FUtella network. Instead of flooding the network in Gnutella-like style, FUtella's routing scheme propagates the question to other peers that are most likely to have an adequate answer, and eventually the student will receive one or more helpful answers. FUtella addresses problems common to all peer-to-peer networks, such as efficient query processing and routing, as well as the reduction of network traffic. FUtella is designed to be implementable and usable on all sorts of devices, ranging from small PDAs over wireless laptops to wired desktops. It, therefore, follows two major design goals: keeping the network traffic as low as possible, i.e. being as efficient as possible, while at the same time adding as little overhead as possible, i.e. staying light-weight enough to be implementable on devices with little storage capabilities and slow network connections. Introduction Peer-to-peer is a rapidly evolving field. Currently, a good number of peer-to-peer systems / architectures have been implemented or are being implemented: Napster [1], Gnutella [2,3], JXTA [4], or Freenet [6] to name just some of the most popular ones. However, those systems have been designed to address certain aspects of peer-to-peer computing while not focusing too closely on others: Napster: Often considered the first popular and widely used file-sharing peer-to-peer system, it is important to understand that it was not truly a pure peer-to-peer system. Instead, Napster was based on a hybrid client-server/peer-to-peer architecture. At its core, it had a central server that kept track of all the files that were being shared along with the addresses of peers that were providing the corresponding files. When a peer was looking for a certain file, it would query the central server that, then, provided the peer with the addresses of peers sharing that file. Only the actual file download was then done on a peer-to-peer basis. Gnutella: After the (legal) demise of Napster, Gnutella has become a very popular peer-to-peer system used for file sharing, as well. Unlike Napster, Gnutella is based on a pure peer-to-peer architecture without (theoretically) any static servers involved. Gnutella has a very simplistic protocol making it extremely easy to implement. Gnutella messages are byte-oriented (as opposed to XML based messages) and are therefore very compact.

However, its core protocol basically lacks any sophisticated routing logic (besides loop detection). When a peer is looking for a certain file, it sends a query to all the peers it is aware of. They will, then, forward the query to all the peers they are aware of, and so forth until, eventually, the file has been located (or TTL has expired). It is easy to see that this network flooding causes heavy net traffic and prevents Gnutella from being able to scale to a rising number of nodes [10]. JXTA: JXTA [4] is different from the other peer-to-peer systems in the sense that it is not a single purpose (file-sharing) system but rather a middleware. Developers can use it as a platform to build peer-to-peer applications on top it of without having to worry about low level peer-topeer computing. JXTA makes no assumptions about anything (such as TCP/IP, etc.) and therefore has a very generic design so that it can be used in all sorts of peer-to-peer scenarios. This comes at the expense of significant overhead in the form of a rather complex protocol stack and a three-tier service layer. Nonetheless, JXTA introduces a number of interesting concepts such as full / simple peers and peer groups. JXTA Search: JXTA Search [5] is a service in JXTA used for efficient query processing. In JXTA Search, all queries have to belong to a certain queryspace. Queries are only forwarded to those peers that have registered themselves with the JXTA Search network as being capable of responding to queries of a certain queryspace. Freenet: Freenet is a document-storage peer-to-peer system. Besides its strong encryption and security capabilities, Freenet has a very efficient document retrieval algorithm. It has been shown [8,9] that Freenet can locate a document with an average effort proportional to O(lg N) with N being the number of nodes on the net. However, Freenet does not support (meta) searching, updating or deleting. The search engine FASD [7] adds meta searching to Freenet but still does not address updating and deleting. FUtella combines approaches found in Gnutella, JXTA, JXTA Search and Freenet and extends them to provide a lightweight, adaptive and efficient peer-to-peer platform for general knowledge sharing. FUtella's architecture is based on the following concepts: - A node on the FUtella net can be either a full peer or a simple peer. Simple peers do not route other peers messages. - Peers sharing a common field of expertise (queryspace) will dynamically combine and form knowledge groups. - Knowledge groups register with the network to be considered for questions belonging to a certain queryspace. - Query routing will forward a query only to those knowledge groups registered for the queryspace the query belongs to, thus reducing the need of message broadcasts. - Discovery requests for finding knowledge groups (registrations) on the FUtella network are routed based on Freenet s efficient retrieval routing algorithm - Freenet's routing approach is extended to allow for the deletion of registrations. - For the actual transmission, FUtella chooses the compact Gnutella message format Querying the FUtella Network Searching for information is one of the most challenging tasks in peer-to-peer computing. Since there are no static servers in peer-to-peer networks, there are no well-known, predefined places to turn to when a peer is looking for some sort of information. The easiest solution is to resort to a brute-force broadcast of the query message. A peer looking for some piece of information would simply pass its query on to all the peers it knows of. They would then forward the message to all the peers they know of and so forth. This is exactly the way Gnutella works.

In order to prevent the network from being flooded, FUtella takes up an approach similar to JXTA Search [5]. When a peer is looking for some piece of information, it will not merely broadcast its query to every peer it knows of. Instead, a query has to belong to a specific queryspace. That queryspace can be viewed as an arbitrary description of the topic the query belongs to. A query is only forwarded to those knowledge groups whose fields of expertise match the query s queryspace as closely as possible, i.e. only those knowledge groups that have published a corresponding queryspace with the FUtella network. In FUtella, every peer maintains a cache of previously discovered knowledge groups that contains the queryspaces and heads of the knowledge groups. When a peer initiates a query, it assigns a queryspace to the query. Before it sends off the query, it searches its cache for those knowledge groups whose queryspaces match the queryspace of the query as closely as possible. After searching the cache, the peer will direct its query to the knowledge group that most closely matched the queryspace of the query. If no matching knowledge groups could be found in the cache or some of the cached knowledge groups no longer exist (i.e. there is no response when trying to connect to them), the peer will have to discover appropriate knowledge groups before sending off its query. Knowledge group discovery is explained later in this paper. Once the peer has determined an appropriate knowledge group, it establishes a connection with its head and sends its query/queries. The head will then propagate the query/queries within its group to find adequate query responses. The head sends the query responses from its group members back to the peer that has established the query connection. The query messages and response messages are modeled after the JXTA Search XML scheme. For a practical example, suppose a student were looking for the solution to question 1 of exercise 11 of her computer architecture class (this would, of course, be after the deadline for exercise 11 when the student wants to compare the proposed solution to the one she has handed in). The corresponding query message could look like this: <?xml version='1.0'?> <request id=1b520c85ac94ea52 xmlns=http://www.inf.fu-berlin.de/futella query-space='computer architecture:exercise 11'> <query> <solution> question 1 </solution> </query> </request> Knowledge Groups In FUtella, knowledge groups are virtual clusters of peers that share a common field of knowledge, i.e. peers that all can answer questions of the same queryspace. Every knowledge group has a head node that serves as central "contact point" for all peers that want to query the knowledge group. A knowledge group can be associated with more than one queryspace. There can be more than one knowledge groups in FUtella associated with the same queryspace. Creating a Knowledge Group Any full peer can decide to create a knowledge group. The peer creating the knowledge group will automatically be the head of the new group. In order to create a new knowledge group, the head has to register it with the FUtella network. The head publishes a registration message with the FUtella network. A practical example of a registration could look like this: <?xml version='1.0'?> <register xmlns=http://www.inf.fu-berlin.de/futella> <query-server> <ip>160.45.116.151</ip>

<port>4567</port> </query-server> <name>computer architecture knowledge group</name> <query-space>computer architecture</query-space> </register> Joining a Knowledge Group A peer that has discovered a knowledge group associated with a queryspace that the peer can also respond to, may choose to join that group. In order to join the knowledge group, the peer connects to the head of the group and sends a join request containing information about itself (e.g. IP address). After having sent such a join request, the peer waits for a join response from the knowledge group head indicating whether or not the knowledge group head accepted the peer. Knowledge Group Communication When a peer wants to send a query to a knowledge group, it connects to the knowledge group head and sends it its query. At that point, the head has to decide what to do with the query. If the head itself can provide an answer to the query, the head will simply generate the appropriate query response and send it back to the querying peer. If, however, the knowledge group head cannot answer the query itself, the head will have to propagate the query to its group members. A knowledge group head, therefore, maintains a member table containing all group members. When a head has selected a member to forward a query to, it sends the query to the group member. If the group member can answer the query, it will prepare a query response message and send it back to the head. If the group member has determined that it cannot answer the query, it is expected to send an empty response message to the head. The head will then choose the next member to forward the query to and so forth. The head forwards the responses to the querying peer. Knowledge Group Discovery Since all queries belong to some queryspace in FUtella, an efficient way of discovering knowledge groups associated with a specific queryspace is essential. FUtella's knowledge group discovery routing algorithm is modeled after Freenet's [6] document retrieval algorithm. Contrary to Gnutella, Freenet does not simply broadcast a retrieval query (to find a certain document). Instead, retrieval queries are routed to those nodes that are deemed most likely to contain the requested document. It has been shown that Freenet locates a requested document with an average effort proportional to O(lg N), N being the number of nodes on the net [8,9]. FUtella's knowledge group discovery routing algorithm works as follows. A peer willing to discover a new knowledge group first initiates a discovery message. The aim now is to find a peer on the FUtella network that has cached a registration of a knowledge group with matching queryspace. Therefore, the peer has to determine which peer would be the best to forward the discovery query to. For this purpose, every peer maintains a routing table. An entry in the table contains both the queryspace and the IP address of a peer (registration provider) that is known to contain a registration with matching queryspace in its registration cache. When peer A receives (or initiates) a discovery query, it searches its cache to see if it contains the registration of a knowledge group that matches the queryspace of the discovery query. If such a registration was found, a discovery response containing that registration is sent upstream to the peer B the discovery query was received from. Peer A will include its own address as the registration provider. The upstream peer B that received the discovery response will also forward that discovery response upstream to the peer C it had received the discovery query

from. This process continues until the discovery response arrives at the original peer that had initiated the discovery query. Every intermediate peer that receives the discovery response during that process will add the queryspace and the registration provider to its routing table. It will also add the registration to its cache thereby replicating the registration. Before forwarding a discovery response upstream, any intermediate peer can arbitrarily declare itself as the registration provider. On the other hand, if peer A receives (or initiates) a discovery query and its cache does not contain a matching registration, peer A has to decide where to forward the discovery query to. In order to do this, peer A searches its routing table for an entry with a queryspace that most closely matches the queryspace of the discovery query. Peer A will then forward the discovery query to that candidate. Upon reception of the discovery query, the candidate searches its cache for a matching registration and if none was found it, too, will choose the closest match from its routing table, and so forth. This process continues until a peer has a cached registration matching the discovery query or the TTL of the discovery request has expired. When peer A receives a discovery query, it checks whether it has received that discovery query before to prevent loop backs. If a loop-back is detected, the peer backtracks the query to the upstream peer B it had received the query from by sending back an empty discovery response. The upstream peer B, after receiving the empty discovery response, will then choose the second best match from its routing table and forward the discovery query to it. In general, when a peer receives an empty discovery response from the n th best match in its routing table, it will go on and choose the (n+1) -th best match to forward the discovery query to. Besides loop-back detection, a peer will also send back an empty discovery response if - the TTL of the discovery query has expired and its cache does NOT contain a matching registration. - it runs out of entries in its routing table to forward the discovery query to. C 10. Forward response(s) A 1. Discovery request "computer architecture" 6. Forward discovery response 7. Send query B 2. Forward discovery request 4. Forward discovery request to 2 nd best match 5. Discovery response containing cached registration 3. C has no cached registration for "computer architecture -> backtrack D 8. Forward query to member E 9. Query response M1 8.Forward query to member 9. Query response... Mi Knowledge group "computer architecture" Figure 1. Steps involved in discovering a knowledge group ("computer architecture") and querying it.

Similar to Freenet's retrieval algorithm, FUtella's discovery query routing algorithm has a number of positive side effects: 1. Popular registrations are propagated through the network as intermediate peers replicate and cache the registrations contained in the discovery responses they are routing. Thus, the effort of discovering those knowledge groups is subsequently reduced. 2. Since intermediate peers cache the registration they are routing, popular registrations move closer to the requester in terms of the network topology. 3. Since discovery responses contain a registration provider, routing a discovery response can broaden the horizon of an intermediate peer in case it did not know the address of the registration provider before. Inserting a New Registration into the FUtella Network Similar to Freenet, insertions in FUtella are handled analogously to discovery queries. When a knowledge group head wants to insert a new knowledge group registration into the network, the group head searches its routing table to find the best match for the queryspace of the new registration and sends the registration message to that peer. That peer will cache the registration and insert the queryspace with the group head as registration provider into its routing table. It will, then, find the best match (but obviously not the one that has just been inserted) in its routing table and forward the registration to it, and so forth. This process continues until the TTL of the registration message has expired. The insertion routing algorithm, too, has a number of positive side effects: 1. The caching of the registration that is performed by the intermediate peers replicates the registration on the FUtella network, thus eliminating a "single point of failure" (to provide the registration, that is) for that knowledge group. 2. It makes sure that peers expected to be able to provide a registration of a certain queryspace are, indeed, likely to receive it. 3. It helps the group head make its presence known to other (the intermediate) peers. Deleting a Registration from the FUtella Network Due to the dynamic nature of peer-to-peer networks, knowledge groups will form and disappear again at random. It is therefore necessary to extend FUtella's architecture to include the possibility of deleting knowledge group registrations. Note that deletion under Freenet is still considered an open problem and is currently simply not possible. Since knowledge group registrations can be replicated and cached by many peers over time, there is no way of exactly knowing all the peers that have cached a particular knowledge group registration. Thus, the only way for a peer to immediately make sure that a certain registration is completely expunged from the FUtella network would be to resort to flooding the network with a delete request. Obviously, this is not a feasible approach. FUtella, therefore, introduces a delayed deletion approach. The central idea is that when one peer discovers that a certain knowledge group no longer exists most likely because the head no longer responds it is not vitally important for all other peers to immediately delete the registration of that knowledge group from their caches, too. For instance, if peer A realizes that knowledge group Z no longer exists, peer B may very well "live" with an invalid registration of Z if it is not going to contact Z any time soon. Therefore, deletion under FUtella works as follows. When a peer discovers that a certain knowledge group no longer exists, it will initiate a delete request that is routed exactly the same way insertions are routed. All intermediate peers will also delete the matching registration from their caches. When an intermediate peer A has deleted a certain registration from its cache, it will use the registration's queryspace to find the best match in its routing table to forward the delete request to. Suppose peer B is that best match. Peer A will forward the delete request to peer B. Nonetheless, peer A will NOT erase the

entry containing peer B from its routing table. For one thing, peer B may have more than one cached registrations of a certain queryspace. Furthermore, peer B may still remain in peer A's routing table even if the deleted registration was the only one peer B had cached under the given queryspace. This is because the next time a new knowledge group of that queryspace is inserted, peer B should still be the prime candidate for peer A to forward the new registration to. It is left to the peers not on the routing path of the deletion request to discover on their own that the knowledge group no longer exists until the registration is eventually removed from the network. Conclusion It is important to understand that peer-to-peer networks on the application layer suffer from the same problems as classic networks on the network/routing layer: scalability, efficiency, overhead, stability etc. FUtella tackles those problems by combining specific approaches found in other peer-to-peer systems and extending them. FUtella treats knowledge group registrations much like documents so that Freenet's efficient retrieval routing scheme can be employed to discover them. Furthermore, FUtella extends Freenet's routing scheme to allow for the deletion of such registrations since knowledge groups can form and disappear at random. The effort of propagating actual queries is reduced to simple point-to-point communication as querying peers directly contact the corresponding knowledge groups. Therefore, FUtella provides a lightweight, adaptive and efficient peer-to-peer platform for general knowledge sharing. Future Work The system is currently being implemented and will be experimentally evaluated and compared to the approaches taken in Gnutella and Freenet, respectively. References: [1] Napster. http://www.napster.com [2] Gnutella. http://www.gnutella.com [3] Gnutella Protocol Specification v0.4. http://www.gnutelladev.com/protocol/gnutella-protocol.html, http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf [4] JXTA. http://www.jxta.org [5] JXTA Search. http://search.jxta.org [6] Freenet. http://freenetproject.org [7] KRONFOL, AMR Z. FASD: A Fault-tolerant, Adaptive, Scalable, Distributed Search Engine. http://www.cs.princeton.edu/~akronfol/fasd/ [8] IAN CLARKE, OSKAR SANDBERG, BRANDON WILEY, and THEODORE W. HONG, "Freenet: A Distributed Anonymous Information Storage and Retrieval System" in Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability, LNCS 2009, ed. by Hannes Federrath. Springer: New York (2001). [9] IAN CLARKE, THEODORE W. HONG, SCOTT G. MILLER, OSKAR SANDBERG, and BRANDON WILEY, "Protecting Free Expression Online with Freenet," IEEE Internet Computing 6(1), 40-49 (2002). [10] RITTER, JORDAN. Why Gnutella can't scale. No, really. http://www.tch.org/gnutella.html Authors: Thomas Zahn Dr. Hartmut Ritter Prof. Dr. Jochen Schiller Prof. Dr. Heinz Schweppe Freie Universität Berlin