Design and Implementation of a Distributed Object Storage System on Peer Nodes. Roger Kilchenmann. Diplomarbeit Von. aus Zürich

Size: px
Start display at page:

Download "Design and Implementation of a Distributed Object Storage System on Peer Nodes. Roger Kilchenmann. Diplomarbeit Von. aus Zürich"

Transcription

1 Design and Implementation of a Distributed Object Storage System on Peer Nodes Diplomarbeit Von Roger Kilchenmann aus Zürich vorgelegt am Lehrstuhl für Praktische Informatik IV Prof. Dr. W. Effelsberg Fakultät für Mathematik und Informatik Universität Mannheim Mai 2002 Betreuer: Prof. Dr. E. Biersack, Institute Eurecom, Sophia Antipolis

2 II

3 Contents Abstract List of Figures VII IX 1 Introduction Evolution of Internet Applications A Reliable Storage Network Outline Related Work File Sharing Applications Napster Gnutella FastTrack Swarmcast Distributed Storage Applications Freenet PAST Oceanstore/Silverback Cooperative File System (CFS) Problem Analysis Overlay Routing Networks Chord Increasing Reliability with Redundancy IDA Approach Replication Approach Replica Placement Caching and Load Balancing Separation of Data and Metadata Metadata Access Data Access III

4 IV CONTENTS 4 Framework Design Overview Object Oriented Programming Classes and Objects Inheritance Polymorphism Java Framework Layers Message Layer Lookup and Routing Layer Block Storage Layer Application Layer Message Layer Implementation Basic Node Thread Re-use Node-to-Node Communication Recursive Message Communication Lookup and Routing Layer Implementation Identifiers Successors and Predecessors Fingers Lookup Iterative Method Recursive Method Virtual Node Tunnelling Join and Leave Stabilization Process Notification Node Join Block Storage Layer Implementation Basic Elements Metadata Hash Paraloader Block Location Cache Block Replica Cache Storing a Block Fetching a Block Reorganization on Overlay Network Changes Metadata Reorganization Data Reorganization

5 CONTENTS V 8 Application Layer Implementation User Interface File Storage and Retrieval Block Event Methods Performance Iterative vs. Recursive Lookup Virtual Node Tunnelling Overall File Performance Conclusion and Future Work 65 Bibliography XI A Package p2p.layer.message XV B Package p2p.layer.lookup XXIII C Package p2p.layer.storage XLV D Package p2p.layer.application LIX E Package cache LXV Ehrenwörtliche Erklärung LXIX

6 VI CONTENTS

7 Abstract This work describes the design and implementation of a distributed system that uses empty disk space on Internet hosts for reliable storage of data objects. To increase the reliability of the system, the objects are replicated and distributed to peer-nodes of an overlay network that is spanned over the participating hosts. The Chord overlay network provides a robust and well scaling binding of objects to nodes, which is used to organize the object replicas in an environment of unreliable hosts that may join or leave the system frequently. It is robust against host failures and the binding is resolved by an efficient lookup operation that operates in time logarithmic to the number of hosts. Congestion of nodes due to non-uniform object access patterns is avoided by caching and parallel data access, which both distribute the load on many nodes and overcome some disadvantages of Chord s deterministic overlay network topology. The major part of the prototype implemented in Java consists of a versatile object oriented framework architecture. A hierarchy of framework layers provides generalized solutions to problems in peer-to-peer networking. The file storage application itself is a thin layer on top of this framework. VII

8 VIII CONTENTS

9 List of Figures 3.1 Chord Key Distribution with Virtual Nodes Chord Finger Example Chord Lookup Example Chord Lookup Hot-Spot I-Hop and P-List Caching Layer and Class Hierarchy Thread Pool Message Processing Iterative Lookup Pseudo-Code and Message Traffic Recursive Lookup Pseudo-Code and Message Traffic Node Join Pseudo-Code and Message Traffic Thread Interaction in Parallel Download Data Block Storage Data Block Retrieval Reorganization Caused by a Leaving Node Reorganization Caused by a Joining Node Splitting a File into Blocks Comparison of the Iterative and Recursive Lookup Latencies The Effect of Virtual Node Tunnelling on the Lookup Path Length Overall File Performance IX

10 X LIST OF FIGURES

11 Chapter 1 Introduction In the last years peer-to-peer applications became a lot of public attention. On the one hand it was the legal issue about content sharing and on the other hand the peer-to-peer paradigm seamed to be a new idea to many people. But the early internet was already designed like that and the Usenet, which appeared in 1979 and is still very popular, can be seen as one of the first P2P applications because there is no hierarchy or central control and the Network News Transport Protocol (NNTP) uses peer-to-peer communication between newsservers [16]. This early application already shares most of the properties that characterize P2P applications[27]: They take advantage of distributed, shared resources such as storage, CPU cycles, and content on peer-nodes Peer-nodes have identical capabilities and responsibilities Symmetrical communication between peer-nodes Significant autonomy from central servers for fault tolerance Operate in dynamic environment where frequent join and leave is the norm The reason why most people consider peer-to-peer applications as something radical new is that in the last decade the paradigm of Internet applications changed from decentralized applications like the Usenet to the server centric World Wide Web. 1.1 Evolution of Internet Applications Between 1995 and 1999 the Internet became a mass medium driven by the "killer" applications World Wide Web(WWW). This changed the application paradigm and had an influence on the further development of the Internet architecture. The World Wide Web is a typical client/server application. A Web client, now called browser, connects to a well known server, which returns the page according to the request 1

12 2 CHAPTER 1. INTRODUCTION of the client and closes the connection. Because the client initiates the communication, only the web server needs a permanent, well known Internet Protocol (IP) address. This behavior allowed the internet service providers (ISP) to satisfy the fast growing number of new internet users by assigning temporary IP addresses to dial up connections because the limited IP address space, 2 32 addresses of 4 byte length, was too small to assign a permanent IP address to every user. Temporary dial-up connections together with unpredictable IP addresses demand new concepts to organize and maintain a distributed network. Another property of the client/server based WWW application had impact on the technical development. Because of the asymmetry in the WWW service, page requests are much smaller than the page reply, the dial-up technologies were developed considering that asymmetry. ADSL and V.90 modems have three to eight time higher downstream bandwidth than upstream bandwidth. By removing the distinction between clients and servers, the P2P applications have a symmetrical bandwidth characteristics. The upstream path of asymmetric connections will limit the total throughput between peer-nodes. Therefore, new mechanisms need to be introduced for using the available bandwidth resources more efficiently. To summarize, the originally symmetric and deterministic internet architecture became asymmetric and dynamic due to changes in the application preferences. The new generation of P2P applications have to deal with a dynamic and asymmetric environment, which contradicts its inherent symmetry and imposes problems on reliability and efficiency. With technological progress and decreasing prices in the computer hardware, new applications for personal computers became possible. Due to increasing hard disk capacities and faster processors, playing and storing audio and video content became very popular. But exchanging multimedia content over the Internet was difficult and expensive for an unexperienced Internet user. Setting up a WWW or FTP server needs a decent amount of knowledge and a permanent Internet connection is hardly affordable for a private person. The new generation of P2P applications from the late 90 s offer a comfortable and easy way for everybody to publish and share content. This made P2P systems, such as Napster, very popular and resulted in about 38 million registered Napster users by Oct 2000[20].

13 1.2. A RELIABLE STORAGE NETWORK 3 So far three different types of P2P applications have developed: P2P File Sharing - Content driven applications sharing bandwidth and storage resources to provide efficient content distribution and storage. Some related applications are presented in the following Chapter "Related Work". P2P Messaging - Human presence is shared across a distributed and decentralized system like the Groove collaboration network[17]. P2P Computing - The distributed system shares CPU cycles for solving computing problems. The sum of idle CPU cycles on many workstation can replace very expensive supercomputers. A well known example is the SETI@home project[26], which uses idle CPUs on internet hosts to analyze radio signals from outer space in order to find signals from intelligent origin. 1.2 A Reliable Storage Network The Java based prototype presented in this work uses empty disk space available on Internet hosts to build a reliable storage system. By April 2002, a typical workstation PC is shipped with a hard disk of about 60 GB storage capacity. After the operating system and some other applications are installed, most of the capacity on the hard disk is still unused. For example, the software takes 10 GB and the remaining free disk space is 50GB. For an organization with 100 such workstations, the total amount of unused disk space is 5 TB. Nowadays, most workstations in an organization are connected together with a local area network (LAN) that uses the Internet protocol (IP). This work describes the design of such a system and explains the implementation of a simple file storage application, which uses an underlying framework architecture developed as a mayor part of this work. The goal is to achieve a maximum of reliability and fault tolerance for the storage service built out of Internet hosts, called nodes in this context. The storage network must be reliable while the nodes themselves are not. Unlike dedicated file servers, the nodes are workstations generally not shipped with redundant power supplies or RAID (Redundant Array of Inexpensive Disks) systems. Since the workstations are under control of their users, their system availability is not predictable. Users may shut down their workstations or a network link may temporarily fail.

14 4 CHAPTER 1. INTRODUCTION Assuming a heterogenous and dynamic environment of hosts connected together by a high bandwidth and low latency IP network this work has focuses on: reliability of data storage scalability in terms of the number of hosts and content requests efficient usage of the available resources Further, it is assumed that there are no restrictions concerning firewalls and Network Address Translation (NAT) issues. All hosts are willing to cooperate by relaying messages and they store data if their storage quotas have not been exceeded. 1.3 Outline This work is organized as follows: In the Chapter "Related Work" existing applications with their solutions to P2P specific problems are analyzed. The next three chapters reflect the general object oriented approach of software development: problem analysis, design and implementation. First the problem fields need to be identified and a general solution has to be developed. This is done in the Chapter "Problem Analysis". In the second step the design breaks the general solution into software layers with well defined functionality and interfaces described in the Chapter "Framework Design Overview". For each layer the algorithms and some important implementation details are presented in its own chapter. In the Section "Performance" the effect of implementation alternatives and optimizations on performance are examined and the overall file storage performance is evaluated. This work closes with the Chapter "Conclusion and Future Work", where the results of this work are summarized and an outlook for future improvements is given. Important terms are emphasized with bold font when they are introduced. Class and method names are always emphasized with italic.

15 Chapter 2 Related Work The field of P2P applications targeted to share storage resource can be divided into two groups. The first group offers file sharing and content distribution capabilities. Because content on hosts is shared, the main task for this group of application is content location and its distribution. The different methods of how to find content items and how the content items are distributed are examined. The second group offers a distributed file system service. The ability of reliable and persistent storage distinguishes it from the first group. Apart from the content location and distribution, the mechanisms to achieve reliability in a dynamic and unreliable environment are examined for this group of applications. 2.1 File Sharing Applications Napster Although Napster[10] is often referred as the first P2P application, its does not follow a true P2P concept. Napster can be characterized as a client/server system for centralized content metadata lookup combined with direct client-to-client connections for content delivery. At startup the Napster client software connects to a central Napster server, authenticates itself with login and password and registers its shared content s metadata to a central index database. A content query is sent to the central index server, which processes the query by a index database lookup, and returns to the client a list of matching content metadata records containing the network location of the client sharing the content item, its exact filename and some bandwidth and latency information. From this list the user has to choose a client from whom to download the content file. The download reliability is low because only a single unreliable source is used and a broken download is not automatically continued from a dif- 5

16 6 CHAPTER 2. RELATED WORK ferent source. Another conceptual problem is using a central server for content location, which is neither a reliable nor a scalable solution. Napster needs to operate several of those central servers to achieve fault tolerance and load balancing because a single server can only handle a limited number of users simultaneously. Above this threshold, the server will reject connection requests and the client has to try another server. A several servers are necessary to serve peak load, but at other times they will be idle, which results in bad resource allocation. After connecting to one of the central servers, the client stays connected to its server for the whole session. Since each server maintains its own index database, a user will only see a restricted view of the total content available. The handicap of Napster is the centralized index, which simplifies the system but results in a single point of failure and a performance bottleneck Gnutella To avoid the disadvantages of Napster, the Gnutella network is decentralized. The only central component is the host cache service, which is used by the servants, a Gnutella specific term of combined client and server, to find a bootstrap node. The Gnutella protocol uses a time-to-live (TTL) scoped flooding for servant and content discovery. A servant is permanently connected to a small number of neighbors. When a servant receives a request from one of his neighbors, it decreases the TTL counter of the request and forwards it to all its neighbors if the TTL is greater than zero. The reply is routed back along the reverse path. There are two important request/reply pairs. A Ping request for discovering new servants is replied with a Pong response message containing the IP address, TCP port and some status information. The other pair is the Query request, which contains query keywords and is answered with a QueryHit if the Query matches some files shared by the servant. The QueryHit is routed back along the reverse path to the servant that initiated the Query and contains the necessary information to start a direct download of the file, which is done similar to the HTTP get command. The main disadvantage of Gnutella is the distributed search based on scoped flooding, which does not scale in terms of the number of servants[21] because the number of messages grows exponentially and uses much of the servant s bandwidth. To reduce the number of servants the next generation of the Gnutella protocol will introduce supernodes, which will act as a message routing proxies for clients with limited bandwidth. These clients, called shielded nodes, have only a single connection to one supernode, which shields them from routing Gnutella messages. The supernode concept is a result of the nodes heterogeneity observed in the real world. Not all nodes are really equal concerning their resources and by far not all user want to share them.

17 2.1. FILE SHARING APPLICATIONS FastTrack The FastTrack protocol, used in the KaZaa and Morpheus application [8], is a hybrid and two layered architecture of peers connect to supernodes, which themselves are connected together. A supernode acts like a local search hub that maintains the index of the media files being shared by each peer connected to it and proxies search requests on behalf of its local peers. FastTrack elects a peers with sufficient bandwidth and processing power to become a supernode if its user has allowed it in the configuration. A search results in FastTrack contains a list of files that match the search criteria. FastTrack uses parallel download and client side caching for file transfers. A file is logically split into segments and these segments are downloaded from other peers that share the same file or, in the case of client side caching, do download this file and share the segments downloaded so far until the download is completed. This can increase the download speed significantly, especially for asymmetric dial-up connections because the limited upstream bandwidths add up together. As FastTrack is a proprietary protocol, it is so far difficult to evaluate what scaling properties the supernode network has Swarmcast Swarmcast [15] is a content distribution network. The content provider has to host content on his own server and Swarmcast s job is to boost the download and to ease the burden on the contents providers server. This is done by parallel downloading and locality based client side caching. For each file a temporary mesh of client nodes downloading this file is maintained in order to find other close nodes to exchange file parts. The provider s file is broken into parts, which are then encoded into packets with a forward error correction code (FEC) [22]. A (n, k) forward error correction encodes k source packets into n > k encoded packets. The encoding is such that any subset of k encoded packets suffices to reconstruct the source data. Swarmcast reduces complexity and communication overhead by randomly sending the packets to other nodes in the mesh and Swarmcast use FEC encoding to avoid the potential overlap in duplicate packets, which would otherwise drastically decrease the utility of each packet. The encoded packets are then spread randomly among the nodes of the mesh, which will exchange them until the nodes have enough packets to reconstruct the file. After downloading, the nodes should keep the packets in a their cache to support the other nodes in the mesh. The system scales nicely because the more requests there are for a file, the more nodes join the mesh and the more packets are cached and exchanged.

18 8 CHAPTER 2. RELATED WORK 2.2 Distributed Storage Applications Distributed storage applications must have an active replication strategy to increase reliability as compared to file sharing and content distribution application, which more rely the fact that with a large number of users sharing content, the probability of content being available can be quite high, however without determination. An important difference to file sharing applications is that distributed storage applications in general have a publish process, which adds content items to the system. The location of the content items is not predefined like it is for file sharing applications Freenet Freenet[3] is a distributed publishing system, which provides anonymity to publishers and consumers of content. An adaptive network is used to locate content by forwarding requests to nodes that are closer to the key that identifies a content item. On each hop information whether the item was found on this path or not travels in backward direction and is temporarily stored on the nodes. The next request for the same key takes advantage of this information and gets routed directly to the content source. When the query reaches the content source, the content is propagated along the query s reverse path and cached in the intermediate nodes. Freenet uses an intelligent flooding search, where the routing information and cached copies are stored along the path. The more requests for a content item, the more cached copies and routing information are available. If there has been no request in a period of time for a content item, the nodes discard the content items because all routing information about this item on the other nodes has already timed out and the item is not referenced anymore. As a consequence, published content is only stored persistent as long as there is enough demand to keep routing information alive. The content objects are floating around in the network and there is only temporal and local knowledge about where the content is actually located. To provide anonymity to the publishers and consumers, there is no direct peer-to-peer data transfer. Instead, the content data is routed through the network. Nodes with low bandwidth may become a bottleneck to the system and the flooding based content lookup causes scales badly PAST PAST [6] is a persistent peer-to-peer storage utility, which replicates complete files on multiple nodes. Pastry [24] is used for message routing and content location. PAST stores a content item on the node whose node identifier nodeid is closest to the file identifier fileid. Routing a message to the closest node is done by choosing the next hop node whose nodeid shares with the fileid a prefix that is at least one

19 2.2. DISTRIBUTED STORAGE APPLICATIONS 9 digit longer than the prefix that the fileid shares with the present node s nodeid. The fileid is generated by hashing the filename and the nodeid is assigned randomly when a node joins the network. The routing path length scales logarithmic in terms of the overall number of nodes in the network. For each file an individual replication factor k can be chosen and replicas are stored on the k nodes that are closest to the fileid. Maintaining the k replicas in the case of a node failure is detected by the Pastry background process of exchanging heartbeat messages with neighbors. When a node detects a neighbor node s failure, the replica is automatically replaced on another neighbor. Free storage space is used to cache files along the routing path, while approaching the closest node during the publish or retrieval process. This can only be done if the file data is routed along the reverse query path. Thus there is no direct peer-to-peer file transfer. Similar to the Freenet system, nodes with low bandwidth may become a bottleneck Oceanstore/Silverback Silverback[30] is the archival layer of the Oceanstore system. For routing and content location Tapestry[32] is used, which is a distributed version of Plaxton s hashed-suffix routing and is quite similar to Pastry. Therefore, the number of hops and messages is logarithmic to the total number of nodes in the network. A file is split into blocks, which are then encoded with a forward error correction (FEC) code into n fragments. The block s binary data is hashed into a blockid, which is used to route the n block fragments to the n closest nodes in terms of the most common suffix of the nodeids and the blockid. The fragments are periodically republished by a file s Responsible Party to increase reliability and they are cached along the path to reduce the access latency and to balance the load on several nodes. The system features a file version management, which use tombstones to reduce storage resources by storing only the difference of a file block compared to the latest tombstone version of a block Cooperative File System (CFS) CFS[4] is a read only file system built on top of Chord[28], which is used for content location. Chord belongs to the same family of second generation peer-to-peer resource location services like Pastry and Tapestry, which use routing for content location. On each hop the closest routing alternative is chosen to approach the closest node defined by a metric of the identifiers generated by a hash function. The basic idea is that nodes closer to the routing target have a more detailed view over the target s neighborhood and this knowledge is exploited to approach the target. Since Chord was chosen as the lookup and routing service for this work, it is described in detail in Section Right now, it s enough to know that the hashed identifiers are interpreted as n-bit numbers, which are arranged in a circle by the natural integer order. A file is split into blocks identified by blockids. By definition, the r block

20 10 CHAPTER 2. RELATED WORK replicas are stored at the successor node of the blockid and its r 1 immediate successor nodes. The successor node is the closest node to an item identified by a blockid or a nodeid and per definition it is the node that immediately follows the item s ID in the circle. When a node joins the circle, a block s successors node can change and the network has to move some blocks to the new node to maintain the property of storing the block s replicas on the closest r nodes to the blockid. The blockid s successor node is responsible for maintaining the r 1 block replicas on its r 1 successor nodes by periodically verifying their availability and replacing replicas in case of a node failure. Similar to PAST and Silverback, cache replicas are stored along the reverse lookup path when the requested block data is returned to reduce latency and to balance the load.

21 Chapter 3 Problem Analysis Peer-to-Peer applications aim to take advantage of shared resources in a dynamic environment where fluctuating participants are the norm. Hence there is a need for a resource centric address scheme working under dynamic conditions[27]. In this work the resource of interest is free disk space available on Internet hosts. Each host is identified by a unique Internet Protocols (IP) host address used for packet routing to this host. To establish a communication to a host, an additional port number is necessary to identify the software that handles the communication process on that host. The IP address together with the port number is the network location, which is necessary to communicate with the software on a host managing its storage resources. Moreover, a content item needs a content name that identifies it among all the other content items. A resource centric addressing scheme for storage related applications provides a binding from content names to network locations, which are resolved by a lookup operation[25]. The Domain Name System (DNS)[7] is an excellent example of such an addressing scheme. It is a host centric addressing scheme because it was introduced to map human readable host names to host IP addresses. A host name consists of domain names separated by dots, which are interpreted as a hierarchy where the last domain name is the top level domain like com, net and org. Basically, there are two possibilities how the binding is stored. In a single flat "hosts.txt" file or distributed over a hierarchical topology of DNS servers, which store the the necessary information to resolve a DNS lookup by traversing the hierarchy. Because the number of lookup steps is limited by the number of hierarchy levels and caching is used in all levels, the DNS scaled to times its original size[16]. But the binding information in this system is manually maintained and changes need hours, if not days, to penetrate thought the system. Therefore, it is not well suited for P2P systems with participants that in average stay in a P2P System for less than an hour. 11

22 12 CHAPTER 3. PROBLEM ANALYSIS The way how Napster resolves the host addresses compares to the single flat file in the DNS. The central real time index on a Napster server stores all binding information to map content name fragments (keywords) to IP addresses, which can be looked up by the clients. The disadvantages of such a solution was already discussed. The distributed lookup by flooding performed by Gnutella provides minimal lookup latency, but trades it against bandwidth and scalability because the number of messages and the bandwidth grows exponentially with the number of nodes. For a file system application the situation is different from that of file sharing application. In file sharing applications the content is already stored on hosts and they have to discover the content stored on the hosts like Napster and Gnutella do. For an application like the one proposed in this work, the system itself decides where the content items are stored during the publish process and therefore an addressing scheme based on an overlay network can be used, which resolves bindings by routing. The addressing scheme maps content items to nodes and this mapping is used to store and retrieve the content items. 3.1 Overlay Routing Networks An overlay routing network is built of nodes connected together by a network of distinct topology. This network is a logical overlay network because the nodes communicate over an underlying communication network. But the logical network topology has influence on the routing algorithm. To resolve a binding for a content name, a message is routed to a node that is "closest" to the content name according to a metric of the node identifiers and the content names. The communication network address of this "closest" node is the result of the lookup operation. The routing algorithm on each node exploits local routing knowledge to route the message to the "closest" local routing alternative until there is no "closer" routing alternative. To define the closeness, there has to be a metric that applies to both, the node identifier and content name space. This can be achieved by using a hash function that deterministically maps node identifiers and content names into a flat (and uniformly populated) hash space and a metric is chosen to define the closeness. Pastry, Tapestry and Chord are based on this idea and therefore share the same average lookup path length of log(#nodes) hops, but they use different metrics and overlay network topologies.

23 3.1. OVERLAY ROUTING NETWORKS Chord Chord is a distributed lookup and routing overlay network using consistent hashing[9], originally introduced for distributed caching. Nodes are organized in a circular topology by using m-bit node IDs interpreted as nonnegative integer numbers wrapped around at zero. The total ordering of integer numbers assigns each node a predecessor and successor node. A node ID is generated by applying a hash function to the node s host IP address. Therefore, the overlay network becomes a deterministic function of the host address. In other words, the host IP address determines the position in the circle. This makes the overlay network topology completely unaware of the underlying network layer topology, which has some positive and negative effects. Routing a message to the successor - neighbor in the overlay network - could result in routing to the other side of the world in the underlying IP network, causing high latency. On the other hand, IP network failures in a region do not map to a region of the logical overlay network region, often used for placing redundant replicas, like in CFS or PAST. A content item is also identified by an m-bit content key, and the binding from keys (hashed content name) to node IDs (hashed host address) is defined by the successor function. A key k is located at the node n with the same or the next higher ID than the key k, written as n = successor(k). The content item associated with the key k is not stored in Chord itself. Chord just assigns a Responsible Node whose network location is used to access the content item. If a host operates more than one node, they are called Virtual Nodes and their node ID is calculated by hashing the host IP address together with a small Virtual Node ID, which is only unique on that host. The reason to use Virtual Nodes is that for a small number of nodes in the circle, the distance in the identifier space between nodes is not likely to be equally distributed as desired, which results in a unequal distribution of keys per real nodes.

24 14 CHAPTER 3. PROBLEM ANALYSIS Figure 3.1: Distribution of keys per real node depending on Virtual Nodes per real node for a simulated network with 10 4 real nodes and 10 6 keys. Figure 3.1 taken from [28] shows that increasing the total number of nodes by introducing multiple Virtual Nodes per real node, balances the number of keys per real nodes. In CFS the number of Virtual Nodes on a host is also used to adjust to the available storage capacity because a node is not allowed to reject a storage request. The separation of content data and content meta data, discussed later in this chapter, can eliminate this problem by introducing one more degree of freedom. But an equal distribution of keys per real node is still a desirable property and Virtual Nodes give significant optimization potential for the lookup as described in the implementation Section Using the same identifiers for nodes and keys leads to a combined lookup and routing. A lookup is resolved by routing a message to the node that is the successor of the key. Every node knows at least two other nodes, its successor and its predecessor. The simple lookup algorithm routes a messages around the circle by following the successor pointers until a node with the same ID or the next higher ID than the key is found. The metric used in Chord is the numerical difference of key and node ID. Routing is done by choosing the local routing alternative that minimizes this difference. As one node is only aware of its successor and its predecessor as available routing alternatives, a lookup message is always traversing the circle in direction of the successor pointers because only this reduces the distance. In the worst case a message has to complete a full circle turn, before the node that is successor to the key is found. Resolving a lookup with the simple algorithm takes O(#nodes) hops. As long as every node has a working pointer to the immediate successor in the cir-

25 3.1. OVERLAY ROUTING NETWORKS 15 cle, a successful successor lookup is guaranteed. In a real application with frequent join and leave, a single successor pointer is not sufficient enough to guarantee a successful lookup. A single node failure would break the circle and result in lookup failures. Therefore, redundant successor pointers are used. As long as one working successor pointer is found, the lookup routing can proceed and a successful lookup is guaranteed. To reduce the average lookup path length to a practical number, a finger table with additional routing information is introduced. Fingers are like shortcuts, used instead of going around the circle from node to node following the successor pointers. Every node divides the circle into m finger intervals with exponentially growing size in power of 2. Finger Table n = 80; k = 1.. m 80 [start end) length node k n + 2 k-1 mod 2 m n + 2 k mod 2 m 2 k-1 successor( start) N110 N N80 N120 finger[6] N m = 7 bit Figure 3.2: An example of a finger interval with the finger pointer A finger points to the successor of the interval start, which could result in finger pointers being outside their corresponding finger interval. The finger nodes are resolved by the Chord lookup function, which returns the successor node of the interval start ID. Using the finger table, adds O(m) additional routing alternatives and the one is chosen that leads closest to the successor of the key. The higher the finger index, the farer away the finger points. Therefore, the finger table is searched in reverse order, starting at the finger[m]. If a finger i points to a node preceding the key, this hop reduces the distance to the key by 2 i 1. With a few hops, the distance to the key is quickly reduced, which results in an average lookup path length of O(log(#nodes)). This bound was proven theoretically and verified by controlled experiments in a the Chord paper[28].

26 16 CHAPTER 3. PROBLEM ANALYSIS Figure 3.3 shows a detailed Chord routing example using finger tables in a m = 7 bit circular hash space. Starting at node N32, which wants to resolve the successor of the key K19, N32 looks in its finger routing table for the node that closest precedes K19. The finger table is searched in reverse order, starting at the finger with index 7. This finger matches the criteria and therefore the lookup continues at N99. On N99 the finger table is searched again. The 7 th finger N60 does not precede K19 and therefore the 6 th finger is tested. This one, pointing to N5, precedes K19, hence the lookup continues on N5. N5 finds N10 as its closest preceding finger. N10 now terminates the lookup because it can make out that its successor N20 is the successor node of K19. N5 Finger Table 5 [start end) node Finger Table 99 [start end) node N110 N99 N80 N10 N20 N32 K19 lookup(k19) Finger Table 32 [start end) node Figure 3.3: An example of a lookup using the finger table. The two important properties of Chord are inherited from using ranged hash functions as proposed in consistent hashing[28]: 1. balance property : the number of keys per real nodes is K with high probability (if each real node has O(logN) Virtual Nodes), where K is the number N of keys and N is the number of nodes. The responsibility for keys is equally distributed among all nodes. 2. monotony property : when the (N + 1) th node joins, the binding for O( K N ) keys changes from an existing nodes to the new node. In other words, the responsibility for keys changes only from existing nodes to new nodes, never from existing nodes to existing nodes. There is only a local reorganization on a node join.

27 3.2. INCREASING RELIABILITY WITH REDUNDANCY 17 Chord offers a scalable, robust, and balanced mapping of hashed content names to host network locations (IP address and a Virtual Node ID), which allows to communicate with these Virtual Node over the underlying network layer. The Chord overlay network delegates responsibility for content items to nodes. Chord does not store the data itself. To be consistent with the terminology introduced by Chord, a content item is identified by its key k, which of course is a Chord m-bit identifier. The node that is responsible for a content item k is called the primary Responsible Node RN1 k of key k and is defined as RN1 k = successor(k) by the lookup function. In a perfect world with no host failures, the straightforward solution would be storing content items on their primary Responsible Nodes. 3.2 Increasing Reliability with Redundancy The reliability for data storage on unreliable nodes is increased by adding redundant information and dispersing this information to several nodes. The reliability expressed as the probability of a successful data access is determined by: The amount of redundant information added The number of nodes and their independent failure probabilities How redundant information is distributed over multiple nodes The problem of storing data blocks on unreliable nodes is closely related to storing data blocks on a hard disk array like it is done for RAID storage solutions [2]. A RAID system is a redundant array of inexpensive disks. This technology was developed to organize small hard disks into arrays to replace much more expensive high capacity disks and to reduce the risk of data loss due to hard disk failures. There are several approaches how to distribute data over disks, or like in this case storage nodes. Two of them are now discussed IDA Approach IDA stands for Information Dispersal Algorithm, proposed by M. Rabin[19]. The basic idea is to disperse the content of a data block into n fragments. The original data can be reconstructed out of any subset of k fragments, where k <= n. One major aspect of this algorithm is that redundancy is added uniformly; there is no distinction between data and parity. This property allows to control the amount of redundant data in fine granularity. To tolerate up to r simultaneous node failures, the data block has to be encoded into n = k + r fragments. If all nodes have the independent failure probability p,

28 18 CHAPTER 3. PROBLEM ANALYSIS this gives a access reliability: p(access) = 1 n i=r+1 ( n i ) p i (1 p) n i The redundancy necessary to achieve this reliability is n k. It is obvious that the k IDA approach needs less amount of storage resource compared to the straight forward replication approach. For the same reliability the replication approach needs r times redundancy. The currently available Forward Error Correcting (FEC) codes, such as the Read- Solomon code [22], have encoding times quadratic to the number of the encoded blocks n. Tornado codes [12] achieve a linear encoding time to n, but so far there is no free implementation available. A performance comparison of the different codes can be found here[13] Replication Approach The other redundancy scheme, block replication or also called mirroring, was already mentioned and compared to the IDA approach. An analysis of the past development in hardware shows that the hard disk storage space doubled every 18 month, which is often referred as Moore s Law, and it is expected to hold for the next decade[31]. While the capacity per disk is growing, the price per storage unit is falling. By April 2002, the average hard disk that is shipped with a workstation can store between 40 to 60 GB. Therefore, disk storage capacity is not considered as a limited resource. Since the prototype will be implemented in Java, one should take into account that Java code that executes numeric calculation is likely to be 10 to 30 times slower than native machine code generated out of C code. Using FEC codes, based on polynomial arithmetic, will always produce significantly higher CPU load compared to replication. The storage node software is expected to run on workstations with priority to the user processes, not dedicated single purpose servers. Therefore, it should run as a background process with low priority and consume as less cpu cycles as possible. For these two reasons, the replication scheme will be used for to increase the reliability of the prototype Replica Placement The locations where replicas are stored depend on the overlay network topology that is used. In general, replicas are stored in the logical neighborhood of the primary Responsible Node. When the primary Responsible Node fails, the routing and lookup mechanism of the overlay network will assign the responsibility to another

29 3.3. CACHING AND LOAD BALANCING 19 node logically close to the failed primary Responsible Node. In Chord, a node that failed gets replaced by its immediate successor. Either the new node already has a replica or the new could can ask its neighbors for a replica. The set of nodes storing replicas of a content item identified by its key k are called Responsible Nodes and defined as: {RNi k i = 1... r}. The name expresses that they are altogether responsible to increase the reliability in terms of the access probability determined by the replication factor r. When this idea is applied to the circular overlay network topology used by Chord, the Responsible Nodes have to be either the primary Responsible Node s r 1 successors or its r 1 predecessors. This decision should take into account how the content items are located by the overlay network and if there are implications for caching and load balancing schemes that could be used to improve the access performance. 3.3 Caching and Load Balancing Chord s balance property results in a uniform distribution of responsibility for content items among all nodes. However, non-uniform information access due to popular content will create hot-spots in the overlay network and congestion in the underling network if not avoided by caching and load balancing mechanisms. The design of these mechanism is closely related to the overlay network s topology and its routing algorithms because both have influence on the routing path through the overlay network and therefore on the locations of hot-spots. Most of the activity in this distributed peer-to-peer system will be caused by locating and accessing content for which caching is used to increase the performance. Chord itself has been developed in the field of distributed cache design based on consistent hashing. Hence, peer-to-peer design should consider some general design principles for distributed caching[29]: 1. Maintain a hierarchy of metadata that tracks where copies of data are stored. 2. Separate data paths from metadata paths 3. Use direct cache-to-cache data transfers to avoid store-and-forward delays Separation of Data and Metadata Caching is used to improve download performance by placing or locating cache replicas closer to the user than the content itself, assuming that closer in terms of network proximity will result in higher throughput. In Chord s case, where network proximity is not reflected by the overlay network, it is difficult to find a close replica.

30 20 CHAPTER 3. PROBLEM ANALYSIS Therefore, a parallel access scheme, which accesses several replicas in parallel, will be used to increase download performance. Farther details about parallel access can be found in the Section According to the first design principle, a content item s metadata structure contains pointers to nodes where replicas to increase the reliability or to distribute the load are located. This metadata structure is then used for parallel access. Accessing a content item is a two step process: 1. Accessing metadata information by primary Responsible Node lookup 2. Using the metadata pointers for parallel access Following the second design principle, data and metadata access paths are separated and for each an individual caching and load balancing scheme is designed that exploits some of its access characteristic. In the Section the metadata access caching scheme and in Section the data access caching scheme is explained in detail. Two ways of separating data and metadata are possible, real and logical separation. Real separation is when a content item s metadata and the replica data are located on different nodes. Logical separation means, that data and metadata are on the same node, but are distinguished in the sense of their different roles in the two step access pattern. First a node is accessed to return metadata, then it is accessed again in the parallel access process. The idea for real separation was originally developed to overcome a negative effect of Chord s balance and monotony properties. When a node joins the Chord ring, some of the keys the new node becomes responsible for shift to the new node. For an average of K keys per node, K replicas have to be transferred to the new N N node. In real life, the circle will be sparsely populated with N nodes and a much higher number K of keys, which makes K 1. Depending on the number of keys, N this can cause high load on the underlying network link between the new node that joins and the existing successor node if the keys are directly transferred from the old node to the new node. From one point of view, this data transfer is not necessary because the replicas on the old node have not vanished and therefore there is no need for shifting data due to a change of responsibility. In order to drastically reduce the data transferred, instead of moving the real data, much smaller metadata is shifted from the old node to the new node to reflect the change of responsibility. An additional degree of freedom is introduced, which allows to choose a node where a replica is stored. This has the advantage that nodes with low storage resource usage can be preferred and an explicit balancing of storage resources can be achieved. It is not necessary anymore

Hierarchical peer-to-peer look-up service. Prototype implementation

Hierarchical peer-to-peer look-up service. Prototype implementation Hierarchical peer-to-peer look-up service Prototype implementation (Master Thesis) Francisco Javier Garcia Romero Tutor in Institut Eurecom: Prof. Dr. Ernst Biersack March 28, 2003 Acknowledges I first

More information

Telematics Chapter 9: Peer-to-Peer Networks

Telematics Chapter 9: Peer-to-Peer Networks Telematics Chapter 9: Peer-to-Peer Networks Beispielbild User watching video clip Server with video clips Application Layer Presentation Layer Application Layer Presentation Layer Session Layer Session

More information

A Survey of Peer-to-Peer Content Distribution Technologies

A Survey of Peer-to-Peer Content Distribution Technologies A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi Outline Overview

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE for for March 10, 2006 Agenda for Peer-to-Peer Sytems Initial approaches to Their Limitations CAN - Applications of CAN Design Details Benefits for Distributed and a decentralized architecture No centralized

More information

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications Introduction to Computer Networks Lecture30 Today s lecture Peer to peer applications Napster Gnutella KaZaA Chord What is P2P? Significant autonomy from central servers Exploits resources at the edges

More information

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9 Peer-to-Peer Networks -: Computer Networking L-: PP Typically each member stores/provides access to content Has quickly grown in popularity Bulk of traffic from/to CMU is Kazaa! Basically a replication

More information

Scalable overlay Networks

Scalable overlay Networks overlay Networks Dr. Samu Varjonen 1 Lectures MO 15.01. C122 Introduction. Exercises. Motivation. TH 18.01. DK117 Unstructured networks I MO 22.01. C122 Unstructured networks II TH 25.01. DK117 Bittorrent

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [P2P SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Byzantine failures vs malicious nodes

More information

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline System Architectural Design Issues Centralized Architectures Application

More information

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Content Overlays. Nick Feamster CS 7260 March 12, 2007 Content Overlays Nick Feamster CS 7260 March 12, 2007 Content Overlays Distributed content storage and retrieval Two primary approaches: Structured overlay Unstructured overlay Today s paper: Chord Not

More information

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 20.1.2014 Contents P2P index revisited Unstructured networks Gnutella Bloom filters BitTorrent Freenet Summary of unstructured networks

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing March 8, 2016 rof. George orter Outline Today: eer-to-peer networking Distributed hash tables Consistent hashing

More information

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002 EE 122: Peer-to-Peer (P2P) Networks Ion Stoica November 27, 22 How Did it Start? A killer application: Naptser - Free music over the Internet Key idea: share the storage and bandwidth of individual (home)

More information

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4 Xiaowei Yang xwy@cs.duke.edu Overview Problem Evolving solutions IP multicast Proxy caching Content distribution networks

More information

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find

More information

An Expresway over Chord in Peer-to-Peer Systems

An Expresway over Chord in Peer-to-Peer Systems An Expresway over Chord in Peer-to-Peer Systems Hathai Tanta-ngai Technical Report CS-2005-19 October 18, 2005 Faculty of Computer Science 6050 University Ave., Halifax, Nova Scotia, B3H 1W5, Canada An

More information

Peer-to-Peer (P2P) Systems

Peer-to-Peer (P2P) Systems Peer-to-Peer (P2P) Systems What Does Peer-to-Peer Mean? A generic name for systems in which peers communicate directly and not through a server Characteristics: decentralized self-organizing distributed

More information

Peer-to-peer computing research a fad?

Peer-to-peer computing research a fad? Peer-to-peer computing research a fad? Frans Kaashoek kaashoek@lcs.mit.edu NSF Project IRIS http://www.project-iris.net Berkeley, ICSI, MIT, NYU, Rice What is a P2P system? Node Node Node Internet Node

More information

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can

More information

Introduction to Peer-to-Peer Systems

Introduction to Peer-to-Peer Systems Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed

More information

INF5070 media storage and distribution systems. to-peer Systems 10/

INF5070 media storage and distribution systems. to-peer Systems 10/ INF5070 Media Storage and Distribution Systems: Peer-to to-peer Systems 10/11 2003 Client-Server! Traditional distributed computing! Successful architecture, and will continue to be so (adding proxy servers)!

More information

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 19.1.2015 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find

More information

Early Measurements of a Cluster-based Architecture for P2P Systems

Early Measurements of a Cluster-based Architecture for P2P Systems Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella

More information

Unit 8 Peer-to-Peer Networking

Unit 8 Peer-to-Peer Networking Unit 8 Peer-to-Peer Networking P2P Systems Use the vast resources of machines at the edge of the Internet to build a network that allows resource sharing without any central authority. Client/Server System

More information

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q Overlay networks To do q q q Overlay networks P2P evolution DHTs in general, Chord and Kademlia Turtles all the way down Overlay networks virtual networks Different applications with a wide range of needs

More information

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016 Distributed Systems 17. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2016 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can

More information

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become

More information

EE 122: Peer-to-Peer Networks

EE 122: Peer-to-Peer Networks EE 122: Peer-to-Peer Networks Ion Stoica (and Brighten Godfrey) TAs: Lucian Popa, David Zats and Ganesh Ananthanarayanan http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer

More information

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley

More information

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 Lecture 6: Overlay Networks CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 1 Overlay networks: Motivations Protocol changes in the network happen very slowly Why? Internet is shared

More information

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. : An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department

More information

Peer-to-Peer Signalling. Agenda

Peer-to-Peer Signalling. Agenda Peer-to-Peer Signalling Marcin Matuszewski marcin@netlab.hut.fi S-38.115 Signalling Protocols Introduction P2P architectures Skype Mobile P2P Summary Agenda 1 Introduction Peer-to-Peer (P2P) is a communications

More information

Peer-to-Peer Networks

Peer-to-Peer Networks Peer-to-Peer Networks 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross Administrivia Quiz #1 is next week

More information

Peer-to-Peer Internet Applications: A Review

Peer-to-Peer Internet Applications: A Review Peer-to-Peer Internet Applications: A Review Davide Quaglia 01/14/10 Introduction Key points Lookup task Outline Centralized (Napster) Query flooding (Gnutella) Distributed Hash Table (Chord) Simulation

More information

Architectures for Distributed Systems

Architectures for Distributed Systems Distributed Systems and Middleware 2013 2: Architectures Architectures for Distributed Systems Components A distributed system consists of components Each component has well-defined interface, can be replaced

More information

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma Overlay and P2P Networks Introduction and unstructured networks Prof. Sasu Tarkoma 14.1.2013 Contents Overlay networks and intro to networking Unstructured networks Overlay Networks An overlay network

More information

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals. Overlay Networks: Motivations CS : Introduction to Computer Networks Overlay Networks and PP Networks Ion Stoica Computer Science Division Department of lectrical ngineering and Computer Sciences University

More information

Motivation for peer-to-peer

Motivation for peer-to-peer Peer-to-peer systems INF 5040 autumn 2015 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Ø Inherent restrictions of the standard client/ server model

More information

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Page 1. How Did it Start? Model Main Challenge CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks How Did it Start?" CS162 Operating Systems and Systems Programming Lecture 24 Peer-to-Peer Networks" A killer application: Napster (1999) Free music over the Internet Key idea: share the storage and bandwidth

More information

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General 1 ELT-53206 Peer-to-Peer Networks Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original

More information

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General 1 ELT-53207 P2P & IoT Systems Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original

More information

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P? Peer-to-Peer Data Management - Part 1- Alex Coman acoman@cs.ualberta.ca Addressed Issue [1] Placement and retrieval of data [2] Server architectures for hybrid P2P [3] Improve search in pure P2P systems

More information

Introduction to P2P Computing

Introduction to P2P Computing Introduction to P2P Computing Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction A. Peer-to-Peer vs. Client/Server B. Overlay Networks 2. Common Topologies 3. Data Location 4. Gnutella

More information

Making Gnutella-like P2P Systems Scalable

Making Gnutella-like P2P Systems Scalable Making Gnutella-like P2P Systems Scalable Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker Presented by: Herman Li Mar 2, 2005 Outline What are peer-to-peer (P2P) systems? Early P2P systems

More information

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC Distributed Meta-data Servers: Architecture and Design Sarah Sharafkandi David H.C. Du DISC 5/22/07 1 Outline Meta-Data Server (MDS) functions Why a distributed and global Architecture? Problem description

More information

Peer to Peer Networks

Peer to Peer Networks Sungkyunkwan University Peer to Peer Networks Prepared by T. Le-Duc and H. Choo Copyright 2000-2017 Networking Laboratory Presentation Outline 2.1 Introduction 2.2 Client-Server Paradigm 2.3 Peer-To-Peer

More information

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -

More information

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili P2P Network Structured Networks: Distributed Hash Tables Pedro García López Universitat Rovira I Virgili Pedro.garcia@urv.net Index Introduction to DHT s Origins of structured overlays Case studies Chord

More information

Peer-to-Peer Architectures and Signaling. Agenda

Peer-to-Peer Architectures and Signaling. Agenda Peer-to-Peer Architectures and Signaling Juuso Lehtinen Juuso@netlab.hut.fi Slides based on presentation by Marcin Matuszewski in 2005 Introduction P2P architectures Skype Mobile P2P Summary Agenda 1 Introduction

More information

08 Distributed Hash Tables

08 Distributed Hash Tables 08 Distributed Hash Tables 2/59 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per

More information

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall

More information

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing Department of Computer Science Institute for System Architecture, Chair for Computer Networks File Sharing What is file sharing? File sharing is the practice of making files available for other users to

More information

Peer-peer and Application-level Networking. CS 218 Fall 2003

Peer-peer and Application-level Networking. CS 218 Fall 2003 Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed Hash Tables (DHT) Chord CAN Much of this material

More information

CIS 700/005 Networking Meets Databases

CIS 700/005 Networking Meets Databases Announcements CIS / Networking Meets Databases Boon Thau Loo Spring Lecture Paper summaries due at noon today. Office hours: Wed - pm ( Levine) Project proposal: due Feb. Student presenter: rd Jan: A Scalable

More information

Distributed Hash Tables: Chord

Distributed Hash Tables: Chord Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 12 th February 2016 Today: DHTs, P2P Distributed Hash Tables: a building block

More information

Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems

Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma Distributed and Agent Systems Peer-to-Peer Systems & JXTA Prof. Agostino Poggi What is Peer-to-Peer

More information

Peer-to-Peer Systems and Distributed Hash Tables

Peer-to-Peer Systems and Distributed Hash Tables Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected

More information

A Framework for Peer-To-Peer Lookup Services based on k-ary search

A Framework for Peer-To-Peer Lookup Services based on k-ary search A Framework for Peer-To-Peer Lookup Services based on k-ary search Sameh El-Ansary Swedish Institute of Computer Science Kista, Sweden Luc Onana Alima Department of Microelectronics and Information Technology

More information

Distributed Systems Final Exam

Distributed Systems Final Exam 15-440 Distributed Systems Final Exam Name: Andrew: ID December 12, 2011 Please write your name and Andrew ID above before starting this exam. This exam has 14 pages, including this title page. Please

More information

Distributed Knowledge Organization and Peer-to-Peer Networks

Distributed Knowledge Organization and Peer-to-Peer Networks Knowledge Organization and Peer-to-Peer Networks Klaus Wehrle Group Chair of Computer Science IV RWTH Aachen University http://ds.cs.rwth-aachen.de 1 Organization of Information Essential challenge in?

More information

Debunking some myths about structured and unstructured overlays

Debunking some myths about structured and unstructured overlays Debunking some myths about structured and unstructured overlays Miguel Castro Manuel Costa Antony Rowstron Microsoft Research, 7 J J Thomson Avenue, Cambridge, UK Abstract We present a comparison of structured

More information

15-744: Computer Networking P2P/DHT

15-744: Computer Networking P2P/DHT 15-744: Computer Networking P2P/DHT Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Chord Comparison of DHTs 2 Peer-to-Peer Networks Typically each member stores/provides access

More information

CS 3516: Advanced Computer Networks

CS 3516: Advanced Computer Networks Welcome to CS 3516: Advanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 320 Fall 2017 A-term 1 Some slides are originally from the course materials of the textbook

More information

Chapter 6 PEER-TO-PEER COMPUTING

Chapter 6 PEER-TO-PEER COMPUTING Chapter 6 PEER-TO-PEER COMPUTING Distributed Computing Group Computer Networks Winter 23 / 24 Overview What is Peer-to-Peer? Dictionary Distributed Hashing Search Join & Leave Other systems Case study:

More information

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have.. DHT Overview P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar DHTs provide a simple primitive put (key,value) get (key) Data/Nodes distributed over a key-space High-level idea: Move

More information

: Scalable Lookup

: Scalable Lookup 6.824 2006: Scalable Lookup Prior focus has been on traditional distributed systems e.g. NFS, DSM/Hypervisor, Harp Machine room: well maintained, centrally located. Relatively stable population: can be

More information

Lecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview:

Lecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview: PP Lecture 1 Peer-to-Peer Model March 16, 005 Overview: centralized database: Napster query flooding: Gnutella intelligent query flooding: KaZaA swarming: BitTorrent unstructured overlay routing: Freenet

More information

Chapter 10: Peer-to-Peer Systems

Chapter 10: Peer-to-Peer Systems Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, Addison-Wesley 2005 Introduction To enable the sharing of data and resources

More information

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup

More information

WSN Routing Protocols

WSN Routing Protocols WSN Routing Protocols 1 Routing Challenges and Design Issues in WSNs 2 Overview The design of routing protocols in WSNs is influenced by many challenging factors. These factors must be overcome before

More information

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization

More information

L3S Research Center, University of Hannover

L3S Research Center, University of Hannover , University of Hannover Dynamics of Wolf-Tilo Balke and Wolf Siberski 21.11.2007 *Original slides provided by S. Rieche, H. Niedermayer, S. Götz, K. Wehrle (University of Tübingen) and A. Datta, K. Aberer

More information

Introduction to Peer-to-Peer Networks

Introduction to Peer-to-Peer Networks Introduction to Peer-to-Peer Networks The Story of Peer-to-Peer The Nature of Peer-to-Peer: Generals & Paradigms Unstructured Peer-to-Peer Systems Sample Applications 1 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

More information

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli P2P Applications Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli Server-based Network Peer-to-peer networks A type of network

More information

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016 Internet Technology 06. Exam 1 Review Paul Krzyzanowski Rutgers University Spring 2016 March 2, 2016 2016 Paul Krzyzanowski 1 Question 1 Defend or contradict this statement: for maximum efficiency, at

More information

IPv6: An Introduction

IPv6: An Introduction Outline IPv6: An Introduction Dheeraj Sanghi Department of Computer Science and Engineering Indian Institute of Technology Kanpur dheeraj@iitk.ac.in http://www.cse.iitk.ac.in/users/dheeraj Problems with

More information

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashock, Frank Dabek, Hari Balakrishnan March 4, 2013 One slide

More information

Internet Technology 3/2/2016

Internet Technology 3/2/2016 Question 1 Defend or contradict this statement: for maximum efficiency, at the expense of reliability, an application should bypass TCP or UDP and use IP directly for communication. Internet Technology

More information

6. Peer-to-peer (P2P) networks I.

6. Peer-to-peer (P2P) networks I. 6. Peer-to-peer (P2P) networks I. PA159: Net-Centric Computing I. Eva Hladká Faculty of Informatics Masaryk University Autumn 2010 Eva Hladká (FI MU) 6. P2P networks I. Autumn 2010 1 / 46 Lecture Overview

More information

LECT-05, S-1 FP2P, Javed I.

LECT-05, S-1 FP2P, Javed I. A Course on Foundations of Peer-to-Peer Systems & Applications LECT-, S- FPP, javed@kent.edu Javed I. Khan@8 CS /99 Foundation of Peer-to-Peer Applications & Systems Kent State University Dept. of Computer

More information

What is Multicasting? Multicasting Fundamentals. Unicast Transmission. Agenda. L70 - Multicasting Fundamentals. L70 - Multicasting Fundamentals

What is Multicasting? Multicasting Fundamentals. Unicast Transmission. Agenda. L70 - Multicasting Fundamentals. L70 - Multicasting Fundamentals What is Multicasting? Multicasting Fundamentals Unicast transmission transmitting a packet to one receiver point-to-point transmission used by most applications today Multicast transmission transmitting

More information

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations. Goals CS : Introduction to Computer Networks Overlay Networks and PP Networks Ion Stoica Computer Science Division Department of lectrical ngineering and Computer Sciences University of California, Berkeley

More information

CS514: Intermediate Course in Computer Systems

CS514: Intermediate Course in Computer Systems Distributed Hash Tables (DHT) Overview and Issues Paul Francis CS514: Intermediate Course in Computer Systems Lecture 26: Nov 19, 2003 Distributed Hash Tables (DHT): Overview and Issues What is a Distributed

More information

Opportunistic Application Flows in Sensor-based Pervasive Environments

Opportunistic Application Flows in Sensor-based Pervasive Environments Opportunistic Application Flows in Sensor-based Pervasive Environments Nanyan Jiang, Cristina Schmidt, Vincent Matossian, and Manish Parashar ICPS 2004 1 Outline Introduction to pervasive sensor-based

More information

Distributed Hash Table

Distributed Hash Table Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ

More information

Lecture 8: Application Layer P2P Applications and DHTs

Lecture 8: Application Layer P2P Applications and DHTs Lecture 8: Application Layer P2P Applications and DHTs COMP 332, Spring 2018 Victoria Manfredi Acknowledgements: materials adapted from Computer Networking: A Top Down Approach 7 th edition: 1996-2016,

More information

Peer-to-peer systems and overlay networks

Peer-to-peer systems and overlay networks Complex Adaptive Systems C.d.L. Informatica Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell Informazione 1 Outline Introduction to P2P systems

More information

INF5071 Performance in distributed systems: Distribution Part III

INF5071 Performance in distributed systems: Distribution Part III INF5071 Performance in distributed systems: Distribution Part III 5 November 2010 Client-Server Traditional distributed computing Successful architecture, and will continue to be so (adding proxy servers)

More information

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems P2P Alex S. 1 Introduction The systems we will examine are known as Peer-To-Peer, or P2P systems, meaning that in the network, the primary mode of communication is between equally capable peers. Basically

More information

Stratos Idreos. A thesis submitted in fulfillment of the requirements for the degree of. Electronic and Computer Engineering

Stratos Idreos. A thesis submitted in fulfillment of the requirements for the degree of. Electronic and Computer Engineering P2P-DIET: A QUERY AND NOTIFICATION SERVICE BASED ON MOBILE AGENTS FOR RAPID IMPLEMENTATION OF P2P APPLICATIONS by Stratos Idreos A thesis submitted in fulfillment of the requirements for the degree of

More information

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger

Middleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger Middleware and Distributed Systems Peer-to-Peer Systems Peter Tröger Peer-to-Peer Systems (P2P) Concept of a decentralized large-scale distributed system Large number of networked computers (peers) Each

More information

«Computer Science» Requirements for applicants by Innopolis University

«Computer Science» Requirements for applicants by Innopolis University «Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly

More information

Building a low-latency, proximity-aware DHT-based P2P network

Building a low-latency, proximity-aware DHT-based P2P network Building a low-latency, proximity-aware DHT-based P2P network Ngoc Ben DANG, Son Tung VU, Hoai Son NGUYEN Department of Computer network College of Technology, Vietnam National University, Hanoi 144 Xuan

More information

PEER-TO-PEER (P2P) systems are now one of the most

PEER-TO-PEER (P2P) systems are now one of the most IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 1, JANUARY 2007 15 Enhancing Peer-to-Peer Systems Through Redundancy Paola Flocchini, Amiya Nayak, Senior Member, IEEE, and Ming Xie Abstract

More information

CS 268: Lecture 22 DHT Applications

CS 268: Lecture 22 DHT Applications CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 (Presentation

More information

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma

Overlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma Overlay and P2P Networks Structured Networks and DHTs Prof. Sasu Tarkoma 6.2.2014 Contents Today Semantic free indexing Consistent Hashing Distributed Hash Tables (DHTs) Thursday (Dr. Samu Varjonen) DHTs

More information

Peer-to-Peer Applications Reading: 9.4

Peer-to-Peer Applications Reading: 9.4 Peer-to-Peer Applications Reading: 9.4 Acknowledgments: Lecture slides are from Computer networks course thought by Jennifer Rexford at Princeton University. When slides are obtained from other sources,

More information

EEC-684/584 Computer Networks

EEC-684/584 Computer Networks EEC-684/584 Computer Networks Lecture 14 wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Outline 2 Review of last lecture Internetworking

More information