Design and Implementation of a Distributed Object Storage System on Peer Nodes. Roger Kilchenmann. Diplomarbeit Von. aus Zürich
|
|
- Scarlett Nicholson
- 6 years ago
- Views:
Transcription
1 Design and Implementation of a Distributed Object Storage System on Peer Nodes Diplomarbeit Von Roger Kilchenmann aus Zürich vorgelegt am Lehrstuhl für Praktische Informatik IV Prof. Dr. W. Effelsberg Fakultät für Mathematik und Informatik Universität Mannheim Mai 2002 Betreuer: Prof. Dr. E. Biersack, Institute Eurecom, Sophia Antipolis
2 II
3 Contents Abstract List of Figures VII IX 1 Introduction Evolution of Internet Applications A Reliable Storage Network Outline Related Work File Sharing Applications Napster Gnutella FastTrack Swarmcast Distributed Storage Applications Freenet PAST Oceanstore/Silverback Cooperative File System (CFS) Problem Analysis Overlay Routing Networks Chord Increasing Reliability with Redundancy IDA Approach Replication Approach Replica Placement Caching and Load Balancing Separation of Data and Metadata Metadata Access Data Access III
4 IV CONTENTS 4 Framework Design Overview Object Oriented Programming Classes and Objects Inheritance Polymorphism Java Framework Layers Message Layer Lookup and Routing Layer Block Storage Layer Application Layer Message Layer Implementation Basic Node Thread Re-use Node-to-Node Communication Recursive Message Communication Lookup and Routing Layer Implementation Identifiers Successors and Predecessors Fingers Lookup Iterative Method Recursive Method Virtual Node Tunnelling Join and Leave Stabilization Process Notification Node Join Block Storage Layer Implementation Basic Elements Metadata Hash Paraloader Block Location Cache Block Replica Cache Storing a Block Fetching a Block Reorganization on Overlay Network Changes Metadata Reorganization Data Reorganization
5 CONTENTS V 8 Application Layer Implementation User Interface File Storage and Retrieval Block Event Methods Performance Iterative vs. Recursive Lookup Virtual Node Tunnelling Overall File Performance Conclusion and Future Work 65 Bibliography XI A Package p2p.layer.message XV B Package p2p.layer.lookup XXIII C Package p2p.layer.storage XLV D Package p2p.layer.application LIX E Package cache LXV Ehrenwörtliche Erklärung LXIX
6 VI CONTENTS
7 Abstract This work describes the design and implementation of a distributed system that uses empty disk space on Internet hosts for reliable storage of data objects. To increase the reliability of the system, the objects are replicated and distributed to peer-nodes of an overlay network that is spanned over the participating hosts. The Chord overlay network provides a robust and well scaling binding of objects to nodes, which is used to organize the object replicas in an environment of unreliable hosts that may join or leave the system frequently. It is robust against host failures and the binding is resolved by an efficient lookup operation that operates in time logarithmic to the number of hosts. Congestion of nodes due to non-uniform object access patterns is avoided by caching and parallel data access, which both distribute the load on many nodes and overcome some disadvantages of Chord s deterministic overlay network topology. The major part of the prototype implemented in Java consists of a versatile object oriented framework architecture. A hierarchy of framework layers provides generalized solutions to problems in peer-to-peer networking. The file storage application itself is a thin layer on top of this framework. VII
8 VIII CONTENTS
9 List of Figures 3.1 Chord Key Distribution with Virtual Nodes Chord Finger Example Chord Lookup Example Chord Lookup Hot-Spot I-Hop and P-List Caching Layer and Class Hierarchy Thread Pool Message Processing Iterative Lookup Pseudo-Code and Message Traffic Recursive Lookup Pseudo-Code and Message Traffic Node Join Pseudo-Code and Message Traffic Thread Interaction in Parallel Download Data Block Storage Data Block Retrieval Reorganization Caused by a Leaving Node Reorganization Caused by a Joining Node Splitting a File into Blocks Comparison of the Iterative and Recursive Lookup Latencies The Effect of Virtual Node Tunnelling on the Lookup Path Length Overall File Performance IX
10 X LIST OF FIGURES
11 Chapter 1 Introduction In the last years peer-to-peer applications became a lot of public attention. On the one hand it was the legal issue about content sharing and on the other hand the peer-to-peer paradigm seamed to be a new idea to many people. But the early internet was already designed like that and the Usenet, which appeared in 1979 and is still very popular, can be seen as one of the first P2P applications because there is no hierarchy or central control and the Network News Transport Protocol (NNTP) uses peer-to-peer communication between newsservers [16]. This early application already shares most of the properties that characterize P2P applications[27]: They take advantage of distributed, shared resources such as storage, CPU cycles, and content on peer-nodes Peer-nodes have identical capabilities and responsibilities Symmetrical communication between peer-nodes Significant autonomy from central servers for fault tolerance Operate in dynamic environment where frequent join and leave is the norm The reason why most people consider peer-to-peer applications as something radical new is that in the last decade the paradigm of Internet applications changed from decentralized applications like the Usenet to the server centric World Wide Web. 1.1 Evolution of Internet Applications Between 1995 and 1999 the Internet became a mass medium driven by the "killer" applications World Wide Web(WWW). This changed the application paradigm and had an influence on the further development of the Internet architecture. The World Wide Web is a typical client/server application. A Web client, now called browser, connects to a well known server, which returns the page according to the request 1
12 2 CHAPTER 1. INTRODUCTION of the client and closes the connection. Because the client initiates the communication, only the web server needs a permanent, well known Internet Protocol (IP) address. This behavior allowed the internet service providers (ISP) to satisfy the fast growing number of new internet users by assigning temporary IP addresses to dial up connections because the limited IP address space, 2 32 addresses of 4 byte length, was too small to assign a permanent IP address to every user. Temporary dial-up connections together with unpredictable IP addresses demand new concepts to organize and maintain a distributed network. Another property of the client/server based WWW application had impact on the technical development. Because of the asymmetry in the WWW service, page requests are much smaller than the page reply, the dial-up technologies were developed considering that asymmetry. ADSL and V.90 modems have three to eight time higher downstream bandwidth than upstream bandwidth. By removing the distinction between clients and servers, the P2P applications have a symmetrical bandwidth characteristics. The upstream path of asymmetric connections will limit the total throughput between peer-nodes. Therefore, new mechanisms need to be introduced for using the available bandwidth resources more efficiently. To summarize, the originally symmetric and deterministic internet architecture became asymmetric and dynamic due to changes in the application preferences. The new generation of P2P applications have to deal with a dynamic and asymmetric environment, which contradicts its inherent symmetry and imposes problems on reliability and efficiency. With technological progress and decreasing prices in the computer hardware, new applications for personal computers became possible. Due to increasing hard disk capacities and faster processors, playing and storing audio and video content became very popular. But exchanging multimedia content over the Internet was difficult and expensive for an unexperienced Internet user. Setting up a WWW or FTP server needs a decent amount of knowledge and a permanent Internet connection is hardly affordable for a private person. The new generation of P2P applications from the late 90 s offer a comfortable and easy way for everybody to publish and share content. This made P2P systems, such as Napster, very popular and resulted in about 38 million registered Napster users by Oct 2000[20].
13 1.2. A RELIABLE STORAGE NETWORK 3 So far three different types of P2P applications have developed: P2P File Sharing - Content driven applications sharing bandwidth and storage resources to provide efficient content distribution and storage. Some related applications are presented in the following Chapter "Related Work". P2P Messaging - Human presence is shared across a distributed and decentralized system like the Groove collaboration network[17]. P2P Computing - The distributed system shares CPU cycles for solving computing problems. The sum of idle CPU cycles on many workstation can replace very expensive supercomputers. A well known example is the SETI@home project[26], which uses idle CPUs on internet hosts to analyze radio signals from outer space in order to find signals from intelligent origin. 1.2 A Reliable Storage Network The Java based prototype presented in this work uses empty disk space available on Internet hosts to build a reliable storage system. By April 2002, a typical workstation PC is shipped with a hard disk of about 60 GB storage capacity. After the operating system and some other applications are installed, most of the capacity on the hard disk is still unused. For example, the software takes 10 GB and the remaining free disk space is 50GB. For an organization with 100 such workstations, the total amount of unused disk space is 5 TB. Nowadays, most workstations in an organization are connected together with a local area network (LAN) that uses the Internet protocol (IP). This work describes the design of such a system and explains the implementation of a simple file storage application, which uses an underlying framework architecture developed as a mayor part of this work. The goal is to achieve a maximum of reliability and fault tolerance for the storage service built out of Internet hosts, called nodes in this context. The storage network must be reliable while the nodes themselves are not. Unlike dedicated file servers, the nodes are workstations generally not shipped with redundant power supplies or RAID (Redundant Array of Inexpensive Disks) systems. Since the workstations are under control of their users, their system availability is not predictable. Users may shut down their workstations or a network link may temporarily fail.
14 4 CHAPTER 1. INTRODUCTION Assuming a heterogenous and dynamic environment of hosts connected together by a high bandwidth and low latency IP network this work has focuses on: reliability of data storage scalability in terms of the number of hosts and content requests efficient usage of the available resources Further, it is assumed that there are no restrictions concerning firewalls and Network Address Translation (NAT) issues. All hosts are willing to cooperate by relaying messages and they store data if their storage quotas have not been exceeded. 1.3 Outline This work is organized as follows: In the Chapter "Related Work" existing applications with their solutions to P2P specific problems are analyzed. The next three chapters reflect the general object oriented approach of software development: problem analysis, design and implementation. First the problem fields need to be identified and a general solution has to be developed. This is done in the Chapter "Problem Analysis". In the second step the design breaks the general solution into software layers with well defined functionality and interfaces described in the Chapter "Framework Design Overview". For each layer the algorithms and some important implementation details are presented in its own chapter. In the Section "Performance" the effect of implementation alternatives and optimizations on performance are examined and the overall file storage performance is evaluated. This work closes with the Chapter "Conclusion and Future Work", where the results of this work are summarized and an outlook for future improvements is given. Important terms are emphasized with bold font when they are introduced. Class and method names are always emphasized with italic.
15 Chapter 2 Related Work The field of P2P applications targeted to share storage resource can be divided into two groups. The first group offers file sharing and content distribution capabilities. Because content on hosts is shared, the main task for this group of application is content location and its distribution. The different methods of how to find content items and how the content items are distributed are examined. The second group offers a distributed file system service. The ability of reliable and persistent storage distinguishes it from the first group. Apart from the content location and distribution, the mechanisms to achieve reliability in a dynamic and unreliable environment are examined for this group of applications. 2.1 File Sharing Applications Napster Although Napster[10] is often referred as the first P2P application, its does not follow a true P2P concept. Napster can be characterized as a client/server system for centralized content metadata lookup combined with direct client-to-client connections for content delivery. At startup the Napster client software connects to a central Napster server, authenticates itself with login and password and registers its shared content s metadata to a central index database. A content query is sent to the central index server, which processes the query by a index database lookup, and returns to the client a list of matching content metadata records containing the network location of the client sharing the content item, its exact filename and some bandwidth and latency information. From this list the user has to choose a client from whom to download the content file. The download reliability is low because only a single unreliable source is used and a broken download is not automatically continued from a dif- 5
16 6 CHAPTER 2. RELATED WORK ferent source. Another conceptual problem is using a central server for content location, which is neither a reliable nor a scalable solution. Napster needs to operate several of those central servers to achieve fault tolerance and load balancing because a single server can only handle a limited number of users simultaneously. Above this threshold, the server will reject connection requests and the client has to try another server. A several servers are necessary to serve peak load, but at other times they will be idle, which results in bad resource allocation. After connecting to one of the central servers, the client stays connected to its server for the whole session. Since each server maintains its own index database, a user will only see a restricted view of the total content available. The handicap of Napster is the centralized index, which simplifies the system but results in a single point of failure and a performance bottleneck Gnutella To avoid the disadvantages of Napster, the Gnutella network is decentralized. The only central component is the host cache service, which is used by the servants, a Gnutella specific term of combined client and server, to find a bootstrap node. The Gnutella protocol uses a time-to-live (TTL) scoped flooding for servant and content discovery. A servant is permanently connected to a small number of neighbors. When a servant receives a request from one of his neighbors, it decreases the TTL counter of the request and forwards it to all its neighbors if the TTL is greater than zero. The reply is routed back along the reverse path. There are two important request/reply pairs. A Ping request for discovering new servants is replied with a Pong response message containing the IP address, TCP port and some status information. The other pair is the Query request, which contains query keywords and is answered with a QueryHit if the Query matches some files shared by the servant. The QueryHit is routed back along the reverse path to the servant that initiated the Query and contains the necessary information to start a direct download of the file, which is done similar to the HTTP get command. The main disadvantage of Gnutella is the distributed search based on scoped flooding, which does not scale in terms of the number of servants[21] because the number of messages grows exponentially and uses much of the servant s bandwidth. To reduce the number of servants the next generation of the Gnutella protocol will introduce supernodes, which will act as a message routing proxies for clients with limited bandwidth. These clients, called shielded nodes, have only a single connection to one supernode, which shields them from routing Gnutella messages. The supernode concept is a result of the nodes heterogeneity observed in the real world. Not all nodes are really equal concerning their resources and by far not all user want to share them.
17 2.1. FILE SHARING APPLICATIONS FastTrack The FastTrack protocol, used in the KaZaa and Morpheus application [8], is a hybrid and two layered architecture of peers connect to supernodes, which themselves are connected together. A supernode acts like a local search hub that maintains the index of the media files being shared by each peer connected to it and proxies search requests on behalf of its local peers. FastTrack elects a peers with sufficient bandwidth and processing power to become a supernode if its user has allowed it in the configuration. A search results in FastTrack contains a list of files that match the search criteria. FastTrack uses parallel download and client side caching for file transfers. A file is logically split into segments and these segments are downloaded from other peers that share the same file or, in the case of client side caching, do download this file and share the segments downloaded so far until the download is completed. This can increase the download speed significantly, especially for asymmetric dial-up connections because the limited upstream bandwidths add up together. As FastTrack is a proprietary protocol, it is so far difficult to evaluate what scaling properties the supernode network has Swarmcast Swarmcast [15] is a content distribution network. The content provider has to host content on his own server and Swarmcast s job is to boost the download and to ease the burden on the contents providers server. This is done by parallel downloading and locality based client side caching. For each file a temporary mesh of client nodes downloading this file is maintained in order to find other close nodes to exchange file parts. The provider s file is broken into parts, which are then encoded into packets with a forward error correction code (FEC) [22]. A (n, k) forward error correction encodes k source packets into n > k encoded packets. The encoding is such that any subset of k encoded packets suffices to reconstruct the source data. Swarmcast reduces complexity and communication overhead by randomly sending the packets to other nodes in the mesh and Swarmcast use FEC encoding to avoid the potential overlap in duplicate packets, which would otherwise drastically decrease the utility of each packet. The encoded packets are then spread randomly among the nodes of the mesh, which will exchange them until the nodes have enough packets to reconstruct the file. After downloading, the nodes should keep the packets in a their cache to support the other nodes in the mesh. The system scales nicely because the more requests there are for a file, the more nodes join the mesh and the more packets are cached and exchanged.
18 8 CHAPTER 2. RELATED WORK 2.2 Distributed Storage Applications Distributed storage applications must have an active replication strategy to increase reliability as compared to file sharing and content distribution application, which more rely the fact that with a large number of users sharing content, the probability of content being available can be quite high, however without determination. An important difference to file sharing applications is that distributed storage applications in general have a publish process, which adds content items to the system. The location of the content items is not predefined like it is for file sharing applications Freenet Freenet[3] is a distributed publishing system, which provides anonymity to publishers and consumers of content. An adaptive network is used to locate content by forwarding requests to nodes that are closer to the key that identifies a content item. On each hop information whether the item was found on this path or not travels in backward direction and is temporarily stored on the nodes. The next request for the same key takes advantage of this information and gets routed directly to the content source. When the query reaches the content source, the content is propagated along the query s reverse path and cached in the intermediate nodes. Freenet uses an intelligent flooding search, where the routing information and cached copies are stored along the path. The more requests for a content item, the more cached copies and routing information are available. If there has been no request in a period of time for a content item, the nodes discard the content items because all routing information about this item on the other nodes has already timed out and the item is not referenced anymore. As a consequence, published content is only stored persistent as long as there is enough demand to keep routing information alive. The content objects are floating around in the network and there is only temporal and local knowledge about where the content is actually located. To provide anonymity to the publishers and consumers, there is no direct peer-to-peer data transfer. Instead, the content data is routed through the network. Nodes with low bandwidth may become a bottleneck to the system and the flooding based content lookup causes scales badly PAST PAST [6] is a persistent peer-to-peer storage utility, which replicates complete files on multiple nodes. Pastry [24] is used for message routing and content location. PAST stores a content item on the node whose node identifier nodeid is closest to the file identifier fileid. Routing a message to the closest node is done by choosing the next hop node whose nodeid shares with the fileid a prefix that is at least one
19 2.2. DISTRIBUTED STORAGE APPLICATIONS 9 digit longer than the prefix that the fileid shares with the present node s nodeid. The fileid is generated by hashing the filename and the nodeid is assigned randomly when a node joins the network. The routing path length scales logarithmic in terms of the overall number of nodes in the network. For each file an individual replication factor k can be chosen and replicas are stored on the k nodes that are closest to the fileid. Maintaining the k replicas in the case of a node failure is detected by the Pastry background process of exchanging heartbeat messages with neighbors. When a node detects a neighbor node s failure, the replica is automatically replaced on another neighbor. Free storage space is used to cache files along the routing path, while approaching the closest node during the publish or retrieval process. This can only be done if the file data is routed along the reverse query path. Thus there is no direct peer-to-peer file transfer. Similar to the Freenet system, nodes with low bandwidth may become a bottleneck Oceanstore/Silverback Silverback[30] is the archival layer of the Oceanstore system. For routing and content location Tapestry[32] is used, which is a distributed version of Plaxton s hashed-suffix routing and is quite similar to Pastry. Therefore, the number of hops and messages is logarithmic to the total number of nodes in the network. A file is split into blocks, which are then encoded with a forward error correction (FEC) code into n fragments. The block s binary data is hashed into a blockid, which is used to route the n block fragments to the n closest nodes in terms of the most common suffix of the nodeids and the blockid. The fragments are periodically republished by a file s Responsible Party to increase reliability and they are cached along the path to reduce the access latency and to balance the load on several nodes. The system features a file version management, which use tombstones to reduce storage resources by storing only the difference of a file block compared to the latest tombstone version of a block Cooperative File System (CFS) CFS[4] is a read only file system built on top of Chord[28], which is used for content location. Chord belongs to the same family of second generation peer-to-peer resource location services like Pastry and Tapestry, which use routing for content location. On each hop the closest routing alternative is chosen to approach the closest node defined by a metric of the identifiers generated by a hash function. The basic idea is that nodes closer to the routing target have a more detailed view over the target s neighborhood and this knowledge is exploited to approach the target. Since Chord was chosen as the lookup and routing service for this work, it is described in detail in Section Right now, it s enough to know that the hashed identifiers are interpreted as n-bit numbers, which are arranged in a circle by the natural integer order. A file is split into blocks identified by blockids. By definition, the r block
20 10 CHAPTER 2. RELATED WORK replicas are stored at the successor node of the blockid and its r 1 immediate successor nodes. The successor node is the closest node to an item identified by a blockid or a nodeid and per definition it is the node that immediately follows the item s ID in the circle. When a node joins the circle, a block s successors node can change and the network has to move some blocks to the new node to maintain the property of storing the block s replicas on the closest r nodes to the blockid. The blockid s successor node is responsible for maintaining the r 1 block replicas on its r 1 successor nodes by periodically verifying their availability and replacing replicas in case of a node failure. Similar to PAST and Silverback, cache replicas are stored along the reverse lookup path when the requested block data is returned to reduce latency and to balance the load.
21 Chapter 3 Problem Analysis Peer-to-Peer applications aim to take advantage of shared resources in a dynamic environment where fluctuating participants are the norm. Hence there is a need for a resource centric address scheme working under dynamic conditions[27]. In this work the resource of interest is free disk space available on Internet hosts. Each host is identified by a unique Internet Protocols (IP) host address used for packet routing to this host. To establish a communication to a host, an additional port number is necessary to identify the software that handles the communication process on that host. The IP address together with the port number is the network location, which is necessary to communicate with the software on a host managing its storage resources. Moreover, a content item needs a content name that identifies it among all the other content items. A resource centric addressing scheme for storage related applications provides a binding from content names to network locations, which are resolved by a lookup operation[25]. The Domain Name System (DNS)[7] is an excellent example of such an addressing scheme. It is a host centric addressing scheme because it was introduced to map human readable host names to host IP addresses. A host name consists of domain names separated by dots, which are interpreted as a hierarchy where the last domain name is the top level domain like com, net and org. Basically, there are two possibilities how the binding is stored. In a single flat "hosts.txt" file or distributed over a hierarchical topology of DNS servers, which store the the necessary information to resolve a DNS lookup by traversing the hierarchy. Because the number of lookup steps is limited by the number of hierarchy levels and caching is used in all levels, the DNS scaled to times its original size[16]. But the binding information in this system is manually maintained and changes need hours, if not days, to penetrate thought the system. Therefore, it is not well suited for P2P systems with participants that in average stay in a P2P System for less than an hour. 11
22 12 CHAPTER 3. PROBLEM ANALYSIS The way how Napster resolves the host addresses compares to the single flat file in the DNS. The central real time index on a Napster server stores all binding information to map content name fragments (keywords) to IP addresses, which can be looked up by the clients. The disadvantages of such a solution was already discussed. The distributed lookup by flooding performed by Gnutella provides minimal lookup latency, but trades it against bandwidth and scalability because the number of messages and the bandwidth grows exponentially with the number of nodes. For a file system application the situation is different from that of file sharing application. In file sharing applications the content is already stored on hosts and they have to discover the content stored on the hosts like Napster and Gnutella do. For an application like the one proposed in this work, the system itself decides where the content items are stored during the publish process and therefore an addressing scheme based on an overlay network can be used, which resolves bindings by routing. The addressing scheme maps content items to nodes and this mapping is used to store and retrieve the content items. 3.1 Overlay Routing Networks An overlay routing network is built of nodes connected together by a network of distinct topology. This network is a logical overlay network because the nodes communicate over an underlying communication network. But the logical network topology has influence on the routing algorithm. To resolve a binding for a content name, a message is routed to a node that is "closest" to the content name according to a metric of the node identifiers and the content names. The communication network address of this "closest" node is the result of the lookup operation. The routing algorithm on each node exploits local routing knowledge to route the message to the "closest" local routing alternative until there is no "closer" routing alternative. To define the closeness, there has to be a metric that applies to both, the node identifier and content name space. This can be achieved by using a hash function that deterministically maps node identifiers and content names into a flat (and uniformly populated) hash space and a metric is chosen to define the closeness. Pastry, Tapestry and Chord are based on this idea and therefore share the same average lookup path length of log(#nodes) hops, but they use different metrics and overlay network topologies.
23 3.1. OVERLAY ROUTING NETWORKS Chord Chord is a distributed lookup and routing overlay network using consistent hashing[9], originally introduced for distributed caching. Nodes are organized in a circular topology by using m-bit node IDs interpreted as nonnegative integer numbers wrapped around at zero. The total ordering of integer numbers assigns each node a predecessor and successor node. A node ID is generated by applying a hash function to the node s host IP address. Therefore, the overlay network becomes a deterministic function of the host address. In other words, the host IP address determines the position in the circle. This makes the overlay network topology completely unaware of the underlying network layer topology, which has some positive and negative effects. Routing a message to the successor - neighbor in the overlay network - could result in routing to the other side of the world in the underlying IP network, causing high latency. On the other hand, IP network failures in a region do not map to a region of the logical overlay network region, often used for placing redundant replicas, like in CFS or PAST. A content item is also identified by an m-bit content key, and the binding from keys (hashed content name) to node IDs (hashed host address) is defined by the successor function. A key k is located at the node n with the same or the next higher ID than the key k, written as n = successor(k). The content item associated with the key k is not stored in Chord itself. Chord just assigns a Responsible Node whose network location is used to access the content item. If a host operates more than one node, they are called Virtual Nodes and their node ID is calculated by hashing the host IP address together with a small Virtual Node ID, which is only unique on that host. The reason to use Virtual Nodes is that for a small number of nodes in the circle, the distance in the identifier space between nodes is not likely to be equally distributed as desired, which results in a unequal distribution of keys per real nodes.
24 14 CHAPTER 3. PROBLEM ANALYSIS Figure 3.1: Distribution of keys per real node depending on Virtual Nodes per real node for a simulated network with 10 4 real nodes and 10 6 keys. Figure 3.1 taken from [28] shows that increasing the total number of nodes by introducing multiple Virtual Nodes per real node, balances the number of keys per real nodes. In CFS the number of Virtual Nodes on a host is also used to adjust to the available storage capacity because a node is not allowed to reject a storage request. The separation of content data and content meta data, discussed later in this chapter, can eliminate this problem by introducing one more degree of freedom. But an equal distribution of keys per real node is still a desirable property and Virtual Nodes give significant optimization potential for the lookup as described in the implementation Section Using the same identifiers for nodes and keys leads to a combined lookup and routing. A lookup is resolved by routing a message to the node that is the successor of the key. Every node knows at least two other nodes, its successor and its predecessor. The simple lookup algorithm routes a messages around the circle by following the successor pointers until a node with the same ID or the next higher ID than the key is found. The metric used in Chord is the numerical difference of key and node ID. Routing is done by choosing the local routing alternative that minimizes this difference. As one node is only aware of its successor and its predecessor as available routing alternatives, a lookup message is always traversing the circle in direction of the successor pointers because only this reduces the distance. In the worst case a message has to complete a full circle turn, before the node that is successor to the key is found. Resolving a lookup with the simple algorithm takes O(#nodes) hops. As long as every node has a working pointer to the immediate successor in the cir-
25 3.1. OVERLAY ROUTING NETWORKS 15 cle, a successful successor lookup is guaranteed. In a real application with frequent join and leave, a single successor pointer is not sufficient enough to guarantee a successful lookup. A single node failure would break the circle and result in lookup failures. Therefore, redundant successor pointers are used. As long as one working successor pointer is found, the lookup routing can proceed and a successful lookup is guaranteed. To reduce the average lookup path length to a practical number, a finger table with additional routing information is introduced. Fingers are like shortcuts, used instead of going around the circle from node to node following the successor pointers. Every node divides the circle into m finger intervals with exponentially growing size in power of 2. Finger Table n = 80; k = 1.. m 80 [start end) length node k n + 2 k-1 mod 2 m n + 2 k mod 2 m 2 k-1 successor( start) N110 N N80 N120 finger[6] N m = 7 bit Figure 3.2: An example of a finger interval with the finger pointer A finger points to the successor of the interval start, which could result in finger pointers being outside their corresponding finger interval. The finger nodes are resolved by the Chord lookup function, which returns the successor node of the interval start ID. Using the finger table, adds O(m) additional routing alternatives and the one is chosen that leads closest to the successor of the key. The higher the finger index, the farer away the finger points. Therefore, the finger table is searched in reverse order, starting at the finger[m]. If a finger i points to a node preceding the key, this hop reduces the distance to the key by 2 i 1. With a few hops, the distance to the key is quickly reduced, which results in an average lookup path length of O(log(#nodes)). This bound was proven theoretically and verified by controlled experiments in a the Chord paper[28].
26 16 CHAPTER 3. PROBLEM ANALYSIS Figure 3.3 shows a detailed Chord routing example using finger tables in a m = 7 bit circular hash space. Starting at node N32, which wants to resolve the successor of the key K19, N32 looks in its finger routing table for the node that closest precedes K19. The finger table is searched in reverse order, starting at the finger with index 7. This finger matches the criteria and therefore the lookup continues at N99. On N99 the finger table is searched again. The 7 th finger N60 does not precede K19 and therefore the 6 th finger is tested. This one, pointing to N5, precedes K19, hence the lookup continues on N5. N5 finds N10 as its closest preceding finger. N10 now terminates the lookup because it can make out that its successor N20 is the successor node of K19. N5 Finger Table 5 [start end) node Finger Table 99 [start end) node N110 N99 N80 N10 N20 N32 K19 lookup(k19) Finger Table 32 [start end) node Figure 3.3: An example of a lookup using the finger table. The two important properties of Chord are inherited from using ranged hash functions as proposed in consistent hashing[28]: 1. balance property : the number of keys per real nodes is K with high probability (if each real node has O(logN) Virtual Nodes), where K is the number N of keys and N is the number of nodes. The responsibility for keys is equally distributed among all nodes. 2. monotony property : when the (N + 1) th node joins, the binding for O( K N ) keys changes from an existing nodes to the new node. In other words, the responsibility for keys changes only from existing nodes to new nodes, never from existing nodes to existing nodes. There is only a local reorganization on a node join.
27 3.2. INCREASING RELIABILITY WITH REDUNDANCY 17 Chord offers a scalable, robust, and balanced mapping of hashed content names to host network locations (IP address and a Virtual Node ID), which allows to communicate with these Virtual Node over the underlying network layer. The Chord overlay network delegates responsibility for content items to nodes. Chord does not store the data itself. To be consistent with the terminology introduced by Chord, a content item is identified by its key k, which of course is a Chord m-bit identifier. The node that is responsible for a content item k is called the primary Responsible Node RN1 k of key k and is defined as RN1 k = successor(k) by the lookup function. In a perfect world with no host failures, the straightforward solution would be storing content items on their primary Responsible Nodes. 3.2 Increasing Reliability with Redundancy The reliability for data storage on unreliable nodes is increased by adding redundant information and dispersing this information to several nodes. The reliability expressed as the probability of a successful data access is determined by: The amount of redundant information added The number of nodes and their independent failure probabilities How redundant information is distributed over multiple nodes The problem of storing data blocks on unreliable nodes is closely related to storing data blocks on a hard disk array like it is done for RAID storage solutions [2]. A RAID system is a redundant array of inexpensive disks. This technology was developed to organize small hard disks into arrays to replace much more expensive high capacity disks and to reduce the risk of data loss due to hard disk failures. There are several approaches how to distribute data over disks, or like in this case storage nodes. Two of them are now discussed IDA Approach IDA stands for Information Dispersal Algorithm, proposed by M. Rabin[19]. The basic idea is to disperse the content of a data block into n fragments. The original data can be reconstructed out of any subset of k fragments, where k <= n. One major aspect of this algorithm is that redundancy is added uniformly; there is no distinction between data and parity. This property allows to control the amount of redundant data in fine granularity. To tolerate up to r simultaneous node failures, the data block has to be encoded into n = k + r fragments. If all nodes have the independent failure probability p,
28 18 CHAPTER 3. PROBLEM ANALYSIS this gives a access reliability: p(access) = 1 n i=r+1 ( n i ) p i (1 p) n i The redundancy necessary to achieve this reliability is n k. It is obvious that the k IDA approach needs less amount of storage resource compared to the straight forward replication approach. For the same reliability the replication approach needs r times redundancy. The currently available Forward Error Correcting (FEC) codes, such as the Read- Solomon code [22], have encoding times quadratic to the number of the encoded blocks n. Tornado codes [12] achieve a linear encoding time to n, but so far there is no free implementation available. A performance comparison of the different codes can be found here[13] Replication Approach The other redundancy scheme, block replication or also called mirroring, was already mentioned and compared to the IDA approach. An analysis of the past development in hardware shows that the hard disk storage space doubled every 18 month, which is often referred as Moore s Law, and it is expected to hold for the next decade[31]. While the capacity per disk is growing, the price per storage unit is falling. By April 2002, the average hard disk that is shipped with a workstation can store between 40 to 60 GB. Therefore, disk storage capacity is not considered as a limited resource. Since the prototype will be implemented in Java, one should take into account that Java code that executes numeric calculation is likely to be 10 to 30 times slower than native machine code generated out of C code. Using FEC codes, based on polynomial arithmetic, will always produce significantly higher CPU load compared to replication. The storage node software is expected to run on workstations with priority to the user processes, not dedicated single purpose servers. Therefore, it should run as a background process with low priority and consume as less cpu cycles as possible. For these two reasons, the replication scheme will be used for to increase the reliability of the prototype Replica Placement The locations where replicas are stored depend on the overlay network topology that is used. In general, replicas are stored in the logical neighborhood of the primary Responsible Node. When the primary Responsible Node fails, the routing and lookup mechanism of the overlay network will assign the responsibility to another
29 3.3. CACHING AND LOAD BALANCING 19 node logically close to the failed primary Responsible Node. In Chord, a node that failed gets replaced by its immediate successor. Either the new node already has a replica or the new could can ask its neighbors for a replica. The set of nodes storing replicas of a content item identified by its key k are called Responsible Nodes and defined as: {RNi k i = 1... r}. The name expresses that they are altogether responsible to increase the reliability in terms of the access probability determined by the replication factor r. When this idea is applied to the circular overlay network topology used by Chord, the Responsible Nodes have to be either the primary Responsible Node s r 1 successors or its r 1 predecessors. This decision should take into account how the content items are located by the overlay network and if there are implications for caching and load balancing schemes that could be used to improve the access performance. 3.3 Caching and Load Balancing Chord s balance property results in a uniform distribution of responsibility for content items among all nodes. However, non-uniform information access due to popular content will create hot-spots in the overlay network and congestion in the underling network if not avoided by caching and load balancing mechanisms. The design of these mechanism is closely related to the overlay network s topology and its routing algorithms because both have influence on the routing path through the overlay network and therefore on the locations of hot-spots. Most of the activity in this distributed peer-to-peer system will be caused by locating and accessing content for which caching is used to increase the performance. Chord itself has been developed in the field of distributed cache design based on consistent hashing. Hence, peer-to-peer design should consider some general design principles for distributed caching[29]: 1. Maintain a hierarchy of metadata that tracks where copies of data are stored. 2. Separate data paths from metadata paths 3. Use direct cache-to-cache data transfers to avoid store-and-forward delays Separation of Data and Metadata Caching is used to improve download performance by placing or locating cache replicas closer to the user than the content itself, assuming that closer in terms of network proximity will result in higher throughput. In Chord s case, where network proximity is not reflected by the overlay network, it is difficult to find a close replica.
30 20 CHAPTER 3. PROBLEM ANALYSIS Therefore, a parallel access scheme, which accesses several replicas in parallel, will be used to increase download performance. Farther details about parallel access can be found in the Section According to the first design principle, a content item s metadata structure contains pointers to nodes where replicas to increase the reliability or to distribute the load are located. This metadata structure is then used for parallel access. Accessing a content item is a two step process: 1. Accessing metadata information by primary Responsible Node lookup 2. Using the metadata pointers for parallel access Following the second design principle, data and metadata access paths are separated and for each an individual caching and load balancing scheme is designed that exploits some of its access characteristic. In the Section the metadata access caching scheme and in Section the data access caching scheme is explained in detail. Two ways of separating data and metadata are possible, real and logical separation. Real separation is when a content item s metadata and the replica data are located on different nodes. Logical separation means, that data and metadata are on the same node, but are distinguished in the sense of their different roles in the two step access pattern. First a node is accessed to return metadata, then it is accessed again in the parallel access process. The idea for real separation was originally developed to overcome a negative effect of Chord s balance and monotony properties. When a node joins the Chord ring, some of the keys the new node becomes responsible for shift to the new node. For an average of K keys per node, K replicas have to be transferred to the new N N node. In real life, the circle will be sparsely populated with N nodes and a much higher number K of keys, which makes K 1. Depending on the number of keys, N this can cause high load on the underlying network link between the new node that joins and the existing successor node if the keys are directly transferred from the old node to the new node. From one point of view, this data transfer is not necessary because the replicas on the old node have not vanished and therefore there is no need for shifting data due to a change of responsibility. In order to drastically reduce the data transferred, instead of moving the real data, much smaller metadata is shifted from the old node to the new node to reflect the change of responsibility. An additional degree of freedom is introduced, which allows to choose a node where a replica is stored. This has the advantage that nodes with low storage resource usage can be preferred and an explicit balancing of storage resources can be achieved. It is not necessary anymore
Hierarchical peer-to-peer look-up service. Prototype implementation
Hierarchical peer-to-peer look-up service Prototype implementation (Master Thesis) Francisco Javier Garcia Romero Tutor in Institut Eurecom: Prof. Dr. Ernst Biersack March 28, 2003 Acknowledges I first
More informationTelematics Chapter 9: Peer-to-Peer Networks
Telematics Chapter 9: Peer-to-Peer Networks Beispielbild User watching video clip Server with video clips Application Layer Presentation Layer Application Layer Presentation Layer Session Layer Session
More informationA Survey of Peer-to-Peer Content Distribution Technologies
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004 Presenter: Seung-hwan Baek Ja-eun Choi Outline Overview
More informationPeer-to-Peer Systems. Chapter General Characteristics
Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include
More informationMarch 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE
for for March 10, 2006 Agenda for Peer-to-Peer Sytems Initial approaches to Their Limitations CAN - Applications of CAN Design Details Benefits for Distributed and a decentralized architecture No centralized
More informationCS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications
Introduction to Computer Networks Lecture30 Today s lecture Peer to peer applications Napster Gnutella KaZaA Chord What is P2P? Significant autonomy from central servers Exploits resources at the edges
More informationFlooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9
Peer-to-Peer Networks -: Computer Networking L-: PP Typically each member stores/provides access to content Has quickly grown in popularity Bulk of traffic from/to CMU is Kazaa! Basically a replication
More informationScalable overlay Networks
overlay Networks Dr. Samu Varjonen 1 Lectures MO 15.01. C122 Introduction. Exercises. Motivation. TH 18.01. DK117 Unstructured networks I MO 22.01. C122 Unstructured networks II TH 25.01. DK117 Bittorrent
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [P2P SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Byzantine failures vs malicious nodes
More informationDISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES
DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline System Architectural Design Issues Centralized Architectures Application
More informationContent Overlays. Nick Feamster CS 7260 March 12, 2007
Content Overlays Nick Feamster CS 7260 March 12, 2007 Content Overlays Distributed content storage and retrieval Two primary approaches: Structured overlay Unstructured overlay Today s paper: Chord Not
More informationOverlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 20.1.2014 Contents P2P index revisited Unstructured networks Gnutella Bloom filters BitTorrent Freenet Summary of unstructured networks
More informationAssignment 5. Georgia Koloniari
Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last
More informationCSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter
CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing March 8, 2016 rof. George orter Outline Today: eer-to-peer networking Distributed hash tables Consistent hashing
More informationEE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002
EE 122: Peer-to-Peer (P2P) Networks Ion Stoica November 27, 22 How Did it Start? A killer application: Naptser - Free music over the Internet Key idea: share the storage and bandwidth of individual (home)
More informationCompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang
CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4 Xiaowei Yang xwy@cs.duke.edu Overview Problem Evolving solutions IP multicast Proxy caching Content distribution networks
More informationOverlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen
Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationAn Expresway over Chord in Peer-to-Peer Systems
An Expresway over Chord in Peer-to-Peer Systems Hathai Tanta-ngai Technical Report CS-2005-19 October 18, 2005 Faculty of Computer Science 6050 University Ave., Halifax, Nova Scotia, B3H 1W5, Canada An
More informationPeer-to-Peer (P2P) Systems
Peer-to-Peer (P2P) Systems What Does Peer-to-Peer Mean? A generic name for systems in which peers communicate directly and not through a server Characteristics: decentralized self-organizing distributed
More informationPeer-to-peer computing research a fad?
Peer-to-peer computing research a fad? Frans Kaashoek kaashoek@lcs.mit.edu NSF Project IRIS http://www.project-iris.net Berkeley, ICSI, MIT, NYU, Rice What is a P2P system? Node Node Node Internet Node
More informationDistributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationIntroduction to Peer-to-Peer Systems
Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed
More informationINF5070 media storage and distribution systems. to-peer Systems 10/
INF5070 Media Storage and Distribution Systems: Peer-to to-peer Systems 10/11 2003 Client-Server! Traditional distributed computing! Successful architecture, and will continue to be so (adding proxy servers)!
More informationOverlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 19.1.2015 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find
More informationEarly Measurements of a Cluster-based Architecture for P2P Systems
Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella
More informationUnit 8 Peer-to-Peer Networking
Unit 8 Peer-to-Peer Networking P2P Systems Use the vast resources of machines at the edge of the Internet to build a network that allows resource sharing without any central authority. Client/Server System
More informationOverlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q
Overlay networks To do q q q Overlay networks P2P evolution DHTs in general, Chord and Kademlia Turtles all the way down Overlay networks virtual networks Different applications with a wide range of needs
More informationDistributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 17. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2016 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can
More informationPeer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today
Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become
More informationEE 122: Peer-to-Peer Networks
EE 122: Peer-to-Peer Networks Ion Stoica (and Brighten Godfrey) TAs: Lucian Popa, David Zats and Ganesh Ananthanarayanan http://inst.eecs.berkeley.edu/~ee122/ (Materials with thanks to Vern Paxson, Jennifer
More informationEECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations
EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley
More informationLecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011
Lecture 6: Overlay Networks CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 1 Overlay networks: Motivations Protocol changes in the network happen very slowly Why? Internet is shared
More informationEARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.
: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department
More informationPeer-to-Peer Signalling. Agenda
Peer-to-Peer Signalling Marcin Matuszewski marcin@netlab.hut.fi S-38.115 Signalling Protocols Introduction P2P architectures Skype Mobile P2P Summary Agenda 1 Introduction Peer-to-Peer (P2P) is a communications
More informationPeer-to-Peer Networks
Peer-to-Peer Networks 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross Administrivia Quiz #1 is next week
More informationPeer-to-Peer Internet Applications: A Review
Peer-to-Peer Internet Applications: A Review Davide Quaglia 01/14/10 Introduction Key points Lookup task Outline Centralized (Napster) Query flooding (Gnutella) Distributed Hash Table (Chord) Simulation
More informationArchitectures for Distributed Systems
Distributed Systems and Middleware 2013 2: Architectures Architectures for Distributed Systems Components A distributed system consists of components Each component has well-defined interface, can be replaced
More informationOverlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma
Overlay and P2P Networks Introduction and unstructured networks Prof. Sasu Tarkoma 14.1.2013 Contents Overlay networks and intro to networking Unstructured networks Overlay Networks An overlay network
More informationOverlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.
Overlay Networks: Motivations CS : Introduction to Computer Networks Overlay Networks and PP Networks Ion Stoica Computer Science Division Department of lectrical ngineering and Computer Sciences University
More informationMotivation for peer-to-peer
Peer-to-peer systems INF 5040 autumn 2015 lecturer: Roman Vitenberg INF5040, Frank Eliassen & Roman Vitenberg 1 Motivation for peer-to-peer Ø Inherent restrictions of the standard client/ server model
More informationPage 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"
How Did it Start?" CS162 Operating Systems and Systems Programming Lecture 24 Peer-to-Peer Networks" A killer application: Napster (1999) Free music over the Internet Key idea: share the storage and bandwidth
More informationSearching for Shared Resources: DHT in General
1 ELT-53206 Peer-to-Peer Networks Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original
More informationSearching for Shared Resources: DHT in General
1 ELT-53207 P2P & IoT Systems Searching for Shared Resources: DHT in General Mathieu Devos Tampere University of Technology Department of Electronics and Communications Engineering Based on the original
More informationAddressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?
Peer-to-Peer Data Management - Part 1- Alex Coman acoman@cs.ualberta.ca Addressed Issue [1] Placement and retrieval of data [2] Server architectures for hybrid P2P [3] Improve search in pure P2P systems
More informationIntroduction to P2P Computing
Introduction to P2P Computing Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction A. Peer-to-Peer vs. Client/Server B. Overlay Networks 2. Common Topologies 3. Data Location 4. Gnutella
More informationMaking Gnutella-like P2P Systems Scalable
Making Gnutella-like P2P Systems Scalable Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker Presented by: Herman Li Mar 2, 2005 Outline What are peer-to-peer (P2P) systems? Early P2P systems
More informationDistributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC
Distributed Meta-data Servers: Architecture and Design Sarah Sharafkandi David H.C. Du DISC 5/22/07 1 Outline Meta-Data Server (MDS) functions Why a distributed and global Architecture? Problem description
More informationPeer to Peer Networks
Sungkyunkwan University Peer to Peer Networks Prepared by T. Le-Duc and H. Choo Copyright 2000-2017 Networking Laboratory Presentation Outline 2.1 Introduction 2.2 Client-Server Paradigm 2.3 Peer-To-Peer
More informationThe Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla
The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -
More informationP2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili
P2P Network Structured Networks: Distributed Hash Tables Pedro García López Universitat Rovira I Virgili Pedro.garcia@urv.net Index Introduction to DHT s Origins of structured overlays Case studies Chord
More informationPeer-to-Peer Architectures and Signaling. Agenda
Peer-to-Peer Architectures and Signaling Juuso Lehtinen Juuso@netlab.hut.fi Slides based on presentation by Marcin Matuszewski in 2005 Introduction P2P architectures Skype Mobile P2P Summary Agenda 1 Introduction
More information08 Distributed Hash Tables
08 Distributed Hash Tables 2/59 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per
More informationOverview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste
Overview 5-44 5-44 Computer Networking 5-64 Lecture 6: Delivering Content: Peer to Peer and CDNs Peter Steenkiste Web Consistent hashing Peer-to-peer Motivation Architectures Discussion CDN Video Fall
More informationDepartment of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing
Department of Computer Science Institute for System Architecture, Chair for Computer Networks File Sharing What is file sharing? File sharing is the practice of making files available for other users to
More informationPeer-peer and Application-level Networking. CS 218 Fall 2003
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed Hash Tables (DHT) Chord CAN Much of this material
More informationCIS 700/005 Networking Meets Databases
Announcements CIS / Networking Meets Databases Boon Thau Loo Spring Lecture Paper summaries due at noon today. Office hours: Wed - pm ( Levine) Project proposal: due Feb. Student presenter: rd Jan: A Scalable
More informationDistributed Hash Tables: Chord
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 12 th February 2016 Today: DHTs, P2P Distributed Hash Tables: a building block
More informationAgent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems
Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma Distributed and Agent Systems Peer-to-Peer Systems & JXTA Prof. Agostino Poggi What is Peer-to-Peer
More informationPeer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected
More informationA Framework for Peer-To-Peer Lookup Services based on k-ary search
A Framework for Peer-To-Peer Lookup Services based on k-ary search Sameh El-Ansary Swedish Institute of Computer Science Kista, Sweden Luc Onana Alima Department of Microelectronics and Information Technology
More informationDistributed Systems Final Exam
15-440 Distributed Systems Final Exam Name: Andrew: ID December 12, 2011 Please write your name and Andrew ID above before starting this exam. This exam has 14 pages, including this title page. Please
More informationDistributed Knowledge Organization and Peer-to-Peer Networks
Knowledge Organization and Peer-to-Peer Networks Klaus Wehrle Group Chair of Computer Science IV RWTH Aachen University http://ds.cs.rwth-aachen.de 1 Organization of Information Essential challenge in?
More informationDebunking some myths about structured and unstructured overlays
Debunking some myths about structured and unstructured overlays Miguel Castro Manuel Costa Antony Rowstron Microsoft Research, 7 J J Thomson Avenue, Cambridge, UK Abstract We present a comparison of structured
More information15-744: Computer Networking P2P/DHT
15-744: Computer Networking P2P/DHT Overview P2P Lookup Overview Centralized/Flooded Lookups Routed Lookups Chord Comparison of DHTs 2 Peer-to-Peer Networks Typically each member stores/provides access
More informationCS 3516: Advanced Computer Networks
Welcome to CS 3516: Advanced Computer Networks Prof. Yanhua Li Time: 9:00am 9:50am M, T, R, and F Location: Fuller 320 Fall 2017 A-term 1 Some slides are originally from the course materials of the textbook
More informationChapter 6 PEER-TO-PEER COMPUTING
Chapter 6 PEER-TO-PEER COMPUTING Distributed Computing Group Computer Networks Winter 23 / 24 Overview What is Peer-to-Peer? Dictionary Distributed Hashing Search Join & Leave Other systems Case study:
More informationDHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..
DHT Overview P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar DHTs provide a simple primitive put (key,value) get (key) Data/Nodes distributed over a key-space High-level idea: Move
More information: Scalable Lookup
6.824 2006: Scalable Lookup Prior focus has been on traditional distributed systems e.g. NFS, DSM/Hypervisor, Harp Machine room: well maintained, centrally located. Relatively stable population: can be
More informationLecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview:
PP Lecture 1 Peer-to-Peer Model March 16, 005 Overview: centralized database: Napster query flooding: Gnutella intelligent query flooding: KaZaA swarming: BitTorrent unstructured overlay routing: Freenet
More informationChapter 10: Peer-to-Peer Systems
Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, Addison-Wesley 2005 Introduction To enable the sharing of data and resources
More informationToday. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup
More informationWSN Routing Protocols
WSN Routing Protocols 1 Routing Challenges and Design Issues in WSNs 2 Overview The design of routing protocols in WSNs is influenced by many challenging factors. These factors must be overcome before
More informationScalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou
Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization
More informationL3S Research Center, University of Hannover
, University of Hannover Dynamics of Wolf-Tilo Balke and Wolf Siberski 21.11.2007 *Original slides provided by S. Rieche, H. Niedermayer, S. Götz, K. Wehrle (University of Tübingen) and A. Datta, K. Aberer
More informationIntroduction to Peer-to-Peer Networks
Introduction to Peer-to-Peer Networks The Story of Peer-to-Peer The Nature of Peer-to-Peer: Generals & Paradigms Unstructured Peer-to-Peer Systems Sample Applications 1 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
More informationP2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli
P2P Applications Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli Server-based Network Peer-to-peer networks A type of network
More informationInternet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016
Internet Technology 06. Exam 1 Review Paul Krzyzanowski Rutgers University Spring 2016 March 2, 2016 2016 Paul Krzyzanowski 1 Question 1 Defend or contradict this statement: for maximum efficiency, at
More informationIPv6: An Introduction
Outline IPv6: An Introduction Dheeraj Sanghi Department of Computer Science and Engineering Indian Institute of Technology Kanpur dheeraj@iitk.ac.in http://www.cse.iitk.ac.in/users/dheeraj Problems with
More informationChord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications
: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashock, Frank Dabek, Hari Balakrishnan March 4, 2013 One slide
More informationInternet Technology 3/2/2016
Question 1 Defend or contradict this statement: for maximum efficiency, at the expense of reliability, an application should bypass TCP or UDP and use IP directly for communication. Internet Technology
More information6. Peer-to-peer (P2P) networks I.
6. Peer-to-peer (P2P) networks I. PA159: Net-Centric Computing I. Eva Hladká Faculty of Informatics Masaryk University Autumn 2010 Eva Hladká (FI MU) 6. P2P networks I. Autumn 2010 1 / 46 Lecture Overview
More informationLECT-05, S-1 FP2P, Javed I.
A Course on Foundations of Peer-to-Peer Systems & Applications LECT-, S- FPP, javed@kent.edu Javed I. Khan@8 CS /99 Foundation of Peer-to-Peer Applications & Systems Kent State University Dept. of Computer
More informationWhat is Multicasting? Multicasting Fundamentals. Unicast Transmission. Agenda. L70 - Multicasting Fundamentals. L70 - Multicasting Fundamentals
What is Multicasting? Multicasting Fundamentals Unicast transmission transmitting a packet to one receiver point-to-point transmission used by most applications today Multicast transmission transmitting
More informationGoals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.
Goals CS : Introduction to Computer Networks Overlay Networks and PP Networks Ion Stoica Computer Science Division Department of lectrical ngineering and Computer Sciences University of California, Berkeley
More informationCS514: Intermediate Course in Computer Systems
Distributed Hash Tables (DHT) Overview and Issues Paul Francis CS514: Intermediate Course in Computer Systems Lecture 26: Nov 19, 2003 Distributed Hash Tables (DHT): Overview and Issues What is a Distributed
More informationOpportunistic Application Flows in Sensor-based Pervasive Environments
Opportunistic Application Flows in Sensor-based Pervasive Environments Nanyan Jiang, Cristina Schmidt, Vincent Matossian, and Manish Parashar ICPS 2004 1 Outline Introduction to pervasive sensor-based
More informationDistributed Hash Table
Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ
More informationLecture 8: Application Layer P2P Applications and DHTs
Lecture 8: Application Layer P2P Applications and DHTs COMP 332, Spring 2018 Victoria Manfredi Acknowledgements: materials adapted from Computer Networking: A Top Down Approach 7 th edition: 1996-2016,
More informationPeer-to-peer systems and overlay networks
Complex Adaptive Systems C.d.L. Informatica Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell Informazione 1 Outline Introduction to P2P systems
More informationINF5071 Performance in distributed systems: Distribution Part III
INF5071 Performance in distributed systems: Distribution Part III 5 November 2010 Client-Server Traditional distributed computing Successful architecture, and will continue to be so (adding proxy servers)
More informationP2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems
P2P Alex S. 1 Introduction The systems we will examine are known as Peer-To-Peer, or P2P systems, meaning that in the network, the primary mode of communication is between equally capable peers. Basically
More informationStratos Idreos. A thesis submitted in fulfillment of the requirements for the degree of. Electronic and Computer Engineering
P2P-DIET: A QUERY AND NOTIFICATION SERVICE BASED ON MOBILE AGENTS FOR RAPID IMPLEMENTATION OF P2P APPLICATIONS by Stratos Idreos A thesis submitted in fulfillment of the requirements for the degree of
More informationMiddleware and Distributed Systems. Peer-to-Peer Systems. Peter Tröger
Middleware and Distributed Systems Peer-to-Peer Systems Peter Tröger Peer-to-Peer Systems (P2P) Concept of a decentralized large-scale distributed system Large number of networked computers (peers) Each
More information«Computer Science» Requirements for applicants by Innopolis University
«Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly
More informationBuilding a low-latency, proximity-aware DHT-based P2P network
Building a low-latency, proximity-aware DHT-based P2P network Ngoc Ben DANG, Son Tung VU, Hoai Son NGUYEN Department of Computer network College of Technology, Vietnam National University, Hanoi 144 Xuan
More informationPEER-TO-PEER (P2P) systems are now one of the most
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 25, NO. 1, JANUARY 2007 15 Enhancing Peer-to-Peer Systems Through Redundancy Paola Flocchini, Amiya Nayak, Senior Member, IEEE, and Ming Xie Abstract
More informationCS 268: Lecture 22 DHT Applications
CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776 (Presentation
More informationOverlay and P2P Networks. Structured Networks and DHTs. Prof. Sasu Tarkoma
Overlay and P2P Networks Structured Networks and DHTs Prof. Sasu Tarkoma 6.2.2014 Contents Today Semantic free indexing Consistent Hashing Distributed Hash Tables (DHTs) Thursday (Dr. Samu Varjonen) DHTs
More informationPeer-to-Peer Applications Reading: 9.4
Peer-to-Peer Applications Reading: 9.4 Acknowledgments: Lecture slides are from Computer networks course thought by Jennifer Rexford at Princeton University. When slides are obtained from other sources,
More informationEEC-684/584 Computer Networks
EEC-684/584 Computer Networks Lecture 14 wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Outline 2 Review of last lecture Internetworking
More information