Providing File Services using a Distributed Hash Table

Similar documents
Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

A Framework for Peer-To-Peer Lookup Services based on k-ary search

Building a low-latency, proximity-aware DHT-based P2P network

Securing Chord for ShadowWalker. Nandit Tiku Department of Computer Science University of Illinois at Urbana-Champaign

08 Distributed Hash Tables

L3S Research Center, University of Hannover

Distributed Hash Tables

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Peer-to-peer computing research a fad?

Peer-to-Peer (P2P) Distributed Storage. Dennis Kafura CS5204 Operating Systems

Decentralized Approach for Balancing Load in Dynamic Cloud Environment

L3S Research Center, University of Hannover

Chord : A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

Distributed Hash Tables: Chord

Peer-to-Peer Systems and Distributed Hash Tables

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Diminished Chord: A Protocol for Heterogeneous Subgroup Formation in Peer-to-Peer Networks

Fault Resilience of Structured P2P Systems

: Scalable Lookup

Topics in P2P Networked Systems

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

TinyTorrent: Implementing a Kademlia Based DHT for File Sharing

Peer-to-Peer (P2P) Systems

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

P2P: Distributed Hash Tables

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

Distributed Hash Table

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

Routing Table Construction Method Solely Based on Query Flows for Structured Overlays

Robust and Efficient Data Management for a Distributed Hash Table. Josh Cates

Scaling Out Key-Value Storage

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Dynamo: Key-Value Cloud Storage

Modern Technology of Internet

Diminished Chord: A Protocol for Heterogeneous Subgroup Formation in Peer-to-Peer Networks

DRing: A Layered Scheme for Range Queries over DHTs

Survey on Novel Load Rebalancing for Distributed File Systems

Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Peer-to-Peer Signalling. Agenda

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

Effect of Links on DHT Routing Algorithms 1

Presented By: Devarsh Patel

Early Measurements of a Cluster-based Architecture for P2P Systems

An Empirical Study of Data Redundancy for High Availability in Large Overlay Networks

Slides for Chapter 10: Peer-to-Peer Systems

Analyzing the Chord Peer-to-Peer Network for Power Grid Applications

CSE 486/586 Distributed Systems

Athens University of Economics and Business. Dept. of Informatics

LECT-05, S-1 FP2P, Javed I.

A Top Catching Scheme Consistency Controlling in Hybrid P2P Network

DYNAMIC TREE-LIKE STRUCTURES IN P2P-NETWORKS

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING

CS514: Intermediate Course in Computer Systems

Degree Optimal Deterministic Routing for P2P Systems

CSE 5306 Distributed Systems

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

A New Symmetric Key Algorithm for Modern Cryptography Rupesh Kumar 1 Sanjay Patel 2 Purushottam Patel 3 Rakesh Patel 4

CS 43: Computer Networks BitTorrent & Content Distribution. Kevin Webb Swarthmore College September 28, 2017

A Decentralized Content-based Aggregation Service for Pervasive Environments

Peer-to-Peer Systems. Chapter General Characteristics

Overlay networks. To do. Overlay networks. P2P evolution DHTs in general, Chord and Kademlia. Turtles all the way down. q q q

Peer to Peer Networks

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

ECS High Availability Design

A LOAD BALANCING ALGORITHM BASED ON MOVEMENT OF NODE DATA FOR DYNAMIC STRUCTURED P2P SYSTEMS

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

Chapter 10: Peer-to-Peer Systems

Arpeggio: Metadata Searching and Content Sharing with Chord

MULTI-DOMAIN VoIP PEERING USING OVERLAY NETWORK

Scaling KVS. CS6450: Distributed Systems Lecture 14. Ryan Stutsman

Structured Peer-to-Peer Networks

Peer to Peer Networks

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

Two-Party Fine-Grained Assured Deletion of Outsourced Data in Cloud Systems

Store, Forget & Check: Using Algebraic Signatures to Check Remotely Administered Storage

A LIGHTWEIGHT FRAMEWORK FOR PEER-TO-PEER PROGRAMMING *

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Peer-to-Peer Architectures and Signaling. Agenda

Telematics Chapter 9: Peer-to-Peer Networks

Chapter 6 PEER-TO-PEER COMPUTING

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

Large-Scale Data Stores and Probabilistic Protocols

Decentralized supplementary services for Voice-over-IP telephony

SCAR - Scattering, Concealing and Recovering data within a DHT

A Survey of Peer-to-Peer Content Distribution Technologies

Distributed Data Management. Profr. Dr. Wolf-Tilo Balke Institut für Informationssysteme Technische Universität Braunschweig

Mapping a Dynamic Prefix Tree on a P2P Network

Distributed Hash Tables

Transcription:

Providing File Services using a Distributed Hash Table Lars Seipel, Alois Schuette University of Applied Sciences Darmstadt, Department of Computer Science, Schoefferstr. 8a, 64295 Darmstadt, Germany lars.seipel@stud.h-da.de alois.schuette@h-da.de http://www.fbi.h-da.de Abstract. This paper describes the redesign of an existing Chord-based P2P infrastructure. The new system employs erasure coding to improve the availability and durability of user data. In addition, we give an example on how to use the infrastructure as a base for building a secure, distributed file system. Keywords: DHT, Peer-to-Peer, Distributed File System, Encryption 1 Introduction The AChord system has emerged out of a student project at Hochschule Darmstadt University of Applied Sciences. It was originally built to support instant messaging applications. This paper describes a redesign of the core infrastructure with the purpose of making it a suitable base for other kinds of distributed applications. Our ambition is to encourage experimentation and ease the development and deployment of new networked peer-to-peer applications. As an example, we present a simple file system that uses AChord as a storage backend for encrypted file system data. 2 Distributed Hash Table In peer-to-peer systems there needs to be a mechanism for locating resources available from peers. To accomplish this task, we make use of the Chord system [6]. Chord provides us with a basic, yet powerful, lookup primitive that answers a single question: given a key, what is the node responsible for dealing with it? published at IS 2015 (The 11th International Conference on Interactive Systems, Ulyanovsk 2015)

2 L. Seipel, A. Schuette Chord is built around a m bits circular identifier space where m corresponds to the output size of a base hash function (e.g. SHA-1 [5]). Both, keys and nodes are mapped into this space. The identifier space can be visualized as a circle with nodes representing points on it, as shown in Fig. 1. 0 7 1 6 2 5 4 3 Fig. 1. Identifier circle of size 2 3 with nodes at 1, 3 and 6 When looking up a key, the node that most closely follows it on the circle (and thereby in the identifier space) is called its successor. This is the node responsible for handling that particular key. Each node is aware of its immediate predecessor and successor. Additional routing information is kept in a data structure called the finger table and is used to speed up the lookup process. It can be shown that, in a Chord network of n nodes, the number of peers to contact for resolving a key to its successor is, with high probability, O(log n) [6]. How the resulting answer is used is generally up to the application. A common theme is the storage of data (in the form of key/value pairs) on participating nodes. Such systems are called distributed hash tables (or DHTs) for their conceptual similarity to hash tables used as in-memory data structures. 3 Block Storage Relying upon the described lookup mechanism we were able to build a distributed system for the storage of data blocks. Nodes provide a interface to users through which blocks can be uploaded to or retrieved from the system. There is no inherent structure to these blocks. As far as the receiving node is concerned, they re just opaque chunks of data. Applications give meaning to blocks by interpreting them in a specific way. A stored block is associated with a key by which it can be retrieved again later. This key is specified by the client when it uploads a block into the system. Thus, the fundamental operations available to a client are put(key, value) and get(key), as shown in table 1.

Providing File Services using a Distributed Hash Table 3 Operation Description get(key) Retrieve value associated with key put(key, value) Store value under key Table 1. Core interface provided to clients The node where a block should be kept is designated by doing a Chord lookup operation for its key. This determines its successor the node responsible for handling a given key. As a result, uploaded blocks are distributed across all participating nodes. When a node joins the system and responsibilities change, the new node can request a list of keys from its peers and download the corresponding values so it can serve them to clients. 3.1 Fragment Maintenance There s an obvious omission so far: nodes can fail and such failures may lead to blocks being (transiently or permanently) unavailable to the system. A solution to this problem involves adding redundancy to the stored data, either by replicating whole blocks or by encoding them using an erasure-resilient code. An erasure code is a form of forward error correction where a chunk of data is encoded into n fragments from which any m (where m < n) are sufficient to rebuild the original source data. We implement Rabin s Information Dispersal Algorithm [4] which has the handy property of being able to generate new fragments that are, with high probability, distinct without the need to specify what fragments were generated previously. The n created fragments are distributed across nodes as follows: if a node is determined to be the successor of a given key, the fragments are stored on it and the n 1 nodes immediately following it in the identifier space. This matches the approach used by MIT s DHash [1] from which we ve also borrowed the block maintenance protocol. The purpose of this protocol is to detect when fragments are misplaced (that is, stored on a node that shouldn t actually have them, a situation that can arise after other nodes joined the system) or can no longer be found in the system at all (e.g. following a node failure). When a node detects that it is in possession of a fragment for a key for which it is not one of the n succesors (or slightly more than that, in anticipation of future failures), it tries to resolve the situation by offering the fragment to any of the correct peers until one of them accepts it (due to having no fragment for the corresponding key). In any case, the misplaced fragment is then deleted to avoid the existence of multiple copies of the same fragment in the system. Another part of the maintenance protocol is involved with the task of recreating fragments that are considered missing. Here, a node synchronizes its local fragment storage with each of its n 1 immediate successors. If a node is the successor for a stored block, then it and the n 1 nodes following it should hold a fragment for that block. When a violation of this rule is discovered during synchronization, the affected node performs a get operation, thereby reconstructing

4 L. Seipel, A. Schuette the block. Now, with the original block data available, it is able to create a new fragment to store locally. Efficient synchronization across nodes makes use of the fact that a cryptographic hash value of a piece of data can be considered a finger print, and can be used for identifying that data. A Merkle tree (also called a hash tree) is a data structure originally proposed by Ralph C. Merkle in the context of digital signatures [3]. Its characteristic property is the labelling of internal nodes with a hash computed over their children. Each node maintains such a tree, in which keys of held fragments conceptually function as the leaves. Thus, the root of the Merkle tree describes the full set of fragments held by a node. The tree is constructed such that the root node s ith child (whereas 0 i < 64) summarizes the set of fragments held in the interval [ i 2160 64, (i+1) 2160 64 ), assuming a 160 bits identifier space [1]. Synchronization is done by exchanging tree nodes between hosts, starting from the root. Descent stops when either matching labels are found (indicating that the corresponding sub-trees are the same and thus can be skipped) or we reach a leaf node that is present on one but not the other host. This triggers the above-mentioned fragment re-creation process on the host that is missing the fragment. 4 AChord File System Interface A library provides its users with a familiar file-like interface. Its main task comprises the mapping of file system calls (open/create, read, write,... ) to get and put operations in the distributed hash table. The library also provides for security by authenticating and en-/decrypting data blocks as they leave and enter the local system. 4.1 Files A file is stored by dividing its contents into blocks of a given size. Before uploading a block, it is encrypted and extended with a message authentication code (MAC) which, on later retrieval, can be used to verify the authenticity and integrity of block data. We compute a cryptographic hash (SHA-1) over the resulting ciphertext and use it as a key to upload the block into the distributed hash table. Given this key, the block can be retrieved again at a later point. Therefore, we can consider this hash to be the block s address. Schemes like this are generally referred to as Content-Addressable Storage. The result of uploading a file s contents is a list of block hashes. To present these to the user as a single entity a mechanism is needed by which it is possible to identify all blocks belonging to a specific file. A fundamental concept with disk-based file systems is the i-node. In [7], Tanenbaum and Bos describe it as listing the attributes and disk addresses of a file s blocks. Substituting disk addresses for keys in the DHT, this description matches our use as well.

Providing File Services using a Distributed Hash Table 5 Assuming 8k blocks, we can only store around 400 SHA-1 hashes in a single block, amounting to just a few megabyte of data. The solution consists of storing the keys for blocks, that themselves contain block addresses. This indirection scheme can be extended to yield double (and triple,... ) indirect blocks, that increase the possible file size the system can handle. As an optimization for small files and to reduce unnecessary overhead, the addresses of the first few blocks are stored directly. The higher levels of indirection are only used when the file reaches a certain size. Taken together with a few pieces of metadata (like the modification time) the first level of block addresses form the i-node block. It and all the indirection blocks it refers to are encrypted and authenticated in the same way as is done for files. As retrieving the i-node block allows to reach all other parts of a file, the file can be addressed by specifying the key used for storing this block. 4.2 Directories In the library interface, the way to refer to files is by user-chosen names. These need to be translated to the keys corresponding to the file s i-node block. The conceptual structure defining this mapping is the directory. We implement simple, linear directories strongly reminiscient of (but stripped from the 14 character file name limit) those used in the UNIX research system [2, 7]. Thus, a directory entry consists of a 20 byte block address (referring to the i-node block) and a (length-prefixed) name of variable size. Directory entries can also refer to other directories. A file path of a/b/c is thus resolved by using the directory a to look up the name b, giving a block address that, when used as DHT key, allows to retrieve the directory called b. The process is repeated for c, yielding the i-node block of the given file. 5 Conclusion The reworked AChord system provides reliable block storage to distributed programs. Through the use of erasure coding it improves on availability and durability of stored user data. The redesign was motivated by the desire to make it a useful base for a wider range of applications. This can be considered a success. The new system already provides the underpinnings for a flock of new applications, including a file synchronization program. It has also taken over with the original instant messaging user base. The system leans heavily on the prior work on Chord and DHash that was done at the Massachusetts Institute of Technology (MIT). It should be noted, though, that our implementation has not seen as much testing (especially in the wide area) as the one created at MIT. The file system interface allows applications to access stored data using a proven programming abstraction. Written data is encrypted and spread across nodes, with the application programmer being able to use a conventional file-like interface.

6 L. Seipel, A. Schuette References 1. Cates, J.: Robust and Efficient Data Management for a Distributed Hash Table. Master s thesis, Massachusetts Institute of Technology (May 2003) 2. Lions, J.: A commentary on the sixth edition unix operating system. Department of Computer Science, The University of New South Wales (1977) 3. Merkle, R.C.: A digital signature based on a conventional encryption function. In: Advances in Cryptology CRYPTO 87. pp. 369 378. Springer (1988) 4. Rabin, M.O.: Efficient dispersal of information for security, load balancing, and fault tolerance. Journal of the ACM (JACM) 36(2), 335 348 (1989) 5. Standard, S.H., FIPS, P.: 180-1. National Institute of Standards and Technology (NIST), US Department of Commerce (1995) 6. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.R., Kaashoek, M.F., Dabek, F., Balakrishnan, H.: Chord: a scalable peer-to-peer lookup protocol for internet applications. Networking, IEEE/ACM Transactions on 11(1), 17 32 (2003) 7. Tanenbaum, A.S., Bos, H.: Modern Operating Systems. Prentice Hall Press, Upper Saddle River, NJ, USA, 4th edn. (2014)