C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems

Size: px
Start display at page:

Download "C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems"

Transcription

1 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems Quanqing Xu 1, Rajesh Vellore Arumugam 1, Khai Leong Yong 1, Yonggang Wen 2, Yew-Soon Ong 2 1 Data Storage Institute, A*STAR, Singapore {Xu Quanqing, Rajesh VA, YONG Khai Leong}@dsi.a-star.edu.sg 2 Nanyang Technological University {ygwen, asysong}@ntu.edu.sg Abstract. In Cloud-scale file systems, load balancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improving quality of services. Many good approaches have been proposed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request distributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication. In this paper, we propose Cloud Cache (C 2 ), an adaptive load balancing scheme for metadata server cluster in Cloud-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balancing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. By conducting a performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C 2. 1 Introduction Modern Cloud-scale file systems that store EB-scale (1 18 or 2 6 bytes) data [1, 2], separate file data access and metadata transactions to achieve high performance and scalability. EB-scale data is managed with a distributed file system in a data center to support many computations, e.g., the large synoptic survey telescope 3, in which there are more than 1 18 files. Data is stored on a storage cluster including numerous servers directly accessed by clients via the network, while metadata is managed separately by a metadata server (MDS) cluster consisting of a few dedicated servers. The dedicated MDS cluster manages the global namespace and the directory hierarchy of file system, the mapping from files to objects, and the permissions of files and directories. The MDS cluster just allows for concurrent data transfers between large numbers of clients and storage servers, and it provides efficient metadata service performance with specific workloads, e.g., thousands of clients updating to the same directory or accessing the same file. 3

2 2 Authors Suppressed Due to Excessive Length Compared to the overall data space, the size of metadata is relatively small, and it is typically.1% to 1% of data space 4, but it is relatively large in EB-scale file systems, e.g., 1PB to 1PB for 1EB data. Besides, 5% to 8% of all file system accesses are to metadata [3]. Therefore, in order to achieve high performance and scalability, a careful MDS cluster architecture must be designed and implemented to avoid potential bottlenecks caused by metadata requests. To efficiently handle the workload generated by a large number of clients, metadata should be properly partitioned so as to evenly distribute metadata traffic by leveraging the MDS cluster efficiently. At the same time, to deal with the changing workload, a scalable metadata management mechanism [4] is necessary to provide highly efficient metadata performance for mixed workloads generated by tens of thousands of concurrent clients. The concurrent accesses from a large number of clients to large-scale distributed storage will cause request load imbalance among metadata servers and inefficient use of metadata cache. Distributed caching is a widely deployed technique to handle request load imbalance and reduce request latency, and it is both orthogonal and complementary to the load balancing technique proposed in [5]. Meanwhile, distributed replication is also able to decrease the retrieve latency of metadata items. There are two insights based on our experience: 1) replicas on cached metadata items can balance request workload, and 2) increasing the number of replicas does help handle bursts of workloads. In our previous work [5], we consider the storage load problem by balancing metadata storage, but we do not take into account the request load problem. The goal of request load balancing is to assign the tasks to each node in distributed systems so that all the resources available are utilized as uniformly as possible. In distributed systems, the solutions to rebalance request workload read as follows: 1) migrating a heavily requested item preemptively from one overloaded node to another underloaded one, or 2) aborting a request in the overloaded node, and transferring it to a different node with replicas. As one of distributed services, distributed metadata management faces the following two questions: 1) How to distribute the workload on metadata servers? and 2) How to reduce the retrieval latency of metadata items? The system performance relies upon the answers to these questions. We have to deal with potentially unpredictable shifts in the request workload, e.g., flash crowds [6] or adversarial access patterns, such as a denial-of-service attack [7]. An imbalanced load causes long retrieval latencies of metadata items and impairs the system overall performance. In EB-scale file systems, the performance of distributed metadata management depends critically on distributing metadata items to MDSs to balance the request workload across the MDSs. Unfortunately, the optimal metadata placement is likely to change over time because of workload changes and dynamic system membership. Therefore, it is common to periodically calculate a new assignment of metadata items to MDSs, either on demand or at regular intervals as the changes of MDS occur. In this paper, we propose an adaptive load balancing approach named C 2 to solve the above problems. We consider how to find an efficient caching and replication scheme, which automatically adapts to changing workload in EB-scale file systems. By analyzing a running workload of requests to metadata items, it calculates a new load-balancing 4 dcslab.hanyang.ac.kr/nvramos8/ethanmiller.pdf

3 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 3 plan and then migrates them when their request rates are more than the request capacity of the node that maintains them. The input to our migration plan consists of an initial state of metadata items in virtual nodes and a given requirement of load balancing, and their request capacities. Our goal is to find a migration plan that moves the metadata from the initial state to the final state of load balancing with the minimum rounds. Moreover, the overlay network topology and metadata access information are utilized for metadata replication decisions. The rest of the paper is organized as follows. Section 2 describes the problem definition. The adaptive cache diffusion mechanism is presented in Section 3. Section 4 introduces the adaptive replication scheme in C 2. In Section 5 we present performance evaluation results of C 2. Section 6 describes related work. In Section 7 we conclude this paper. 2 Problem Definition 2.1 Traces Analyzed There are three real traces we analyze as shown in Table 1. Microsoft means Microsoft Windows build server production traces [8] from BuildServer to BuildServer7 within 24 hours, and its data size is 223.7GB (including access pattern information). Harvard is a research and NFS trace used by a large Harvard research group [9], and its data size is 158.6GB (including access pattern information). We implemented a metadata crawler that performs a recursive walk of the file system using stat() to extract file/directory metadata. By using the metadata crawler, the Linux trace is fetched from 22 Linux servers in our data center, and it is different from and much bigger than the Linux trace in [5]. Its file system metadata size is 4.53GB, and data size is 3.5TB. Table 1: Traces Trace # of files Path metadata Max. length Harvard 7,936,19 176M 18 Microsoft 7,725, M 34 Linux 1,271,66 786M Load Balancing Distributed metadata server cluster must guarantee good load balancing in such a way that they can meet their throughput and latency goals, and both partitioning and replication can be combined to make them scalable. They have to balance two kinds of loads: 1) storage load and 2) request load. The storage load is static for requiring constant storage capacity in each node. Capacity is typically load-balanced by using a hashing-based approach [1]. The request load is dynamic for handling queries from users. Metadata should be distributed as uniformly as possible among nodes, and no node should cope with much more query requests than another node. Although some schemes can balance the utilization of storage space, they do not balance the request load, in which

4 4 Authors Suppressed Due to Excessive Length hot spots often occur, i.e., some items are requested more than others. Many real-world workloads have uneven request distributions. Distributed systems typically balance the request load with the following ways. Some systems dynamically move data from overloaded servers to underloaded servers to make the request load uniform. Others rely upon replication to direct queries to the underloaded ones with a number of replicas, substantially improving load balancing [11]. 2.3 DROP with Caching DROP [5] leverages pathname-based locality-preserving hashing (LpH) for metadata distribution and location, avoiding the overhead of hierarchical directory traversal. To access data, a client hashes the pathname of the file with the same LpH function to locate which MDS contains the metadata of the file, and then contacts the appropriate MDS. The process is extremely efficient metadata access, typically involving a single message to a single MDS. With losing negligible metadata locality, DROP uses an efficient histogram-based dynamic load balancing mechanism to balance storage load. We can leverage the namespace locality in keys by caching metadata items within the same domain in lookup results, reducing total metadata lookup traffic. DROP maintains namespace locality in metadata placement, so clients do not need to require data from many nodes, and repetitive lookups are avoided because of the lookup cache mechanism. Large amounts of localities exist in distributed systems, e.g., file access locality in P2P systems [12], and they are the basis for distributed caching techniques. Metadata server architecture is shown in Figure 1. DROP is a SSD/NVM-based key-value store, where key is pathname, and value is its inode information. C 2 is used to deal with request load balancing, in which the lookup cache stores metadata items in recent query results, so future query requests that access keys in cached key ranges entirely bypass the lookup step. Clients could also explore a lookup cache in DirHash or FileHash that is to randomly distribute directories or files according to their pathnames, each of which is assigned to a metadata server, but it would be less effective since future queries may not request keys in recently accessed key ranges. Cache entries may become stale because of potential dynamic system membership. DROP falls back to a normal lookup when a metadata item is not found. It does not affect correctness with a stale cache entry, but it impairs retrieval latency. When a file/directory is updated, DROP is responsible to insert its metadata s new versions along the entire path to the root. It makes sure that each read must have a consistent view of the metadata, and it implies that each write must update all the metadata along the full path. When writing temporary files, DROP avoids this overhead with a t- second write-back cache, which is also explored as a buffer. Due to this buffer, multiple reads of the same metadata occurring within a t-second window only require it to be retrieved once. Metadata items seen by clients may be stale by up to t seconds because of this cache, but incomplete writes will never been seen. 2.4 Problem Formulation Given a set of nodes S (S = {S i, i = 1,, n}), with each storing a subset of metadata items D (D = {D j, j = 1,, m}) and a specified set of move operations, each of

5 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 5 System Interface Cloud Cache (C 2 ) Replication Engine Storage Manager Failover Policy Lookup Cache Request load balancing Ki=pathname Vi=inode_info SSD/NVM-based Key-value Store DROP /home/alice/ /a.txt Inode: Blocks: 8 Size: Uid: ( 1/ Alice) Gid: ( 1/ Alice) Data server: DS3 Locality-preserving Hashing Ring Dynamic Load Balancing Storage load balancing EB-scale storage Fig. 1: Metadata Server Architecture which specifies which item needs to be moved from one node to another one. A question we face and address is how to schedule these move operations. For each metadata item d, there is a subset of source MDSs S d and target MDSs T d. In the beginning, only the MDSs in S d have metadata item d, and all the MDSs in T d want to receive it. A MDS in T d becomes a source of item d after it receives item d. Our goal is to find a metadata migration plan using the minimum number of rounds, where there is a constraint that is each MDS just takes part in the transfer of only one item either as a sender or receiver. It is a NP-hard problem [13]. There are a set of nodes S and a set of metadata items D. Initially, each MDS stores a subset of items. A transfer graph G = (V, E) is built, in which each node represents a virtual node and an edge e = (u, v) represents a metadata item to be moved from a node u to v. Over time, metadata items may be moved to another MDS for load balancing. Note that the transfer graph can be a multi-graph, in which there are multiple edges between two nodes, when multiple metadata items are moved from one node to another. There are two situations: 1) the request load of an item is smaller than a given request load threshold l t for a node, and 2) the request load of an item is bigger than l t for a node. The Microsoft trace shows that the hottest file is accessed over 2.5% of total requests and the the combined CDF of hottest 125 files is close to 9% [8]. It tells that the hottest file is much more popular than one of other files that are not in the hottest 125 files. Suppose that there are 2 metadata servers, each of which has five virtual nodes, and there are 1 virtual nodes in total. Any virtual node that maintains the hottest file will be overloaded. For the first situation, we can use adaptive cache diffusion discussed in Section 3, while for the second one, we can use adaptive replication scheme described in Section 4. 3 Adaptive Cache Diffusion We first present an adaptive cache diffusion approach that leads to low migration overhead and fast convergence. Load-stealing and load-shedding are used to achieve this goal. Cache space is used for retrieval operations of DROP, in which a cached metadata item is placed at a virtual node to accelerate subsequent retrievals. It might be replaced via LRU very soon after it is created.

6 6 Authors Suppressed Due to Excessive Length 3.1 System Model A physical metadata server might have a set of virtual nodes N = {n 1, n 2,, n d } with a set of loads L = {l 1, l 2,, l d }. Load is applied to metadata servers via their virtual nodes, i.e., metadata server S might have load L S = d i l i. A MDS is said to be load-balanced when it satisfy Definition 1, i.e., the largest load is less than t 2 times the smallest load in the DROP system. According to Definition 1, a MDS has an upper target L u (L u = t L) and a lower target L l (L l = 1/t L). If a MDS finds itself receiving more load than L u, it considers itself overloaded. Otherwise, it considers itself underloaded if it finds itself receiving less load than L l. MDSs may want to operate below their capacities to prevent variations in workload from temporary overload. Definition 1 (MDS i is load balancing). MDS i is load balancing if its load satisfies 1/t L i /L t (t 2). File popularity [9] follow Zipf request distributions. The Zipf property of file access patterns is a basic fact of nature. It states that a small number of objects are greatly popular, but there is a long tail of unpopular requests. A Zipf workload means that destinations are ranked by popularity. The Zipf law states that the popularity of the ith-most popular object is proportional to i α, in which α is the Zipf coefficient. Usually, Zipf distributions look linear when plotted on a log-log scale. Figure 2 shows the popularity distribution of file/directory metadata items in the Microsoft and Harvard traces. Like the Internet, the metadata request distribution as observed in both traces also follows Zipf distributions. (a) Microsoft Windows trace (b) Harvard trace Fig. 2: Read and write distribution 3.2 Load Shedding Load-shedding means that an overloaded node attempts to offload requests to one or more underloaded ones. It may be well suited to the DROP MDS cluster. An overloaded node n 1 has to transfer an item x to another node n 2, and simultaneously create a redirection pointer to n 2. The item x also could be replicated at n 2, increasing redundancy and allowing n 1 to control how much load would be shed. In Section 4, we will explain how to effectively place multiple replicas using a multiple-choice scheme.

7 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 7 There are m metadata items in a node, with a tuple of loads l 1, l 2,, l m and a tuple of probabilities p 1, p 2,, p m. When this node has a cache of size c >, the c most frequently requested items will all hit the cache of this node, with two tuples of positive numbers l 1, l 2,, l c and p 1, p 2,, p c respectively. Let L be t L, and this node is overloaded if c i l i > L. Therefore, it can be formulated as a -1 Knapsack Problem that is NP-hard, i.e., it is to determine how to reassign some items to other nodes in a way that minimizes metadata migration from this node as follows: c maximize z = p i x i s.t. i=1 c l i x i L i=1 x i {, 1}, i {1, 2,, c} (1a) (1b) (1c) Constraint (1b) ensures that the total load of metadata items kept in this node is less than L. Constraint (1c) states if an item x i is kept or not. 3.3 Load Stealing Load-stealing states that a underloaded node n 1 seeks out load to take from one or more overloaded nodes. The load-stealing node finds such a node n 2, and it makes a replica of an item x in the node n 2, which creates a redirection pointer to n 1 for the item x. A natural idea is to have n 1 attempt to steal metadata items, for which n 1 has a redirection pointer. A metadata item can be placed using multiple choices, and it is associated with one of its r hash locations, which is further explained in Section 4. There are a number of metadata items from previous nodes with two tuples of positive numbers l 1, l 2,, l c and p 1, p 2,, p c respectively. If c i l i < L, this node is in load balancing, and it can take some items with its cache space of L L e. Therefore, it also can be formulated as a -1 Knapsack Problem, i.e., it is to determine how to take some items from other overloaded nodes in a way that maximizes the cache utilization of this node as follows: maximize z = p ix i s.t. c c i=1 l ix i L L e i=1 x i {, 1}, i {1, 2,, c } (2a) (2b) (2c) where c L e = l i (3) i=1

8 8 Authors Suppressed Due to Excessive Length {a:19%, b:16%, c:14%, d:13%, e:12%, f:8%} {a:19%, b:16%, c:14%, d:13%, e:12%, f:8%} A1 A1 The DROP Overlay Network B1 {} Node A1 B1 Content (load) {a:954, b:834, c:721, d:672, e:6, f: 434} {} Load 4,215 Routing Table The DROP Overlay Network B1 {} Node A1 B1 Content (load) {a:954, c:721, e:6, f: 434} Load {b:834, d:672, j:885} 2,391 Routing Table 2,79 b:b, d:b D1 {h:16%, i:15%, j:14%, k:12%, l:1%} C1 {g:17%} C1 D1 {g:814} {h:992, i:896, j:885, k:717, l:596} 814 4,86 D1 {h:16%, i:15%, j:14%, k:12%, l:1%} C1 {g:17%} C1 D1 {g:814, l:596} 1,41 {h:992, i:896, k:717} 2,65 j:b, l:c (a) Before Cache Diffusion (b) After Cache Diffusion Fig. 3: Cache Diffusion. Each metadata server has only one virtual node for illustration. 3.4 Traffic Control During load balancing, a metadata item may be migrated multiple times. DROP uses metadata pointers to minimize metadata migration overhead. For a metadata pointer, a node retrieves the metadata when it has held the pointer for longer than the stabilization time of the pointer. Using metadata pointers only temporarily hurts metadata locality when balancing the load. Besides reducing load balancing overhead, pointers also can make writes succeed even when the target node is at capacity, pointers can be utilized to divert metadata items from heavy nodes to light nodes. However, the node at full capacity will eventually shed some load when balancing the load, just causing temporary additional indirection. Suppose that a node X is heavily loaded, and a node Y takes some items of X to reduce some of X s load. Now X must transfer some of its metadata items to Y. Instead of having X immediately shed some of its metadata items to Y when Y gets some items from X, Y initially maintains metadata pointers to X. Later Y transfers the pointers to Z, and Z ultimately retrieves the actual metadata from X and deletes the pointers. There is an example of cache diffusion as shown in Figure 3, in which there is load imbalance before cache diffusion as shown in Figure 3(a). There are four nodes, where A 1 and D 1 are overloaded, while B 1 and C 1 are underloaded. After running our cache diffusion approach, we can see there is a good load balancing as shown in Figure 3(b), where the loads of four nodes are all in [1/t L, t L], i.e., [ , ] (t = 2). The migrated items are found via the routing tables of the nodes that are responsible for the items. 4 Adaptive Replication Scheme We propose a novel metadata replication mechanism to further balance request workload by placing multiple replicas of popular metadata items in different nodes. In DROP, a ZooKeeper-based linearizable consistency mechanism is proposed in [14] to keep excellent metadata consistency among MDSs. 4.1 Random Node Selection We first present an effective random node selection strategy to achieve coarse load balancing. Let h denote a hash function that maps virtual nodes onto the ring, and H (H =

9 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 9 {h 1, h 2,, h k }) denote a set of hash functions mapping metadata items onto the ring. The number of replicas r is calculated by r = f θ, where f is the access frequency of a metadata item, and θ is a given threshold. An item x is inserted as a primary replica using the h hash function, and its r 1 replicas should be placed in the nodes selected by using k hash functions. Lookups are initiated to find the nodes associated with each of these k hash values by calculating h 1, h 2,, h k. According to the mapping given by h, k lookups can be executed in parallel to find the virtual nodes n 1, n 2,, n k in charge of these hash values. After querying the loads of nodes, the underloaded nodes are chosen. To decrease the overhead of searching for additional nodes, redirection pointers are explored. In addition to storing replicas at the r 1 underloaded nodes N r 1 u, other nodes {S N r 1 u } store a pointer x N r 1 u. To search for the item x, a single query is performed by choosing a hash function h j at random in an effort to locate one of nodes in N r 1 u. If n j does not have x, n j forwards the query request with a pointer x N r 1 u. Query requests take at most one more step. The extra step is necessary with probability (k r + 1)/k if h j is chosen uniformly at random from the k choices. It incurs the overhead of maintaining the additional pointers, but storing actual items and any associated computation dominate the stored pointers. In addition, we need to answer how to select r 1 nodes from N u to place x s replicas. 4.2 Topology-aware Replica Placement To select the r 1 underloaded nodes N r 1 u, we first need to consider the network topological characteristics of nodes so that we place the replicas of an item on the nodes topologically adjacent to the node in charge of the item in DROP. In this way, we can reduce the network bandwidth consumption and the query latency when achieving better load balancing. We employ an effective topology-aware replica placement scheme by introducing a technique that discovers the topological information of nodes. In this technique, the key is how to represent and keep the network topology information so that the topologically close nodes are easily discovered for a given node. Thus, we must have a mechanism that is able to represent the topological location information of nodes. The distributed binning scheme [15] is a simple approach for this purpose. For example, there is a topology table for node 7, as shown in Figure 4. The landmark node ordering information is employed as part of the node identification information. There are three landmark nodes L 1, L 2 and L 3 that are used, and the link latencies from node 7 to the three landmark nodes are within [,2), [2,8) and greater than 8ms. The nodes with the same or similar ordering information are topologically close, e.g., node 7:11 is topological closer to node 26:12 than node 124:212. It means the link latency to the node 26:12 is much smaller than to the node 124:212. For each entry in the table, the first item is the order information, and the second one consists of several records, each of which includes the node ID and its workload. For an entry with a tuple [o, (id, w), ], it represents that workloads came from the nodes with the common order information o in the past given period. To choose a node with adequate workload capacity to store a replica, node n 1 in charge of a popular item has to contact those nodes by sending a message. The nodes being selected to store the replicas reply

10 1 Authors Suppressed Due to Excessive Length MetadataId Order (NodeID, Workload) Fig. 4: A sample topology table on node 7:11. When there are three replicas of MetadataId 635 with access times 45 to be placed in DROP, the nodes will be 125, 256 and 558. with their order information and estimated workload. Meanwhile, direct links to the replicas are created on node n Directory-based Replica Diffusion We present an efficient directory-based replica diffusion technique in DROP. A replica, as a copy of cached metadata item, is placed in the DROP overlay by its insertion operation. DROP stores directories of pointers to metadata item replicas that are stored in virtual nodes, but their locations are not related to the structure of locality-preserving hashing. When a node has a metadata item whose request times exceed a given threshold, it creates a directory for the item, chooses r 1 virtual nodes with the topologyaware replica replacement scheme and stores the item replicas at the nodes, recording them in its directory. When the directory receives a request for the item, it returns directory entries pointing to individual replicas of the item with a single response message. The directory node monitors the request rate for the item to determine if a new replica is created. When the request rate reaches a given threshold, the directory node creates a new replica along with the list of pointers to replicas of the item. In chain-based replica diffusion, the r replicas of an item are placed on its primary node and its r 1 followers. In both replica diffusion techniques, a node has to serve a request if it holds a replica of the requested metadata item. In the chain-based replica diffusion, a node pushes out a replica of the item one overlay hop closer to the source node of the last request if the request rate has exceeded its capability. It also aims to offload some of the demand to more nodes that serve requests. Compared to the chainbased replica diffusion, the directory-based one has three advantages: 1) faster replica transmission speed, 2) higher query parallelism, and 3) better load balancing because of r 1 nodes chosen with k random hash functions. 5 Performance Evaluation In this section, we evaluate the performance of C 2 using one synthetic workload-based simulation and two detailed trace-driven simulations. We have developed a detailed

11 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 11 event-driven simulator to validate and evaluate our design decisions and choices. In the first part, we empirically evaluate the convergence rate of C 2. We second measure the metadata migration overhead of C 2. Lastly, we measure the scalability of our adaptive replication scheme. We define Load Factor as follows: LoadF actor = Max.Load Min.Load. Each MDS has five virtual nodes, and the Linux trace follows the Zipf distribution with α = 1.2. All simulation experiments are conducted on a Linux Server with four Dual-Core AMD Opteron(TM) 2.6GHz processors and 8.GB of RAM, running 64-bit Ubuntu All the experiments are repeated three times, and average results are reported. 5.1 Convergence Rate In this section, we measure convergence rate using the three approaches with C 2 on the three traces. Convergence rate is critically significant in distributed systems. It includes two metrics: 1) number of rounds that measures how many rounds should be taken to reach load balancing, and 2) time cost that means how fast to achieve load balancing. Figure 5 depicts the number of rounds on the three traces in the cluster of metadata servers when using all the three methods with C 2. We can see that there are at most five rounds to converge to load balancing on the Linux trace, and there are at most four rounds to converge to load balancing on both the Microsoft trace and the Harvard trace. This is because there are much more access frequencies with more metadata items in the Linux trace than in both the Microsoft and Harvard traces, so the system is harder to reach load balancing when using the Linux trace than the other two traces. As shown in Figure 5(b) and Figure 5(c), the system is in a state of load balancing before running DirHash with C 2 or FileHash with C 2 in the cluster of ten MDSs # of rounds 3 2 # of rounds 3 2 # of rounds (a) Linux trace (b) Microsoft Windows trace (c) Harvard trace Fig. 5: Number of Rounds with Varying the Number of Metadata Servers Figure 6 shows how long it takes using all the three approaches with C 2 to reach load balancing. Figure 6(b) illustrates that DROP with C 2 has much longer time cost than the other two approaches because the Microsoft trace has only three first-level directories, and it has more obvious locality than the other two traces. Figure 6(c) shows that DROP with C 2 is close to the other two approaches in time cost because the Harvard trace has the most first-level directories among the three traces, and it is the worst in locality among them. Figure 6(a) demonstrates that DROP with C 2 is somewhat longer in time cost than the other two approaches. This is because the Linux trace has more first-level directories than the Microsoft trace, and much fewer first-level ones than

12 12 Authors Suppressed Due to Excessive Length the Harvard trace, and it has worse than Microsoft and better than Harvard in locality. Figure 5 and 6 illustrate that the deployed techniques have excellent efficiency. Time Cost (seconds) Time Cost (seconds) Time Cost (seconds) s (a) Linux trace s (b) Microsoft Windows trace s (c) Harvard trace Fig. 6: Time Cost with Varying the Number of Metadata Servers 5.2 Migration Overhead As file and directory metadata items are accessed more/less frequently, the request workload distribution in the system changes, and the system may have to migrate cached metadata to maintain request load balancing. Figure 7 shows the metadata migration overhead with excellent scalability. We perform this experiment as follows. Due to skew query requests, all the MDSs in the DROP system are not in a satisfactory load balancing state at the beginning. Metadata items in the Microsoft and Harvard traces are accessed according to their real-world history access information, while those in the Linux trace are requested according to the Zipf-like distribution. Migration Overhead 4% 35% 3% 25% 2% 15% 1% 5% Migration Overhead 45% 4% 35% 3% 25% 2% 15% 1% 5% Migration Overhead 5% 4% 3% 2% 1% % % % (a) Linux trace (b) Microsoft Windows trace (c) Harvard trace Fig. 7: Migration Overhead with Varying the Number of Metadata Servers During this period, the system is not in load balancing repeatedly. When not in load balancing, it will run C 2 to make itself reach a good load balancing state. Figure 8 demonstrates that all the three methods with C 2 make the system reach good load balancing. We investigate how many metadata items are migrated from the beginning to the end. Figure 7(a) shows that the three methods cause 24.67%, 17.23% and 16.1% of items to be migrated in average respectively, and Figure 7(b) illustrates that they cause 27.68%, 21.13% and 23.59% of items to be migrated in average respectively. Note that Figure 7(c) demonstrates that DirHash with C 2 makes more items to be migrated than the other two approaches, because it has better load balancing than them on the Harvard

13 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 13 trace as shown in Figure 8(c). As we previously show the scalability of C 2 in convergence rate, C 2 tries to reduce the metadata migration overhead at each step by making decision based on -1 knapsack problem used in load shedding and load stealing, as shown in Section 3.2 and Section Load Factor 3 2 Load Factor 3 2 Load Factor (a) Linux trace (b) Microsoft Windows trace (c) Harvard trace Fig. 8: Load Balancing with Varying the Number of Metadata Servers 5.3 Replication Overhead In this section, we present how many replicas are necessary for metadata items requested much heavily to keep good load balancing. Note that we do not count a main replica. Figure 9 shows that our adaptive replication scheme has excellent scalability with different numbers of metadata servers. When running our adaptive replication scheme on the Microsoft trace, the number of replicas varies slightly as the MDS cluster size increases, where its maximum value is 2.96, and its minimum value is 1.. When running the scheme on the other two traces, the number of replicas rises somewhat more obviously than that on the Microsoft trace as the MDS cluster size increases, but the scalability is still excellent on the two traces. The maximum numbers of replicas are 7.6 and 6.68 respectively, while the minimum numbers of replicas are 3.8 and 3.17 respectively on the Harvard trace and the Linux trace. 1 8 Harvard Microsoft Linux # of replicas s Fig. 9: Replicas for metadata items accessed frequently

14 14 Authors Suppressed Due to Excessive Length 6 Related work In recent years, many load balancing schemes were proposed in distributed metadata organization and management. Online Migration. The virtual node approach [1] was proposed to cope with the imbalance of the key distribution due to hash function. A number of virtual nodes are generated with random IDs in a physical server, so reducing the load imbalance. However, the usage of virtual nodes enormously boosts the amount of routing metadata in each server, therefore causing more maintenance overhead and increasing the number of hops per lookup. In addition, it does not take item popularity into account. On the contrary, the dynamic ID approach explores only a single ID per server [16]. The load of a server can be adjusted with a more suitable ID in the namespace. However, the solution requires IDs to be reassigned to maintain load balancing, resulting in a high overhead because of transferring items and updating overlay links. Our motivation for studying the online migration problem lies in how to efficiently migrate metadata for MDS cluster in large-scale storage systems. Caching and Replication. Hot spots are handled with caches to store popular items in the network, and query requests are considered to be resolved whenever cache hits occur along the entire path. Solutions addressing the uneven popularity of objects are based on caching and replication. Path replication replicates objects on all nodes along the full lookup path, e.g., DHash [17] replicates objects in k successors with caching on the lookup path. In the k-choice [11] load balancing approach, multiple hashes are employed to generate a set of IDs in a node, and one of the IDs is chosen at join time to minimize the differences between capacity and load for itself and other nodes affected by its join time. Unfortunately, the last several hops of a lookup are precisely the ones that can least be optimized [18]. Furthermore, a fixed number of replicas do not work well since the request load is dynamic: resources may be wasted if the number is set too high, while the replicas may not be enough to support a high request load if it is set too low. Our replication-based solution is similar to the k-choice approach, with a flexible number of replicas and topology-aware replica placement strategy. 7 Conclusions In this paper, we present an adaptive load balancing approach named C 2 to handle request load balancing for metadata server cluster in Cloud-scale file systems. C 2 explores the opposition between load balancing, and caching and replication, i.e., skew request distributions impair load balancing but simultaneously raises the effectiveness of caching and replication. Therefore, the cache serves the most popular items, ensuring that the nodes maintaining them do not become performance bottlenecks. Multiple hash functions are exploited to place multiple replicas, so balancing the load caused by most frequently accessed items. Our approach enables the system good load balancing even when query request workload is heavily skewed. Extensive simulation results show significant improvements in maintaining a more balanced distributed metadata management system, leading to the improved system with excellent scalability and performance.

15 C 2 : Adaptive Load Balancing for Metadata Server Cluster in Cloud-scale File Systems 15 Acknowledgement The authors would like to thank Garth Gibson from Carnegie Mellon University and Jun Wang from University of Central Florida for their help. This work is supported by A STAR Thematic Strategic Research Programme (TSRP) Grant No References 1. Raicu, I., Foster, I.T., Beckman, P.: Making a case for distributed file systems at Exascale. In: LSAP. (211) Amer, A., Long, D., Schwarz, T.: Reliability Challenges for Storing Exabytes. In: International Conference on Computing, Networking and Communications (ICNC), CNC Workshop. (214) 3. Ousterhout, J.K., Costa, H.D., Harrison, D., Kunze, J.A., Kupfer, M.D., Thompson, J.G.: A Trace-Driven Analysis of the UNIX 4.2 BSD File System. In: SOSP. (1985) Hua, Y., Zhu, Y., Jiang, H., Feng, D., Tian, L.: Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems. IEEE Trans. Parallel Distrib. Syst. 22(4) (211) Xu, Q., Arumugam, R.V., Yong, K.L., Mahadevan, S.: DROP: Facilitating distributed metadata management in EB-scale storage systems. In: MSST. (213) Wendell, P., Freedman, M.J.: Going viral: flash crowds in an open CDN. In: Internet Measurement Conference. (211) Fan, B., Lim, H., Andersen, D.G., Kaminsky, M.: Small cache, big effect: provable load balancing for randomly partitioned cluster services. In: SoCC. (211) Kavalanekar, S., Worthington, B.L., Zhang, Q., Sharda, V.: Characterization of storage workload traces from production Windows Servers. In: IISWC. (28) Ellard, D., Ledlie, J., Malkani, P., Seltzer, M.I.: Passive NFS Tracing of and Research Workloads. In: FAST. (23) 1. Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM. (21) Ledlie, J., Seltzer, M.I.: Distributed, secure load balancing with skew, heterogeneity and churn. In: INFOCOM. (25) Gummadi, P.K., Dunn, R.J., Saroiu, S., Gribble, S.D., Levy, H.M., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: SOSP. (23) Khuller, S., Kim, Y.A., Wan, Y.C.J.: Algorithms for data migration with cloning. In: PODS. (23) Xu, Q., Arumugam, R., Yong, K.L., Mahadevan, S.: Efficient and Scalable Metadata Management in EB-scale File Systems. IEEE Transactions on Parallel and Distributed Systems 99(PrePrints) (213) Ratnasamy, S., Handley, M., Karp, R.M., Shenker, S.: Topologically-Aware Overlay Construction and Server Selection. In: INFOCOM. (22) 16. Naor, M., Wieder, U.: Novel architectures for P2P applications: The continuous-discrete approach. ACM Transactions on Algorithms 3(3) (27) 17. Dabek, F., Kaashoek, M.F., Karger, D.R., Morris, R., Stoica, I.: Wide-Area Cooperative Storage with CFS. In: SOSP. (21) Gopalakrishnan, V., Silaghi, B.D., Bhattacharjee, B., Keleher, P.J.: Adaptive Replication in Peer-to-Peer Systems. In: ICDCS. (24)

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. : An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department

More information

Adaptive Load Balancing for DHT Lookups

Adaptive Load Balancing for DHT Lookups Adaptive Load Balancing for DHT Lookups Silvia Bianchi, Sabina Serbu, Pascal Felber and Peter Kropf University of Neuchâtel, CH-, Neuchâtel, Switzerland {silvia.bianchi, sabina.serbu, pascal.felber, peter.kropf}@unine.ch

More information

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others Sabina Serbu, Silvia Bianchi, Peter Kropf and Pascal Felber Computer Science Department, University of Neuchâtel

More information

Dynamic Metadata Management for Petabyte-scale File Systems

Dynamic Metadata Management for Petabyte-scale File Systems Dynamic Metadata Management for Petabyte-scale File Systems Sage Weil Kristal T. Pollack, Scott A. Brandt, Ethan L. Miller UC Santa Cruz November 1, 2006 Presented by Jae Geuk, Kim System Overview Petabytes

More information

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

Load Sharing in Peer-to-Peer Networks using Dynamic Replication Load Sharing in Peer-to-Peer Networks using Dynamic Replication S Rajasekhar, B Rong, K Y Lai, I Khalil and Z Tari School of Computer Science and Information Technology RMIT University, Melbourne 3, Australia

More information

Evaluating Unstructured Peer-to-Peer Lookup Overlays

Evaluating Unstructured Peer-to-Peer Lookup Overlays Evaluating Unstructured Peer-to-Peer Lookup Overlays Idit Keidar EE Department, Technion Roie Melamed CS Department, Technion ABSTRACT Unstructured peer-to-peer lookup systems incur small constant overhead

More information

Early Measurements of a Cluster-based Architecture for P2P Systems

Early Measurements of a Cluster-based Architecture for P2P Systems Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella

More information

Building a low-latency, proximity-aware DHT-based P2P network

Building a low-latency, proximity-aware DHT-based P2P network Building a low-latency, proximity-aware DHT-based P2P network Ngoc Ben DANG, Son Tung VU, Hoai Son NGUYEN Department of Computer network College of Technology, Vietnam National University, Hanoi 144 Xuan

More information

Plover: A Proactive Low-overhead File Replication Scheme for Structured P2P Systems

Plover: A Proactive Low-overhead File Replication Scheme for Structured P2P Systems : A Proactive Low-overhead File Replication Scheme for Structured P2P Systems Haiying Shen Yingwu Zhu Dept. of Computer Science & Computer Engineering Dept. of Computer Science & Software Engineering University

More information

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -

More information

Effective File Replication and Consistency Maintenance Mechanism in P2P Systems

Effective File Replication and Consistency Maintenance Mechanism in P2P Systems Global Journal of Computer Science and Technology Volume 11 Issue 16 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Structured Peer-to-Peer

More information

Should we build Gnutella on a structured overlay? We believe

Should we build Gnutella on a structured overlay? We believe Should we build on a structured overlay? Miguel Castro, Manuel Costa and Antony Rowstron Microsoft Research, Cambridge, CB3 FB, UK Abstract There has been much interest in both unstructured and structured

More information

IN recent years, the amount of traffic has rapidly increased

IN recent years, the amount of traffic has rapidly increased , March 15-17, 2017, Hong Kong Content Download Method with Distributed Cache Management Masamitsu Iio, Kouji Hirata, and Miki Yamamoto Abstract This paper proposes a content download method with distributed

More information

Distriubted Hash Tables and Scalable Content Adressable Network (CAN)

Distriubted Hash Tables and Scalable Content Adressable Network (CAN) Distriubted Hash Tables and Scalable Content Adressable Network (CAN) Ines Abdelghani 22.09.2008 Contents 1 Introduction 2 2 Distributed Hash Tables: DHT 2 2.1 Generalities about DHTs............................

More information

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS

DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS DISTRIBUTED HASH TABLE PROTOCOL DETECTION IN WIRELESS SENSOR NETWORKS Mr. M. Raghu (Asst.professor) Dr.Pauls Engineering College Ms. M. Ananthi (PG Scholar) Dr. Pauls Engineering College Abstract- Wireless

More information

Time-related replication for p2p storage system

Time-related replication for p2p storage system Seventh International Conference on Networking Time-related replication for p2p storage system Kyungbaek Kim E-mail: University of California, Irvine Computer Science-Systems 3204 Donald Bren Hall, Irvine,

More information

Load Balancing in Structured P2P Systems

Load Balancing in Structured P2P Systems 1 Load Balancing in Structured P2P Systems Ananth Rao Karthik Lakshminarayanan Sonesh Surana Richard Karp Ion Stoica fananthar, karthik, sonesh, karp, istoicag@cs.berkeley.edu Abstract Most P2P systems

More information

Exploiting Semantic Clustering in the edonkey P2P Network

Exploiting Semantic Clustering in the edonkey P2P Network Exploiting Semantic Clustering in the edonkey P2P Network S. Handurukande, A.-M. Kermarrec, F. Le Fessant & L. Massoulié Distributed Programming Laboratory, EPFL, Switzerland INRIA, Rennes, France INRIA-Futurs

More information

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC Distributed Meta-data Servers: Architecture and Design Sarah Sharafkandi David H.C. Du DISC 5/22/07 1 Outline Meta-Data Server (MDS) functions Why a distributed and global Architecture? Problem description

More information

MULTI-DOMAIN VoIP PEERING USING OVERLAY NETWORK

MULTI-DOMAIN VoIP PEERING USING OVERLAY NETWORK 116 MULTI-DOMAIN VoIP PEERING USING OVERLAY NETWORK Herry Imanta Sitepu, Carmadi Machbub, Armein Z. R. Langi, Suhono Harso Supangkat School of Electrical Engineering and Informatics, Institut Teknologi

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [P2P SYSTEMS] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Byzantine failures vs malicious nodes

More information

Excogitating File Replication and Consistency maintenance strategies intended for Providing High Performance at low Cost in Peer-to-Peer Networks

Excogitating File Replication and Consistency maintenance strategies intended for Providing High Performance at low Cost in Peer-to-Peer Networks Excogitating File Replication and Consistency maintenance strategies intended for Providing High Performance at low Cost in Peer-to-Peer Networks Bollimuntha Kishore Babu #1, Divya Vadlamudi #2, Movva

More information

Location Efficient Proximity and Interest Clustered P2p File Sharing System

Location Efficient Proximity and Interest Clustered P2p File Sharing System Location Efficient Proximity and Interest Clustered P2p File Sharing System B.Ajay Kumar M.Tech, Dept of Computer Science & Engineering, Usharama College of Engineering & Technology, A.P, India. Abstract:

More information

A Top Catching Scheme Consistency Controlling in Hybrid P2P Network

A Top Catching Scheme Consistency Controlling in Hybrid P2P Network A Top Catching Scheme Consistency Controlling in Hybrid P2P Network V. Asha*1, P Ramesh Babu*2 M.Tech (CSE) Student Department of CSE, Priyadarshini Institute of Technology & Science, Chintalapudi, Guntur(Dist),

More information

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems

A Super-Peer Based Lookup in Structured Peer-to-Peer Systems A Super-Peer Based Lookup in Structured Peer-to-Peer Systems Yingwu Zhu Honghao Wang Yiming Hu ECECS Department ECECS Department ECECS Department University of Cincinnati University of Cincinnati University

More information

Jinho Hwang and Timothy Wood George Washington University

Jinho Hwang and Timothy Wood George Washington University Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP

More information

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems Yunhao Liu, Zhenyun Zhuang, Li Xiao Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824

More information

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04 The Design and Implementation of a Next Generation Name Service for the Internet Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04 Presenter: Saurabh Kadekodi Agenda DNS overview Current DNS Problems

More information

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING MOHAMMAD H. NADIMI-SHAHRAKI*, FARAMARZ SAFI, ELNAZ SHAFIGH FARD Department of Computer Engineering, Najafabad branch, Islamic Azad University, Najafabad,

More information

A Scalable Content- Addressable Network

A Scalable Content- Addressable Network A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1 Outline CAN basics

More information

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks

SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks SplitQuest: Controlled and Exhaustive Search in Peer-to-Peer Networks Pericles Lopes Ronaldo A. Ferreira pericles@facom.ufms.br raf@facom.ufms.br College of Computing, Federal University of Mato Grosso

More information

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for

More information

PPMS: A Peer to Peer Metadata Management Strategy for Distributed File Systems

PPMS: A Peer to Peer Metadata Management Strategy for Distributed File Systems PPMS: A Peer to Peer Metadata Management Strategy for Distributed File Systems Di Yang, Weigang Wu, Zhansong Li, Jiongyu Yu, and Yong Li Department of Computer Science, Sun Yat-sen University Guangzhou

More information

Caching for NASD. Department of Computer Science University of Wisconsin-Madison Madison, WI 53706

Caching for NASD. Department of Computer Science University of Wisconsin-Madison Madison, WI 53706 Caching for NASD Chen Zhou Wanli Yang {chenzhou, wanli}@cs.wisc.edu Department of Computer Science University of Wisconsin-Madison Madison, WI 53706 Abstract NASD is a totally new storage system architecture,

More information

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems Kuang-Li Huang, Tai-Yi Huang and Jerry C. Y. Chou Department of Computer Science National Tsing Hua University Hsinchu,

More information

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables Takehiro Miyao, Hiroya Nagao, Kazuyuki Shudo Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,

More information

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen

Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Deadline Guaranteed Service for Multi- Tenant Cloud Storage Guoxin Liu and Haiying Shen Presenter: Haiying Shen Associate professor *Department of Electrical and Computer Engineering, Clemson University,

More information

Shaking Service Requests in Peer-to-Peer Video Systems

Shaking Service Requests in Peer-to-Peer Video Systems Service in Peer-to-Peer Video Systems Ying Cai Ashwin Natarajan Johnny Wong Department of Computer Science Iowa State University Ames, IA 500, U. S. A. E-mail: {yingcai, ashwin, wong@cs.iastate.edu Abstract

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

Minimizing Churn in Distributed Systems

Minimizing Churn in Distributed Systems Minimizing Churn in Distributed Systems by P. Brighten Godfrey, Scott Shenker, and Ion Stoica appearing in SIGCOMM 2006 presented by Todd Sproull Introduction Problem: nodes joining or leaving distributed

More information

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu

Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu Towards Deadline Guaranteed Cloud Storage Services Guoxin Liu, Haiying Shen, and Lei Yu Presenter: Guoxin Liu Ph.D. Department of Electrical and Computer Engineering, Clemson University, Clemson, USA Computer

More information

Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution

Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution Proximity Based Peer-to-Peer Overlay Networks (P3ON) with Load Distribution Kunwoo Park 1, Sangheon Pack 2, and Taekyoung Kwon 1 1 School of Computer Engineering, Seoul National University, Seoul, Korea

More information

Efficient Resource Management for the P2P Web Caching

Efficient Resource Management for the P2P Web Caching Efficient Resource Management for the P2P Web Caching Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute

More information

the Effectiveness of Migration-based Load

the Effectiveness of Migration-based Load On the Effectiveness of Migration-based Load Balancing Strategies in DHT Systems Di Wu, Ye Tian and Kam-Wing Ng Department of Computer Science & Engineering The Chinese University of Hong Kong Shatin,

More information

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery Shawn Chen 1, Steve Liang 2 1 Geomatics, University of Calgary, hschen@ucalgary.ca 2 Geomatics, University of Calgary, steve.liang@ucalgary.ca

More information

Fault Resilience of Structured P2P Systems

Fault Resilience of Structured P2P Systems Fault Resilience of Structured P2P Systems Zhiyu Liu 1, Guihai Chen 1, Chunfeng Yuan 1, Sanglu Lu 1, and Chengzhong Xu 2 1 National Laboratory of Novel Software Technology, Nanjing University, China 2

More information

Update Propagation Through Replica Chain in Decentralized and Unstructured P2P Systems

Update Propagation Through Replica Chain in Decentralized and Unstructured P2P Systems Update Propagation Through Replica Chain in Decentralized and Unstructured PP Systems Zhijun Wang, Sajal K. Das, Mohan Kumar and Huaping Shen Center for Research in Wireless Mobility and Networking (CReWMaN)

More information

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network

A Directed-multicast Routing Approach with Path Replication in Content Addressable Network 2010 Second International Conference on Communication Software and Networks A Directed-multicast Routing Approach with Path Replication in Content Addressable Network Wenbo Shen, Weizhe Zhang, Hongli Zhang,

More information

Distributed Hash Table

Distributed Hash Table Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ

More information

Debunking some myths about structured and unstructured overlays

Debunking some myths about structured and unstructured overlays Debunking some myths about structured and unstructured overlays Miguel Castro Manuel Costa Antony Rowstron Microsoft Research, 7 J J Thomson Avenue, Cambridge, UK Abstract We present a comparison of structured

More information

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup

Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup Structured Superpeers: Leveraging Heterogeneity to Provide Constant-Time Lookup Alper Mizrak (Presenter) Yuchung Cheng Vineet Kumar Stefan Savage Department of Computer Science & Engineering University

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

Understanding Chord Performance

Understanding Chord Performance CS68 Course Project Understanding Chord Performance and Topology-aware Overlay Construction for Chord Li Zhuang(zl@cs), Feng Zhou(zf@cs) Abstract We studied performance of the Chord scalable lookup system

More information

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD 1 SHAIK SHAHEENA, 2 SD. AFZAL AHMAD, 3 DR.PRAVEEN SHAM 1 PG SCHOLAR,CSE(CN), QUBA ENGINEERING COLLEGE & TECHNOLOGY, NELLORE 2 ASSOCIATE PROFESSOR, CSE, QUBA ENGINEERING COLLEGE & TECHNOLOGY, NELLORE 3

More information

Replication, Load Balancing and Efficient Range Query Processing in DHTs

Replication, Load Balancing and Efficient Range Query Processing in DHTs Replication, Load Balancing and Efficient Range Query Processing in DHTs Theoni Pitoura, Nikos Ntarmos, and Peter Triantafillou R.A. Computer Technology Institute and Computer Engineering & Informatics

More information

Exploiting Communities for Enhancing Lookup Performance in Structured P2P Systems

Exploiting Communities for Enhancing Lookup Performance in Structured P2P Systems Exploiting Communities for Enhancing Lookup Performance in Structured P2P Systems H. M. N. Dilum Bandara and Anura P. Jayasumana Colorado State University Anura.Jayasumana@ColoState.edu Contribution Community-aware

More information

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft

More information

A Peer-to-Peer Architecture to Enable Versatile Lookup System Design

A Peer-to-Peer Architecture to Enable Versatile Lookup System Design A Peer-to-Peer Architecture to Enable Versatile Lookup System Design Vivek Sawant Jasleen Kaur University of North Carolina at Chapel Hill, Chapel Hill, NC, USA vivek, jasleen @cs.unc.edu Abstract The

More information

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE for for March 10, 2006 Agenda for Peer-to-Peer Sytems Initial approaches to Their Limitations CAN - Applications of CAN Design Details Benefits for Distributed and a decentralized architecture No centralized

More information

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007

Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007 Content Overlays (continued) Nick Feamster CS 7260 March 26, 2007 Administrivia Quiz date Remaining lectures Interim report PS 3 Out Friday, 1-2 problems 2 Structured vs. Unstructured Overlays Structured

More information

Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing

Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing Zhi Li, Prasant Mohapatra, and Chen-Nee Chuah University of California, Davis, CA 95616, USA {lizhi, prasant}@cs.ucdavis.edu,

More information

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems

Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems Finding Data in the Cloud using Distributed Hash Tables (Chord) IBM Haifa Research Storage Systems 1 Motivation from the File Systems World The App needs to know the path /home/user/my pictures/ The Filesystem

More information

Proactive Caching for Better than Single-Hop Lookup Performance

Proactive Caching for Better than Single-Hop Lookup Performance Proactive Caching for Better than Single-Hop Lookup Performance Venugopalan Ramasubramanian and Emin Gün Sirer Cornell University, Ithaca NY 4853 ramasv, egs @cs.cornell.edu Abstract High lookup latencies

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

EAD: An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems

EAD: An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems EAD: An Efficient and Adaptive Decentralized File Replication Algorithm in P2P File Sharing Systems Haiying Shen Department of Computer Science and Computer Engineering University of Arkansas, Fayetteville,

More information

Web Caching and Content Delivery

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve Web performance Duplicate requests to the same

More information

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd

Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd Small-World Overlay P2P Networks: Construction and Handling Dynamic Flash Crowd Ken Y.K. Hui John C. S. Lui David K.Y. Yau Dept. of Computer Science & Engineering Computer Science Department The Chinese

More information

Athens University of Economics and Business. Dept. of Informatics

Athens University of Economics and Business. Dept. of Informatics Athens University of Economics and Business Athens University of Economics and Business Dept. of Informatics B.Sc. Thesis Project report: Implementation of the PASTRY Distributed Hash Table lookup service

More information

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network Mouna Kacimi Max-Planck Institut fur Informatik 66123 Saarbrucken, Germany mkacimi@mpi-inf.mpg.de ABSTRACT Several caching

More information

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization

More information

Distributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4

Distributed Web Crawling over DHTs. Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4 Distributed Web Crawling over DHTs Boon Thau Loo, Owen Cooper, Sailesh Krishnamurthy CS294-4 Search Today Search Index Crawl What s Wrong? Users have a limited search interface Today s web is dynamic and

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

08 Distributed Hash Tables

08 Distributed Hash Tables 08 Distributed Hash Tables 2/59 Chord Lookup Algorithm Properties Interface: lookup(key) IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per

More information

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Content Overlays. Nick Feamster CS 7260 March 12, 2007 Content Overlays Nick Feamster CS 7260 March 12, 2007 Content Overlays Distributed content storage and retrieval Two primary approaches: Structured overlay Unstructured overlay Today s paper: Chord Not

More information

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline System Architectural Design Issues Centralized Architectures Application

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search David Baer Student of Computer Science Dept. of Computer Science Swiss Federal Institute of Technology (ETH) ETH-Zentrum,

More information

ACONTENT discovery system (CDS) is a distributed

ACONTENT discovery system (CDS) is a distributed 54 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO. 1, JANUARY 2004 Design and Evaluation of a Distributed Scalable Content Discovery System Jun Gao and Peter Steenkiste, Senior Member, IEEE

More information

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs

Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs Back-Up Chord: Chord Ring Recovery Protocol for P2P File Sharing over MANETs Hong-Jong Jeong, Dongkyun Kim, Jeomki Song, Byung-yeub Kim, and Jeong-Su Park Department of Computer Engineering, Kyungpook

More information

Load Balancing in Peer-to-Peer Systems

Load Balancing in Peer-to-Peer Systems Load Balancing in Peer-to-Peer Systems Haiying Shen Computer Science and Computer Engineering Department University of Arkansas Fayetteville, Arkansas, USA 2 Abstract Structured peer-to-peer (P2P) overlay

More information

Peer-to-Peer Systems and Distributed Hash Tables

Peer-to-Peer Systems and Distributed Hash Tables Peer-to-Peer Systems and Distributed Hash Tables CS 240: Computing Systems and Concurrency Lecture 8 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Selected

More information

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers

SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers 2011 31st International Conference on Distributed Computing Systems Workshops SHHC: A Scalable Hybrid Hash Cluster for Cloud Backup Services in Data Centers Lei Xu, Jian Hu, Stephen Mkandawire and Hong

More information

Three Layer Hierarchical Model for Chord

Three Layer Hierarchical Model for Chord Three Layer Hierarchical Model for Chord Waqas A. Imtiaz, Shimul Shil, A.K.M Mahfuzur Rahman Abstract Increasing popularity of decentralized Peer-to-Peer (P2P) architecture emphasizes on the need to come

More information

Effects of Churn on Structured P2P Overlay Networks

Effects of Churn on Structured P2P Overlay Networks International Conference on Automation, Control, Engineering and Computer Science (ACECS'14) Proceedings - Copyright IPCO-214, pp.164-17 ISSN 2356-568 Effects of Churn on Structured P2P Overlay Networks

More information

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables Peer-to-Peer Systems and Distributed Hash Tables COS 418: Distributed Systems Lecture 7 Today 1. Peer-to-Peer Systems Napster, Gnutella, BitTorrent, challenges 2. Distributed Hash Tables 3. The Chord Lookup

More information

Routing Table Construction Method Solely Based on Query Flows for Structured Overlays

Routing Table Construction Method Solely Based on Query Flows for Structured Overlays Routing Table Construction Method Solely Based on Query Flows for Structured Overlays Yasuhiro Ando, Hiroya Nagao, Takehiro Miyao and Kazuyuki Shudo Tokyo Institute of Technology Abstract In structured

More information

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes

More information

Towards Efficient Load Balancing in Structured P2P Systems

Towards Efficient Load Balancing in Structured P2P Systems Towards Efficient Load Balancing in Structured P2P Systems Yingwu Zhu Department of ECECS University of Cincinnati zhuy@ececs.uc.edu Yiming Hu Department of ECECS University of Cincinnati yhu@ececs.uc.edu

More information

VFS Interceptor: Dynamically Tracing File System Operations in real. environments

VFS Interceptor: Dynamically Tracing File System Operations in real. environments VFS Interceptor: Dynamically Tracing File System Operations in real environments Yang Wang, Jiwu Shu, Wei Xue, Mao Xue Department of Computer Science and Technology, Tsinghua University iodine01@mails.tsinghua.edu.cn,

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Today l Basic distributed file systems l Two classical examples Next time l Naming things xkdc Distributed File Systems " A DFS supports network-wide sharing of files and devices

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

Architectures for Distributed Systems

Architectures for Distributed Systems Distributed Systems and Middleware 2013 2: Architectures Architectures for Distributed Systems Components A distributed system consists of components Each component has well-defined interface, can be replaced

More information

Ceph: A Scalable, High-Performance Distributed File System

Ceph: A Scalable, High-Performance Distributed File System Ceph: A Scalable, High-Performance Distributed File System S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long Presented by Philip Snowberger Department of Computer Science and Engineering University

More information

Query Processing Over Peer-To-Peer Data Sharing Systems

Query Processing Over Peer-To-Peer Data Sharing Systems Query Processing Over Peer-To-Peer Data Sharing Systems O. D. Şahin A. Gupta D. Agrawal A. El Abbadi Department of Computer Science University of California at Santa Barbara odsahin, abhishek, agrawal,

More information

TSP-Chord: An Improved Chord Model with Physical Topology Awareness

TSP-Chord: An Improved Chord Model with Physical Topology Awareness 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore TSP-Chord: An Improved Chord Model with Physical Topology Awareness

More information

A Square Root Topologys to Find Unstructured Peer-To-Peer Networks

A Square Root Topologys to Find Unstructured Peer-To-Peer Networks Global Journal of Computer Science and Technology Network, Web & Security Volume 13 Issue 2 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol Min Li 1, Enhong Chen 1, and Phillip C-y Sheu 2 1 Department of Computer Science and Technology, University of Science and Technology of China,

More information

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017 Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store

More information