Distributed Two-way Trees for File Replication on Demand

Size: px
Start display at page:

Download "Distributed Two-way Trees for File Replication on Demand"

Transcription

1 Distributed Two-way Trees for File Replication on Demand Ramprasad Tamilselvan Department of Computer Science Golisano College of Computing and Information Sciences Rochester, NY Abstract Edge data centers reduce the access time of contents significantly. It pulls the content from the origin servers, stores the content locally and serves the client. The storage system in the edge data centers should be flexible and dynamic to handle the popular files or sudden peak in the traffic for the files. In this paper, we propose an algorithm called two-way tree to replicate files efficiently based on demand. The experimental results show that the two-way tree algorithm relieves hot spots in the storage cluster and performs significantly better than the existing peer to peer storage system during the peak time in traffic. I. INTRODUCTION Content caching helps in reducing the access time of web contents. Global edge networks store the content in a location close to the users to reduce the access time of contents. It also reduces the burden on the origin servers. Since edge networks store heterogeneous contents, the popularity of the files changes dynamically. Change in the popularity of the files creates hot spots in the storage cluster. Replicating files dynamically in the storage cluster relieves hot spots and reduces the access time of the popular files. The data traffic in edge networks keeps on increasing, so edge networks use mini data centers to store the cache. The architecture of this data center comprises of proxy servers and a storage cluster. The storage cluster uses peer to peer distributed systems to reduce the access time of contents. In the existing system, the number of replication remains constant. The number of copies of both popular and non-popular files remains same in the storage cluster. When the storage cluster receives more requests for the popular files, all the systems in the cluster try to access the servers which store popular files. This overloads the server and affects the access time of files. Random tree algorithm [1] makes use of consistent hashing to relieve hot spots in the web servers. It creates a virtual tree for a file and stores the file in the root node of the tree. The leaf nodes of the tree receive the initial requests. Each node forwards the requests to its parent if it doesn t have the file until the request reaches the node with the file. If a node has the file, it serves the client. The only problem with the random tree is that requests for both popular and non-popular files follow the same path and use the same tree. Though it reduces the access time of popular files, it increases the access time of non-popular files. Two-way tree algorithm makes use of consistent hashing and random trees. But it decouples a tree into a fat tree and a slim tree. The initial requests make use of the fat tree to reach the content quickly. Once the file becomes popular, the requests for the file make use of the thin tree and replicates the file to fewer nodes based on demand. This algorithm makes the storage cluster more flexible. It helps to relieve hot spots in the servers and also reduces the access time of popular and non-popular files. We developed a simulation to generate the requests based on the web traces available from UC Berkley. The simulation requests contents from the systems which use peer to peer distributed storage and Two-way tree algorithm. The results from the simulation are used to evaluate the performance of both systems. II. RELATED WORK In the early stages, most of the systems used file replication to backup the data. When Internet traffic started growing, replications are used in peer to peer distributed systems to prevent overloading the servers. In the existing peer to peer distributed systems like Cassandra, replication factor can be configured as more than 1. So that it backs up data and at the same time distributes load among the servers. But in these systems, replication factors are configured statically. It does not change dynamically based on demand. Li et al. [2] proposed Tachyon to replicate data based on demand. Later, Tachyon was renamed as Alluxio and developed as a standalone system. Alluxio acts as an intermediate layer between traditional storage and computation framework to provide faster access to data. Alluxio introduces cache layer between application and storage layer. Alluxio has three components such as master, workers and clients. Computation frameworks make use of alluxio clients to access the server. Workers help to store, retrieve and cache the data when required. Data sharing among workers reduces workload in storage servers. In this way, popular data are cached in alluxio workers and served in memory speed. Blowfish [3] is a distributed data store that achieves dynamic storage-performance trade off. Scarlett [4] replicates popular content efficiently in MapReduce clusters. Blowfish and Scarlett systems replicate files in the storage layer. So these two 1 P a g e

2 systems are closely coupled with storage layer. These systems are not suitable for heterogeneous environment where different softwares are used in storage layers. Karger et al. [1] proposed distributed caching protocols for relieving hot spots in servers. This protocol makes use of consistent hashing and random trees. In this algorithm, a virtual tree is built for each file. And the file is placed in the root node of the tree. All the initial requests are received in the leaf nodes and pass on to its parent node till it reaches the node which has the file. Each node keeps track of popularity of files. When a file becomes popular, node which has the file replicates it to its leaf nodes. We propose an algorithm called Two-way tree which replicates the files efficiently based on demand. The algorithm is based on consistent hashing and random tree. But it decouples lookup tree and replication tree to improve the lookup time complexity. This algorithm is used in a system which acts as an intermediate layer between storage and application layer. A. Tree Construction III. RANDOM TREES A random tree algorithm constructs a virtual tree for each file in the storage as shown in Fig.1 and places the file in the root node of the tree. The server in the random tree is accessed using hash function(f ilename, level, position). The hash function takes the parameters as file name, level of the node in the tree and position of the node in the tree and returns the server id. A node can compute its parent node using the above hash function for the file. The node passes the name of the file, the parent position, and the parent level to the hash function and it gets back the parent s server id. The parent position and the parent level can be computed using the degree of the tree. The initial requests are received by the leaf nodes the random tree. The server id of the leaf nodes of the tree for the file can be computed using the hash function. The level and the position of the leaf nodes of the tree can be computed using the degree of lookup tree. lookup path B. Limitations replication path Fig. 1. Random tree (n = 7 and d = 2) A random tree algorithm helps in relieving hot spots in the storage cluster by replicating the files based on demand. When the node in the random tree receives the request for the file, the node forwards the request to its parent node till the request reaches the node which has the file. The node which has the file serves the request. The additional overhead in forwarding the request affects the overall performance of the system. To overcome this issue, we can increase the degree of the random tree. But the issue with the fat tree is that replication is not under control in the storage cluster. A. Tree Construction IV. TWO-WAY TREES In the two-way tree algorithm, a virtual tree is constructed for lookup path and replication path for each file as shown in Fig 2 and Fig 3 respectively. The degree of the lookup tree is larger than the degree of the replication tree. The fat lookup tree reduces the lookup path significantly. The small replication tree controls the number of replications in the system. The server id of the nodes are retrieved using hash f unc (f ile name, level, position, degree). The hash function takes parameters as the filename, the level of the node in the tree and the position of the node in the tree. The degree in the hash function denotes the degree of the lookup tree or the random tree based on the type of request. 1 lookup path Fig. 2. Two-way tree (lookup) (n = 7 and D = 6) replication path Fig. 3. Two-way tree (replication) (n = 7 and d = 2) B. Handling Read Requests In the two-way tree algorithm, the server forwards the request to its parent node in both lookup tree and replication tree. The mode of the request is UP in the lookup tree and LEFT in the replication tree. Algorithm 1 describes a method for handling the read request in the lookup tree. The server keeps track of the number of files forwarded to its parent node in the lookup tree and the replication tree. If the file is not available on the server, it forwards the requests to its parent node in the lookup tree. If the popularity of a file increases, the server forwards the request for the file to its parent node in the replication tree. If the popularity of the file is high, the server sends the write request to its parent node in the replication tree. Once the file is available on the server, it directly serves 2 P a g e

3 the client. If the server receives more number of requests for the file, it shares the load by forwarding some requests to its parent node in the replication tree. Algorithm 1 Handling read requests in the server in the lookup tree 1: procedure READ(UP, file, sender id) 2: countup countup + 1 3: if file.color = white then 4: if countup < threshold1 then 5: forward read(up, file, this.id) to up parent 6: else if countup < threshold2 then 7: forward read(left, file, this.id) to left parent 8: else 9: file.color gray 10: forward write(left, file, this.id) to left parent 11: enqueue(file) 12: else 13: if countup < threshold3 OR leaf OR left parent(sender id) then 14: if file.color = black then 15: serve client 16: else if file.color = gray then 17: enqueue(file) 18: else 19: forward read(left, file, sender id) to left parent(sender id) Algorithm 2 describes a method for handling read requests in the replication tree (mode as LEFT). The server keeps track of the number of files forwarded to its parent node in the replication tree. If the popularity of the file is less on the server, it forwards read requests to its parent node in the replication tree. If the file is more popular on the server, it forwards the write requests to its parent node in the replication tree. When the file is available on the server, it serves the client. Algorithm 2 Handling read requests in the server in the replication tree 1: procedure READ(LEFT, file, sender id) 2: countleft countleft + 1 3: if file.color = white then 4: if countleft < threshold1 then 5: forward read(left, file, this.id) to left parent 6: else 7: file.color gray 8: forward write(left, file, this.id) to left parent 9: else if file.color = gray then 10: enqueue(file) 11: else 12: serve client C. Handling Write Requests The write request for the file indicates that file is more popular in the storage cluster. When a server receives the write request, it forwards the write request to its parent node in the replication tree if the file is not available. Otherwise, it replicates the file to its sender child node in the tree. The write request keeps track of all the servers in its path so that all the servers receive a copy of the file. Algorithm 3 describes a method for handling write requests in the server. When a server receives the write request with mode as RIGHT, it replicates the file to the next server id in the write request and forwards the write request to it with mode as RIGHT. Algorithm 4 describes a method for handling write requests with mode as right in the server. Algorithm 3 Handling write requests in the server in the replication tree 1: procedure WRITE(LEFT, file, sender id) 2: if file.color = black then 3: replicate file to sender id 4: Forward write(right, file, sender id) 5: else if file.color = white then 6: file.color gray 7: forward(left, file, this id) to left parent Algorithm 4 Handling replication requests in the server in the replication tree 1: procedure WRITE(RIGHT, file, sender id) 2: store file 3: forward(right, file, this.id) to right child D. Time Complexity Analysis In this section, the time complexity of lookup path length is discussed in detail. The time complexity of lookup path in the random tree is O (log d n), where n is the number of servers in the storage cluster and d is the degree of the random tree. The time complexity of lookup path in the two-way tree is O (log D n), where n is the number of servers in the storage cluster and D is the degree of lookup tree. The lookup path length in the two-way tree algorithm is significantly reduced. The time complexity of lookup path in the two-way tree algorithm is less than the random tree algorithm. This reduces the overhead of forwarding requests to the server which has the file. It helps in improving the performance of storage system. V. SIMULATIONS Simulations of two-way tree system, random tree storage system and peer to peer distributed storage system are developed to compare the performance of storage cluster. The details of the simulation of each system are discussed in detail in this section. A. Peer to Peer Distributed Storage System Storage cluster uses peer to peer distributed storage system to improve its performance. Cassandra is an example of a peer to peer system for the key-value store. The peer to peer system uses consistent hashing which assigns a range of hash keys to each server in the storage cluster. The servers in the storage cluster can receive the initial requests. The peer to peer system locates the file in the server by computing the hash key based on the name of the file. The hash key of the file should match with the range of hash keys in the server. The peer to peer system stores the file on the server which holds the hash key for the file. Any servers in the storage cluster can receive the read request for the file. The server serves the client directly if the file is available on the server. Otherwise, the server requests the file from the peer server which has the file and serves the client directly. The server does not store the content from the peer server. The replication factor in the peer to peer system is configurable. But the replication factor is static and constant. For the replication factor 2, the system stores two copies of files in the cluster. We developed a system which simulates the peer to peer system with functionalities as mentioned above. 3 P a g e

4 B. Random Tree Storage Simulation The random tree system constructs a virtual tree for each file in the storage cluster. The leaf nodes of the tree receive the initial request for the file. If the file is available on the server, it serves the request. Otherwise, it forwards the request to its parent node till it reaches the node which has the file. If the popularity of the file reaches configured threshold value in the server, the server replicates the file to its child nodes. We developed a system which simulates the random tree algorithm as mentioned above. C. Two-way Tree Storage Simulation The two-way tree system constructs a virtual lookup tree and a virtual replication tree for each file in the storage cluster. The degree of lookup tree is larger than the degree of replication tree. The two-way tree algorithm forwards the request to its parent node in the lookup tree. It uses the replication tree for file replication. We developed a simulation which simulates the two-way tree algorithm. VI. SIMULATION EXPERIMENTS In this section, we discuss the trace driven approach, the performance metrics for the evaluation and the system configurations in detail. A. A Trace driven approach In this project, we used a trace-driven approach for testing our simulations. We used web traces from real proxy servers as input to our simulation. For our experiments, we used the web traces data available in We preprocessed the web traces available in the above link to suit our simulation. This approach helps in testing our system in the real time. B. System Configurations The experiments are conducted in Linux machine with 8 GB memory and 2GHz Intel Core i5 processor. The simulations are developed using JAVA version 8. C. Simulation Configurations In our experiment, some of the properties are common in all three simulations. This helps in evaluating the performance of all simulations in the same environment. Each server in the storage cluster process a client request and a peer request per second. Each server completes processing a request exactly in one second. The server forwards the request to its parent node in one second. VII. RESULTS In this section, we present the results of the performance of random tree algorithm, two-way tree algorithm and peer to peer system. For this experiment, the number of servers in the storage cluster is 7. The degree of the random tree in this experiment is 2. In two-way trees, the degree of the lookup tree is 6 and the degree of the replication tree is 2. In the scaled configuration experiment, the number of servers in the storage cluster is 15. The degree of the random tree is 2. In two-way trees, the degree of the lookup tree is 14 and the degree of the replication tree is 2. A. Random Tree Vs Two-way Tree Fig. 4 shows the plot of maximum queue length of the random tree and the two-way tree measured during the simulation. In the simulation, the maximum queue length of the servers in the storage is measured at the interval of 5000 seconds. From the graph, we can say that the maximum queue length of the random tree and the two-way tree remains same and low. This indicates that there is no occurrence of hot spots in the storage cluster. Fig. 5 shows the plot of the number of files served by the random tree and the two-way tree in the simulation. The graph shows that the two-way tree system serves more files compared to the random The fat lookup tree helps two-way trees to serve more files compared to the random tree. Fig. 4. Maximum queue length in the random tree and the two-way tree system. B. Two-way Tree Vs Peer to Peer System Fig. 6 shows the plot of maximum queue length in the twoway tree and the peer to peer system. The graph shows that the max queue length of the two-way tree remains low throughout the simulation. The max queue length of the peer to peer system has a spike at time 60000th second. This indicates that there are occurrences of hot spots in the peer to peer system. Fig. 7 shows the number of files served by both the two-way tree system and the peer to peer system. The plot shows that the two-way tree system performs better than the peer to peer system around time period The above two plots show that there are occurrences of hot spots in the storage cluster in the peer to peer system at time When there are occurrences of hot spots in the peer to peer system, it serves less number of files compared to the two-way 4 P a g e

5 Fig. 5. system. Number of files served by the random tree and the two-way tree Fig. 7. Number of files served by the peer to peer system and the two-way Fig. 6. Maximum queue length in the peer to peer system and the two-way Fig. 8. Maximum queue length in the peer to peer system and the two-way C. Scaled Configuration: Two-way Tree Vs Peer to Peer System This experiment evaluates the performance of the two-way tree and the peer to peer system by increasing the number of servers in the storage cluster. The plots of max queue length of the peer to peer system and the two-way tree system are shown in Fig. 8. The plot shows that there is no occurrence of hot spots in the peer to peer system when the number of servers is increased in the storage cluster. The max queue length of the two-way tree remains the same as the peer to peer system. The number of files served by the peer to peer system and the two-way tree system is plotted in Fig. 9. The plot shows that the performance of the peer to peer system and the twoway tree system are same when there are more servers in the cluster. VIII. CONCLUSION In this paper, we designed an algorithm called two-way trees to replicate files dynamically based on demand in the storage cluster. The two-way tree algorithm relieves hot spots in the storage cluster like the random tree algorithm. But the two-way tree algorithm performs better than the random tree. Decoupling the lookup path and the replication path significantly reduces the time complexity of the two-way tree compared to the random tree. The above experiment uses the data collected from the proxy server in the real network. The results show that two-way tree relieves hot spots in the storage cluster and performs better than the peer to peer system during the peak time in the traffic. The storage cluster in edge data centers can make use of two-way tree algorithm to replicate files dynamically based on demand. 5 P a g e

6 Fig. 9. Number of files served by the peer to peer system and the two-way REFERENCES [1] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin, Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web, in Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. ACM, 1997, pp [2] H. Li, A. Ghodsi, M. Zaharia, E. Baldeschwieler, S. Shenker, and I. Stoica, Tachyon: Memory throughput i/o for cluster computing frameworks, memory, vol. 18, p. 1, [3] A. Khandelwal, R. Agarwal, and I. Stoica, Blowfish: Dynamic storageperformance tradeoff in data stores. in NSDI, 2016, pp [4] G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg, I. Stoica, D. Harlan, and E. Harris, Scarlett: coping with skewed content popularity in mapreduce clusters, in Proceedings of the sixth conference on Computer systems. ACM, 2011, pp P a g e

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems Kuang-Li Huang, Tai-Yi Huang and Jerry C. Y. Chou Department of Computer Science National Tsing Hua University Hsinchu,

More information

Consistent Hashing. Overview. Ranged Hash Functions. .. CSC 560 Advanced DBMS Architectures Alexander Dekhtyar..

Consistent Hashing. Overview. Ranged Hash Functions. .. CSC 560 Advanced DBMS Architectures Alexander Dekhtyar.. .. CSC 56 Advanced DBMS Architectures Alexander Dekhtyar.. Overview Consistent Hashing Consistent hashing, introduced in [] is a hashing technique that assigns items (keys) to buckets in a way that makes

More information

Cache Management for In Memory. Jun ZHANG Oct 15, 2018

Cache Management for In Memory. Jun ZHANG Oct 15, 2018 Cache Management for In Memory Analytics Jun ZHANG Oct 15, 2018 1 Outline 1. Introduction 2. LRC: Dependency aware caching 3. OpuS: Fair cache sharing in multi tenant cloud 4. SP Cache: Load balancing

More information

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables

A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables A Structured Overlay for Non-uniform Node Identifier Distribution Based on Flexible Routing Tables Takehiro Miyao, Hiroya Nagao, Kazuyuki Shudo Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-ku,

More information

Load Sharing in Peer-to-Peer Networks using Dynamic Replication

Load Sharing in Peer-to-Peer Networks using Dynamic Replication Load Sharing in Peer-to-Peer Networks using Dynamic Replication S Rajasekhar, B Rong, K Y Lai, I Khalil and Z Tari School of Computer Science and Information Technology RMIT University, Melbourne 3, Australia

More information

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search

Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search Semester Thesis on Chord/CFS: Towards Compatibility with Firewalls and a Keyword Search David Baer Student of Computer Science Dept. of Computer Science Swiss Federal Institute of Technology (ETH) ETH-Zentrum,

More information

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others

Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others Dynamic Load Sharing in Peer-to-Peer Systems: When some Peers are more Equal than Others Sabina Serbu, Silvia Bianchi, Peter Kropf and Pascal Felber Computer Science Department, University of Neuchâtel

More information

Cascaded Coded Distributed Computing on Heterogeneous Networks

Cascaded Coded Distributed Computing on Heterogeneous Networks Cascaded Coded Distributed Computing on Heterogeneous Networks Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT,

More information

Making Gnutella-like P2P Systems Scalable

Making Gnutella-like P2P Systems Scalable Making Gnutella-like P2P Systems Scalable Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, S. Shenker Presented by: Herman Li Mar 2, 2005 Outline What are peer-to-peer (P2P) systems? Early P2P systems

More information

Early Measurements of a Cluster-based Architecture for P2P Systems

Early Measurements of a Cluster-based Architecture for P2P Systems Early Measurements of a Cluster-based Architecture for P2P Systems Balachander Krishnamurthy, Jia Wang, Yinglian Xie I. INTRODUCTION Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores

BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores Anurag Khandelwal, Rachit Agarwal, and Ion Stoica, University of California, Berkeley https://www.usenix.org/conference/nsdi/technical-sessions/presentation/khandelwal

More information

Jinho Hwang and Timothy Wood George Washington University

Jinho Hwang and Timothy Wood George Washington University Jinho Hwang and Timothy Wood George Washington University Background: Memory Caching Two orders of magnitude more reads than writes Solution: Deploy memcached hosts to handle the read capacity 6. HTTP

More information

A Micro Partitioning Technique in MapReduce for Massive Data Analysis

A Micro Partitioning Technique in MapReduce for Massive Data Analysis A Micro Partitioning Technique in MapReduce for Massive Data Analysis Nandhini.C, Premadevi.P PG Scholar, Dept. of CSE, Angel College of Engg and Tech, Tiruppur, Tamil Nadu Assistant Professor, Dept. of

More information

Getafix: Workload-aware Distributed Interactive Analytics

Getafix: Workload-aware Distributed Interactive Analytics Getafix: Workload-aware Distributed Interactive Analytics Presenter: Mainak Ghosh Collaborators: Le Xu, Xiaoyao Qian, Thomas Kao, Indranil Gupta, Himanshu Gupta Data Analytics 2 Picture borrowed from https://conferences.oreilly.com/strata/strata-ny-2016/public/schedule/detail/51640

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

A Scalable Content- Addressable Network

A Scalable Content- Addressable Network A Scalable Content- Addressable Network In Proceedings of ACM SIGCOMM 2001 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Presented by L.G. Alex Sung 9th March 2005 for CS856 1 Outline CAN basics

More information

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Real-time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments Nikos Zacheilas, Vana Kalogeraki Department of Informatics Athens University of Economics and Business 1 Big Data era has arrived!

More information

Shaking Service Requests in Peer-to-Peer Video Systems

Shaking Service Requests in Peer-to-Peer Video Systems Service in Peer-to-Peer Video Systems Ying Cai Ashwin Natarajan Johnny Wong Department of Computer Science Iowa State University Ames, IA 500, U. S. A. E-mail: {yingcai, ashwin, wong@cs.iastate.edu Abstract

More information

Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja

Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja Distributed Implementation of BG Benchmark Validation Phase Dimitrios Stripelis, Sachin Raja {stripeli,raja}@usc.edu 1. BG BENCHMARK OVERVIEW BG is a state full benchmark used to evaluate the performance

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Improving Ensemble of Trees in MLlib

Improving Ensemble of Trees in MLlib Improving Ensemble of Trees in MLlib Jianneng Li, Ashkon Soroudi, Zhiyuan Lin Abstract We analyze the implementation of decision tree and random forest in MLlib, a machine learning library built on top

More information

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University)

Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Jinho Hwang (IBM Research) Wei Zhang, Timothy Wood, H. Howie Huang (George Washington Univ.) K.K. Ramakrishnan (Rutgers University) Background: Memory Caching Two orders of magnitude more reads than writes

More information

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. : An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems. 1 K.V.K.Chaitanya, 2 Smt. S.Vasundra, M,Tech., (Ph.D), 1 M.Tech (Computer Science), 2 Associate Professor, Department

More information

Load Balancing in Structured P2P Systems

Load Balancing in Structured P2P Systems 1 Load Balancing in Structured P2P Systems Ananth Rao Karthik Lakshminarayanan Sonesh Surana Richard Karp Ion Stoica fananthar, karthik, sonesh, karp, istoicag@cs.berkeley.edu Abstract Most P2P systems

More information

Building a low-latency, proximity-aware DHT-based P2P network

Building a low-latency, proximity-aware DHT-based P2P network Building a low-latency, proximity-aware DHT-based P2P network Ngoc Ben DANG, Son Tung VU, Hoai Son NGUYEN Department of Computer network College of Technology, Vietnam National University, Hanoi 144 Xuan

More information

Bistro: a Framework for Building Scalable Wide-Area Upload Applications

Bistro: a Framework for Building Scalable Wide-Area Upload Applications Bistro: a Framework for Building Scalable Wide-Area Upload Applications [ Appeared in ACM SIGMETRICS Performance Evaluation Review, September 2000 ] [ Also Presented at the Workshop on Performance and

More information

ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System

ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System ProRenaTa: Proactive and Reactive Tuning to Scale a Distributed Storage System Ying Liu, Navaneeth Rameshan, Enric Monte, Vladimir Vlassov, and Leandro Navarro Ying Liu; Rameshan, N.; Monte, E.; Vlassov,

More information

IN recent years, the amount of traffic has rapidly increased

IN recent years, the amount of traffic has rapidly increased , March 15-17, 2017, Hong Kong Content Download Method with Distributed Cache Management Masamitsu Iio, Kouji Hirata, and Miki Yamamoto Abstract This paper proposes a content download method with distributed

More information

Distributed Hash Table

Distributed Hash Table Distributed Hash Table P2P Routing and Searching Algorithms Ruixuan Li College of Computer Science, HUST rxli@public.wh.hb.cn http://idc.hust.edu.cn/~rxli/ In Courtesy of Xiaodong Zhang, Ohio State Univ

More information

Dynamic Metadata Management for Petabyte-scale File Systems

Dynamic Metadata Management for Petabyte-scale File Systems Dynamic Metadata Management for Petabyte-scale File Systems Sage Weil Kristal T. Pollack, Scott A. Brandt, Ethan L. Miller UC Santa Cruz November 1, 2006 Presented by Jae Geuk, Kim System Overview Petabytes

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Key metrics for effective storage performance and capacity reporting

Key metrics for effective storage performance and capacity reporting Key metrics for effective storage performance and capacity reporting Key Metrics for Effective Storage Performance and Capacity Reporting Objectives This white paper will cover the key metrics in storage

More information

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei VIAF: Verification-based Integrity Assurance Framework for MapReduce YongzhiWang, JinpengWei MapReduce in Brief Satisfying the demand for large scale data processing It is a parallel programming model

More information

Debunking some myths about structured and unstructured overlays

Debunking some myths about structured and unstructured overlays Debunking some myths about structured and unstructured overlays Miguel Castro Manuel Costa Antony Rowstron Microsoft Research, 7 J J Thomson Avenue, Cambridge, UK Abstract We present a comparison of structured

More information

A Proxy-based Query Aggregation Method for Distributed Key-Value Stores

A Proxy-based Query Aggregation Method for Distributed Key-Value Stores A Proxy-based Query Aggregation Method for Distributed Key-Value Stores Daichi Kawanami, Masanari Kamoshita, Ryota Kawashima and Hiroshi Matsuo Nagoya Institute of Technology, in Nagoya, Aichi, 466-8555,

More information

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network Mouna Kacimi Max-Planck Institut fur Informatik 66123 Saarbrucken, Germany mkacimi@mpi-inf.mpg.de ABSTRACT Several caching

More information

On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment

On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment International Journal of Future Computer and Communication, Vol. 6, No. 1, March 2017 On Performance Evaluation of BM-Based String Matching Algorithms in Distributed Computing Environment Kunaphas Kongkitimanon

More information

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

Rocksteady: Fast Migration for Low-Latency In-memory Storage. Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, niraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1 Introduction Distributed low-latency in-memory key-value stores are

More information

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management

IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management IntegrityMR: Integrity Assurance Framework for Big Data Analytics and Management Applications Yongzhi Wang, Jinpeng Wei Florida International University Mudhakar Srivatsa IBM T.J. Watson Research Center

More information

Performance Testing: A Comparative Study and Analysis of Web Service Testing Tools

Performance Testing: A Comparative Study and Analysis of Web Service Testing Tools Performance Testing: A Comparative Study and Analysis of Web Service Testing Tools Dr.V.Asha 1, Divyabindu M C 2, Asha V 3 1,2,3 Department of Master of Computer Applications, New Horizon College of Engineering,

More information

Double Threshold Based Load Balancing Approach by Using VM Migration for the Cloud Computing Environment

Double Threshold Based Load Balancing Approach by Using VM Migration for the Cloud Computing Environment www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 1 January 2015, Page No. 9966-9970 Double Threshold Based Load Balancing Approach by Using VM Migration

More information

Towards Low-Redundancy Push-Pull P2P Live Streaming

Towards Low-Redundancy Push-Pull P2P Live Streaming Towards Low-Redundancy Push-Pull P2P Live Streaming Zhenjiang Li, Yao Yu, Xiaojun Hei and Danny H.K. Tsang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Study of Load Balancing Schemes over a Video on Demand System

Study of Load Balancing Schemes over a Video on Demand System Study of Load Balancing Schemes over a Video on Demand System Priyank Singhal Ashish Chhabria Nupur Bansal Nataasha Raul Research Scholar, Computer Department Abstract: Load balancing algorithms on Video

More information

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems Alexey Paznikov Saint Petersburg Electrotechnical University

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

Efficient Resource Management for the P2P Web Caching

Efficient Resource Management for the P2P Web Caching Efficient Resource Management for the P2P Web Caching Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Optimizing run-length algorithm using octonary repetition tree

Optimizing run-length algorithm using octonary repetition tree Optimizing run-length algorithm using octonary repetition tree Kaveh Geyratmand Haghighi 1, Mir Kamal Mirnia* 2, Ahmad Habibizad Navin 3 1 Department of Computer,East Azarbaijan Science and Research Branch,

More information

Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment

Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment IEEE TRANSACTION ON PARALLEL AND DISTRIBUTED SYSTEMS(TPDS), VOL. N, NO. N, MONTH YEAR 1 Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment Zhen Xiao,

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Batch Inherence of Map Reduce Framework

Batch Inherence of Map Reduce Framework Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287

More information

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael

More information

Evaluation of Performance of Cooperative Web Caching with Web Polygraph

Evaluation of Performance of Cooperative Web Caching with Web Polygraph Evaluation of Performance of Cooperative Web Caching with Web Polygraph Ping Du Jaspal Subhlok Department of Computer Science University of Houston Houston, TX 77204 {pdu, jaspal}@uh.edu Abstract This

More information

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING

EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING EFFICIENT ROUTING OF LOAD BALANCING IN GRID COMPUTING MOHAMMAD H. NADIMI-SHAHRAKI*, FARAMARZ SAFI, ELNAZ SHAFIGH FARD Department of Computer Engineering, Najafabad branch, Islamic Azad University, Najafabad,

More information

Distributed Storage for Tor Hidden Service Descriptors

Distributed Storage for Tor Hidden Service Descriptors Distributed Storage for Tor Hidden Service Descriptors Karsten Loesing University of Bamberg, Germany karsten.loesing@wiai.uni-bamberg.de March 27, 2007 Abstract Tor provides a mechanism for responder

More information

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework Li-Yung Ho Institute of Information Science Academia Sinica, Department of Computer Science and Information Engineering

More information

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou Scalability In Peer-to-Peer Systems Presented by Stavros Nikolaou Background on Peer-to-Peer Systems Definition: Distributed systems/applications featuring: No centralized control, no hierarchical organization

More information

MULTIMEDIA PROXY CACHING FOR VIDEO STREAMING APPLICATIONS.

MULTIMEDIA PROXY CACHING FOR VIDEO STREAMING APPLICATIONS. MULTIMEDIA PROXY CACHING FOR VIDEO STREAMING APPLICATIONS. Radhika R Dept. of Electrical Engineering, IISc, Bangalore. radhika@ee.iisc.ernet.in Lawrence Jenkins Dept. of Electrical Engineering, IISc, Bangalore.

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

Congestion Control in Datacenters. Ahmed Saeed

Congestion Control in Datacenters. Ahmed Saeed Congestion Control in Datacenters Ahmed Saeed What is a Datacenter? Tens of thousands of machines in the same building (or adjacent buildings) Hundreds of switches connecting all machines What is a Datacenter?

More information

RAID4S: Improving RAID Performance with Solid State Drives

RAID4S: Improving RAID Performance with Solid State Drives RAID4S: Improving RAID Performance with Solid State Drives Rosie Wacha UCSC: Scott Brandt and Carlos Maltzahn LANL: John Bent, James Nunez, and Meghan Wingate SRL/ISSDM Symposium October 19, 2010 1 RAID:

More information

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014! BENCHMARK: PRELIMINARY RESULTS JUNE 25, 2014 Our latest benchmark test results are in. The detailed report will be published early next month, but after 6 weeks of designing and running these tests we

More information

Analyzing Spark Scheduling And Comparing Evaluations On Sort And Logistic Regression With Albatross

Analyzing Spark Scheduling And Comparing Evaluations On Sort And Logistic Regression With Albatross Analyzing Spark Scheduling And Comparing Evaluations On Sort And Logistic Regression With Albatross Henrique Pizzol Grando University Of Sao Paulo henrique.grando@usp.br Iman Sadooghi Illinois Institute

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

OASIS: Self-tuning Storage for Applications

OASIS: Self-tuning Storage for Applications OASIS: Self-tuning Storage for Applications Kostas Magoutis, Prasenjit Sarkar, Gauri Shah 14 th NASA Goddard- 23 rd IEEE Mass Storage Systems Technologies, College Park, MD, May 17, 2006 Outline Motivation

More information

Joint Optimization of Content Replication and Server Selection for Video-On-Demand

Joint Optimization of Content Replication and Server Selection for Video-On-Demand Joint Optimization of Content Replication and Server Selection for Video-On-Demand Huan Huang Pengye Xia S.-H. Gary Chan Department of Compute Science and Engineering The Hong Kong University of Science

More information

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Subway : Peer-To-Peer Clustering of Clients for Web Proxy Subway : Peer-To-Peer Clustering of Clients for Web Proxy Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute

More information

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University HyperDex A Distributed, Searchable Key-Value Store Robert Escriva Bernard Wong Emin Gün Sirer Department of Computer Science Cornell University School of Computer Science University of Waterloo ACM SIGCOMM

More information

Figure 1 shows unstructured data when plotted on the co-ordinate axis

Figure 1 shows unstructured data when plotted on the co-ordinate axis 7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal

More information

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism

Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism Dynamic Design of Cellular Wireless Networks via Self Organizing Mechanism V.Narasimha Raghavan, M.Venkatesh, Divya Sridharabalan, T.Sabhanayagam, Nithin Bharath Abstract In our paper, we are utilizing

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Characterizing Traffic Demand Aware Overlay Routing Network Topologies

Characterizing Traffic Demand Aware Overlay Routing Network Topologies Characterizing Traffic Demand Aware Overlay Routing Network Topologies Benjamin D. McBride Kansas State University Rathbone Hall Manhattan, KS Email: bdm@ksu.edu Caterina Scoglio Kansas State University

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models CIEL: A Universal Execution Engine for

More information

Demand fetching is commonly employed to bring the data

Demand fetching is commonly employed to bring the data Proceedings of 2nd Annual Conference on Theoretical and Applied Computer Science, November 2010, Stillwater, OK 14 Markov Prediction Scheme for Cache Prefetching Pranav Pathak, Mehedi Sarwar, Sohum Sohoni

More information

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica

Sparrow. Distributed Low-Latency Spark Scheduling. Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica Outline The Spark scheduling bottleneck Sparrow s fully distributed, fault-tolerant technique

More information

Linux multi-core scalability

Linux multi-core scalability Linux multi-core scalability Oct 2009 Andi Kleen Intel Corporation andi@firstfloor.org Overview Scalability theory Linux history Some common scalability trouble-spots Application workarounds Motivation

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

A New Combinatorial Design of Coded Distributed Computing

A New Combinatorial Design of Coded Distributed Computing A New Combinatorial Design of Coded Distributed Computing Nicholas Woolsey, Rong-Rong Chen, and Mingyue Ji Department of Electrical and Computer Engineering, University of Utah Salt Lake City, UT, USA

More information

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04

Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04 The Design and Implementation of a Next Generation Name Service for the Internet Venugopal Ramasubramanian Emin Gün Sirer SIGCOMM 04 Presenter: Saurabh Kadekodi Agenda DNS overview Current DNS Problems

More information

Chapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP

Chapter 3: Naming Page 38. Clients in most cases find the Jini lookup services in their scope by IP Discovery Services - Jini Discovery services require more than search facilities: Discovery Clients in most cases find the Jini lookup services in their scope by IP multicast/broadcast Multicast UDP for

More information

ATS Summit: CARP Plugin. Eric Schwartz

ATS Summit: CARP Plugin. Eric Schwartz ATS Summit: CARP Plugin Eric Schwartz Outline CARP Overview CARP Plugin Implementation Yahoo! Insights CARP vs. Hierarchical Caching Other CARP Plugin Features Blacklist/Whitelist Pre- vs. Post-remap Modes

More information

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

Implementation of Aggregation of Map and Reduce Function for Performance Improvisation 2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation

More information

CSE 123b Communications Software

CSE 123b Communications Software CSE 123b Communications Software Spring 2002 Lecture 13: Content Distribution Networks (plus some other applications) Stefan Savage Some slides courtesy Srini Seshan Today s class Quick examples of other

More information

Optimally-balanced Hash Tree Generation in Ad Hoc Networks

Optimally-balanced Hash Tree Generation in Ad Hoc Networks African Journal of Information and Communication Technology, Vol. 6, No., September Optimally-balanced Hash Tree Generation in Ad Hoc Networks V. R. Ghorpade, Y. V. Joshi and R. R. Manthalkar. Kolhapur

More information

CHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1)

CHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1) 71 CHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1) 4.1 INTRODUCTION One of the prime research objectives of this thesis is to optimize

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications CSE 123b Communications Software Spring 2004 Today s class Quick examples of other application protocols Mail, telnet, NFS Content Distribution Networks (CDN) Lecture 12: Content Distribution Networks

More information

Towards Energy Proportionality for Large-Scale Latency-Critical Workloads

Towards Energy Proportionality for Large-Scale Latency-Critical Workloads Towards Energy Proportionality for Large-Scale Latency-Critical Workloads David Lo *, Liqun Cheng *, Rama Govindaraju *, Luiz André Barroso *, Christos Kozyrakis Stanford University * Google Inc. 2012

More information

EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding

EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding EC-Cache: Load-balanced, Low-latency Cluster Caching with Online Erasure Coding Rashmi Vinayak UC Berkeley Joint work with Mosharaf Chowdhury, Jack Kosaian (U Michigan) Ion Stoica, Kannan Ramchandran (UC

More information

A Whirlwind Tour of Apache Mesos

A Whirlwind Tour of Apache Mesos A Whirlwind Tour of Apache Mesos About Herdy Senior Software Engineer at Citadel Technology Solutions (Singapore) The eternal student Find me on the internet: _hhandoko hhandoko hhandoko https://au.linkedin.com/in/herdyhandoko

More information

Flexible Replication Management for Frequently Accessed Data Files in HDFS Using Hadoop

Flexible Replication Management for Frequently Accessed Data Files in HDFS Using Hadoop International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 02, March 2017 ISSN: 2455-3778 http://www.ijmtst.com Flexible Replication Management for Frequently Accessed

More information

Dynamic Replication Management Scheme for Cloud Storage

Dynamic Replication Management Scheme for Cloud Storage Dynamic Replication Management Scheme for Cloud Storage May Phyo Thu, Khine Moe Nwe, Kyar Nyo Aye University of Computer Studies, Yangon mayphyothu.mpt1@gmail.com, khinemoenwe@ucsy.edu.mm, kyarnyoaye@gmail.com

More information

Accelerating OpenFlow SDN Switches with Per-Port Cache

Accelerating OpenFlow SDN Switches with Per-Port Cache Accelerating OpenFlow SDN Switches with Per-Port Cache Cheng-Yi Lin Youn-Long Lin Department of Computer Science National Tsing Hua University 1 Outline 1. Introduction 2. Related Work 3. Per-Port Cache

More information

CS 6343: CLOUD COMPUTING Term Project

CS 6343: CLOUD COMPUTING Term Project CS 6343: CLOUD COMPUTING Term Project Project Goal Explore existing Cloud storage systems Implement some components in Cloud storage systems to get a better understanding on the implementation issues in

More information

Adaptive Load Balancing for DHT Lookups

Adaptive Load Balancing for DHT Lookups Adaptive Load Balancing for DHT Lookups Silvia Bianchi, Sabina Serbu, Pascal Felber and Peter Kropf University of Neuchâtel, CH-, Neuchâtel, Switzerland {silvia.bianchi, sabina.serbu, pascal.felber, peter.kropf}@unine.ch

More information

TEFS: A Flash File System for Use on Memory Constrained Devices

TEFS: A Flash File System for Use on Memory Constrained Devices 2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) TEFS: A Flash File for Use on Memory Constrained Devices Wade Penson wpenson@alumni.ubc.ca Scott Fazackerley scott.fazackerley@alumni.ubc.ca

More information