Efficient Metadata Management in Cloud Computing

Similar documents
An Efficient Distributed B-tree Index Method in Cloud Computing

PPMS: A Peer to Peer Metadata Management Strategy for Distributed File Systems

The Performance Analysis of a Service Deployment System Based on the Centralized Storage

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Research on Implement Snapshot of pnfs Distributed File System

Outline. Challenges of DFS CEPH A SCALABLE HIGH PERFORMANCE DFS DATA DISTRIBUTION AND MANAGEMENT IN DISTRIBUTED FILE SYSTEM 11/16/2010

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

A Hybrid Scheme for Object Allocation in a Distributed Object-Storage System

FSRM Feedback Algorithm based on Learning Theory

The Study of Intelligent Scheduling Algorithm in the Vehicle ECU based on CAN Bus

CLIP: A Compact, Load-balancing Index Placement Function

A New Encoding Scheme of Supporting Data Update Efficiently

A Database Redo Log System Based on Virtual Memory Disk*

Research Article A Two-Level Cache for Distributed Information Retrieval in Search Engines

Adaptation of Distributed File System to VDI Storage by Client-Side Cache

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Research on Design Reuse System of Parallel Indexing Cam Mechanism Based on Knowledge

Mobile Management Method for SDN-based Wireless Networks

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

Open Access Compression Algorithm of 3D Point Cloud Data Based on Octree

Processing Technology of Massive Human Health Data Based on Hadoop

DDSF: A Data Deduplication System Framework for Cloud Environments

Remote Direct Storage Management for Exa-Scale Storage

New research on Key Technologies of unstructured data cloud storage

OPERATING SYSTEM. Chapter 12: File System Implementation

The Google File System

Chapter 11: Implementing File Systems

Open Access Research on Algorithms of Spatial-Temporal Multi-Channel Allocation Based on the Greedy Algorithm for Wireless Mesh Network

Preliminary Research on Distributed Cluster Monitoring of G/S Model

High Performance Computing on MapReduce Programming Framework

The Google File System

The Construction of Open Source Cloud Storage System for Digital Resources

The Google File System

SONAS Best Practices and options for CIFS Scalability

The Microblogging Evolution Based on User's Popularity

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimization Scheme for Small Files Storage Based on Hadoop Distributed File System

Chapter 12: File System Implementation

Design and Implementation of cache protecting from power failure in Disk Array

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Dynamic Metadata Management for Petabyte-scale File Systems

A priority based dynamic bandwidth scheduling in SDN networks 1

The Google File System

Pantheon: Exascale File System Search for Scientific Computing

An Improved Frequent Pattern-growth Algorithm Based on Decomposition of the Transaction Database

SMCCSE: PaaS Platform for processing large amounts of social media

Open Access Mobile Management Method for SDN-Based Wireless Networks

BESIII Physical Analysis on Hadoop Platform

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Chapter 10: File System Implementation

EaSync: A Transparent File Synchronization Service across Multiple Machines

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

A MPI-based parallel pyramid building algorithm for large-scale RS image

An Algorithm of Association Rule Based on Cloud Computing

SURVEY PAPER ON CLOUD COMPUTING

Alternative Approaches for Deduplication in Cloud Storage Environment

Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster

Google File System. By Dinesh Amatya

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Application of Redundant Backup Technology in Network Security

Spyglass: metadata search for large-scale storage systems

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

zfs - A Reliable Distributed ObS-based File System

manufacturing process.

A New Approach to Web Data Mining Based on Cloud Computing

Live Virtual Machine Migration with Efficient Working Set Prediction

Optimization of thread affinity and memory affinity for remote core locking synchronization in multithreaded programs for multicore computer systems

A task migration algorithm for power management on heterogeneous multicore Manman Peng1, a, Wen Luo1, b

Abstract. 1. Introduction. 2. Design and Implementation Master Chunkserver

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

An Efficient Image Processing Method Based on Web Services for Mobile Devices

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

Open Access Design and Development of Network Application Layer Sniffer Analysis Software. Yujiao Wang and Haiyun Lin *

Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Coordinating Parallel HSM in Object-based Cluster Filesystems

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

CLOUD-SCALE FILE SYSTEMS

Improving Throughput in Cloud Storage System

A Kind of Fast Image Edge Detection Algorithm Based on Dynamic Threshold Value

Generating Realistic Impressions for File-System Benchmarking. Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

The Load Balancing Research of SDN based on Ant Colony Algorithm with Job Classification Wucai Lin1,a, Lichen Zhang2,b

Open Access The Kinematics Analysis and Configuration Optimize of Quadruped Robot. Jinrong Zhang *, Chenxi Wang and Jianhua Zhang

Building a Single Distributed File System from Many NFS Servers -or- The Poor-Man s Cluster Server

A Metadata Management Method Base on Directory Path Code

Using IKAROS as a data transfer and management utility within the KM3NeT computing model

Routing Lookup Algorithm for IPv6 using Hash Tables

Global Journal of Engineering Science and Research Management

SQL Query Optimization on Cross Nodes for Distributed System

Capriccio : Scalable Threads for Internet Services

A New Model of Search Engine based on Cloud Computing

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

At-most-once Algorithm for Linearizable RPC in Distributed Systems

A Hybrid Hierarchical Control Plane for Software-Defined Network

Linux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD

Transcription:

Send Orders for Reprints to reprints@benthamscience.ae The Open Cybernetics & Systemics Journal, 2015, 9, 1485-1489 1485 Efficient Metadata Management in Cloud Computing Open Access Yu Shuchun 1,* and Huang Bin 2 1 Deptment of Computer Engineering, Huaihua University, Huaihua, Hunan, 418008, P.R. China; 2 School of Mathmatic and Computer Science, Guizhou Normal University, Guiyang, Guizhou, 550001, P.R. China Abstract: Existing metadata management methods bring about lower access efficiency in solving the problem of renaming directory. This paper proposes a metadata management method based on directory path redirection (named as DPRD) which includes the data distribution method based on directory path and the directory renaming method based on directory path redirection. Experiments show that DPRD effectively solves the lower access efficiency caused by the renaming directory. Keywords: Cloud computing, directory path, redirection, metadata. 1. INTRODUCTION With the prevalence of Internet application and dataintensive computing, there are many new application systems in cloud computing environment. These systems are mainly characterized by [1-3]: (1) The enormous files stored in the system, some even reach trillions level, and it still increase rapidly; (2) The user number and daily access are quire enormous, reaching billions level. For example, the number of pictures currently stored in Facebook [4] is more than 140 billion, and both the number of users and daily access are over one billion. Such massive files and accesses may result in that the expandability and high performance will be the bottleneck of the efficient application of cloud computing, and meanwhile, the efficient management of metadata is a key technology that can break through this bottleneck. At present, the metadata management methods mainly include the look-up table [5], sub-tree partition, Hash and directory path fixed number. The typical application of lookup table should be the single metadata server method adopted by Google, which mainly stores the metadata table in a server, and it is applicable for the storage system with few files. However, for the distributed storage system with numerous files, it will be a bottleneck of the system performance. The sub-tree partition method [6, 7] separates the only name space of file system into independent sub-tree according to the directory levels, and each metadata server in the metadata server cluster will be responsible for one or several sub-trees. It has a good expandability, but the directory is distributed in several metadata servers, and it requires directory traversal among metadata servers to locate the metadata, which may result in the low efficiency access. Hash method [8, 9] will locate the storage position of this file according to the file identifier (such as the full path name of the file). It breaks through the limitations of single metadata server, but it requires migrating related metadata when 1874-110X/15 renaming a directory. The directory path fixed numbering (marked as DPFN) [10, 11] endows the globally unique ID (DPID) for each directory path, and DPID remains unchanged in the life cycle of the directory path, and the metadata of all files (or sub-directories) in the directory path will be placed and achieved according to its hash value of DPID. It can solve the metadata migration issue caused by directory renaming, but it uses a directory path index server to manage the mapping relationship between directory path and DPID, which brings a low efficiency access. Aiming at the low efficiency access for solving the directory renaming problem in current metadata management methods, the metadata management method based on the redirection of directory path (marked as DPRD) is proposed in this paper. Firstly, it uses the method of hash directory path for distributing and achieving the metadata, breaks through the bottleneck of the directory path index server, and realizes the concurrent access of multiple users. Secondly, it only records the redirected directory path space, and the space is distributed into all metadata servers, which reduces the searching space of directory path. In the end, adopting the directory path redirection method, it realizes the correct access to metadata without migrating metadata during renaming directories. It can effectively solve the low efficiency access caused by the directory renaming. 2. ANALYSIS OF THE DIRECTORY PATH FIXED NUMBER he directory path fixed number (DPFN) supports not to migrate metadata when renaming a directory, which basic idea is: each directory path is given a globally unique fixed ID, which remains unchanged in the life cycle of its directory path; a directory path index server (marked as DPIS) is used to maintain the mapping between directory path and its DPID; placing and achieving the metadata of an object has to get its DPID from the directory path index sever, and then remaining operations are completed in the MDS corresponding to DPID hash value. The metadata distribution and access pattern are shown in Fig. (1). 2015 Bentham Open

1486 The Open Cybernetics & Systemics Journal, 2015, Volume 9 Shuchun and Bin hash(dpid) Hash(DPID) Hash(DPID 00 ) Hash(DPID 01 ) Hash(DPID02) Hash(DPID 10 ) Hash(DPID 11 ) Hash(DPID12) Hash(DPID n-10 ) Hash(DPID n-11 ) Hash(DPIDn-12) MDS 0 MDS 1 MDS n-1 Fig. (1). Metadata distribution and access in DPFN. Based on the above mentioned idea, the process of DPFN achieves the file f is: (1) User send a request to DPIS for achieving the DPID of the file f; (2) Search the DPID of the file f in the entire directory path space; (3) Return the DPID of the file f to user; (4) An user send access request to the MDS hash (DPID) according to the hash value of the DPID; (5) Search the metadata of the file f in the MDS hash (DPID); (6) Return the metadata of f to user; Although DPFN solves the issue of metadata migration caused by renaming directory, but it brings a low efficiency access. It is mainly because: (1) all users must firstly access DPIS for the DPID, resulting in multi-user accesses are serial; (2) the DPIS records the mapping between directory path and its DPID, resulting in the search space is large. 3. METADATA DISTRIBUTION METHOD BASED ON DIRECTORY PATH The directory path of a file refers to the path from the root directory to its parent directory. The directory path of file f is marked as DPf. As shown in Fig. (2), the directory path of file 6 DPfile6=\D2\D3. The method for DPRD to distribute metadata is that: it hashes the directory path of a file, and then it will distribute the metadata of the file to the metadata server corresponding to the hash value, namely the MDS storing the metadata of file f is: Hf=hash(DPf). For instance, for the file system showed in Fig. (2), if the Hash value of each directory path is shown in Fig. (3), then the directory and file metadata distribution is shown in Fig. (4). D1 Fig. (2). A file system. D3 D2 Fig. (3). DP and hash (DP). file8 D4 Fig. (4). Metadata distribution.

Efficient Metadata Management in Cloud Computing The Open Cybernetics & Systemics Journal, 2015, Volume 9 1487 Hash(DP 0 ) Hash(DP 1 ) Hash(DP 2 ) Physical layer Fig. (5). Metadata organization. 4. EFFICIENT METADATA ORGANIZATION METHOD As for the file under the same directory, the Hash value of the directory path name is the same. Therefore, the metadata of all files in the same directory is distributed to the same metadata server. In order to be convenient for searching and directory operation, DPRD employs three-layer structure to organize the metadata, as shown in Fig. (5). The bottom layer is physical layer, which stores the metadata in the form of heap file, and it can realize the efficient utilization of storage space. In the physic layer, all metadata are distributed randomly, while most of the directory operation requires operating all files in a directory or the sub-tree which the directory is its root. In order to improve the directory operation efficiency, DPRD establishes a logic layer over the physical layer, which can virtualizes the randomly scattered files into a logic whole through indexed mode. The top layer is mapping layer, which is a list formed by the directory path in the same MDS (marked as DT). Each item of DT is a pair <hash (DP), pointer>, in which hash (DP) is the hash value of a directory path, and the pointer points to the logic block of DP. 5. RENAMING DIRECTORY BASED ON REDIREC- TION OF DIRECTORY PATH When a directory dir is renamed as dir, those directory path DPdir containing dir will turn to be DPdir, and the metadata server storing these file under DPdir will change from Hash(DPdir) to Hash(DPdir ). In order to maintain correct location data in the following operations, it should migrate the metadata under DPdir from Hash(DPdir) to Hash(DPdir ) synchronously, which greatly influence the system performance. At first, DPRD employs the directory path redirection method to avoid metadata migration when renaming, and then it will take advantage of the periodical fluctuation of the load, which will migrate the metadata whose directory paths are redirected to correct position when the system load is relatively light. Thus it improves the efficiency of renaming operation. The redirection method of directory path is that: each MDS is placed a redirection directory table (marked as RDT), which records directory paths whose metadata are not local. Each item of RDT is a pair <hash (DP), pointer>, in which hash (DP) is the hash value of a directory path, and the pointer points to the logic block of DP. The pointer is composed of A and B of two parts, in which A is the MDS number which hosts the metadata of a redirected directory path, and B is its logical block first address. With the redirection of directory path, the metadata distribution and organizational structure is shown in Fig. (6). When comparing DPFN and DPRD, it can be seen that: (1) multiuser accesses are carried out serially in DPFN and concurrently in DPRD. (2) DPFN records entire directory path space, while DPRD only records the redirected directory path space, and the directory space is distributed into N metadata server by the hash method of directory path, so each metadata server stores nearly 1/N of the redirected directory space, which greatly shortens the searching space of each access. (3) When the directory path remains unchanged, DPFN still accesses the directory path indexed server to gain DPID, while DPRD will find metadata in the corresponding metadata server of directory path directly. In summary, DPRD is superior to DPFN in access efficiency. 6. EXPERIMENT AND RESULT ANALYSIS 6.1. Experimental Environment Our testing infrastructure had 126 machines on 4 racks connected by Gigabit Ethernet switches. Intra-rack bisection bandwidth was 14Gbps, while inter-rack bisection bandwidth was 6.5 Gbps. Each machine had two 2.4GHz Intel Xeon CPUs, 4GB of main memory, and two 7200RPM SCSI disks with 200GB each. Machines ran Red Hat Enterprise Linux AS 4 with kernel version 2.6.9. In the following experiments, the distribution of nested directories in each sub-tree by depth and that of the subdirectories count in per directory are based on the generative, probabilistic model [12, 13] which closely accords the situa-

1488 The Open Cybernetics & Systemics Journal, 2015, Volume 9 Shuchun and Bin Hash (DP 10 ) Hash (DP11) Hash (DP 12 ) Hash (MDP 10 ) Hash (MDP 11) Hash (MDP 12 ) Hash (DP 20 ) Hash (DP 21) Hash (DP 22 ) Hash (MDP 20 ) Hash (MDP 21) Hash (MDP 22 ) Hash (DP n0 ) Hash (DP n1 ) Hash (DP n2 ) Hash (MDP n0 ) Hash (MDP n1) Hash (MDP n2 ) Fig. (6). Metadata organization with directory path redirection. tion of the observed directory depths with metadata snapshots from more than ten thousand file systems from 2001 to 2004. Each client machine had 4 parallel threads, each with one outstanding request issued as soon as the previous completed. MDS and client respectively run on a separate computer. 6.2. Experiment Result and Analysis At first, under different directory redirection rate (represented by r), metadata operation performance of DPRD was tested. The directory redirection rate is the proportion of redirected directory paths and total directory paths. In this test, a great number of directories should be made first, and then files would be created under each existing directory. Finally, the metadata of each file would be read. The test result is shown in Fig. (7), from which it can be seen that, when r=0, the system performance is the best, but with the increase of r, the system performance declines gradually, for when r=0, a network access can achieve metadata, but when r >0, these redirected directory may get their metadata needing two network access. When r is fixed, with the increase of MDS, the growth rate of throughout will decline gradually. It may result from the fact that when the increase of MDS brings the growth of throughout, the probability of hitting the redirected directory increases. Fig. (7). Access performance under different redirection rates. Fig. (8). Performance comparison for DPRD and DPFN. Secondly, DPRD and DPFN in making directories, creating files and reading metadata was compared, and the test result was shown in Fig. (8). It can be seen from the picture that DPRD is superior to DPFN in performance and expandability, it result from the bottleneck, meaning there is directory path index server in DPFN. As for DPRD, reading metadata and creating files can be finished at one network access. Therefore, the two have similar performances, but when creating files, the directory files of directory path should be equipped with write lock, while for the reading of metadata should be equipped with read lock, and its concurrent ability is higher than that of file creation. Therefore, it is superior to the creating of files in performance. However, making directory requires two network accesses, namely, one is to modify the directory file of the directory path, and the other is to create the directory file. Therefore, it is lower than file creation and metadata reading in performance. In the end, the performance of renaming directory operation under such condition in which the directory contains 1, 2, 4, 8 and 16 files was tested, and the result shows that the response delay of renaming directory operation is only related to the number of sub-directory under the renamed directory, which it has nothing to do with the number of files under the renamed directory. It is mainly because on one hand, the metadata of files is not moved when the directory is renamed in DPRD, and it is carried out when the load is

Efficient Metadata Management in Cloud Computing The Open Cybernetics & Systemics Journal, 2015, Volume 9 1489 light, while on the other hand, directory renaming will lead to the changes in all the directory paths, which requires redirection. It has also been discovered in the test that the response delay of renaming directory has nothing to do with the number of MDS, for when renaming a directory, the number of redirection operation has nothing to do with the directory path. CONCLUSION Our distributed metadata management strategy DPRD is used to eliminate the following questions. Firstly, SUBTREE PARTITION always needs to traverse the directory hierarchy and at the same time metadata partitioning among MDS cluster is coarse-grained. DPRD adopts hash method for directory paths to avoid the directories traversal. Secondly, although a full pathname used to identify a file in HASH can directly and quickly locate the destination MDS, it cannot efficiently handle some situations such as renaming a directory. DPRD adopts directory path redirection to avoid the renaming overhead of files metadata in a renamed directory. Thirdly, DPFN avoid also the renaming overhead of files metadata in a renamed directory by endowing the globally unique ID (DPID) for each directory path, but it uses a directory path index server to manage the mapping relationship between directory path and DPID, which become a bottleneck. DPRD adopts the metadata distribution based directory path to break through the bottleneck. Also, DPFN records entire directory path space, while DPRD only records the redirected directory path space, and the redirected directory space is distributed into N metadata server by the hash method of directory path, so each MDS stores nearly 1/N of the redirected directory space, which greatly shortens the searching space of each access. We have described our distributed metadata management strategy DPRD and given the survey of the performance comparisons, and the performance results are encouraging. CONFLICT OF INTEREST The authors confirm that this article content has no conflict of interest. ACKNOWLEDGEMENTS The authors would like to thank for the support by National Basic Research Program of China (973 Program) under Grant No.2011CB302601, National High Technology Research and Development program of China (863 Program) under Grant No. 2011AA01A202, Science and technology program of Hunan Province under Grant 2013FJ4335 and 2013FJ4295, and the constructing program of the key discipline in Huaihua University. REFERENCES [1] S. V. Patil, G. Gibson, S. Lang, and M. Polte, Giga+: scalable directories for shared file systems, In: PDSW '07: Proceedings of the 2 nd International Workshop on Petascale Data Storage, New York, NY, USA, 2007, pp. 26-29. [2] Y. Hua, Y. Zhu, H. Jiang, D. Feng, and L. Tian, Scalable and adaptive metadata management in ultra large-scale file systems, In: ICDCS '08: Proceedings of the 2008 the 28 th International Conference on Distributed Computing Systems, Washington, DC, USA, 2008, pp. 403-410. [3] J. Xing, J. Xiong, N. Sun, and J. Ma, Adaptive and scalable metadata management to support a trillion files, In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Portland, Oregon, November 14-20, 2009. [4] Facebook Website, www.facebook.com. [5] S. Ghemawat, H. Gobioff, and S. -T. Leung, The Google File System. In SOSP, 2003. [6] M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel, and D. C. Steere, Coda: a highly available file system for a distributed workstation environment, IEEE Transactions on Computers, vol. 39, no. 4, pp. 447459, 1990. [7] S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller, Dynamic metadata management for petabyte-scale file systems, In: Proceedings of IEEE/ACM SC2004 Conference, Pittsburgh, PA, USA, 2004, pp. 523534. [8] O. Rodeh, and A. Teperman, zfs a scalable distributed file system using object disks, In: Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, California, USA, 2003, pp. 207218. [9] S. A. Brandt, E. L. Miller, D. D. E. Long, and L. Xue, Efficient metadata management in large distributed file systems, In: IEEE Symposium on Mass Storage Systems, San Diego, California, USA, 2003, pp. 290298. [10] Z. Liu, and X. M. Zhou, A metadata management method based on directory path, Journal of Software, vol. 18, no. 2, pp. 236245, 2007. [11] J. Wang, D. Feng, F. Wang, and C. Lu, MHS: a distributed metadata management strategy, The Journal of Systems and Software, vol. 82, no. 12, pp. 2004-2011, 2009. [12] N. Agrawal, W. J. Bolosky, and J. R. Douceur, A five-year study of file-system metadata, In: Proceedings of the 5 th USENIX Conference on File and Storage Technologies, Berkeley, CA, USA, 2007, pp. 31-45. [13] N. Agrawal, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, Generating realistic impressions for file-system benchmarking, In: Proceedings of the 7 th USENIX Conference on file and Storage Technologies, San Francisco, California, USA, 2009, pp. 125-138. Received: June 10, 2015 Revised: July 29, 2015 Accepted: August 15, 2015 Shuchun and Bin; Licensee Bentham Open. This is an open access article licensed under the terms of the (https://creativecommons.org/licenses/by/4.0/legalcode), which permits unrestricted, noncommercial use, distribution and reproduction in any medium, provided the work is properly cited.