Remote Direct Storage Management for Exa-Scale Storage

, pp.15-20 http://dx.doi.org/10.14257/astl.2016.139.04 Remote Direct Storage Management for Exa-Scale Storage Dong-Oh Kim, Myung-Hoon Cha, Hong-Yeon Kim Storage System Research Team, High Performance Computing Research Department, Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Korea {dokim, mhcha, kimhy}@etri.re.kr Abstract. Recently, the size of storage has been increasing in order to store large amounts of data. The most part of storage research is focused on raising capacity and bandwidth. But, efficiency of the file management is becoming even more important in terms of in Exa-Scale Storage operations. In this paper, we present a method of Remote Direct Storage Management (RDSM) for Exa- Scale Storage. RDSM allows users to easily manage server-side file and to easily use storage-specific functions. By utilizing RDSM, file copy is up to 30% faster than cp and file movement is up to 240 times faster than mv in LINUX. Keywords: Exa-Scale Storage, file management, distributed file system, fuse based file system, client utility 1 Introduction Recently, the need for Exa-Scale Storage has increased with the demand for highcapacity storage. However, in building Exa-Scale Storage, there are lots of problems, such as file system problems, network problems, power problems, etc. [1,2]. The most part of storage research is focused on raising capacity and bandwidth. However, if the file management processing is inefficient, most of the resource is wasted due to the file management in Exa-Scale Storage. So, efficiency of the file management is becoming even more important in terms of in Exa-Scale Storage operations. Exa-Scale Storage will have a large number of volume than Peta-Scale Storage. And, that can be used at the same time for a variety of applications with multiple users. Also, Exa-Scale Storage contains various storage devices or networks due to advances in technology [2,3]. In this complex environment, the management costs will vary significantly depending on how file management is performed in the Exa- Scale environment, when the user performs the application, it varies the processing time and processing costs according to the processing method [4-7]. In this paper, we present a method of Remote Direct Storage Management (RDSM) for Exa-Scale Storage. RDSM allows a client application to manage storage for file management. That is, RDSM serves to convert the external instruction as an internal ISSN: 2287-1233 ASTL Copyright 2016 SERSC

storage instruction for efficient processing. In addition, RDSM allows users to easily use storage-specific functions, supporting more efficient storage utilization. The remainder of this paper is organized as follows. Section 2 describes the concept of RDSM. Section 3 explains the implementation of RDSM in MAHA-FS. Section 4 examines the performance evaluation results of RDSM. Lastly, the conclusion is presented in Section 5. 2 RDSM In this section, we describe the concept of RDSM. RDSM provides a method for the client to process the file effectively. In this way, file processing can be performed in the storage. Figure 1 shows a process to moving files between volumes on Linux. Figure 2 shows a process of moving files between volumes using RDSM. Fig. 1. Basic process of moving files Fig. 2. Process of moving files using RDSM As shown in Figure 1, because file movement processing is done at the client level, the processing speed becomes slow and also consumes a significant amount of resources. But, RDSM provides a method for directly managing files in the storage remotely. So, as shown in Figure 2, the engine receives a command when using RDSM to perform file movement directly between the volumes. RDSM engine is composed of RDSM command and RDSM manager. The RDSM command is the user-defined commands that is used to call internal management API in Exa-Scale Storage. The RDSM commands are transmitted to RDSM manager of Exa-Scale Storage Server according to the FUSE architecture. The RDSM manager interprets the received requests (RDSM commands), verifies that it has the appropriate commands and parameters, and requests to the RDSM worker to perform the command. RDSM enables remote control according to the POSIX API, not as a separate interface. So, in RDSM it is possible to create an application without kernel compiling. 3 Development of RDSM In this section, we describe the implementation of RDSM in MAHA-FS. MAHA-FS which is similar to HDFS [8] and GFS [9], is a FUSE-based large-scale distributed file system using thousands of commodity servers in HPC (High Performance 16 Copyright 2016 SERSC

Computing) environments [10]. MAHA-FS is the HPC version of GLORY-FS [11] and was developed by ETRI. GLORY-FS is a FUSE-based large-scale distributed file system used in cloud computing. MAHA-FS is composed of a MDS (Metadata Server), multiple DS (Data Server), multiple FUSE clients and multiple utilities. In particular, MAHA-FS can support the fusion of different types of disks like SSDs (Solid-State Drive), HDDs (Hard Disk Drive) and MAIDs (Massive Array of Idle Disks). MAHA-FS performs file management according to the requested RDSM command. Figure 3 shows the system architecture of RDSM in MAHA-FS. Fig. 3. Architecture of RDSM in MAHA-FS As shown in Figure 3, the user application and the RDSM utility calls the RDSM command through the POSIX API. The request is forwarded to the RDSM Manager of the MDS in MAHA-FS via the FUSE clients. The RDSM command is the user-defined commands that is used to call internal management API in Exa-Scale Storage. Table 1 shows an example of the RDSM command in MAHA-FS. Table 1. RDSM command in MAHA-FS RDSM command parameter 1 parameter 2 maha_cp <source file info.> <destination file info.> maha_mv <source file info.> <destination file info.> set_disk <source file info.> ssd hdd maid maha_cp is the command to copy a file directly from <source file info> to <destination file info.>. maha_mv is the command to move a file directly from <source file info.> to <destination file info.>. set_disk is a special command in MAHA-FS to migrate a file from <source file info.> to the specified disk type. MAHA-FS supports three kinds of disk type. The RDSM manager performs an analysis of the received requests based on the pre-defined RDSM command. The RDSM manager interprets the received requests, verifies that it has the appropriate commands and parameters, and requests the RDSM worker to perform the file management. Copyright 2016 SERSC 17

The RDSM worker processes the command by calling the function of the RDSM command library or the utility of MAHA-FS. The RDSM command Library consists of a number of functions that call an internal function of MAHA-FS or processes a given command. In the RDSM worker, maha_cp and maha_mv are processed by calling the appropriate functions in RDSM command. In RDSM worker, set_disk is processed by calling the migration utility. For example, if you want to move the file ( test.dat ) into the SSD, you can simply call the posix API: setxattr ( test.dat, set_disk, ssd, 3, 3). If the last parameter (flags) is 3 in the setxattr function, the FUSE client is treated as the RDSM command. Figures 4 and 5 show the information of the file before and after running the set_disk command with the utility of MAHA-FS. Fig. 4. File information before set_disk Fig. 5. File information after set_disk The bottom of Figure 4 and Figure 5 shows the location information of the chunk. As shown in Figure 4, a chunk of the files are stored on the HDD with id 7a7dd101. As shown in Figure 5, a chunk of the files are stored on the SSD with id 6a7defa2 after running set_disk. 4 Performance Evaluation In this section, we verify the performance of RDSM through experiments. The performance evaluation was conducted using 1 MDS, 5 DS and 1 Client node. Each node has two Intel Zeon E5-2609 2.4GHz CPU and 32GB memory. Each DS node has 8 HDD. On each node, OS is "Red Hat Enterprise Linux 6.2, Linux 2.6.32-220.el6.x86_64", FUSE is "2.8.3-4.el6" and the file System is MAHA-FS. This paper compares cp and RDSM_cp, as well as mv and RDSM_mv. The cp and mv applications are provided in Linux. The RDSM_cp and RDSM_mv applications are simple utilities to call maha_cp and maha_mv in table 1. <source file info.> and <destination file info.> each specify a file on a different volume. Figure 6 shows the execution time of cp, mv, RDSM_cp, and RDSM_mv according to file size at 3 DS. Figure 7 shows the execution time of the cp, mv, RDSM_cp, and RDSM_mv process the 4GB file according to the change of the number of DS. 18 Copyright 2016 SERSC

Fig. 6. Execution time on the file size changes Fig. 7. Execution time on the number of DS changes As shown in Figure 6, the execution time of RDSM_cp is 30% faster on average than cp and RDSM_mv is 240 times faster than the mv on average. RDSM_cp eliminates the client network overhead of cp, by using RDSM. RDSM_mv eliminates the data movement between volumes of mv, by using RDSM. As shown in Figure 7, the execution time is reduced by increasing the number of DS. The execution time of RDSM_cp is up to 47% faster than cp and RDSM_mv is up to 370 times faster than mv. 5 Conclusion The efficient processing of files has become more important in Exa-Scale environments. So, we presented RDSM as a method of directly managing files for Exa-Scale Storage remotely. RDSM manager was actually implemented in the MAHA-FS. By utilizing RDSM, file copy is up to 47% faster than cp and file movement is up to 370 times faster than mv in LINUX. The biggest advantage of RDSM is that it allows you to easily call the administrative functions of the server in the client. In this way, RDSM user can manage files efficiently or easily use the various storage-specific functions of the storage. In the future, it is necessary the study of an effective file transfer method between the client and Exa-Scale Storage utilizing RDSM. When I/O processing in the client application, is also required way to minimize the unwanted movement of the data. Acknowledgments. This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(msip) (No. R0126-15-1082, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology). Copyright 2016 SERSC 19

References 1. Kunkel, J. M., Kuhn, M., and Ludwig, T.: Exascale Storage Systems - An Analytical Study of Expenses 2. Characteristics of Future Systems. pp. 116--134. (2014) 2. Nadkarni, A.: EMC Elastic Cloud Storage - Blueprint for Exascale Storage. White paper, EMC (2016) 3. Aloisio, G., Fiore, S.: Towards Exascale Distributed Data Management. In: International Journal of High Performance Computing Applications archive, vol. 23 issue. 4, pp. 398-- 400. (2009) 4. Dreyfus, E.: FUSE and beyond: bridging file systems, In: Proceeding of the EuroBSDcon, pp. 1--14. The Sofia (2014) 5. FUSE: Filesystem in Userspace, http://fuse.sourceforge.net 6. Ishiguro, S., Murakami, J., Oyama, Y., Tatebe, O.: Optimizing Local File Accesses for FUSE-Based Distributed Storage. In: Proceedings of the International Workshop on Data- Intensive Scalable Computing Systems (DISCS 12), pp. 760--765. IEEE, (2012) 7. Rajgarhia, A., Gehani, A.: Performance and Extension of User Space File Systems. In: the ACM Symposium on Applied Computing (SAC 00), pp. 206--213. ACM Press, New York, (2010) 8. HDFS: Hadoop Distributed File System, http://hadoop.apache.org/ 9. Ghemawat, S., Gobioff, H., and Leung, S.: The Google File System. In: 9th ACM Symposium on Operating Systems Principles (SOSP 03), pp. 20--40. ACM Press, New York, (2003) 10. Kim, Y. C., Kim, D. O., Kim, H. Y., Kim, Y. K., Choi, W.: MAHA-FS: A Distributed File Sys-tem for High Performance Metadata Processing and Random IO. KIPS Transactions on Software and Data Engineering, vol.2, issue 2, pp. 91--96. (2013) 11. Min, Y. S., Jin, K.S., Kim, H.Y., Kim, Y.K.: A Trend to Distributed File Systems for Cloud Computing. Electronics and Telecommunications Trends, vol. 24, issue 4, pp. 55-- 68. (2009) 20 Copyright 2016 SERSC