Remote Direct Storage Management for Exa-Scale Storage

Size: px
Start display at page:

Download "Remote Direct Storage Management for Exa-Scale Storage"

Transcription

1 , pp Remote Direct Storage Management for Exa-Scale Storage Dong-Oh Kim, Myung-Hoon Cha, Hong-Yeon Kim Storage System Research Team, High Performance Computing Research Department, Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, Korea {dokim, mhcha, Abstract. Recently, the size of storage has been increasing in order to store large amounts of data. The most part of storage research is focused on raising capacity and bandwidth. But, efficiency of the file management is becoming even more important in terms of in Exa-Scale Storage operations. In this paper, we present a method of Remote Direct Storage Management (RDSM) for Exa- Scale Storage. RDSM allows users to easily manage server-side file and to easily use storage-specific functions. By utilizing RDSM, file copy is up to 30% faster than cp and file movement is up to 240 times faster than mv in LINUX. Keywords: Exa-Scale Storage, file management, distributed file system, fuse based file system, client utility 1 Introduction Recently, the need for Exa-Scale Storage has increased with the demand for highcapacity storage. However, in building Exa-Scale Storage, there are lots of problems, such as file system problems, network problems, power problems, etc. [1,2]. The most part of storage research is focused on raising capacity and bandwidth. However, if the file management processing is inefficient, most of the resource is wasted due to the file management in Exa-Scale Storage. So, efficiency of the file management is becoming even more important in terms of in Exa-Scale Storage operations. Exa-Scale Storage will have a large number of volume than Peta-Scale Storage. And, that can be used at the same time for a variety of applications with multiple users. Also, Exa-Scale Storage contains various storage devices or networks due to advances in technology [2,3]. In this complex environment, the management costs will vary significantly depending on how file management is performed in the Exa- Scale environment, when the user performs the application, it varies the processing time and processing costs according to the processing method [4-7]. In this paper, we present a method of Remote Direct Storage Management (RDSM) for Exa-Scale Storage. RDSM allows a client application to manage storage for file management. That is, RDSM serves to convert the external instruction as an internal ISSN: ASTL Copyright 2016 SERSC

2 storage instruction for efficient processing. In addition, RDSM allows users to easily use storage-specific functions, supporting more efficient storage utilization. The remainder of this paper is organized as follows. Section 2 describes the concept of RDSM. Section 3 explains the implementation of RDSM in MAHA-FS. Section 4 examines the performance evaluation results of RDSM. Lastly, the conclusion is presented in Section 5. 2 RDSM In this section, we describe the concept of RDSM. RDSM provides a method for the client to process the file effectively. In this way, file processing can be performed in the storage. Figure 1 shows a process to moving files between volumes on Linux. Figure 2 shows a process of moving files between volumes using RDSM. Fig. 1. Basic process of moving files Fig. 2. Process of moving files using RDSM As shown in Figure 1, because file movement processing is done at the client level, the processing speed becomes slow and also consumes a significant amount of resources. But, RDSM provides a method for directly managing files in the storage remotely. So, as shown in Figure 2, the engine receives a command when using RDSM to perform file movement directly between the volumes. RDSM engine is composed of RDSM command and RDSM manager. The RDSM command is the user-defined commands that is used to call internal management API in Exa-Scale Storage. The RDSM commands are transmitted to RDSM manager of Exa-Scale Storage Server according to the FUSE architecture. The RDSM manager interprets the received requests (RDSM commands), verifies that it has the appropriate commands and parameters, and requests to the RDSM worker to perform the command. RDSM enables remote control according to the POSIX API, not as a separate interface. So, in RDSM it is possible to create an application without kernel compiling. 3 Development of RDSM In this section, we describe the implementation of RDSM in MAHA-FS. MAHA-FS which is similar to HDFS [8] and GFS [9], is a FUSE-based large-scale distributed file system using thousands of commodity servers in HPC (High Performance 16 Copyright 2016 SERSC

3 Computing) environments [10]. MAHA-FS is the HPC version of GLORY-FS [11] and was developed by ETRI. GLORY-FS is a FUSE-based large-scale distributed file system used in cloud computing. MAHA-FS is composed of a MDS (Metadata Server), multiple DS (Data Server), multiple FUSE clients and multiple utilities. In particular, MAHA-FS can support the fusion of different types of disks like SSDs (Solid-State Drive), HDDs (Hard Disk Drive) and MAIDs (Massive Array of Idle Disks). MAHA-FS performs file management according to the requested RDSM command. Figure 3 shows the system architecture of RDSM in MAHA-FS. Fig. 3. Architecture of RDSM in MAHA-FS As shown in Figure 3, the user application and the RDSM utility calls the RDSM command through the POSIX API. The request is forwarded to the RDSM Manager of the MDS in MAHA-FS via the FUSE clients. The RDSM command is the user-defined commands that is used to call internal management API in Exa-Scale Storage. Table 1 shows an example of the RDSM command in MAHA-FS. Table 1. RDSM command in MAHA-FS RDSM command parameter 1 parameter 2 maha_cp <source file info.> <destination file info.> maha_mv <source file info.> <destination file info.> set_disk <source file info.> ssd hdd maid maha_cp is the command to copy a file directly from <source file info> to <destination file info.>. maha_mv is the command to move a file directly from <source file info.> to <destination file info.>. set_disk is a special command in MAHA-FS to migrate a file from <source file info.> to the specified disk type. MAHA-FS supports three kinds of disk type. The RDSM manager performs an analysis of the received requests based on the pre-defined RDSM command. The RDSM manager interprets the received requests, verifies that it has the appropriate commands and parameters, and requests the RDSM worker to perform the file management. Copyright 2016 SERSC 17

4 The RDSM worker processes the command by calling the function of the RDSM command library or the utility of MAHA-FS. The RDSM command Library consists of a number of functions that call an internal function of MAHA-FS or processes a given command. In the RDSM worker, maha_cp and maha_mv are processed by calling the appropriate functions in RDSM command. In RDSM worker, set_disk is processed by calling the migration utility. For example, if you want to move the file ( test.dat ) into the SSD, you can simply call the posix API: setxattr ( test.dat, set_disk, ssd, 3, 3). If the last parameter (flags) is 3 in the setxattr function, the FUSE client is treated as the RDSM command. Figures 4 and 5 show the information of the file before and after running the set_disk command with the utility of MAHA-FS. Fig. 4. File information before set_disk Fig. 5. File information after set_disk The bottom of Figure 4 and Figure 5 shows the location information of the chunk. As shown in Figure 4, a chunk of the files are stored on the HDD with id 7a7dd101. As shown in Figure 5, a chunk of the files are stored on the SSD with id 6a7defa2 after running set_disk. 4 Performance Evaluation In this section, we verify the performance of RDSM through experiments. The performance evaluation was conducted using 1 MDS, 5 DS and 1 Client node. Each node has two Intel Zeon E GHz CPU and 32GB memory. Each DS node has 8 HDD. On each node, OS is "Red Hat Enterprise Linux 6.2, Linux el6.x86_64", FUSE is " el6" and the file System is MAHA-FS. This paper compares cp and RDSM_cp, as well as mv and RDSM_mv. The cp and mv applications are provided in Linux. The RDSM_cp and RDSM_mv applications are simple utilities to call maha_cp and maha_mv in table 1. <source file info.> and <destination file info.> each specify a file on a different volume. Figure 6 shows the execution time of cp, mv, RDSM_cp, and RDSM_mv according to file size at 3 DS. Figure 7 shows the execution time of the cp, mv, RDSM_cp, and RDSM_mv process the 4GB file according to the change of the number of DS. 18 Copyright 2016 SERSC

5 Fig. 6. Execution time on the file size changes Fig. 7. Execution time on the number of DS changes As shown in Figure 6, the execution time of RDSM_cp is 30% faster on average than cp and RDSM_mv is 240 times faster than the mv on average. RDSM_cp eliminates the client network overhead of cp, by using RDSM. RDSM_mv eliminates the data movement between volumes of mv, by using RDSM. As shown in Figure 7, the execution time is reduced by increasing the number of DS. The execution time of RDSM_cp is up to 47% faster than cp and RDSM_mv is up to 370 times faster than mv. 5 Conclusion The efficient processing of files has become more important in Exa-Scale environments. So, we presented RDSM as a method of directly managing files for Exa-Scale Storage remotely. RDSM manager was actually implemented in the MAHA-FS. By utilizing RDSM, file copy is up to 47% faster than cp and file movement is up to 370 times faster than mv in LINUX. The biggest advantage of RDSM is that it allows you to easily call the administrative functions of the server in the client. In this way, RDSM user can manage files efficiently or easily use the various storage-specific functions of the storage. In the future, it is necessary the study of an effective file transfer method between the client and Exa-Scale Storage utilizing RDSM. When I/O processing in the client application, is also required way to minimize the unwanted movement of the data. Acknowledgments. This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(msip) (No. R , Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology). Copyright 2016 SERSC 19

6 References 1. Kunkel, J. M., Kuhn, M., and Ludwig, T.: Exascale Storage Systems - An Analytical Study of Expenses 2. Characteristics of Future Systems. pp (2014) 2. Nadkarni, A.: EMC Elastic Cloud Storage - Blueprint for Exascale Storage. White paper, EMC (2016) 3. Aloisio, G., Fiore, S.: Towards Exascale Distributed Data Management. In: International Journal of High Performance Computing Applications archive, vol. 23 issue. 4, pp (2009) 4. Dreyfus, E.: FUSE and beyond: bridging file systems, In: Proceeding of the EuroBSDcon, pp The Sofia (2014) 5. FUSE: Filesystem in Userspace, 6. Ishiguro, S., Murakami, J., Oyama, Y., Tatebe, O.: Optimizing Local File Accesses for FUSE-Based Distributed Storage. In: Proceedings of the International Workshop on Data- Intensive Scalable Computing Systems (DISCS 12), pp IEEE, (2012) 7. Rajgarhia, A., Gehani, A.: Performance and Extension of User Space File Systems. In: the ACM Symposium on Applied Computing (SAC 00), pp ACM Press, New York, (2010) 8. HDFS: Hadoop Distributed File System, 9. Ghemawat, S., Gobioff, H., and Leung, S.: The Google File System. In: 9th ACM Symposium on Operating Systems Principles (SOSP 03), pp ACM Press, New York, (2003) 10. Kim, Y. C., Kim, D. O., Kim, H. Y., Kim, Y. K., Choi, W.: MAHA-FS: A Distributed File Sys-tem for High Performance Metadata Processing and Random IO. KIPS Transactions on Software and Data Engineering, vol.2, issue 2, pp (2013) 11. Min, Y. S., Jin, K.S., Kim, H.Y., Kim, Y.K.: A Trend to Distributed File Systems for Cloud Computing. Electronics and Telecommunications Trends, vol. 24, issue 4, pp (2009) 20 Copyright 2016 SERSC

Adaptation of Distributed File System to VDI Storage by Client-Side Cache

Adaptation of Distributed File System to VDI Storage by Client-Side Cache Adaptation of Distributed File System to VDI Storage by Client-Side Cache Cheiyol Kim 1*, Sangmin Lee 1, Youngkyun Kim 1, Daewha Seo 2 1 Storage System Research Team, Electronics and Telecommunications

More information

Analysis of Virtual Machine Scalability based on Queue Spinlock

Analysis of Virtual Machine Scalability based on Queue Spinlock , pp.15-19 http://dx.doi.org/10.14257/astl.2017.148.04 Analysis of Virtual Machine Scalability based on Queue Spinlock Seunghyub Jeon, Seung-Jun Cha, Yeonjeong Jung, Jinmee Kim and Sungin Jung Electronics

More information

Optimizing Local File Accesses for FUSE-Based Distributed Storage

Optimizing Local File Accesses for FUSE-Based Distributed Storage Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro 1, Jun Murakami 1, Yoshihiro Oyama 1,3, Osamu Tatebe 2,3 1. The University of Electro-Communications, Japan 2. University

More information

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Learning Based Real-time Object Recognition System with Image Web Crawler , pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department

More information

A Study on the IoT Sensor Interaction Transmission System based on BigData

A Study on the IoT Sensor Interaction Transmission System based on BigData Vol.123 (SoftTech 2016), pp.220-224 http://dx.doi.org/10.14257/astl.2016.123.41 A Study on the IoT Sensor Interaction Transmission System based on BigData Jin-Tae Park 1, Gyung-Soo Phyo 1 and Il-Young

More information

SMCCSE: PaaS Platform for processing large amounts of social media

SMCCSE: PaaS Platform for processing large amounts of social media KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and

More information

Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence

Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence , pp.354-359 http://dx.doi.org/10.14257/astl.2016.139.71 Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence Jong-Hyun Kim, Yangseo Choi, Joo-Young Lee, Sunoh Choi,

More information

Trajectory Planning for Mobile Robots with Considering Velocity Constraints on Xenomai

Trajectory Planning for Mobile Robots with Considering Velocity Constraints on Xenomai , pp.1-5 http://dx.doi.org/10.14257/astl.2014.49.01 Trajectory Planning for Mobile Robots with Considering Velocity Constraints on Xenomai Gil Jin Yang and Byoung Wook Choi *, Seoul National University

More information

Byte Index Chunking Approach for Data Compression

Byte Index Chunking Approach for Data Compression Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2, Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea {Ider555, jso, jeonggun.lee, yuko}@hallym.ac.kr 2

More information

A Design of Building Group Management Service Framework for On-Going Commissioning

A Design of Building Group Management Service Framework for On-Going Commissioning , pp.84-88 http://dx.doi.org/10.14257/astl.2014.49.18 A Design of Building Group Management Service Framework for On-Going Commissioning Taehyung Kim 1, Youn Kwae Jeong 1 and Il Woo Lee 1, 1 Electronics

More information

Design of Ontology Engine Architecture for L-V-C Integrating System

Design of Ontology Engine Architecture for L-V-C Integrating System , pp.225-230 http://dx.doi.org/10.14257/astl.2016.139.48 Design of Ontology Engine Architecture for L-V-C Integrating System Gap-Jun Son 1, Yun-Hee Son 2 and Kyu-Chul Lee * 1,2,* Department of Computer

More information

Batch Inherence of Map Reduce Framework

Batch Inherence of Map Reduce Framework Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287

More information

The Google File System. Alexandru Costan

The Google File System. Alexandru Costan 1 The Google File System Alexandru Costan Actions on Big Data 2 Storage Analysis Acquisition Handling the data stream Data structured unstructured semi-structured Results Transactions Outline File systems

More information

Analyzing and Improving Load Balancing Algorithm of MooseFS

Analyzing and Improving Load Balancing Algorithm of MooseFS , pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University

More information

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,

More information

A Personal Information Retrieval System in a Web Environment

A Personal Information Retrieval System in a Web Environment Vol.87 (Art, Culture, Game, Graphics, Broadcasting and Digital Contents 2015), pp.42-46 http://dx.doi.org/10.14257/astl.2015.87.10 A Personal Information Retrieval System in a Web Environment YoungDeok

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

MapReduce. U of Toronto, 2014

MapReduce. U of Toronto, 2014 MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage , pp. 9-16 http://dx.doi.org/10.14257/ijmue.2016.11.4.02 An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage Eunmi Jung 1 and Junho Jeong 2

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone

The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone , pp.1-5 http://dx.doi.org/10.14257/astl.2017.146.01 The Design and Implementation of a BLE-based WebD2D Service for Android Smartphone Do-Hyung Kim 1, Seok-Jin Yoon 1, Hyung-Seok Lee 1 and Jae-Ho Lee

More information

BigData and Map Reduce VITMAC03

BigData and Map Reduce VITMAC03 BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to

More information

A Simple Model for Estimating Power Consumption of a Multicore Server System

A Simple Model for Estimating Power Consumption of a Multicore Server System , pp.153-160 http://dx.doi.org/10.14257/ijmue.2014.9.2.15 A Simple Model for Estimating Power Consumption of a Multicore Server System Minjoong Kim, Yoondeok Ju, Jinseok Chae and Moonju Park School of

More information

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd. 1 Agenda Introduction Background and Motivation Hybrid Key-Value Data Store Architecture Overview Design details Performance

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

A Novel Model for Home Media Streaming Service in Cloud Computing Environment

A Novel Model for Home Media Streaming Service in Cloud Computing Environment , pp.265-274 http://dx.doi.org/10.14257/ijsh.2013.7.6.26 A Novel Model for Home Media Streaming Service in Cloud Computing Environment Yun Cui 1, Myoungjin Kim 1 and Hanku Lee1, 2,* 1 Department of Internet

More information

Building Ubiquitous Computing Environment Using the Web of Things Platform

Building Ubiquitous Computing Environment Using the Web of Things Platform , pp.105-109 http://dx.doi.org/10.14257/astl.2013 Building Ubiquitous Computing Environment Using the Web of Things Platform Woo-Chang Shin Dept. of Computer Science, at SeoKyeong University 16-1 Jungneung-Dong

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

Dell Technologies IoT Solution Surveillance with Genetec Security Center

Dell Technologies IoT Solution Surveillance with Genetec Security Center Dell Technologies IoT Solution Surveillance with Genetec Security Center Surveillance December 2018 H17435 Configuration Best Practices Abstract This guide is intended for internal Dell Technologies personnel

More information

Research on Implement Snapshot of pnfs Distributed File System

Research on Implement Snapshot of pnfs Distributed File System Applied Mathematics & Information Sciences An International Journal 2011 NSP 5 (2) (2011), 179S-185S Research on Implement Snapshot of pnfs Distributed File System Liu-Chao, Zhang-Jing Wang, Liu Zhenjun,

More information

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St. -Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St. Petersburg Introduction File System Enterprise Needs Gluster Revisited Ceph

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc. UK LUG 10 th July 2012 Lustre at Exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Exascale I/O requirements Exascale I/O model 3 Lustre at Exascale - UK LUG 10th July 2012 Exascale I/O

More information

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network

More information

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski

Introduction The Project Lustre Architecture Performance Conclusion References. Lustre. Paul Bienkowski Lustre Paul Bienkowski 2bienkow@informatik.uni-hamburg.de Proseminar Ein-/Ausgabe - Stand der Wissenschaft 2013-06-03 1 / 34 Outline 1 Introduction 2 The Project Goals and Priorities History Who is involved?

More information

Data Centers and Cloud Computing

Data Centers and Cloud Computing Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Fast Forward I/O & Storage

Fast Forward I/O & Storage Fast Forward I/O & Storage Eric Barton Lead Architect 1 Department of Energy - Fast Forward Challenge FastForward RFP provided US Government funding for exascale research and development Sponsored by 7

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

A Polygon Rendering Method with Precomputed Information

A Polygon Rendering Method with Precomputed Information A Polygon Rendering Method with Precomputed Information Seunghyun Park #1, Byoung-Woo Oh #2 # Department of Computer Engineering, Kumoh National Institute of Technology, Korea 1 seunghyunpark12@gmail.com

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) 4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

Designing Next Generation FS for NVMe and NVMe-oF

Designing Next Generation FS for NVMe and NVMe-oF Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO

More information

Supporting Collaborative 3D Editing over Cloud Storage

Supporting Collaborative 3D Editing over Cloud Storage , pp.33-37 http://dx.doi.org/10.14257/astl.2015.107.09 Supporting Collaborative 3D Editing over Cloud Storage Yeoun-Ui Ha 1, Jae-Hwan Jin 2, Myung-Joon Lee 3 Department of Electrical/Electronic and Computer

More information

Design of Self-Adaptive System Observation over Internet of Things

Design of Self-Adaptive System Observation over Internet of Things , pp.165-171 http://dx.doi.org/10.14257/astl.2015.117.39 Design of Self-Adaptive System Observation over Internet of Things Young-Joo Kim 1, Jong-Soo Seok 1, Moon Soo Lee 1, Jeong-Si Kim 1, and YungJoon

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1.

Online Version Only. Book made by this file is ILLEGAL. Design and Implementation of Binary File Similarity Evaluation System. 1. , pp.1-10 http://dx.doi.org/10.14257/ijmue.2014.9.1.01 Design and Implementation of Binary File Similarity Evaluation System Sun-Jung Kim 2, Young Jun Yoo, Jungmin So 1, Jeong Gun Lee 1, Jin Kim 1 and

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

SSD Garbage Collection Detection and Management with Machine Learning Algorithm 1

SSD Garbage Collection Detection and Management with Machine Learning Algorithm 1 , pp.197-206 http//dx.doi.org/10.14257/ijca.2018.11.4.18 SSD Garbage Collection Detection and Management with Machine Learning Algorithm 1 Jung Kyu Park 1 and Jaeho Kim 2* 1 Department of Computer Software

More information

Next-Generation Cloud Platform

Next-Generation Cloud Platform Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology

More information

Time Stamp based Multiple Snapshot Management Method for Storage System

Time Stamp based Multiple Snapshot Management Method for Storage System Time Stamp based Multiple Snapshot Management Method for Storage System Yunsoo Lee 1, Dongmin Shin 1, Insoo Bae 1, Seokil Song 1, Seungkook Cheong 2 1 Dept. of Computer Engineering, Korea National University

More information

CS 345A Data Mining. MapReduce

CS 345A Data Mining. MapReduce CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Parallelizing Inline Data Reduction Operations for Primary Storage Systems

Parallelizing Inline Data Reduction Operations for Primary Storage Systems Parallelizing Inline Data Reduction Operations for Primary Storage Systems Jeonghyeon Ma ( ) and Chanik Park Department of Computer Science and Engineering, POSTECH, Pohang, South Korea {doitnow0415,cipark}@postech.ac.kr

More information

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage Silverton Consulting, Inc. StorInt Briefing 2017 SILVERTON CONSULTING, INC. ALL RIGHTS RESERVED Page 2 Introduction Unstructured data has

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION

MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2014) Vol. 3 (4) 273 283 MODERN FILESYSTEM PERFORMANCE IN LOCAL MULTI-DISK STORAGE SPACE CONFIGURATION MATEUSZ SMOLIŃSKI Institute of

More information

System Specification

System Specification NetBrain Integrated Edition 7.1 System Specification Version 7.1a Last Updated 2018-09-04 Copyright 2004-2018 NetBrain Technologies, Inc. All rights reserved. Introduction NetBrain Integrated Edition features

More information

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin

Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Accelerating String Matching Algorithms on Multicore Processors Cheng-Hung Lin Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan Abstract String matching is the most

More information

An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths

An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths , pp.88-93 http://dx.doi.org/10.14257/astl.2016.135.23 An Efficient Flow Table Management Scheme for SDNs Based On Flow Forwarding Paths Dongryeol Kim, Byoung-Dai Lee Kyonggi university, Department of

More information

Development of Technique for Healing Data Races based on Software Transactional Memory

Development of Technique for Healing Data Races based on Software Transactional Memory , pp.482-487 http://dx.doi.org/10.14257/astl.2016.139.96 Development of Technique for Healing Data Races based on Software Transactional Memory Eu-Teum Choi 1,, Kun Su Yoon 2, Ok-Kyoon Ha 3, Yong-Kee Jun

More information

Big Data Service Combination for Efficient Energy Data Analytics

Big Data Service Combination for Efficient Energy Data Analytics , pp.455-459 http://dx.doi.org/10.14257/astl.2016.139.90 Big Data Service Combination for Efficient Energy Data Analytics Tai-Yeon Ku, Wan-ki Park, Il-Woo Lee Energy IT Technology Research Section Hyper-connected

More information

Design and Implementation of Secure OTP Generation for IoT Devices

Design and Implementation of Secure OTP Generation for IoT Devices , pp.75-80 http://dx.doi.org/10.14257/astl.2017.146.15 Design and Implementation of Secure OTP Generation for IoT Devices Young-Sae Kim 1 and Jeong-Nyeo Kim 1 1 Electronics and Telecommunications Research

More information

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com Discover CephFS TECHNICAL REPORT SPONSORED BY image vlastas, 123RF.com Discover CephFS TECHNICAL REPORT The CephFS filesystem combines the power of object storage with the simplicity of an ordinary Linux

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

EE 660: Computer Architecture Cloud Architecture: Virtualization

EE 660: Computer Architecture Cloud Architecture: Virtualization EE 660: Computer Architecture Cloud Architecture: Virtualization Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. Roy Campbell & Prof Reza Farivar

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 ISSN

International Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 ISSN 68 Improving Access Efficiency of Small Files in HDFS Monica B. Bisane, Student, Department of CSE, G.C.O.E, Amravati,India, monica9.bisane@gmail.com Asst.Prof. Pushpanjali M. Chouragade, Department of

More information

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES

Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES Opendedupe & Veritas NetBackup ARCHITECTURE OVERVIEW AND USE CASES May, 2017 Contents Introduction... 2 Overview... 2 Architecture... 2 SDFS File System Service... 3 Data Writes... 3 Data Reads... 3 De-duplication

More information

Virtualized Testbed Development using Openstack

Virtualized Testbed Development using Openstack , pp.742-746 http://dx.doi.org/10.14257/astl.2015.120.147 Virtualized Testbed Development using Openstack Byeongok Kwak 1, Heeyoung Jung 1, 1 Electronics and Telecommunications Research Institute (ETRI),

More information

DDSF: A Data Deduplication System Framework for Cloud Environments

DDSF: A Data Deduplication System Framework for Cloud Environments DDSF: A Data Deduplication System Framework for Cloud Environments Jianhua Gu, Chuang Zhang and Wenwei Zhang School of Computer Science and Technology, High Performance Computing R&D Center Northwestern

More information

Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms

Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms Towards High-Performance and Cost-Effective Distributed Storage Systems with Information Dispersal Algorithms Dongfang Zhao 1, Kent Burlingame 1,2, Corentin Debains 1, Pedro Alvarez-Tabio 1, Ioan Raicu

More information

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk

A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk A Case Study: Performance Evaluation of a DRAM-Based Solid State Disk Hitoshi Oi The University of Aizu November 2, 2007 Japan-China Joint Workshop on Frontier of Computer Science and Technology (FCST)

More information

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E. 18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,

More information

MAHA. - Supercomputing System for Bioinformatics

MAHA. - Supercomputing System for Bioinformatics MAHA - Supercomputing System for Bioinformatics - 2013.01.29 Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 2 ETRI HPC R&D Area - Overview Research area Computing HW MAHA System HW - Rpeak : 0.3

More information

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

EXTRACT DATA IN LARGE DATABASE WITH HADOOP International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage

Campaign Storage. Peter Braam Co-founder & CEO Campaign Storage Campaign Storage Peter Braam 2017-04 Co-founder & CEO Campaign Storage Contents Memory class storage & Campaign storage Object Storage Campaign Storage Search and Policy Management Data Movers & Servers

More information

Dongjun Shin Samsung Electronics

Dongjun Shin Samsung Electronics 2014.10.31. Dongjun Shin Samsung Electronics Contents 2 Background Understanding CPU behavior Experiments Improvement idea Revisiting Linux I/O stack Conclusion Background Definition 3 CPU bound A computer

More information

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine

LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine 777 LevelDB-Raw: Eliminating File System Overhead for Optimizing Performance of LevelDB Engine Hak-Su Lim and Jin-Soo Kim *College of Info. & Comm. Engineering, Sungkyunkwan University, Korea {haksu.lim,

More information

Bringing HyperScale Computing to the Enterprise. The need for Enterprises to overhaul their IT systems

Bringing HyperScale Computing to the Enterprise. The need for Enterprises to overhaul their IT systems Bringing HyperScale Computing to the Enterprise The need for Enterprises to overhaul their IT systems MSys: Corporate Overview Established In: 2007 Self-funded, profitable Over 350 employees Global Presence

More information

Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments

Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments Vol.8, No.3 (2014), pp.223-234 http://dx.doi.org/10.14257/ijsh.2014.8.3.21 Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments Yunsik Son 1, Seman

More information

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc

Interrupt response times on Arduino and Raspberry Pi. Tomaž Šolc Interrupt response times on Arduino and Raspberry Pi Tomaž Šolc tomaz.solc@ijs.si Introduction Full-featured Linux-based systems are replacing microcontrollers in some embedded applications for low volumes,

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Red Hat Enterprise 7 Beta File Systems

Red Hat Enterprise 7 Beta File Systems Red Hat Enterprise 7 Beta File Systems New Scale, Speed & Features Ric Wheeler Director Red Hat Kernel File & Storage Team Red Hat Storage Engineering Agenda Red Hat Enterprise Linux 7 Storage Features

More information

Jumbo: Beyond MapReduce for Workload Balancing

Jumbo: Beyond MapReduce for Workload Balancing Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp

More information

Dynamic Translator-Based Virtualization

Dynamic Translator-Based Virtualization Dynamic Translator-Based Virtualization Yuki Kinebuchi 1,HidenariKoshimae 1,ShuichiOikawa 2, and Tatsuo Nakajima 1 1 Department of Computer Science, Waseda University {yukikine, hide, tatsuo}@dcl.info.waseda.ac.jp

More information

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk

Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Chapter 14 HARD: Host-Level Address Remapping Driver for Solid-State Disk Young-Joon Jang and Dongkun Shin Abstract Recent SSDs use parallel architectures with multi-channel and multiway, and manages multiple

More information

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017 Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google

More information

CLIENT DATA NODE NAME NODE

CLIENT DATA NODE NAME NODE Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency

More information

Implementation of Smart Car Infotainment System including Black Box and Self-diagnosis Function

Implementation of Smart Car Infotainment System including Black Box and Self-diagnosis Function , pp.267-274 http://dx.doi.org/10.14257/ijseia.2014.8.1.23 Implementation of Smart Car Infotainment System including Black Box and Self-diagnosis Function Minyoung Kim 1, Jae-Hyun Nam 2 and Jong-Wook Jang

More information