Research on Mass Image Storage Platform Based on Cloud Computing

Similar documents
Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

Decision analysis of the weather log by Hadoop

Processing Technology of Massive Human Health Data Based on Hadoop

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

A Fast and High Throughput SQL Query System for Big Data

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c

Research and Implementation of Server Load Balancing Strategy in Service System

The Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform

Correlation based File Prefetching Approach for Hadoop

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Chapter 5. The MapReduce Programming Model and Implementation

ApsaraDB for Redis. Product Introduction

Certified Big Data Hadoop and Spark Scala Course Curriculum

A New Model of Search Engine based on Cloud Computing

Introduction to Hadoop and MapReduce

Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop

A Review Approach for Big Data and Hadoop Technology

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

SMCCSE: PaaS Platform for processing large amounts of social media

TP1-2: Analyzing Hadoop Logs

Application of Nonlinear Later TV Edition in Gigabit Ethernet. Hong Ma

CLIENT DATA NODE NAME NODE

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

Distributed File Systems II

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

Certified Big Data and Hadoop Course Curriculum

Enhanced Hadoop with Search and MapReduce Concurrency Optimization

Construction Scheme for Cloud Platform of NSFC Information System

A New HadoopBased Network Management System with Policy Approach

Quobyte The Data Center File System QUOBYTE INC.

Construction and Application of Cloud Data Center in University

Research on Load Balancing and Database Replication based on Linux

A Study on Load Balancing Techniques for Task Allocation in Big Data Processing* Jin Xiaohong1,a, Li Hui1, b, Liu Yanjun1, c, Fan Yanfang1, d

New research on Key Technologies of unstructured data cloud storage

WHITE PAPER Software-Defined Storage IzumoFS with Cisco UCS and Cisco UCS Director Solutions

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

International Journal of Advance Engineering and Research Development. A Study: Hadoop Framework

High Performance Computing on MapReduce Programming Framework

Cloud Computing CS

Dell EMC Surveillance for Reveal Body- Worn Camera Systems

WHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Design of Coal Mine Power Supply Monitoring System

SDS: A Scalable Data Services System in Data Grid

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Study on secure data storage based on cloud computing ABSTRACT KEYWORDS

The power quality intelligent monitoring system based on cloud computing Jie Bai 1a, Changpo Song 2b

The Google File System. Alexandru Costan

Distributed Systems 16. Distributed File Systems II

Research on Accessing Remotely Sensed Data Efficiently Stored in Oracle under Matlab Environment Jing Dong*

Massively Scalable File Storage. Philippe Nicolas, KerStor

Embedded Technosolutions

A Data Collecting and Caching Mechanism for Gateway Middleware in the Web of Things

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

The Design and Implementation of Disaster Recovery in Dual-active Cloud Center

10 Million Smart Meter Data with Apache HBase

The Construction of Open Source Cloud Storage System for Digital Resources

Hadoop. copyright 2011 Trainologic LTD

Design and Implementation of High-Speed Real-Time Data Acquisition and Processing System based on FPGA

Grid Computing with Voyager

Dell EMC Surveillance for IndigoVision Body-Worn Cameras

Locality Aware Fair Scheduling for Hammr

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

Distributed Face Recognition Using Hadoop

GFS: The Google File System

Evaluation of Apache Hadoop for parallel data analysis with ROOT

Indexing Strategies of MapReduce for Information Retrieval in Big Data

Service Execution Platform WebOTX To Support Cloud Computing

Deploy Hadoop For Processing Text Data To Run Map Reduce Application On A Single Site

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Top 25 Hadoop Admin Interview Questions and Answers

GlusterFS and RHS for SysAdmins

Data Analysis Using MapReduce in Hadoop Environment

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Design of Smart Home System Based on ZigBee Technology and R&D for Application

Big Data Hadoop Course Content

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Building Backup-to-Disk and Disaster Recovery Solutions with the ReadyDATA 5200

GLOBAL COMMAND Security Anywhere MANAGEMENT SYSTEM

Overview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.

ABSTRACT I. INTRODUCTION

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Design and Implementation of Music Recommendation System Based on Hadoop

Mixing and matching virtual and physical HPC clusters. Paolo Anedda

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

EXTRACT DATA IN LARGE DATABASE WITH HADOOP

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment

The Application of CAN Bus in Intelligent Substation Automation System Yuehua HUANG 1, a, Ruiyong LIU 2, b, Peipei YANG 3, C, Dongxu XIANG 4,D

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

HADOOP FRAMEWORK FOR BIG DATA

Communication Stability Experiment of IOT Car Based on WIFI and Bluetooth

CISC 7610 Lecture 2b The beginnings of NoSQL

Autonomic Data Replication in Cloud Environment

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Building a Future-Proof Data- Processing Solution with Intelligent IoT Gateways. Johnny T.L. Fang Product Manager

August Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund

Transcription:

6th International Conference on Sensor Network and Computer Engineering (ICSNCE 2016) Research on Mass Image Storage Platform Based on Cloud Computing Xiaoqing Zhou1, a *, Jiaxiu Sun2, b and Zhiyong Zhou1, c 1 Center of Computing, China West Normal University, Nanchong, China 2 Colleges of Business, China West Normal University, Nanchong, China a 369656820@qq.com, bzhousun123@163.com, c449735106@qq.com *The corresponding author Keywords: Massive images; Cloud computing; Distributed calculation; Storage framework Abstract. With fast development and deep appliance of the Internet, problem of mass image data storage stand out, so the problem of low management efficiency, low storage ability and high cost of traditional storage framework has appeared. The appearance of Hadoop provides a new thought. However, Hadoop itself is not suit for the handle of small files. This paper puts forward a storage framework of mass image files based on Hadoop, and solved the internal storage bottleneck of NameNode when small files are excessive through classification algorithm of preprocessing module and lead-in of high efficiency and first-level of index mechanism. The test manifests that the system is safe, easy to defend and has fine extension quality; as a result, it can reach to a fine effect. Introduction With fast development and deep appliance of the Internet, it appears more and more large portal sites, e-business websites and net community. These websites have large picture resource and the number of pictures increases in high speed. Traditional technological framework cannot play an active role in dealing with massive images. How to build a cheap and efficient picture storage management system in the precondition of fulfilling high concurrent access has become a to- besolved problem [1]. This paper puts forward a massive image storage module based on Hadoop through analysis of HDFS of Hadoop, research of MapReduce and business demand analysis of picture storage [2, 3, 4]. This system has a framework of Master\Slave and based on HDFS distributed file system of Hadoop. Its hardware is constructed on the common Linux machine clusters. High error tolerance, high response and load balancing can be realized through indoor monitor and it provides outdoor service, what s more, it can also fulfill high concurrency and dependable appliance. Overall Platform Design Framework and Structure of Overall Platform. The system adopts HA framework and smooth expansion which enable the availability and extension of the whole file system. Besides, the system uses flat data organization structure through efficient extended first-level index mechanism, in this way, which can place sequence file and offset position relationship of pictures and files quickly. Storage perfection and stability of every storage node is realized through load balancing and cache system design. The system employs MVC3 layer design after combining nature of different structure of massive image data, distribution and diversity and considering the realization of system organization. As a result, the structure is clearer and the system is easier to extend. See Fig. 1 The whole framework of the system. Analysis of Platform Layer Data Resource Layer. Data resource layer is the basis of whole platform and the fundamental part of cloud storage. Storage equipment can be FC optical fiber passage storage equipment, can be IP storage equipment such as NAS and ISCSI, and can be DAS storage equipment such as SCSI or 2016. The authors - Published by Atlantis Press 5

SAS [5]. Cloud storage devices are often large and distributed in different geographical area, linking to each other through a wide area network, internet or FC fiber channel network. On the storage device is a unified storage management system that can achieve logical virtualization management of storage device, multilink redundancy management and status monitoring and fault maintenance of hardware device. Application interface layer Request Routing Business logic layer Business center server Data access connection Data resource layer Storage node Figure 1. The overall framework of the platform structure Business Logic Layer. Business logic layer can parallel process massive image data and manage the entire platform system configuration. It is the most important part of the cloud storage, but also has technical content. The main function is to coordinate a plurality of storage apparatus underlying business logic layer, to provide a uniform API interface to the upper layer application service and to shield storage apparatus on the lower layer. Basis of management s implementation requires the use of distributed file system, grid computing and cluster technology, accurately the realization of cloud computing. From the angle of providing service technically, the basic management layer is also responsible for managing data, including data encryption, data backup, data disaster tolerance to ensure data security, ensure the stability of the top business, thus ensuring the security and stability of the entire system. Application Interface Layer. A flexible and changeable part of cloud storage, Application interface layer provides convenient, easy used and friendlier interface for users. Different systems decide their individual service according to their own business which exactly embodies the advantage of cloud storage. Stratified Platform Module Structure Analysis. Considering system s function, the whole system can be divided into the Data resource layer, business logic layer, application interface layer, each layer is composed of several modules Data Resource Layer Module Structure. At the bottom is the data storage layer, mainly composed of a storage module. For massive data storage, different data sources provided by different database need to be shielded by storage module to provide database access services, so that the system can meet the requirement of handling massive data storage, as a result, the system has better extendibility and completeness which is ease of management and deployment. A large number of low-cost machines are combined into a cluster through HDFS of Hadoop which provide massive storage capacity [6]. Business Logic Layer Module Structure. Business logic layer is the core of the whole system, but also key elements of the system design and development. It uses a distributed database 6

technology and Linux cluster technology, providing the main function of the massive data parallel load storage. The processed data is stored at distributed database of the system, through the massive parallel processing of data in this layer, while also providing management support services to ensure system normal operation. This layer consists of five functional modules. Image File Pre-Processing Module. Image file preprocessing module is to preprocess the image file, file name design and picture metadata management. Preprocessing module combines the strong correlation files into Sequence File by using the classification algorithm, which greatly reduces the number of files in HDFS. Through the design of the image file name, the image can be easily read. Metadata is just simple car Block information in Hadoop system which is stored in the NameNode machine s memory, but the picture is stored in each image contains its own picture name, description, upload time, uploaders and other information [7]. Under the massive picture background, the amount of information will reach to several G. As there is no way to save in the NameNode memory, so here we use Hadoop HBase to store pictures in the metadata [8]. Storage Control Module. Storage control module s main function is to provide storage management web interface and a unified command, realizing the management of the storage layer nodes via web interface. Hadoop provides opened port 50070 for viewing NameNode information. You can view the current normal Node, Node failures, NameNode log information through this interface. Caching Service Module. The main purpose of caching service module is to build buffer. The client requests sent to here by cache buffer screening, where we use Redis to build caching system. Redis is similar to traditional Memcached. They both are a <Key/Value> type storage system. Value type of storage it supports is relatively more, including String, List, Set and zest. Push\pop, add\ remove and intersection, union set, difference Set, operations or richer operations are supported by these data types, and these operations are atomic [9]. On this basis, redis supports a variety of different ways to sort and can synchronize master-slave. If the business is large, single Redis cannot meet business needs, so it can be clustered for Redis process and it can be established of multiple Redis. As Redis supports Master-Slave structure, so it can be set up a Redis responsible for writing cache data, and the rest is responsible for more than one Redis read cache data. Business Processing Module. Business processing module is mainly to solve image processing tasks. As usual, uploaded Internet application pictures need to be processed such as, upload picture as generate thumbnails, upload portrait with Image Segmentation. In the case of bulk upload pictures, if the picture processing focus on the application server, the application server will take up CPU resources, affecting the quality of the application service. Traditional approach is a machine alone does image processing. Such an approach can be decoupled from the application. However, after the emergence of MapReduce, we can process pictures based on MapReduce. Image processing is distributed on the cheap storage machine of image storage and stored directly after processing which not only save hardware resources, but also deal with stress distribution to each node. Load Balancing Module. Load balancing module is mainly used to solve the stability under high concurrent use of the system. We deploy load balancing for the entire system, using HAProxy of RoundRobin load balancing algorithm, dividing front-end user requests pressure to each Web image server [10]. HAproxy provides high availability, load balancing, and proxy TCP and HTTPbased applications. It is a free, fast and reliable solution. Our requests are sent to different servers through distinction of read and write. Picture read requests are sent to the image server. On the one hand it reads the picture metadata information through cache, on the other hand it hosts image through NameNode. Application Interface Layer Structure. This layer is made up by GUI interface module based on users and API interface module based on calculating ways. GUI Interface Module Based on Users. This interface module faces towards users for whom providing different kinds of operating applied tools. So users have easy access to huge data storage process. 7

API Interface Module Based on Calculating Ways. We can provide compiling applied system and assign API in calculating storage according to some high-lever users having higher and more requirements, which can realize needed applied function. System Realization and Result Analysis Software Preparation. Operation system: Ubuntu 9.04; Distributed file system: Hadoop0.20.2. The motion of Hadoop needs JDK environment and this paper chooses JDK1.6.0_31, including a NameNode server, a Job Tracker server and four DataNode servers.; Image server: Nginx- 0.9.6;Caching software: Redis; Load balancing software: HAProxy; Java environment: JRE6;Java develop instrument: eclipse3.2. Research Result Analysis Cluster Safety Test. Security testing of NameNode verse DataNode2 is realized through Heart Agent for safety test of NameNode. When we cut net connection of NameNode, then quickly tying VIP to address of NameNode2 through Heart Agent. The service is operating fine after checking, so it is concluded that system safety test is achieved of research aim. Picture Storage Joint Test. We first adopts applied server complication line to do store and take operation. We choose pictures randomly to analyze, finding the size of pictures is between 2K and 10M, and the effect is like Fig. 2. Picture Storage Joint Test. We first adopts applied server complication line to do store and take operation. We choose pictures randomly to analyze, finding the size of pictures is between 2K and 10M, and the effect is like Fig. 2. Figure 2. Image read analysis Figure 3. Image written analysis Storage system constructed by Hadoop can read TPS (transmission number of operations per second) and it increases with threads, but its growth gradually slowed down to 70 when the thread reached to the first peak, and then add read threads, and TPS is no longer growing stably. Traditional NFS storage systems require multiple IO when read pictures, so when the number of threads have reached to the 40, the system arrive to its load limit. While NFS storage systems using Hadoop s new storage system reduces the number of IO operations, under the same machine configuration, you can achieve more concurrent read operations. We then analyzed using random writes pictures. The picture size distribution is in the 2K-10M, including small pictures and big pictures. After the client concurrent requests, HDFS will store pictures. The effect is shown in Fig. 3. Written TPS s thread count peaked at 60, and then an additional written thread is involved, TPS is no longer growing stably at this moment. Written performance of the new storage system of Hadoop is superior to NFS written performance which ensure the high throughput of the system. 8

By analyzing renderings of concurrent read and write multi-thread, you can find a new storage system uses Hadoop in optimizing the storage procedure, reducing the number of IO access process and improving the carrying capacity of the system. In a word, the overall deposit is better than NFS systems. Especially in the face of high concurrent read and write operations, the new storage system HDFS guarantee the stability of the system and the security of the data. So we can be sure that the new storage systems of Hadoop have the ability to solve the problem of massive storage ability. Summary This paper designs and develops massive pictures and data storage platform based on Hadoop, adopting the technology of Linux data base and parallel distributed database technology. This paper employs a series of pictures and data management ways such as: HDFS distributed file system, Map/Reduce parallel calculation module and HBase data base technology. Besides, the paper also adopts Redis and HAProxy to construct buffer and load balancing, in this way, the whole system reach to a stable and healthy condition. If this platform is built on many cheap and common computers, the requirement of high efficiency and pictures and data management can be reached. The result of platform module realization manifests: this system has fine extension quality and is easy to defend, so the technological route and design method adopted by the system is effective and feasible. References [1] X.Y.ZHAO, Y.YANG, L.L.SUN and Y.CHEN: Journal of Computer Applications, Hadoopbased storage architecture for mass MP3 files, Vol.32 (2012) No.6, p.1724. [2] L.LIN: Design and Analysis of the Mass Image Storage Model Based on Hadoop (Ms., Shandong University of Technology, China 2011, 11), p.56. [3] J.CUI, T.S.LI and H.X. Lan: Journal of Computer Research and Development, Design and Development of the Mass Data Storage Platform Based on Hadoop. Vol.49 (2012) No.7, p.12. [4] J.LIU: Computer & Telecommunication, Design and Research of the Mass Teaching Resources Storage Platform Based on Hadoop, Vol (33) (2013) No, 5 p.07. [5] J.H.TAI: Research of Mass Data Storage Technology Based on Hadoop Platform (MS. D., WuHan University of Technology, China 2012, 11), p.06. [6] Tome White: Hadoop: The Definitive Guide (O'REILLY PRESS, America 2009) [7] J.X.ZHANG, Z.M.GU and C.ZHENG: Application Research of Computers, Survey of research progress on cloud computing, Vol27 (2010) No.2, p.429. [8] Y.JZHAO: Intelligent Computer and Applications, Study on Data Storage Structures based on Cloud Computing, Vol4 (2014) No.4, p.90. [9] Y.CHEN: Computer Measurement & Control, High Concurrent Access Optimization in HDFS Based on Cache for Mass Images, Vol22 (2015) No.8, p.2669. [10] Y.J.TU, Y. BAI: Computer Measurement & Control, Design of Cloud Computing Safe Storage Strategy Based on Hadoop and Double Key, Vol22 (2014) No.8, p.2629. 9