New research on Key Technologies of unstructured data cloud storage

Similar documents
Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

Crop Production Management Information System Design and Implementation

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Construction Scheme for Cloud Platform of NSFC Information System

Construction and Application of Cloud Data Center in University

The Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c

Research and Design of Education and Teaching Resource Management System based on ASP.NET Technology

Research on Digital Library Platform Based on Cloud Computing

Research on the Interoperability Architecture of the Digital Library Grid

High Performance Computing on MapReduce Programming Framework

Research on 3G Terminal-Based Agricultural Information Service

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

A Fast and High Throughput SQL Query System for Big Data

Every SAS Cloud has a Silver Lining. Letting your data reign in the cloud

DDSF: A Data Deduplication System Framework for Cloud Environments

The Research and Design of the Application Domain Building Based on GridGIS

SQL Query Optimization on Cross Nodes for Distributed System

The power quality intelligent monitoring system based on cloud computing Jie Bai 1a, Changpo Song 2b

Data safety for digital business. Veritas Backup Exec WHITE PAPER. One solution for hybrid, physical, and virtual environments.

When, Where & Why to Use NoSQL?

Embedded Technosolutions

The Design and Implementation of Disaster Recovery in Dual-active Cloud Center

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Next-generation IT Platforms Delivering New Value through Accumulation and Utilization of Big Data

THE GEOSPATIAL DATA CLOUD: AN IMPLEMENTATION OF APPLYING CLOUD COMPUTING IN GEOSCIENCES*

Grid Computing with Voyager

Design of Physical Education Management System Guoquan Zhang

Research and Design of Data Storage Scheme for Electric Power Big Data

Design on Office Automation System based on Domino/Notes Lijun Wang1,a, Jiahui Wang2,b

Design and Implementation of High-Speed Real-Time Data Acquisition and Processing System based on FPGA

Information Retrieval System Based on Context-aware in Internet of Things. Ma Junhong 1, a *

Addressing Data Management and IT Infrastructure Challenges in a SharePoint Environment. By Michael Noel

An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS

Research Article Mobile Storage and Search Engine of Information Oriented to Food Cloud

Part 1: Indexes for Big Data

A Study on Load Balancing Techniques for Task Allocation in Big Data Processing* Jin Xiaohong1,a, Li Hui1, b, Liu Yanjun1, c, Fan Yanfang1, d

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses

Linux Automation.

Processing Technology of Massive Human Health Data Based on Hadoop

FROM A RIGID ECOSYSTEM TO A LOGICAL AND FLEXIBLE ENTITY: THE SOFTWARE- DEFINED DATA CENTRE

Study on XML-based Heterogeneous Agriculture Database Sharing Platform

Was ist dran an einer spezialisierten Data Warehousing platform?

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

A priority based dynamic bandwidth scheduling in SDN networks 1

Introduction to the Active Everywhere Database

Optimizing and Managing File Storage in Windows Environments

Lecture 9: MIMD Architecture

A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING

Global Journal of Engineering Science and Research Management

A Brief Discussion on the Research and Implementation of Enterprise-level Unstructured Data Management Platform

The Nasuni Security Model

Design and Implementation of Remote Push System of Resources Based on Internet

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

Process Solutions. Uniformance PHD. Product Information Note

OPENSTACK PRIVATE CLOUD WITH GITHUB

SDS: A Scalable Data Services System in Data Grid

Europeana DSI 2 Access to Digital Resources of European Heritage

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1

Designing Database Solutions for Microsoft SQL Server 2012

Dynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c

Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud

Unstructured Data Migration and Dump Technology of Large-scale Enterprises

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

Start Now with Information Governance

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Lecture 9: MIMD Architectures

The Construction of Open Source Cloud Storage System for Digital Resources

A New Model of Search Engine based on Cloud Computing

Market Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.

Research on Approach of Equipment Status and Operation Information Acquisition Based on Equipment Control Bus

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Lecture 9: MIMD Architectures

Research and Application of Mobile Geographic Information Service Technology Based on JSP Chengtong GUO1, a, Yan YAO1,b

Four Steps to Unleashing The Full Potential of Your Database

Construction of SSI Framework Based on MVC Software Design Model Yongchang Rena, Yongzhe Mab

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Research on the Application of Bank Transaction Data Stream Storage based on HBase Xiaoguo Wang*, Yuxiang Liu and Lin Zhang

High Availability Server for IIoT Applications

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs DOE Visiting Faculty Program Project Report

Cloud Services. Introduction

Sony Adopts Cisco Solution for Global IPv6 Project

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

Protecting Mission-Critical Application Environments The Top 5 Challenges and Solutions for Backup and Recovery

STATE OF MODERN APPLICATIONS IN THE CLOUD

GPU ACCELERATED DATABASE MANAGEMENT SYSTEMS

Industrial system integration experts with combined 100+ years of experience in software development, integration and large project execution

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

CA485 Ray Walshe Google File System

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS

Low Friction Data Warehousing WITH PERSPECTIVE ILM DATA GOVERNOR

Application of Redundant Backup Technology in Network Security

Developing Microsoft Azure Solutions (70-532) Syllabus

Decentralized Distributed Storage System for Big Data

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Computer Life (CPL) ISSN: Simulation and Implementation of Cloud Computing Based on CloudSim

Transcription:

2017 International Conference on Computing, Communications and Automation(I3CA 2017) New research on Key Technologies of unstructured data cloud storage Songqi Peng, Rengkui Liua, *, Futian Wang State Key Lab Of Rail Traffic Control & Safety,Beijing Jiaotong University, Beijing 100044, China a rkliu@bjtu.edu.cn * Corresponding author Keyword: Unstructured Data; Cloud Storage; Database Abstract: From the traditional to today's data network text files, pictures, mainstream audio and video, the Internet is gradually changing the data structure from unstructured data, which is unstructured data growth and a variety of network data storage has brought new challenges. In this paper, various solutions of massive non structured data storage problems, summarize the key problems to realize unified storage of unstructured data, the design and implementation of an unstructured data using the data storage function of unified batch processing framework, solve all kinds of problems of the non uniform node data processing. Introduction With the rapid development of Internet, the relationship between enterprises and the Internet is more and more close. Many information flows through the Internet, which makes the data on the Internet now reach an unpredictable level. Maintenance information needs a lot of manpower, technology and other valuable resources. These data are filled on the Internet, the vast majority of them have their own different formats of documents, pictures and videos and other unstructured data [1-2]. The of unstructured data is considered to be a major problem in today's Internet technology, because the past can effectively structure data tools and techniques for unstructured data and therefore not applicable. Many commercial applications have proved that the traditional relational database can manage structured data, but in recent years many rely on unstructured data network applications, network media development spawned in non relational database structure, exposed more and more obvious limitations of the data, in particular the performance and reliability of the rapid expansion of the problem unstructured data show 3]. This paper studies the solution of various kinds of massive unstructured data storage, analyzes all the problems existing in the storage system, and summarizes the key issues to achieve the unified storage of unstructured data. Then, with a massive, heterogeneous and unstructured data association and other features for the storage problem, put forward through the unified storage platform to solve the metadata of unstructured data, unified data interface, consistency and key issues of heterogeneous storage and data availability, high integrated, and other types of storage facilities. And a mixture of various types of data storage problem selection mechanism to effectively through heterogeneous storage devices. At the same time, based on the unified storage platform, an unstructured data batch framework with unified data storage function is designed and implemented, which solves the problem of unified processing of heterogeneous data types. Cloud storage technology Cloud storage is mainly used to store large amounts of data to actively solve problems. It can not only provide specialized storage solutions, but also publish storage business separately. Cloud storage is an application model based on Web, which has the characteristics of low cost and extensibility. It is a service concept, not real memory nor specific device. Using connectivity to the Internet, users enjoy the ability to share the storage pool with shared cloud storage. Users do not Copyright (2017) Francis Academic Press, UK 108

need to know the contents of the system, do not need to know how to store, it is transparent to all equipment users, at any time and space authorized users can use the network connection to use cloud storage, cloud services 4-5. The cloud storage data architecture model shown in Figure 1. Figure 1. The architecture of cloud data storage service. With the rapid development of modern network information technology, the information data has increased exponentially, formed in the era of big data, user generated data stored in the user data stored in the cloud environment has put forward higher requirements that need to be addressed: (1) efficient mass data storage and access requirements, users appear to hundreds of millions of monthly the dynamic query SQL data record, billions of dollars in relational database is inefficient, in the era of big data, the urgent need to solve the problem of data storage and efficient access to large amounts of data; positive development; (2) high concurrent read and write the database, the Internet network, the key to the user as the center, according to the personalized information the user needs to generate dynamic pages and information, such as the current micro Bo, this form of high concurrent access load data, usually form each Tens of thousands of seconds to read and write requirements; (3) high availability and scalability of the database requirements, system structure based on Web, it is very difficult to extend the database, when the user access to the database server is increasing rapidly, not simply the use of hardware and service node scalability and load balancing. Provide maintenance, upgrade and migration form stop uninterrupted data for web service requirements, will reduce the user experience: C4) support for unstructured data processing needs of the port, the relational database greatly limits the data processing and data types, various types of data can not be achieved in the future in the user requirements. Unstructured data cloud storage hierarchy The storage and use of unstructured data is very common. Many systems have to upload attachments, pictures, press releases and document functions. At present, however, most implementations are stored by creating a writable directory on the server. Unstructured data is often larger, requiring more bandwidth and a certain computing power of the server, which has some impact on some of the server's high performance requirements. Server cluster synchronization. As applications require large-scale cluster support, the traditional approach will face more challenges. In order to synchronize data between nodes in each server, we need some similar network storage techniques to solve them. Many servers are uploaded to the server side of the Troy Trojan program invasion, most of these implementations because of vulnerability generated by uploading files. The storage requirements of the traditional file system for unstructured data must be the directory of the filesystem, which is 6-7 writable. Cloud storage is not required, and similar functions can be implemented using other methods, but technically advanced cloud storage has some advantages. Cloud storage is stored and read in the form of object storage, which is responsible for the actual content of the document. The high scalability, massive, high reliability and repeated file merging of cloud storage will help to improve the quality of storage service. The unstructured data cloud storage architecture is based on this design, as shown in Figure 2. 109

Unstructured data Video monitoring and data storage Multi-user online storage service Enterprise data services Software download service Unstructured data access cluster Application layer User Access authorization Security policy Session layer BLOB data Meta data Data layer Routing layer Relational database storage cluster Physical layer Figure 2. Cloud storage tier architecture of unstructured data The application layer provides unstructured data application interface, the interface is composed of various types of data storage service providers to develop the storage applications, such as online storage, network drives, video, data hosting and software download service. At this point, users face a virtual, unlimited capacity cloud storage space without taking into account the physical location of the storage space and data when the user submits the data. The session layer is responsible for user, authority allocation, spatial allocation, and storage security policies. The layer relies on the security level and develops different security programs to ensure data security. The role of data layer is the unified of unstructured data and metadata. Unstructured data ranges from MB to GB, and size and metadata information, such as data identifiers, file length, type and other attribute information, the total length of not more than 1 KB, the difference in the amount of data between the two. Therefore, different data and metadata need to be stored on the network bandwidth, and computing resources should be used for different types of data storage strategies. Thus, figure 1 will be broken down into a data layer, a service data store, and a metadata store. The routing layer is responsible for cloud nodes, interoperability and storage path access interfaces and back-end storage device computing. The physical layer of unstructured data storage provides storage space and computing resources, and is responsible for maintaining physical path storage nodes. The purpose of this system is to make full use of the existing communication subnet and equipment without adding hardware input. Unstructured data cloud storage system structure design In order to realize the effective of unstructured data, many companies and individuals at home and abroad have done a lot of research. The most important is divided into two categories: one is to convert the unstructured data based on the technology of semi structured data; the other is the conversion of unstructured data to structured data, the final data will be stored in a relational database. Unstructured to structured data conversion mostly adopts unstructured data, structured data and semi-structured data. Therefore, through the storage and of relational database, the data structure is obtained. According to the requirements of the project, "structured data, unstructured data, semistructured data conversion method and gradually enlarge the application on the basis of the data structure of file metadata extraction concept structure standard to realize the conversion function of the template file name save the converted files, create a document template, unstructured data file table and structured data association, as shown in Figure 3 shows. The system is composed of database, file system, template library, file format definition module, metadata extraction module, template creation and module, middle module, data representation and data conversion module. The whole 110

system is divided into three layers: interface application layer, application logic layer and data storage layer.. File system (client) FTP server Application layer interface Local host Access the file File format definition Metadata extraction Templates create and manage Save template Application logic layer Intermediate data representation Template library XML data conversion Template information Create table structure Document template table Document the results table Data storage layer Figure 3. Unstructured data cloud storage system structure The application layer interface provides the user interface graphical data interface of the application program, the user can use the structured unstructured data conversion operation, regardless of the specific data conversion. The program logic layer is composed of five functional modules of the system structure, with emphasis on the implementation of the business logic structure to the unstructured data conversion system. The application layer interface client file system in the acquisition of analog output file, make a request for data conversion, then the application receives a request sent by the client, will need to convert the file transfer to the data conversion module. After the module receives the file, it determines which program to convert according to the type of file. Then, the work of the five functional modules, file metadata extraction, set up the appropriate document template, and then realize unstructured to semi-structured data conversion, the processed data is written in the database table simulation results. Then, the application results are converted back to the user, and the user is prompted for the next data conversion, and finally the entire process of data conversion is completed. The data storage layer collects the database tables used by the system, such as document templates, document association tables, simulation results tables, and so on. The document template needs to create a document association table before the system runs. The data simulation table is the unstructured file data after the structured data is converted. When the data conversion is complete, the system will associate the relevant information to the file table. Conclusion Non structured data analysis based on the rapid growth trend on the Internet, introduces the solutions proposed by researchers at home and abroad, and non structured data storage, storage of these solutions can solve the massive unstructured data, and to ensure that the expansion of the system. However, different data types of unstructured data and different data have different storage characteristics, and how to store these different types of unstructured data in a unified way becomes an urgent problem. This paper presents a unified unstructured data storage platform, unstructured data storage interface provides a unified model, combined with the underlying implementation of different types of unstructured data in heterogeneous storage, and in this heterogeneous storage infrastructure, to ensure the consistency of the data and the use of high. On this basis, combining a number of unstructured data structures on the storage platform, a large number of unstructured data resources 111

and storage resources can be fully integrated in the processing process to achieve efficient data processing. Acknowledgements Fund Project: National Natural Science Foundation of China (51578057); State Key Laboratory of Rail Traffic Control and Safety (Beijing Jiaotong University) Independent research project (RCS2016ZT007). References [1] Nicolae B. High throughput data-compression for cloud storage[m]//data Management in Grid and Peer-to-Peer Systems. Springer Berlin Heidelberg, 2010: 1-12. [2] Calder B, Wang J, Ogus A, et al. Windows Azure Storage: a highly available cloud storage service with strong consistency[c]//proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011: 143-157. [3] Prahlad A, Muller M S, Kottomtharayil R, et al. Performing data storage operations with a cloud storage environment, including automatically selecting among multiple cloud storage sites: U.S. Patent Application 12/751,651[P]. 2010-3-31. [4] Zhang D W, Sun F Q, Cheng X, et al. Research on hadoop-based enterprise file cloud storage system[c]//awareness Science and Technology (icast), 2011 3rd International Conference on. IEEE, 2011: 434-437. [5] Wang Q, Wang C, Ren K, et al. Enabling public auditability and data dynamics for storage security in cloud computing[j]. Parallel and Distributed Systems, IEEE Transactions on, 2011, 22(5): 847-859. [6] Wang C, Ren K, Lou W, et al. Toward publicly auditable secure cloud data storage services[j]. Network, IEEE, 2010, 24(4): 19-24. [7] Lin H Y, Tzeng W G. A secure erasure code-based cloud storage system with secure data forwarding[j]. Parallel and Distributed Systems, IEEE Transactions on, 2012, 23(6): 995-1003. 112