Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Similar documents
An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS

Data Mining in the Application of E-Commerce Website

Introduction to Data Mining

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Study on secure data storage based on cloud computing ABSTRACT KEYWORDS

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Clustering Analysis based on Data Mining Applications Xuedong Fan

Construction of the Library Management System Based on Data Warehouse and OLAP Maoli Xu 1, a, Xiuying Li 2,b

Study on the Application Analysis and Future Development of Data Mining Technology

Analysis on the technology improvement of the library network information retrieval efficiency

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing

Knowledge Discovery and Data Mining

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

A Data Classification Algorithm of Internet of Things Based on Neural Network

The application of OLAP and Data mining technology in the analysis of. book lending

Data Mining Course Overview

Multisource Remote Sensing Data Mining System Construction in Cloud Computing Environment Dong YinDi 1, Liu ChengJun 1

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Improvement and Simulation of Data Mining Algorithm Based on Association Rules

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Research on Technologies in Smart Substation

Information Push Service of University Library in Network and Information Age

Web Data mining-a Research area in Web usage mining

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Graph Structure Over Time

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Comparison of FP tree and Apriori Algorithm

Design and Realization of Data Mining System based on Web HE Defu1, a

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

A Customer Segmentation Mining System on the Web Platform

Data Mining and Warehousing

1. Inroduction to Data Mininig

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Clustering Algorithms In Data Mining

Data Mining and Business Process Management of Apriori Algorithm

Correlation Based Feature Selection with Irrelevant Feature Removal

The Design and Application of GIS Mathematical Model Database System with Meta-algorithm Li-Zhijiang

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

DATA MINING II - 1DL460

New research on Key Technologies of unstructured data cloud storage

Improved Frequent Pattern Mining Algorithm with Indexing

Written Exam Data Warehousing and Data Mining course code:

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Research on Design and Application of Computer Database Quality Evaluation Model

Research on Design Information Management System for Leather Goods

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Knowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:

APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE

Customer Clustering using RFM analysis

Ancient Kiln Landscape Evolution Based on Particle Swarm Optimization and Cellular Automata Model

Tag Based Image Search by Social Re-ranking

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc. Research on motion tracking and detection of computer vision ABSTRACT KEYWORDS

Exploration of Fault Diagnosis Technology for Air Compressor Based on Internet of Things

Supervised and Unsupervised Learning (II)

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Research on Approach of Equipment Status and Operation Information Acquisition Based on Equipment Control Bus

The Promotion Channel Investigation of BIM Technology Application

Study on Personalized Recommendation Model of Internet Advertisement

The research and design of user interface in parallel computer system

Research of the Optimization of a Data Mining Algorithm Based on an Embedded Data Mining System

Design of Coal Mine Power Supply Monitoring System

Research on An Electronic Map Retrieval Algorithm Based on Big Data

Chapter 4 Data Mining A Short Introduction

Research on Data Mining and Statistical Analysis Xiaoyao Lu1, a

Defining a Data Mining Task. CSE3212 Data Mining. What to be mined? Or the Approaches. Task-relevant Data. Estimation.

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

data-based banking customer analytics

Framework Research on Privacy Protection of PHR Owners in Medical Cloud System Based on Aggregation Key Encryption Algorithm

Design of Underground Current Detection Nodes Based on ZigBee

Network Traffic Classification Based on Deep Learning

A Network-Based Management Information System for Animal Husbandry in Farms

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Discovering Advertisement Links by Using URL Text

The Solutions to Some Key Problems of Solar Energy Output in the Belt and Road Yong-ping GAO 1,*, Li-li LIAO 2 and Yue-shun HE 3

A Novel Data Mining Platform Design with Dynamic Algorithm Base

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

Analyzing Working of FP-Growth Algorithm for Frequent Pattern Mining

Crop Production Management Information System Design and Implementation

Question Bank. 4) It is the source of information later delivered to data marts.

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Research on Social Relationship Network System based on MongoDB

D B M G Data Base and Data Mining Group of Politecnico di Torino

The discuss of the dead-lock issue in event-driven system

Landslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis.

Multi-dimensional database design and implementation of dam safety monitoring system

CE4031 and CZ4031 Database System Principles

Transcription:

[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction based on massive database Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace Engineering, Langfang, 065000, (CHINA) 2 Science Department, North China Institute of Aerospace Engineering, Langfang, 065000, (CHINA) ABSTRACT After years of rapid development, database and information technology have already evolved from the original file processing into a complicated and powerful database system. Nowadays, the primary task is how to further explore the database functions. This paper explores the data mining system construction based on a massive database. Constructing the data mining system can further explore the database functions based on a massive database, which is beneficial to the decision-makers to find useful data fast and accurately. So they can make the most reasonable and effective decisions on the basis of these data. According to the authors experiences, this research provides a few constructing ways of the mining system, which can be shared to other peers. KEYWORDS A massive database; The data mining system; Construction. Trade Science Inc.

BTAIJ, 10(20) 2014 Yunfeng Zhang et al. 12527 INTRODUCTION Because of the construction of the database, it is possible to store and extract massive electronic information. But the digital information resource database is very huge. Therefore, one of the vitally important research topics in the field of a massive database is how to research and extract the useful information faster and better. The data mining technology means the unique means and processes to search the useful information in the database. It is essential for the users to master a way of data mining so that they can more rapidly and accurately get the information. Hence, this paper explores the data mining system construction based on a massive database. According to the authors experiences, this research provides a few constructing ways of the mining system, which can be shared to the other peers. THE DATA MINING TECHNOLOGY The content and essence of the data mining With the deepening of the research on the data mining technology, it is clearly recognized that there are three main technical supports for the research of the data mining, including database, artificial intelligence and mathematical statistics. The database In 1980s, the database was very popular. It was applied in all walks of life and it had become a trend and culture. On one hand, the culture of database spread fast, and the database as the foundation of the knowledge source was very solid. On the other hand, the starting point of knowledge acquisition was increased greatly owing to the fact that the database formalized and organized a special field. So the knowledge would be searched and extracted belonging to the database in the future. Therefore, under this circumstance, the database scientists have invested more energy to study the data mining. Artificial intelligence The experts in Artificial Intelligence have embarked on inference based on cases. In particular, the scientists in machine learning are not satisfied with the ivory tower which they constructed in small sample learning mode. They have paid attention to the big data sample which is partial, noisy, numerous, random and ambiguous in the reality. Gradually, they have also devoted themselves to the data mining research. Mathematical statistics Mathematical statistics is a very active and vital subject in applied mathematics. It was born earlier than the invention of the computer and there have been a history of hundreds of years. The foundation of modern information consulting industry is the powerful and effective mathematical statistics means and instruments. At the current information era, consulting industry has been well- developed, but the combination of the database and mathematical statistics has still been at the first stage and demanded a long time to develop. The knowledge can be discovered by the data mining as shown in TABLE 1. TABLE 1 : The knowledge types discovered by the data mining Types Generalization Knowledge Association Knowledge Deviation Knowledge Difference Knowledge Prediction Knowledge Characteristic Knowledge Definition The knowledge reflects the common characters of the same type stuff. The knowledge reflects the dependence and association between stuffs. The knowledge reveals the abnormal phenomenon of the stuff deviating from the norm. The knowledge reflects the attribute differences between different stuffs Predict the future data on the basis of the historical and current data. The knowledge reflects all aspects of characteristics of the stuffs. All knowledge can be found at different levels of conception. With the increase in the conception tree through microcosm, mesocosm and macrocosm, the demands to make decisions of different clients from different levels can be satisfied. For instance, in the database of a supermarket, a classic association rule can be easily concluded. The clients who buy bread and butter stand a good chance to buy milk too. Besides, there is a great possibility that the clients who buy food use the credit card. The rule concluded from the database is very helpful for the business to make the sales plan and the strategy. The procedures of the database mining and the composition of the mining system The database mining is the process to discover the knowledge. The specific procedures are shown in Figure 1. There are seven steps. (1) Data cleaning: clean noise or inconsistent data; (2) Data integration: combine data sources;

12528 Exploration on the data mining system construction based on massive database BTAIJ, 10(20) 2014 (3) Data selection: search and analyze the related data in the database; (4) Data conversion: convert or unify the data to be the form of a suitable for mining, for example, aggregate operation, summary and so on; (5) Data mining: the basic step, extract the data model by using the intelligent method; (6) Data evaluation: measure the data according to some interesting degree and recognize the really interesting model represented the knowledge; (7) Knowledge representation: provide the mined knowledge for the users by the means of knowledge representation and visualization technology. Figure 1 : The procedures of the data mining The database mining procedures can realize the interaction with the knowledge database or the users, providing the interesting models for the users and storing the new knowledge in the database. From this point of view, the data mining just one step in the whole process but the most important step. However, in the database industry and the database research community, the knowledge discovery in the database is a normal term in the data mining. Thus, the generalized point of the database mining is that data mining means the process to mine the interesting knowledge from the numerous data stored in the database. From this point of view, the database mining system includes the following parts, whose system construction is as shown in Figure 2. (1) Database: a database can restore information and it is possible to integrate or clean data in the database; (2) Data warehouse server: according to the users demands of the data mining, extract the related data in the data warehouse server; (3) Knowledge base: it is a kind of domain knowledge used to evaluate the interesting degree of the result pattern or guide the research. The concept of this kind of knowledge base is hierarchical, including the knowledge about the users confirmation; (4) Data mining engine: it is the essential part of the data mining, which is combined by a group of function module, and it is used to characterize, classify, convert and deviation analyze; (5) Data evaluation module: it is generally measured by the interesting degree, and it can interact with the data mining module, which makes it possible for the search to focus on the interesting model; (6) Graphical user interface: the communication between the users and the data mining system is realized in this module, which allows the system to interact with the users, assigns the data mining, provides information, helps search focus, and does the exploratory data mining.

BTAIJ, 10(20) 2014 Yunfeng Zhang et al. 12529 Figure 2 : The classic data mining system The functions of the data mining system Data cleaning and generalization The data mining system can generalize the existing data to a higher level. Applying the generalized integration algorithm of GDBR, the complexity of the space and time can be related conditionally. Then, the method of adopting N- Gram can search effectively and accurately for the repeated records which are similar in the system, and then sort and test them. Intelligent operations such as normative insertion, deletion, exchange and replacement can cope with the common spelling mistake, cleaning the data. But there are a few deviations in the precision detection by adopting the normal elimination basic algorithm, so this system improves the elimination basic algorithm, applying the principles of statistics reasonably and combining with the direct and converse repeat matrixes, which is able to increase the detection rate of spelling mistake and the correct rate of modification. The mining function of the data According to the related association rule and the time sequence rule, the system classifies and aggregates the data by the means of the data mining, achieving the expected application aim of the data mining system. Realize Apriori algorithm by searching and integrating the frequent item sets among the data. Then, form the association rule by the frequent item sets. The methods are as following. Suppose record the frequent item set as I, and record all the nonvoid subsets in I as a. If the value of support (I)/support (a) is greater than minconf, the rule a=> (1-a) will be output directly. If the nonvoid subsets in I do not correspond to the condition, the related rule will not be output. In other words, the association rule is not formed as a. Similar with the association rule, but the time sequence rule tends to the time association of the item sets in the system. The time sequence rule in this system is formed by AprionAII. In a broad sense, the association rule includes strong rule, exception rule and random rule. The rule a small amount of data obeyed is represented by exception rule. Although the number is small, its confidence is high. It is the rule formed for the unknown information at this stage and unpredictable information. The system setting of the minimum confidence is realized by the exception association rule. So the system can form CAR and ECAR and delete SCAR. For those well defined data and classified data, the data category which is representative can be formed, and the according classification standard of the data belonging to the unknown category can be formed, which is the classifier. In this system, applying the interval classifier can increase the level of the correct rate and classification accuracy and decrease the over deep tree extension of the decision tree classifier. Clustering algorithm means to merge high-density clusters and adopt CURE algorithm to mark different clusters by many representative points. So a certain cluster distributed framework is formed. Then recognize effectively the special shape, which enlarges the data processing and enhances processing capacity. The main clustering means which this system adopts is the hierarchical cluster procedure. Before utilizing the means, the data mining system will divide space distribution automatically, which makes the information object form many data units. Then according to the characteristic of the unit, computer the clusters distribution. Another unique clustering algorithm is the density-based clustering algorithm. Through improving Dbscan algorithm, the data division can be realized by the small division cluster, and the acceleration of the

12530 Exploration on the data mining system construction based on massive database BTAIJ, 10(20) 2014 algorithm speed is realized by selecting the representative expanding seed points of the neighboring objects, and the whole database clustering is realized by the sample data clustering. It makes the clustering algorithm of the system more effective. THE CONSTRUCTION METHOD OF THE DATA MINING SYSTEM BASED ON A MASSIVE DATABASE The overall frame structure settings The system integrates all kinds of related modules closely to form the data structure being hierarchical, including the distinctive operation function of multiple outputs, multiple data sources and multiple parameters. As a result, the independence between each mining operation module can be realized, which results a more functional and more stable system. As an integrated system, there is a coordinating and unifying association among modules, which promotes to realize standard and systematic operations of the data parameters, mining results, and data sources applied by each module. On account of the data mining system based on a massive database, the data mining range of the system must be enlarged, realizing the mining objects not only existing in the database, but also in the according files. So the according file information processing method is provided by the system. To present more easily the mining results and realize the remote decision support analysis, the system also has the function of restoring the mining results automatically, expanding the application range. Because it is people who operate the computer, this system is equipped with good operation interface. It is very convenient for the users and decision makers to do the decision analysis and make a accurate decision. Module settings According to the above frame structure of this system, the following modules are set especially to realize the related functions of the data mining system. (1) The mining module can achieve the mining operation function, which mines different data in the database. Each mining module is independent. The database management module can control a single module. The storage module is the site of data source. Through the mining to read in the corresponding data in the mining base, the basis of data is provided for other modules. (2) The main function of the preprocess module is to filter, define and format the data source, further improving the operability and practicability of the whole system. The main sub-modules are data mapping, column mapping and type mapping. The data mapping is to map a source table to become ID type, and then form the relevant comparison table. Different data are mapped to form a unified module which is worth to mine. The column mapping is to extract the useful column from the data source, which is benefit to decrease the number of data and accelerate the computing speed. The type mapping is to convert the type of data source. The conversion is compulsive to unify the different types of data in the database, which is benefit for mining. (3) Storage module is to operate uniformly the data in the whole database. However, the external files must be imported first, and then are stored and controlled. ODBC technology is adopted at lower layer interface. Memory index and buffer function are utilized to accelerate the computing power of the system. The core module of the whole system is the mining management module. All kinds of information the users achieved from the database by the mining shall be stored in the mining base. The mining base is directly set in the system database, which is convenient for transferring and management. The mining base management includes all types of operations in the process of the data mining, data preparation and data storage. When the mining base stores the operation information, it is sequential so that it is conducive to the convenient operations. But the data mining operation is dependent in the whole process of mining. It needs a source, which is the operation result of the other data mining and results a new mining. Besides, the new result may be the data source of another mining process. Interface settings The main interface of this system is similar with Explorer interface, which is very user-friendly, beautiful and high operability. Different mining results are presented by different graphic technologies. The system application form presents the results of generalization and cleaning. Tree structure presents the decision tree. Two dimension and three dimension points are used to present the clustering structure. All types of rules and models are presented by text. CONCLUSION At present, there are a number of researches on constructing the data mining system based on a massive database. All these researches aim to mine the database deeply, so the decision makers can find the useful data resource fast and accurately from the numerous data, and make an effective decision on the basis of these useful data resource. This paper explores the data mining system construction based on a massive database. Firstly, it introduces the data mining technology, including the content and essence of the data mining, the database mining procedures and the composition of the mining system, and the functions of the data mining system. Then, it discusses the methods of constructing the database mining system, including the overall frame structure settings, module settings and interface settings. Although it is simple, it has its own characteristics. Because an increasing number of related data integration systems are issued and gained acceptance, the enterprises shall construct database mining system on the basis of their own features and demands so that provide better service for them and improve the application benefit and economic benefit.

BTAIJ, 10(20) 2014 Yunfeng Zhang et al. 12531 REFERENCES [1] Shi Yong; Construction Analysis of Knowledge Intelligent Query Data Mining System, Coal Technology, 7, (2012). [2] Tang Hui, Li Haichen, Wang Binbin; Framework of an Automated Data Mining System Based on Intelligent Agents, Journal of Library and Information Science in Agriculture, 12, (2010). [3] Chen Yufeng, Zhang Hongyan, Jing Song; Study on Rural Migrant Workers Employment Recommend System Based on Data Mining, Journal of Anhui Agriculture, 33 (2011).