Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Similar documents
RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Data Mining in the Application of E-Commerce Website

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

1. Inroduction to Data Mininig

Research on Data Mining and Statistical Analysis Xiaoyao Lu1, a

Introduction to Data Mining

Web Data mining-a Research area in Web usage mining

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Chapter 1, Introduction

COMP 465 Special Topics: Data Mining

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

Design and Realization of Data Mining System based on Web HE Defu1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

An Indian Journal FULL PAPER. Trade Science Inc. Research on data mining clustering algorithm in cloud computing environments ABSTRACT KEYWORDS

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

SK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Clustering Analysis based on Data Mining Applications Xuedong Fan

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

An Improved Apriori Algorithm for Association Rules

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm

A Data Classification Algorithm of Internet of Things Based on Neural Network

Table Of Contents: xix Foreword to Second Edition

SCHEME OF COURSE WORK. Data Warehousing and Data mining

Information mining and information retrieval : methods and applications

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

Study on the Application Analysis and Future Development of Data Mining Technology

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Knowledge Discovery and Data Mining

Web Usage Mining: A Research Area in Web Mining

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Remotely Sensed Image Processing Service Automatic Composition

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

Data Mining: An experimental approach with WEKA on UCI Dataset

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

Contents. Foreword to Second Edition. Acknowledgments About the Authors

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Performance Analysis of Data Mining Classification Techniques

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

A Survey On Different Text Clustering Techniques For Patent Analysis

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Code No: R Set No. 1

Correlation Based Feature Selection with Irrelevant Feature Removal

Data Mining Classification through Simulation Technique

DATA MINING AND WAREHOUSING

In-Memory Analytics with EXASOL and KNIME //

CLASSIFICATION BASED HYBRID APPROACH FOR DETECTION OF LUNG CANCER

Overview of Web Mining Techniques and its Application towards Web

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Analysis of Fragment Mining on Indian Financial Market

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Bayesian Learning Networks Approach to Cybercrime Detection

Enhanced Bug Detection by Data Mining Techniques

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

Research on the New Image De-Noising Methodology Based on Neural Network and HMM-Hidden Markov Models

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Cse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University

Data warehouse and Data Mining

Question Bank. 4) It is the source of information later delivered to data marts.

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

D B M G Data Base and Data Mining Group of Politecnico di Torino

SERVICE RECOMMENDATION ON WIKI-WS PLATFORM

Fault Identification from Web Log Files by Pattern Discovery

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Full file at

Data- and Rule-Based Integrated Mechanism for Job Shop Scheduling

A SURVEY OF IMAGE MINING TECHNIQUES AND APPLICATIONS

Record Linkage using Probabilistic Methods and Data Mining Techniques

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

International Journal of Scientific & Engineering Research Volume 8, Issue 5, May ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

Data Mining Course Overview

Research and application on the Data mining technology in medical information systems. Jinhai Zhang

Multi-dimensional database design and implementation of dam safety monitoring system

Transcription:

, pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai University, Henan, China Corresponding E-mail:Flyequn@163.com Abstract:Originated from Bayesian statistics, Bayesian network,with such characteristics as its unique expression form of uncertainty knowledge, rich probabilistic expression abilities, and the incremental learning method for comprehensive priori knowledge,indicates the probability distributions and causal relations of objects, becoming one of the most striking focus among numerous current data mining methods. Taking a brief introduction of data mining for the point-cut of the study, and combining an explanation for the data mining process and an analysis of Bayesian Network, the paper investigates the implementation of Bayesian network applied in in learning. Keywords: Bayesian network; Learning method; Data mining 1 Introduction Data mining is a process as well as a knowledge discovery in database, namely, it is from the large, incomplete, noisy, fuzzy and random data to extract implicit information that people do not know in advance, but it is potentially useful information and it is a non- trivial process for knowledge. Data mining is originated from many disciplines, including database, artificial intelligence, statistics, machine learning, etc. Among them, the most important three fields are database, machine learning and statistics. These different historical influences made the different scholars hold different views on the function of data mining. 1.1 Introduction of Data Mining With the development of global informatization, automatic data acquisition tools and mature database technologies have led to massive data stored in databases. It is very important to extract reliable, novel, and effective knowledge from massive data which can also be understood by people, hence data mining has caused great concerns to information industry. Its extensive application fields involve agriculture, learning diagnostics, business management, product control, market analysis, engineering design, scientific research, and so on. Data mining is related to many disciplines and methods, therefore, there are data a variety of classification methods for data mining. According to the task of data mining, ISSN: 2287-1233 ASTL Copyright 2016 SERSC

it can be divided into classification or warning model discovery, data summarization, clustering, association rules discovery, sequence pattern discovery and dependency relation or the dependent model discovery, exception discovery and trend discovery, etc; according to the object of data mining, it can be including the relational database, object-oriented database, spatial database, temporal database, text database, multimedia database heterogeneous database, heritage database and Web, etc; according to the method, it can be divided into the machine learning method, statistical method, neural network method and database method. While machine learning methods can be divided into inductive learning methods (decision trees, induction of rules, etc.) based on the case study, active learning, genetic algorithms, etc. Statistical analysis methods can be divided into regression (multivariate regression and autoregressive regression, etc.), discriminant analysis (Bayesian discriminating, Fischer discriminant, nonparametric discriminant, etc.), cluster analysis (hierarchical clustering, clustering segmentation, etc), exploratory analysis (principal component analysis, correlation analysis, etc.) and so on. The artificial neural network method can be divided into feedforward neural networks (BP algorithm), self-organizing neural network (self-organizing feature map, competitive learning, etc.) The database method mainly includes the multidimensional data analysis, attribute-oriented induction method, etc. 1.2 Data Mining Process Data mining is a process of mining interesting knowledge from among mass data stored in databases, data warehouses or other information bases. It is a definition by JiaweiHan in his work, Data Structures Concepts and Technologies. Data Mining also refers to the process of discovering knowledge from mass data, as shown in Figure 1, which represents well the process of knowledge discovery. Fig.1. Flow Chart of Data Mining 68 Copyright 2016 SERSC

1.3 The Determination of Business Logics It is an important step for Data Mining to clearly define its business problems and determine the purposes of data mining. It has a blindness and will not be successful to mine data simply for the purpose of data mining itself. Data Preparation The prelearning of data, eliminating inconsistent and noisy data, and combining together the data from different data sources. Choice of Data, searching for all internal and external data information relating to business objects, among which the data suitable for the application of data mining are chosen. Data Transformation, aiming at certain methods for mining algorithm, and transforming data into forms suitable for mining. Data Mining Using intelligent methods, we mine the acquired and transformed data. In addition to choosing the right mining algorithm, all the rest of the work can be completed automatically. Knowledge Assessment. Interpreting and evaluating the results. The adopted analytical methods are generally determined by data mining operations, and visualization techniques are often used. Knowledge Representation. The knowledge obtained through an analysis will be provided to the users, or integrated into the organizational structures of business information systems. 1.4 Classification and Forecasting Based on known training sets, Classification is used to find out models or functions which describe and distinguish the data classes or concepts, and to accurately classify each group and entities according to the classified information, so as to forecast object classes with unknown signs by using models. While in Forecasting, the forecasted values are numerical data. 1.5 Cluster Analysis Cluster analysis aims at a collection of data objects, aggregates entities with the same characteristics to become one category, enables data objects in the same category to share as many similarities as possible, and uses certain rules to describe the common properties of the category, whereas there are large differences among objects in different categories. Clustering, in essence,is an unsupervised learning method, the purpose of which is to find out similarities and differences in data sets and to aggregate the data objects sharing common characteristics into the same category, the characteristics of each cluster can usually be analyzed and explained. Copyright 2016 SERSC 69

1.6 Verifying Mining Model Verification is the process of assessing the implementation of mining model over the real data. Before setting the mining model in the production environment, it must verify it through the understanding of its quality and characteristics. It can use a variety of methods to assess the quality and feature of data mining model. First of all, it can use various statistical validity information to determine whether there are problems in data or models. Secondly, the data can be divided into the set of training and set of test so as to test the accuracy of prediction. Finally, it can ask the business experts to check the result of the data mining model to determine whether the pattern of findings in the target business scheme is meaningful.all of these methods in data mining methods are very useful in creating, testing and optimizing models to solve specific problems which can be used repeatedly. While Apriori algorithm is used for the Boolean data, so the historical evaluation data need to be converted. The reflection of food category is S, wherein x represents the number of the category, y represents the number of sub-class. The reflection of time and quarter is J 1 -J 4. The reflection of month is M 1 -M 12, the enterprise risk level is set Q L,Q M,Q S respectively as high risk, medium risk, low risk, for example, the official risk is set as G L, G M, G S, the trade risk is set as T M, T L, T S, the result of the risk assessment is set as P L, P M, P S,which is encoded as follows in table 1: Table 1. Data after being encoded S 01,S 015,Q S,G S,T S,J 3,M 9,P S S 02,S 021,Q M,G M,T L,J 2,M 4,P M S 02,S 022,Q M,G S,T L,J 2,M 4,P M S 01,S 15,Q L,G M,T S,J 3,M 8,P M S 02,S 012,Q M,G M,T M,J 2,M 5,P M S 02,S 021,Q M,G L,T M,J 2,M 6,P M S 02,S 022,Q S,G M,T M,J 1,M 1,P S S 01,S 15,Q L,G L,T M,J 2,M 5,P L 2 The Implementation of Learning Based on Bayesian Network A Bayesian network constructed according to users priori knowledge is called a priori Bayesian network, and a Bayesian network obtained by the combination of priori Bayesian networks and data is called a posteriori Bayesian networks, the process of obtaining posteriori Bayesian networks from priori Bayesian networks is known as Bayesian network learning. Bayesian network can keep learning, the posteriori Bayesian network obtained by the last learning can become the prior Bayesian network for next learning. Before each learning, users can make adjustments to the prior Bayes networks, enabling new Bayesian networks to be able to better reflect the knowledge contained in the data, as shown in Figure 2. 70 Copyright 2016 SERSC

Fig. 2. Continuous Learning Graph of a Bayesian Network 3 Conclusion Therefore, the exported learning in the actual application should be combined with professional knowledge to analyze the rules produced by data mining, at the same time, these rules also provide some clues and inspiration for the professional research. References 1. Charniak, E.: Bayesian Networks without Tears.AI Magazine. vol.12, 1991. pp. 50-63. 2. Pearl, J.: Graphical Models for Probabilistic and Causal Reasoning. The Computer Science and Engineering Hand-book. Kluwer Academic Publishers. 1997. pp. 697-714. 3. Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learning. vol.9, 1992. pp. 309-347. 4. Sewell W., Shah, V.: Social class, parental encouragement and educational asp irations[j].american Journal of Sociology. 1968. vol.73, pp559-572. 5. Spirtes, P., Glymour, C., Scheines, R.: Causation, Predication and Search. New York:Springer-Verlag. 1993. pp. 25-29. Copyright 2016 SERSC 71