Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Similar documents
Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

A Data Classification Algorithm of Internet of Things Based on Neural Network

Review on Data Mining Techniques for Intrusion Detection System

The k-means Algorithm and Genetic Algorithm

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Optimization of Association Rule Mining through Genetic Algorithm

Design and Realization of Data Mining System based on Web HE Defu1, a

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Basic Data Mining Technique

A Hybrid Genetic Algorithm for the Distributed Permutation Flowshop Scheduling Problem Yan Li 1, a*, Zhigang Chen 2, b

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

Classification Using Unstructured Rules and Ant Colony Optimization

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Program demonstrating Gini Index Classification

Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ET-based Test Data Generation for Multiple-path Testing

ECLT 5810 Evaluation of Classification Quality

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Dynamic Clustering of Data with Modified K-Means Algorithm

Clustering Analysis based on Data Mining Applications Xuedong Fan

Study on the Application Analysis and Future Development of Data Mining Technology

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

A Kind of Wireless Sensor Network Coverage Optimization Algorithm Based on Genetic PSO

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Fast Efficient Clustering Algorithm for Balanced Data

Categorization of Sequential Data using Associative Classifiers

Feature Selection Based on Relative Attribute Dependency: An Experimental Study

An Empirical Study on feature selection for Data Classification

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

Attribute Selection with a Multiobjective Genetic Algorithm

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

An Improved Apriori Algorithm for Association Rules

Research on Design and Application of Computer Database Quality Evaluation Model

The Gene Modular Detection of Random Boolean Networks by Dynamic Characteristics Analysis

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Multisource Remote Sensing Data Mining System Construction in Cloud Computing Environment Dong YinDi 1, Liu ChengJun 1

Analyzing Outlier Detection Techniques with Hybrid Method

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

Research on friction parameter identification under the influence of vibration and collision

Bagging-Based Logistic Regression With Spark: A Medical Data Mining Method

Extracting Multi-Knowledge from fmri Data through Swarm-based Rough Set Reduction

Using Decision Boundary to Analyze Classifiers

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

Introduction to Data Mining

Cross Reference Strategies for Cooperative Modalities

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Multi-label classification using rule-based classifier systems


International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015)

CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Research on Data Mining Technology Based on Business Intelligence. Yang WANG

BP Neural Network Based On Genetic Algorithm Applied In Text Classification

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

Evolving SQL Queries for Data Mining

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

A study of classification algorithms using Rapidminer

Creating Time-Varying Fuzzy Control Rules Based on Data Mining

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

CS145: INTRODUCTION TO DATA MINING

Hybrid AFS Algorithm and k-nn Classification for Detection of Diseases

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

IJMIE Volume 2, Issue 9 ISSN:

Information Security Coding Rule Based on Neural Network and Greedy Algorithm and Application in Network Alarm Detection

Iteration Reduction K Means Clustering Algorithm

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

Improving Classifier Performance by Imputing Missing Values using Discretization Method

Optimization of Surface Roughness in End Milling of Medium Carbon Steel by Coupled Statistical Approach with Genetic Algorithm

Genetic Fourier Descriptor for the Detection of Rotational Symmetry

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

Transcription:

International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department, Guangdong AIB Polytechnic College, Guangzhou 510507, China a yangxiuping@126.com Keywords: Data Mining; Electronic Commerce; Logistic Regression; Service Evaluation Abstract. With the fast development in the subject of computer science and technology, the combination of machine learning algorithms and electronic business is needed. There may be some unpredictable but frequent problems such as delay in shipment, shipping errors caused by E-commerce participants low efficiency. There are problems will have a negative impact on enterprises participants end up. The efficiency of e-commerce is an important way for a proper evaluation of improving management. In this paper, we propose the theory of knowledge mining based on Rough Set Theory to handle the vague and inaccurate information about the evaluation of supplier and mine the law knowledge that exists between input variables and adverse position. The RST output is then used as the feature and sent to the Logistic regression (LR) to the electronic commerce website product grade. The proposed method, called RST-LR, the discretization process by the attribute values; the minimum attribute set filtering; evaluation criteria; the establishment of calculation accuracy of ranking and assessment system. We simulate and experiment the algorithm and illustrate the accuracy. Introduction According to the fact that wide application of computer networks and the rapid development of the Internet, therefore, more and more people access the internet and the great improvement of internet transaction security technology, E-commerce has gradually been accepted by the public and it has become a mainstream business model [1]. At the same time, through business to business applications (business to business) and (business to consumer) business to consumer electronic commerce practice, many companies realize their dreams of opening an online store, paid for by real time online, supply chain management (SCM) and other mechanisms. They can manage their logistics and capital efficiency, and to provide a safe and substantive online transaction environment, so as to attract more people to shop online. The global e-commerce online shopping market is increasing year by year. Data mining is an interdisciplinary topic. According to the different requirements of specific tasks, we can use statistical methods, neural networks, online analytical processing, genetic algorithms and decision trees, rough sets and so on. (1) Statistical analysis methods. Statistical analysis methods mainly refer to the relevant statistical methods for data correlation analysis, regression analysis, cluster analysis and principal component analysis. (2) Neural network based methods. Neural network is similar to the data processing method which caused by simulating human brain neurons process information, which is more common Back Propagation (BP) neural network. (3) Online processing method. Online processing and traditional online transaction processing (OLTP) is different, is mainly used to deal with affairs, such as the civil aviation calibration system, bank storage systems, etc. (4) Genetic algorithm (GA) is a kind of evolutionary algorithm, its basic principle is evolution simulation of biological, coding parameter problem, mark them as chromosome, and then use an iterative method to select, crossover and mutation, and ultimately meet the optimization objectives have chromosome. (5) Decision tree algorithm is a kind of inductive learning methods. The main purpose of this paper is for the service quality and efficiency evaluation of E-commerce, and we introduce RST theory in the course. The effective use of mining knowledge rule set and Logistic regression, we can provide appropriate counseling service quality and efficiency, the ultimate ecommerce. In the experiment, there are many procedures as follows: data collection, data 2015. The authors - Published by Atlantis Press 1370

preprocessing, discrimination, attribute reduction, reduction filtering, rule generation, rule filtering, identification prediction, accuracy calculation and diagnostic system. And the results show that web data mining can greatly improve the service quality and efficiency when used in r-commerce. Our Proposed Methodology General Structure of Proposed Algorithm. The main purpose of this paper is to evaluate the service quality and the service efficiency of E-commerce, and we introduce RST theory in the course. In the folowing we use flowcgat to discribe the procesure[3]. Fig. 1 The General Web Mining Fig. 2 The Proposed Methodology Data Collection and Pre-processing. UCI database is maintained by the University of California at Irvine California Irvine, which are widely used in machine learning Our Proposed Model and Analysis. The database is currently a total of 264datasets, and the number is still growing. UCI data set is used standard test data set. This is entirely the design standards of international famous. In addition, combining with the actual situation China financial market, puts forward the empirical study, the completion of the entire design guide. In this paper, we use the Amazon business review data set UCI as the main data resource. The data set is used for on-line write-spring is in pattern recognition field a new author identification. The characteristics of the dataset are plural. The data sets are from the Amazon e-commerce website customer identification comments. Previous studies conducted experiment in two to ten author identification parts. But in the online context, reviews to be identified usually have more potential authors, usually recognition algorithm is not suitable for the target class a lot. Robust detection and recognition algorithm, we identified the 50 most active user (through a unique ID and username donation) posting comments often in these news group. We collected 30 each author's comment number. This paper used the database and by using the Rough Set it can deal with the vague and imprecise information during the process of evaluation of electricity. Rough Set for Feature Representation. When the rough set approach deals with data, it defines the table of information as below where is for search set. For example:, is the attribute set. The information function expression is. It also can be described as. Let, and is the subset [2]. So the relationship without distinction must meet the following formula: (1) Based on the above definition, the lower limit can be defined as. The critical value is described as: 1371

PNX = PX PX down (2) There are four comments relating to this formula. If X meets the following PX down ϕ, X can be considered as RST element in the data set. The calculation of identification error rate is: card ( PX down ) µ p( X) =, 0 µ p( X) 1 card PX (3) ( ) Experiment Analysis and Simulation This section will empirically validate our proposed RST-LR based on rough set theory (RST) and logistic regression (LR) for product ranking. The experiment procedures are as follows. First, collect data according to experiment design; second, extract feature from the processed data; third, model training and test. The experimental steps are shown in Figure 3. Pre-Processing Feature Extraction Select the Cost Function Recognize Sample Model Training Computing Fig. 3 The Flowchart of the Proposed Method Experimental Dataset. The Amazon Commerce reviews data used in our experiments was collected by National Engineering -Center for E-Learning. The data sets are from the Amazon e-commerce site identification customer reviews. It is used to identify the author reviews from amazon. Most previous works carried out two to ten author identification experiment. But in the network environment, there are usually more potential authors to be recognized, and the recognition method is usually is not suitable for the target class of a lot of. The data set is from customer reviews on Amazon e-commerce site identification. The data set includes 10000 properties of 1500 samples, used to identify the author reviews from amazon. Through it, we divide the data set into two parts, 1000 samples of the training data set, the prediction data of 500 sample sets. The Evaluation Criterion. To validate the advantages of the proposed approach, we employ the identification accuracy as the evaluation criterion which can be defined as follows, where TP indicates the true positive; TN indicates the true negative; FP denotes false positive; FN represents false negative. TN + TP Accuracy = (4) TN + TP + FN + FP The General Result. In the first experiment, we take advantage of the RSTLR approach for product ranking, and use the Amazon Commerce reviews collected from Amazon Commerce Website for authorship identification as the samples to run experiment. In this experiment the data set was collected by the national network of Engineering Research Center of distillate. The data set was collected from customer reviews on Amazon business identification. It uses the accuracy and precision as the evaluation standard to verify the product ranking RST-LR effect. In the experimental process uses standard methods to determine the parameters of RST-LR. Then use the RST-LR operation of the trained recognition. In the experiment, the parameters of RST-LR are configured as the default. In the experiment, it targets to validate the advantage and robustness of RST-LR method for product ranking, in comparison with others. The Amazon Commerce reviews is collected from Amazon Commerce Website for authorship identification and is randomly partition to training set 1372

and test set. The dataset are collected from the customer s reviews in Amazon Commerce Website for authorship identification.. We take advantage of accuracy and precision as the assessment criterion for assessment. As show in previous section of this paper, the parameters of RST-LR is solve using the standard algorithm. The learn RST-LR to is then employ to run identification. The test is performed for multiple rounds, with default set. The reasons for these results are mainly three aspects. (1) The RST-LR method can be applied to the conditions that sample data is large scale, complex dimension, containing a large number of heterogeneous information. (2) The learning method is according to the data distribution of the input data to select the model parameters of the RSTLR, which makes the RST-LR having better adaptability. (3) The framework of the proposed algorithm is composed of some comprehensive procedures which sequentially maximizes the identification ability. The result is shown in the figure4. Fig. 4 The Experimental Illustration Summary The quality of the E-commerce service efficiency has relationship with the survival of E-commerce, so this paper is about RST based E-commerce service efficiency evaluation system. But in the application of the model, there are some aspects should be considered. There are several simplifying attribute methods. Here, we adopt the latter genetic algorithm, the design principles for the fitness function of the genetic algorithm is the rate of the cases included in the attribute set. The answer got from the simplified attribute is not unique; we can obtain many minimal set of attributes. But which set should be adopted, should not only compare the accuracy of the identification results but also refer to the advice of experts in the field. We can also filter rules if it is necessary. The principles whose ability of predicting is less should be filtered, and the others are kept. In the future, we plan to use deep learning algorithms to optimize the proposed method. For example, Wang [4] use the deep learning method to classify the data. References [1] Chen, F. -L. and Liu, S. -F., A Neural-Network Approach To Recognize Defect Spatial Pattern In Semiconductor Fabrication, IEEE transactions on semiconductor manufacturing, Vol. 13, No. 3, pp. 366-373, 2010. [2] Han, J, and Kamber, M., Data Mining: Concepts and Techniques, San Francisco, CA: Morgan Kaufmann Publishers, 2001. [3] Johnson, D. S., Approximation Algorithrns for Combinatorial Problerns." Journal of Computer and System Sciences, Vol. 9, pp. 258-278, 1974. 1373

[4] Haoxiang Wang. "CLASSIFYING GRAY-SCALE SAR IMAGES: A DEEP LEARNING APPROACH", Machine Learning and Applications: An International Journal (MLAIJ) Vol.1, No.1, September 2014. 1374