BP Neural Network Based On Genetic Algorithm Applied In Text Classification

Similar documents
Aero-engine PID parameters Optimization based on Adaptive Genetic Algorithm. Yinling Wang, Huacong Li

Neural Network Weight Selection Using Genetic Algorithms

Face recognition based on improved BP neural network

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

A Data Classification Algorithm of Internet of Things Based on Neural Network

Study on GA-based matching method of railway vehicle wheels

The Prediction of Real estate Price Index based on Improved Neural Network Algorithm

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

The Application Research of Neural Network in Embedded Intelligent Detection

Approach Using Genetic Algorithm for Intrusion Detection System

Power Load Forecasting Based on ABC-SA Neural Network Model

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Genetic Fourier Descriptor for the Detection of Rotational Symmetry

Study on the Application Analysis and Future Development of Data Mining Technology

Robot Path Planning Method Based on Improved Genetic Algorithm

Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.

Research Article Path Planning Using a Hybrid Evolutionary Algorithm Based on Tree Structure Encoding

THE MULTI-TARGET FIRE DISTRIBUTION STRATEGY RESEARCH OF THE ANTI-AIR FIRE BASED ON THE GENETIC ALGORITHM. Received January 2011; revised May 2011

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

An improved PID neural network controller for long time delay systems using particle swarm optimization algorithm

Research on Fuzzy Neural Network Modeling and Genetic Algorithms Optimization in CNC Machine Tools Energy Saving

Parameter optimization model in electrical discharge machining process *

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

A 3D MODEL RETRIEVAL ALGORITHM BASED ON BP- BAGGING

Santa Fe Trail Problem Solution Using Grammatical Evolution

Traffic Signal Control Based On Fuzzy Artificial Neural Networks With Particle Swarm Optimization

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015)

Hierarchical Learning Algorithm for the Beta Basis Function Neural Network

A Power Grid Comprehensive Evaluation Based on BP Neural Network

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Research on Design and Application of Computer Database Quality Evaluation Model

Evolving SQL Queries for Data Mining

Argha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.

An Improved Fusion Method of Fuzzy Logic Based on k-mean Clustering in WSN

The Genetic Algorithm for finding the maxima of single-variable functions

Research Article A New Optimized GA-RBF Neural Network Algorithm

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Structural topology optimization based on improved genetic algorithm

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Fault Diagnosis of Wind Turbine Based on ELMD and FCM

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Inducing Parameters of a Decision Tree for Expert System Shell McESE by Genetic Algorithm

DERIVATIVE-FREE OPTIMIZATION

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks

Using Gini-index for Feature Weighting in Text Categorization

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING

Fast Efficient Clustering Algorithm for Balanced Data

Open Access Research on Traveling Salesman Problem Based on the Ant Colony Optimization Algorithm and Genetic Algorithm

A Hybrid Genetic Algorithm for the Distributed Permutation Flowshop Scheduling Problem Yan Li 1, a*, Zhigang Chen 2, b

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

Parameter Selection of a Support Vector Machine, Based on a Chaotic Particle Swarm Optimization Algorithm

Identification of Vehicle Class and Speed for Mixed Sensor Technology using Fuzzy- Neural & Genetic Algorithm : A Design Approach

Genetic Algorithm for Finding Shortest Path in a Network

Training and Application of Radial-Basis Process Neural Network Based on Improved Shuffled Flog Leaping Algorithm

Research on Heterogeneous Communication Network for Power Distribution Automation

Evolutionary Computation. Chao Lan

Dynamic Clustering of Data with Modified K-Means Algorithm

Open Access Self-Growing RBF Neural Network Approach for Semantic Image Retrieval

Neuro-fuzzy, GA-Fuzzy, Neural-Fuzzy-GA: A Data Mining Technique for Optimization

Research on Design Reuse System of Parallel Indexing Cam Mechanism Based on Knowledge

A PSO-based Generic Classifier Design and Weka Implementation Study

ET-based Test Data Generation for Multiple-path Testing

HAUL TRUCK RELIABILITY ANALYSIS APPLYING A META- HEURISTIC-BASED ARTIFICIAL NEURAL NETWORK MODEL: A CASE STUDY FROM A BAUXITE MINE IN INDIA

Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks

Clustering Analysis of Simple K Means Algorithm for Various Data Sets in Function Optimization Problem (Fop) of Evolutionary Programming

A Framework for adaptive focused web crawling and information retrieval using genetic algorithms

Optimization of Association Rule Mining through Genetic Algorithm

Genetic Algorithm for Dynamic Capacitated Minimum Spanning Tree

Network Traffic Classification Based on Deep Learning

Scheme of Big-Data Supported Interactive Evolutionary Computation

A new improved ant colony algorithm with levy mutation 1

Multi-Objective Optimization Using Genetic Algorithms

Application of Wang-Yu Algorithm in the Geometric Constraint Problem

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Journal of Chemical and Pharmaceutical Research, 2014, 6(9): Research Article

A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Grid-Based Genetic Algorithm Approach to Colour Image Segmentation

Research on Intrusion Detection Algorithm Based on Multi-Class SVM in Wireless Sensor Networks

Hybrid Particle Swarm and Neural Network Approach for Streamflow Forecasting

Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

Using The Heuristic Genetic Algorithm in Multi-runway Aircraft Landing Scheduling

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

A Study on the Traveling Salesman Problem using Genetic Algorithms

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

FSRM Feedback Algorithm based on Learning Theory

A Method Based Genetic Algorithm for Pipe Routing Design

A Hybrid Fireworks Optimization Method with Differential Evolution Operators

The AVS/RS Modeling and Path Planning

Transcription:

20 International Conference on Information Management and Engineering (ICIME 20) IPCSIT vol. 52 (202) (202) IACSIT Press, Singapore DOI: 0.7763/IPCSIT.202.V52.75 BP Neural Network Based On Genetic Algorithm Applied In Text Classification SUN Ai-xiang and LI Ming-hui Management institute, Shandong University of Technology, Zibo, China Abstract. The The BP neural network is one of the most commonly used methods in the field of text classification. BP learning algorithm gained success to some degree, but there are still some drawbacks: the error decreasing slowly, adjusting for a long time, more iterations lead to slow convergence, and training often fall into a local minimum and can not converge to a given error.in order to overcome the shortcomings of the BP neural network, this paper constructed an excellent BP neural network combined with genetic algorithm.in the learning process, the weights are described as chromosomes, then compute the fitness of the chromosomes, and then go on the genetic iteration until the convergence. And in this paper, this algorithm is applied to text classification.the experimental results show that: with the measurement of The BP neural network is one of the most commonly used methods in the field of text classification. BP learning algorithm gained success to some degree, but there are still some drawbacks: the error decreasing slowly, adjusting for a long time, more iterations lead to slow convergence, and training often fall into a local minimum and can not converge to a given error.in order to overcome the shortcomings of the BP neural network, this paper constructed an excellent BP neural network combined with genetic algorithm.in the learning process, the weights are described as chromosomes, then compute the fitness of the chromosomes, and then go on the genetic iteration until the convergence. And in this paper, this algorithm is applied to text classification.the experimental results show that: with the measurement of F-measure the accuracy of the text classification has been greatly improved. Keywords: text classification, bp, genetic algorithm, f-measure.. Introduction Since the 90's of 20th century, With the rapid development of network technology, Information has been expanding in high speed; And information will be growing fast and fast in the future. Now it is very difficult to estimate the amount of information. Among the many information carriers,the text is the most important one. According to statistics, 80% of the information is in the form of text. Relative to other information carriers, the text has been increasing in even more rapid speed. Text Mining has become the most important branch of data mining. It is a rapidly popularizing area of research. Text classification technology is one kind of the most important text mining technologies. Text classification can be applied in many fields;for example:information filtering, information retrieval and digital libraries.however, the accuracy of text classification will have a direct impact on its applications in various areas. If the accuracy of text classification is low, it will not be helpful; on the contrary,it will bring about negative effect.classification technology is the core of text classification technology. It plays an important role on the accuracy of text classification.the BP neural network is one of the most commonly used methods in the field of text classification, but the convergence speed of BP algorithm is slow and easily convergence to the local minimum point. The genetic Corresponding author. Tel.: + 5065339594. E-mail address: aixiang2@63.com.

algorithm can be transplanted to the BP algorithm to overcome these shortcomings. Genetic algorithms are a developed optimization algorithm based on biological principle -"survival of the fittest". It takes a simple coding techniques to represent complex data structures and uses the genetic operations (selection, mutation, crossover) to improve the adaptability of the genetic population in order to gain satisfactory and optimal solution of the problem. In this paper, the BP neural network based on genetic algorithm is applied to the text classification []. The results showed that: F-measure in the evaluation measure, the text of the classification accuracy has been greatly improved. 2. Text Classification Text classification is selecting one or more class label to the test document from predefined categories.as the text is not structured data,we need to transform it to structured data which the computer can directly recognize and process, the structured form must can fully reflect the characteristics of the text itself, and can highlight the difference with other texts. Vector Space Model (abbr is VSM), is the most widely used text expression model[2] currently. It is proposed by the G. Salton in the last century, 60 years. In the vector space model, each text are expressed as a vector. And VSM is successfully applied to the SMART text retrieval system.transforming texts into vectors, need to go through series of pre-processing step such as sub-word, stemming, removing stop words, lowering dimension. 3. BP Neural Network BP (Back Propagation) neural network [3-4] is the most widely used neural networks currently. The full name of BP neural network is the artificial neural networks based on back propagation algorithm. It is commonly referred to as three-layer feed-forward network or sensor : The three-layers are: input layer, hidden layer and output layer. The features of BP neural network are: the neurons of one layer are fully connected with neurons of its adjacent layers ; the neurons in the same layer are of no connection; the neurons of one layer have not feedback connections with the neurons of other layers.the hierarchical structure of BP neural network is shown in Figure : Fig. : BP neural network. The learning process of BP neural network is composed of two processes: the forward dissemination of information and the back propagation of error. The input layer neurons is responsible for receiving input information from the outside world, and pass it to the neurons of the middle layer; The middle layer is the internal information processing layer, responsible for information transformation; According to the requirement of information capacity, the middle layer can be designed as a single hidden layer or more hidden layers;the last hidden layer transfer information to each neuron of the output layer, after further treatment, the neuron network complete one forward propagation study process, the output layer output the study results to the outside world. When the actual output and expected output does not consistent, enter the error back propagation stage. Error propagate from the output layer to the hidden layer, input layer, layer by layer,at the same time correct the weight by the way of gradient descent. Cycle of positive information dissemination and error back-propagation is called the process of continuously adjusting weights and also is called the training process of BP neural network, this process has been carried out to the network output error reduced to an acceptable level, or to the number of learning generation.

BP learning algorithm gains success to some degree, but there are still some drawbacks: the error decreasing slowly, adjusting for a long time, more iterations lead to slow convergence, and training often fall into a local minimum and can not converge to a given error.in order to overcome the shortcomings of the BP neural network, this paper constructed an excellent BP neural network combined with genetic algorithm.in the learning process, the weights are described as chromosomes, and then compute the fitness of the chromosomes, and then go on the genetic iteration until the convergence. In this paper, this algorithm is applied to text classification. The experimental results show that: with the measurement of F-measure the accuracy of the text classification has been greatly improved 4. BP Neural Network Based on Genetic Algorithm In the learning process, the weights described as chromosome, and select the appropriate fitness function, then go on the genetic iteration, until the convergence [5, 6]. 4.. The representation form of neural network and genetic factors To facilitate the genetic operations, a string can be used to represent BP neural network topology, the string is composed of the weights, the weight of BP neural network is represented as,, where k is the number of the layer in BP neural network, W ij k is the connected weight between the i-th neuron in the k laye with the jth neuron in the k + layer; W ij k is in the [0,] range, then the BP neural network can be expressed as: 2 2 2 2 W W 2 W 2 W 22 W W 2 W 2 W 22 Through this encoding, the topology of the neural network represent a specific structure of genetic factor,the network information is stored in them,so we can go on genetic manipulation in order to gain the best chromosome. 4.2. The learning process of BP neural network based on GA The learning process of BP neural network based on GA is shown in Figure 2 [7, 8]. Population initialization Evaluate individual Satisfy? Y N genetic operation Output the individual Next generation Fig.: The process of BP neural network based on GA 4.2.. The initialization of the population As the conventional optimization algorithms, the initial point must be given before the iteration, the difference is that only one initial point is given before the iteration in the conventional optimization algorithms, but more than one initial point are given before the iteration in genetic algorithms,here,the initial points is the initial population. 4.2.2. The evaluation function

In the start stage that will reduce the input vector s selection to the connection vectors,then that will increase the winning chance of each connection vector.and then that will reduce the deviation between input vectors and the connection vectors as quickly as possible.this method can highten the speed of convergence; but the convergence speed is still low. Any individual in the population must be evaluated by the evaluation function Assuming the evaluation function is formula (): f = / (y i y t ) () Where, y i is the desired output, it comes from samples (x i,y i ), y t is the actual output of neural networks.when the input is x i 4.2.3. Genetic operations Because of high dimensional property of text data, feature selection and feature extraction must be carried out to reduce the dimension of text data before clustering, After feature extraction, a word may be mapped to more than one dimension of input space; this method to determine the initial connection weights becomes very difficult. Select Operation Selecting which individual is by the value of the fitness,the larger the individual fitness is, the greater the chance that can be selected to participating in reproduction. And accordingly, the smaller the fitness of individuals is, the greater the chance of being eliminated is;the number of individuals is reflected by the elimination probability p s.assuming the number of individual groups is pop, it means that ps.pop individuals of poor fitness will be eliminated and can not enter the next iteration, in order to maintain a fixed number of individuals in the next iteration, the individuals that are eliminated by select operation will be replaced by the individuals having large fitness which are accquied by mutation operation The mutation operation Taking into account the chromosome (neural network representation) characteristics, there is not a cross operator, only a mutation operator.a sudden mutation rate p m is an important parameter in genetic algorithm.the mutation operation is randomly selecting pm.pop individuals from preserved individuals for each individual selected, select a random number W k k ij to change, the change is adding an increment W ij as formula (2)(3): W ij k = N(0, b m ) (2) b m = λ ( f/f max ) (3) Where, λ is the coefficient, N(0, b m ) is the Gaussian function which mean is 0, variance is b m. The greater the ith neuron fitness is, the greater the b m is;so the probability that W ij k off the mean 0 will be greater. 3 GA termination conditions As,GA termination conditions, we can take different forms. In this paper, When the individual reaches the user's accuracy requirements, we can terminate all operations. Usually chosen the g formula (4) as convergence criterion: f (k +)-f (k) ε (4) Where f (k) is the best individual fitness of k generation. 5. Experimental Data And Analysis 5.. Test corpus This experiment used the Chinese text classification corpus [9] that Tan Song-bo, WANG Yue-fen filed. 200 texts (40 texts of finance and economics, 40 texts of computer, 40 texts of sports, 40 texts of health, 40 texts of real estate)are extracted from the corpus for training. At the same time, 200 texts (40 texts of finance and economics, 40 texts of computer, 40 texts of sports, 40 texts of health, 40 texts of real estate)are extracted from the corpus for testing. After preprocessing, each of this 400 text is converted into a 00-dimensional vector.

5.2. The criteria to evaluate the effectiveness of textclassing F-measure is used in evaluating the text retrieval system [0]. F-measure combines two kinds evaluation criteria of text retrieval: Precesion (abbreviated as P; also known as purity) and recall (Recall, abbreviated as R). Their meaning in evaluating text classification accuracy are as follows: A category i, precision, recall the definition of the following equation: The precision, recall of a category i, are defined as formula (5): N N P(i) R(i) (5) N 2 N 3 Which - N the number of correct text divided into categories i by classifier N - the number of all texts in the category i 2 N - the number of all texts divided into categories i by classifier 3 the F-measure of Category i are defined as formula (6): 2 ( ) P(i) R(i) (i) P(i) R(i) F 2 Usually take, the recall and precision take the same weight, as formula (7): 2 P(i) R(i) F (i) (7) P(i) R(i) For the classification results, the overall F-measure is the the weighted average value of each category,s F, as shown in formula (8): F measure i ( i F (i)) i Where i is the number of all the text in i category. 5.3. The implementation specific parameters of GABP algorithm: Group size: pop = 00 Bp neural network: the number of nodes in input layer is 00, the number of nodes in output layer is 5, the number of hidden layer nodes is 20 W ij k: the initial value obtained randomly from[0,] Out probability ps = 0. Mutation probability pm = 0.2 ε = 0.00; 5.4. The experimental results Results of the comparison as shown in Figure 3: The figure shows that: for the same test corpus, the classification results of BP is general, the classification results of GABP has been greatly improved (6) (8).2 0.8 0.6 0.4 0.2 0 2 3 4 5 6 GABP BP

6. Conclusion Fig. 2: Compare of converge generation. This paper describes a new algorithm-gabp applied in text classification. In the learning process, the weights are described as chromosomes, and then compute the fitness of the chromosomes, and then go on the genetic iteration until the convergence. GABP overcomes the shortcoming of the original BP : the error decreasing slowly, adjusting for a long time, more iterations lead to slow convergence, and training often fall into a local minimum and can not converge to a given error., so it is an excellent text classification algorithm.the results show that: at the evaluation measurement--f-measure, the GABP has greatly increased the accuracy of text classing. 7. References [] Wang An-lin. complex system,s analysis and modeling. Shanghai: Shanghai Jiaotong University Press.2004 [2] Salton G.Automatic Text Processing[M].Addison-wesley Publishing Company,988 [3] Zhang Liming. artificial neural network model and its application [M]. Shanghai: Fudan University Press, 992. [4] Wu Jiantong, Wang Jian hua. neural network technology and its application [M]. Harbin: Harbin Institute of Technology Press, 998. [5] Wang Chongjun. A genetic algorithm based on BP neural network algorithm and its application [J]. Nanjing University, 2003, 39 (5): 459-466. [6] YEN G G. LU Haiming. Hierachical Genetic Algorithm Based on Neural Network Design[ C ] / / IEEE Symposiumon Combinations of Evolutionary Computation and Neura lnetwork. 2000. [7] Liu Xu, Xue Fuzhen,Tanglei. Adaptive genetic algorithm based on multi-variable system design method of approximate model [J]. Chemicals and Instruments, 2009, 36 (): 27-30. [8] YANG guo jun, Cui Ping yuan,li Lin-lin. Genetic Algorithm in Neural Network Control and Implementation [J]. System Simulation, 200, 3 (5): 567-570. [9] Tansong Bo, WANG Yue-fen. Chinese text classification corpus -TanCorpV.0. http://www.searchforum.org.cn/tansongbo/corpus.php [0] David H, Heikki M, Padhraic S. Principles of data mining [M]. Zhang Yinkui, Liao Li, Song Jun and so on. Machinery Industry Press. Beijing, 2003