Association Rule Mining Based on Estimation of Distribution Algorithm for Blood Indices

Similar documents
An Optimal Algorithm for Prufer Codes *

Cluster Analysis of Electrical Behavior

Concurrent Apriori Data Mining Algorithms

Available online at Available online at Advanced in Control Engineering and Information Science

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Meta-heuristics for Multidimensional Knapsack Problems

THE PATH PLANNING ALGORITHM AND SIMULATION FOR MOBILE ROBOT

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

A Binarization Algorithm specialized on Document Images and Photos

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Vector Quantization Codebook Design and Application Based on the Clonal Selection Algorithm

EVALUATION OF THE PERFORMANCES OF ARTIFICIAL BEE COLONY AND INVASIVE WEED OPTIMIZATION ALGORITHMS ON THE MODIFIED BENCHMARK FUNCTIONS

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

BIN XIA et al: AN IMPROVED K-MEANS ALGORITHM BASED ON CLOUD PLATFORM FOR DATA MINING

Association Rule Mining with Parallel Frequent Pattern Growth Algorithm on Hadoop

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Research of Dynamic Access to Cloud Database Based on Improved Pheromone Algorithm

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

An Efficient Genetic Algorithm with Fuzzy c-means Clustering for Traveling Salesman Problem

The Research of Support Vector Machine in Agricultural Data Classification

Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks

An Improved Image Segmentation Algorithm Based on the Otsu Method

Remote Sensing Image Retrieval Algorithm based on MapReduce and Characteristic Information

A Clustering Algorithm Solution to the Collaborative Filtering

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

Complexity Analysis of Problem-Dimension Using PSO

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Network Intrusion Detection Based on PSO-SVM

Discrete Cosine Transform Optimization in Image Compression Based on Genetic Algorithm

CHAPTER 2 PROPOSED IMPROVED PARTICLE SWARM OPTIMIZATION

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Support Vector Machines

Research on Kruskal Crossover Genetic Algorithm for Multi- Objective Logistics Distribution Path Optimization

Clustering Algorithm Combining CPSO with K-Means Chunqin Gu 1, a, Qian Tao 2, b

X- Chart Using ANOM Approach

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

An Image Compression Algorithm based on Wavelet Transform and LZW

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

Problem Set 3 Solutions

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Efficient Genetic Algorithm Based Approach for the Minimum Graph Bisection Problem

NGPM -- A NSGA-II Program in Matlab

Smoothing Spline ANOVA for variable screening

A Novel Approach for an Early Test Case Generation using Genetic Algorithm and Dominance Concept based on Use cases

Degree-Constrained Minimum Spanning Tree Problem Using Genetic Algorithm

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Parallel Artificial Bee Colony Algorithm for the Traveling Salesman Problem

Parallel matrix-vector multiplication

Research of Neural Network Classifier Based on FCM and PSO for Breast Cancer Classification

S1 Note. Basis functions.

A Novel Adaptive Descriptor Algorithm for Ternary Pattern Textures

A new segmentation algorithm for medical volume image based on K-means clustering

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Image Feature Selection Based on Ant Colony Optimization

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Application of Clustering Algorithm in Big Data Sample Set Optimization

Parallel and Distributed Association Rule Mining - Dr. Giuseppe Di Fatta. San Vigilio,

A Model Based on Multi-agent for Dynamic Bandwidth Allocation in Networks Guang LU, Jian-Wen QI

A fast algorithm for color image segmentation

Cracking of the Merkle Hellman Cryptosystem Using Genetic Algorithm

ENERGY EFFICIENCY OPTIMIZATION OF MECHANICAL NUMERICAL CONTROL MACHINING PARAMETERS

Cost-efficient deployment of distributed software services

A New Approach For the Ranking of Fuzzy Sets With Different Heights

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Programming in Fortran 90 : 2017/2018

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

A Notable Swarm Approach to Evolve Neural Network for Classification in Data Mining

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

PARETO BAYESIAN OPTIMIZATION ALGORITHM FOR THE MULTIOBJECTIVE 0/1 KNAPSACK PROBLEM

Reducing Frame Rate for Object Tracking

Classifier Swarms for Human Detection in Infrared Imagery

K-means Optimization Clustering Algorithm Based on Hybrid PSO/GA Optimization and CS validity index

FPGA-based implementation of circular interpolation

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Straight Line Detection Based on Particle Swarm Optimization

CHAPTER 4 OPTIMIZATION TECHNIQUES

Multi-objective Optimization Using Self-adaptive Differential Evolution Algorithm

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Fast Computation of Shortest Path for Visiting Segments in the Plane

The Shortest Path of Touring Lines given in the Plane

Research on Categorization of Animation Effect Based on Data Mining

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

AN EFFICIENT AND ROBUST GENETIC ALGORITHM APPROACH FOR AUTOMATED MAP LABELING

Edge Detection in Noisy Images Using the Support Vector Machines

An Adaptive Virtual Machine Location Selection Mechanism in Distributed Cloud

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Review of approximation techniques

An Image Fusion Approach Based on Segmentation Region

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

Transcription:

Assocaton Rule Mnng Based on Estmaton of Dstrbuton Algorthm for Blood Indces Xnyu Zhang College of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna E-mal: zhangxnyu1995@126.com Botu Xue College of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna E-mal: xuebotu1994@163.com Guanghu Su College of Informaton Scence and Engneerng ortheastern Unversty Shenyang, Chna E-mal: suguanghu@foxmal.com Janang Cu* Inst. Intellgent Systems ortheastern Unversty, Shenyang, Chna E-mal: cuanang@se.neu.edu.cn *Correspondng Author Abstract To come over the lmtatons of Apror algorthm and assocaton rule mnng algorthm based on Genetc Algorthm (GA), ths paper proposed a new assocaton rule mnng algorthm based on the populaton-based ncremental algorthm (PBIL), whch s a knd of dstrbuton estmaton algorthms. The proposed assocaton rule-mnng algorthm keeps the advantages of GA mnng assocaton rules n codng and the ftness functon. Through usng probablty vector possessng learnng propertes to update the populaton, the algorthm ncreases the convergence speed and enhances the searchng ablty, compared to GA. In the experment of mnng assocaton, rules n blood ndces data, PBIL algorthm performs better not only n runnng tme, convergence speed, but also acheve better searchng results. Meanwhle, ths paper proposed a parallel algorthm for assocaton rule mnng based on PBIL and desgned a system archtecture based on cloud computng for blood ndces analyss, provdng a good example to apply the new algorthm to cloud computng. Keywords-Dstrbuton estmaton algorthm; Probablty vector; Blood ndces; Parallel algorthm; Cloud computng I. ITRODUCTIO Assocaton rule mnng s an mportant branch of data mnng. By collectng many records of tems n the database then analyzng them, the valuable relatonshps between huge amounts of data can be found [1]. The sgnfcance of the assocaton rules analyss s greater n medcal data than n other areas. By mnng the medcal data, the potental relatonshps between varous dseases and varous health ndcators can be found, servng for medcal research and dsease dagnoss [2]. Apror algorthm s the most typcal algorthm for assocaton rule mnng. Tradtonal Apror algorthm needs to scan the database for many tmes to generate vast canddate sets, leadng to poor extensblty of Apror algorthm. To overcome the weakness above, some scholars put forward a theory usng ntellgent optmzaton algorthm to mne the assocaton rules. In 2004, L Yng [3] put forward the applcaton of generalzed genetc algorthm n Apror algorthm mprovng. At frst, t uses Apror to search partal assocaton rules, then t uses genetc algorthm to search global assocaton rules. In ths way, the tmes of traversng the database can be reduced. In 2012, Shwe Chen [4] put forward a method of assocaton rule mnng based on nterest measure and genetc algorthm, mprovng the qualty of assocaton rule mnng. In 2016, Donghao Xu [5] put forward a method of assocaton rule mnng based on mproved partcle swarm optmzaton algorthm, verfyng the advantage of partcle swarm optmzaton on assocaton rule mnng compared wth genetc algorthm. Wth the rapd development of computer technology, cloud computng has become a drecton for the future development of dstrbuted computng. MapReduce programmng frame put forward by Google s a representatve technology of cloud computng. It s sutable for dstrbuted processng of large-scale datasets and has very hgh computatonal effcency [6]. Therefore, some scholars put forward a method of assocaton rule mnng based on Hadoop and other cloud computng technology. They also put forward some parallel algorthms for assocaton rule mnng. In 2011, Zhang Sheng [7] put forward an Apror algorthm based on cloud computng. It deploys MRM- Apror algorthm n MapReduce frame and has good effect n speed. Based on the research, an assocaton rule mnng algorthm based on dstrbuton estmaton (PBIL) s proposed n ths paper. Compared wth Apror and GA assocaton rule mnng algorthm n experment, the new algorthm s proved to be effectve. Meanwhle, an parallel algorthm based on ths algorthm s desgned, whch s sutable for MapReduce frame. In addton, a realzaton plan for blood ndces analyss system based on cloud computng s ntroduced n ths artcle. 20

II. THE CLASSICAL ASSOCIATIO RULE MIIG ALGORITHM Apror algorthm s one of the most typcal algorthms n the data mnng feld. It uses a method called teraton of layer by layer to produce hgh dmenson frequent tem sets from low dmenson frequent tem sets. Then the assocaton rules can be produced from frequent tem sets [8]. The specfc mathematcal model s n lterature 8. Support degree n Apror algorthm s defned as follows: the number of transactons of the entre transacton set s m, and there are n transactons contanng the tem set, then the support degree of the tem set s n/m. If the tem set A exsts, the support degree of the tem set s supp(a). The canddate set n Apror algorthm s generated layer by layer, and only after fully scannng the database wll the frequent tem set of ths layer be produced. Therefore, f the database s very large, ths work wll cost a lot of memory resources, reducng the effcency of the algorthm. III. ASSOCIATIO RULE MIIG BASED O GEETIC ALGORITHM (GA) A. Assocaton Rules Model Based on GA Based on the analyss of the Apror algorthm, assocaton rule mnng s manly dvded nto two parts. Frst, fnd frequent tem sets n the transacton database. The second s to generate assocaton rules based on the frequent tem sets whch are found [9]. And the workload of the former s the larger one, whch s the drect cause of low effcency of Apror. Therefore, genetc algorthm (GA) can be used to realze the global search for frequent tem sets. Genetc algorthm s an effectve global optmzaton algorthm. Wth ts bnary codng mode, usng genetc operators to evolve populaton, t s able to keep extractng frequent tem sets from the transacton database at a rapd speed, avodng the operatons lke on and prune whch need to frequently scan the database, mprovng the computng speed and mnng precson. The concrete mplementaton steps are shown n fg. 2. B. The Weakness of Assocaton Rule Mnng Based on GA Genetc algorthm wll save the ndvduals, whch meet the requrements of frequent tem sets n each generaton to the next generaton when t s mnng assocaton rules. And the ndvduals whch meet the requrements wll be removed from the prevous generaton. The man purposes of ths populaton selecton method are to reserve the excellent ndvduals and to keep the populaton dversty. But these two requrements are contradctory. If the excellent ndvduals reserved are excessve n each generaton, the populaton dversty wll decrease, prone to lead to a prematurty phenomenon. As a result, n assocaton rule mnng, the search of frequent tem sets wll be ncomplete, and the extracted assocaton rules wll be partal. If the excellent ndvduals reserved are not enough n each generaton, the convergence speed of the algorthm wll be reduced and the computng tme wll be longer. That wll affect the advantage of assocaton rule mnng based on genetc algorthm n speed. IV. ASSOCIATIO RULE MIIG ALGORITHM BASED O ESTIMATIO OF DISTRIBUTIO ALGORITHM (EDA) A. Descrpton of EDA To come over the dsadvantages of genetc algorthm mnng assocaton rules, EDA can be used. EDA s a knd of evolutonary algorthms developed by the genetc algorthm. It frst selects samples from the optmal populaton and extracts nformaton. Then t uses the nformaton to buld proper probablty module. At last, t updates the populaton to ncrement ndvduals wth more ftness untl the end condton. In the way t can maxmum the ndvduals quantty and keep the populaton dversty [10]. At the same tme, EDA can select new solutons by probablty dstrbuton to obtan the optmal solutons wth less teraton tmes. It can effectvely prevent the local optmzaton and precocty n GA when dealng wth hgher order or longdstance tectonc block problems [11]. B. PBIL Probablty Module When handlng the problem, whch owns mutual ndependent varables, PBIL algorthm, a typcal form of EDA, can be used. PBIL algorthm s manly appled to bnary-code optmzaton problem. PBIL collects the data recordng the values of varables, whose value s ruled as 0 or 1, to buld the probablty vector. Then t uses the probablty vector to estmate the one-dmensonal edge dstrbuton. Assumng that a bnary gene populaton wth gene postons (mutual ndependent varables) and M ndvduals s exstng and the populaton can evolve contnuously, the gene populaton on the tth generaton can be expressed as: X 0 g t 1 1,..., M, 1,..., (1) Where t represents the evolutonal generaton number of X 1,..., the gene populaton, t represents the th gene poston (ndependent varable) at the tth generaton, X g t X represents the code of t th gene poston (varable) of the th ndvdual at the tth generaton. Then we use the code condton of every varable at the tth generaton to generate the probablty vector: 1 2 3 Pt Pt Pt Pt,..., Pt (2) Then we count the amount of the ndvduals whose desgnated gene poston (varable) are of code 1 and calculate the percent n total M ndvduals at current gene populaton. The percent obtaned s equal to the dstrbuton probablty of the desgnated varable as follows: M g 1 g 1 P X t t M (3) 21

When usng the EDA, t frst generates a random orgnal populaton and fgures out the ftness value of every ndvduals of the populaton. Then t ranks all ndvduals n order of correspondng ftness value. Indvduals wth greater ftness values wll be seen more advanced. Then truncaton s adopted to select advanced ndvduals of certan amount. The rate of truncaton s named as selerate. So the amount of advanced ndvduals m can be expressed as: m selerate M (4) Therefore, the advanced populaton s made up of the frst ranked m ndvduals of the orgnal populaton. Through equaton (3) PBIL algorthm gans the probablty vector by those advanced ndvduals. Then t takes samples basng on the probablty vector and the nextgeneraton populaton s obtaned. At the same tme to make probablty vector descrbe the probablty dstrbuton of the advanced ndvduals wth faster speed and more accurate qualty, PBIL algorthm adopts the Heb rules from machne learnng theory to update the probablty vector, whch means that the probablty dstrbuton of each varable s adusted lnearly at a certan learnng speed[12] as equaton (5) shows: 1 P X P X g (1 ) m t1 t t t m 1 (5) Where m represents the amount of advanced ndvduals selected at the tth generaton. The process of samplng from the probablty vector can be descrbed as generatng a random number rangng from 0 to 1. If the number s greater than the probablty vector correspondng to a certan ndvdual gene poston, whch means, the bnary value of the gene poston s 1, otherwse 0. C. Codes of Indvduals and Item Set When dealng wth the problem of mnng assocaton rules, EDA adopts bnary code also. That s, when a patent owns an abnormal blood ndex, the value of the ndex s set as 1. If the ndex s normal, the value s set as 0 nstead. D. Selecton of Ftness Functon Ftness functon s desgned to reflect the frequency of the tem set and recognze the frequent tem set. Snce the crtera of beng frequent tem set s that the support level of the tem set s greater than the mnmum support level, the ftness functon can be defned as: ftness g Supp g MnSupp Where represents the th ndvdual at the th generaton. The name of the ftness functon s ftness. Supp g X t g X t represents the support level of the tem set correspondng to a certan ndvdual. MnSupp represents the mnmum support level whch s gven by user. If an ndvdual owns ftness value greater than 1, t llustrates the tem set correspondng to the ndvdual has ts support level greater than the mnmum support level. The (6) tem set s frequent tem set. The ndvdual wll be reserved as a member of advanced populaton wth updatng the probablty vector. If the ftness value s less than 1, t means that the tem set correspondng to the ndvdual s not frequent. Then the ndvdual s elmnated drectly. E. Procedures of Mnng Assocaton Rules by PBIL Based on the analyss above, specfc steps of assocaton rule mnng based on PBIL algorthm s shown n Fg.1. Start Intalze populaton and parameters (n ndvduals) Compute ndvdual ftness and rank by equaton (6) Save ndvduals wth ftness > 1 Truncaton and selecton for advanced populaton (m ndvduals) by equaton (4) Compute Probablty vector Pt by equaton (3) Iteraton tmes>1? Samplng [(n-m)ndvduals] The nextgeneraton populaton Is satsfed? Y Indvdual decodng Obtan assocaton rules End Y Lnearly modfy probablty vector by equaton (5) Fgure 1. Flow chart of PBIL mnng assocaton rules V. AALYSIS OF EXPERIMETAL RESULTS A. Constructon of the Transacton Database All the data n ths experment are from anonymous blood routne laboratory sheets provded by a second grade hosptal. Blood routne laboratory sheets provde 9 test ndces [13], ncludng whte blood cell count (WBC), neutrophl (E), lymphocyte (LY), monocyte (MO), eosnophl (EOS), basophl (BASO), red blood cell count (RBC), hemoglobn (Hb) and hematocrt (HCT). 255 laboratory sheets were randomly selected to buld transacton database. The data n each laboratory sheet s regarded as one tem set of transacton database. And each tem of the tem set s encoded referrng to the encodng rules n 4.3 and the test result n the laboratory sheet: ormal ndex encodng s 0, and abnormal ndex encodng s 1. 22

B. The Experments and Results Analyss In order to verfy the advantages of PBIL algorthm for mnng assocaton rules, classcal Apror algorthm and assocaton rule mnng algorthm based on genetc algorthm (GA) were compared wth PBIL algorthm n the experments. The three algorthms were compared wth each other n three aspects: effect of mnng assocaton rules, the tme of extractng frequent tem sets and the convergence of PBIL. Referrng to the analyss of 3.1, assocaton rule mnng model based on GA can be desgned. The specfc steps are shown n Fg.2. Start Intalze populaton and parameters (n ndvduals) Compute ndvdual ftness and rank by equaton (6) Save ndvduals wth ftness > 1 Extract ndvduals wth ftness >1 and form ntermedate populaton1 (Popmd1) Remanng (n-m)ndvduals form ntermedate populaton2 (Popmd2) Crossng of Popmd2 Varaton of Popmd2 The next-generaton populaton Is satsfed? Y Indvdual decodng Obtan assocaton rules End Fgure 2. Flow chart of GA mnng assocaton rules Referrng to the blood ndces data n 5.1, based on Wn10 system and Intel Core 5 processor, the algorthm above can be translated to programmng n MATLAB 2015a. C. Analyss of assocaton rules The transacton database (255 transactons, 9 data tems) n 5.1 s calculated by usng assocaton rule mnng algorthm based on PBIL. The set ponts are as follows: Mnmum support s 0.12 (30/255); Mnmum confdence s 0.7; Populaton sze (Popsze) s 500; Iteraton tmes (Iteraton) are 100; Truncaton selectvty (selrate) s 0.4; Learnng rate (learnrate) s 0.1. Mergng the smlar rules of operaton result, the fnal result s shown n Table I. Rule number TABLE I. Rule premse FOUD RULES Rule result Support level Confdence level 1 WBC,E LY 0.1294 0.8250 2 WBC,BASO LY 0.1294 0.7174 3 WBC,Hb HCT 0.1804 0.7302 4 E,BASO WBC 0.1216 0.8185 5 E,RBC Hb 0.1608 0.7885 6 E,Hb WBC 0.1412 0.8571 7 BASO,Hb ESO 0.1216 0.7949 8 HCT,RBC Hb 0.1294 0.8049 The result n Table I can be obtaned n classcal Apror algorthm as well. Table I shows the ncdence relatons between each blood ndex. For nstance, n Assocaton Rule 1, patents wth whte blood cells, neutrophls and lymphocytes abnormal at the same tme are the most common. Therefore, accordng to the assocaton rule, patents wth whte blood cells abnormal can be told to prevent or treat dseases caused by abnormal lymphocytes, and vce versa. D. Mnng tme comparson between algorthms Based on the analyss n 3.1, assocaton rule mnng can be dvded nto two stages. The frst stage s to fnd frequent tem sets n the transacton database, whch costs the man computng tme. The second stage s to generate assocaton rules based on the frequent tem sets whch are found. Parts of the three algorthms n the second stage are the same. Therefore, t s ust enough to compare the three algorthms tme of searchng for frequent tem sets. E. The relatonshp between the mnng tme and the number of transactons In ths experment, the mnmum support s 2/255, and the number of data tems (ndces) s 9. The three algorthms computng tme can be compared under the premse of searchng for the same number of frequent tem sets, by keepng changng the number of transactons. The settng parameters of PBIL and GA are as follows: Populaton sze s 500; Iteraton tmes are 100; PBIL truncaton selectvty s 0.4; Learnng rate s 0.1; GA crossover probablty s 0.8; Mutaton probablty s 0.01. The change of the three algorthms computng tme s shown n Fg.3, and the number of frequent tem sets whch are found s shown n Table II. In Fg.3, the changng number of transactons n the transacton set s used as abscssa. The computng tme of algorthms s used as ordnate. The gray curve, orange curve and blue curve separately represent the trends of Apror, GA and PBIL on computng tme. In the condton of the same number of data tems, the computng tme of the three algorthms ncreases wth ncrement of the number of transactons. The computng tme of Apror s the longest, surpassng PBIL and GA. The PBIL s based on the probablty model to evolve populaton, and t has learnng propertes. Therefore, convergence of ths algorthm s drectonal. GA s based on rules of crossover and mutaton n nature to evolve populaton. So t has large randomness and doesn t have learnng propertes. As a result, PBIL has 23

faster convergence speed than GA, whch becomes more obvous wth the ncreasng of data volume. caused by the ncrement of dmenson when addtonal dmensons are added to the data. It wll constantly add hghdmensonal data nto the populaton to evolve because of ts crossover and mutaton. Therefore, PBIL algorthm s better than the former two algorthms. TABLE III. FREQUET ITEM SETS AMOUT FOUD I DIFFERET DATA SETS AMOUT Fgure 3. Operaton tme of algorthms at dfferent transacton amount Order Data sets amount Frequent tem sets amount Scale of populaton Iteraton tmes 1 3 6 20 5 2 4 7 25 10 3 5 8 35 10 4 6 11 65 20 5 7 16 100 40 6 8 23 150 40 7 9 24 200 50 TABLE II. FREQUET ITEM SET S AMOUT FOUD I DIFFERET TRASACTIOS AMOUT Order Transactons amount Frequent tem sets amount 1 20 95 2 40 197 3 60 205 4 80 250 5 100 253 6 120 331 7 140 332 8 160 336 9 180 356 10 200 373 F. The relatonshp between the mnng tme and the number of data tems In ths experment, the mnmum support s 50/255, and the number of transactons s 255. The three algorthms computng tme can be compared under the premse of searchng for the same number of frequent tem sets, by keepng changng the number of data tems (ndces). The change of the three algorthms computng tme s shown n Fg.4, parameter settng and the number of frequent tem sets whch are found are shown n Table. III. In Fg.4, n the condton of the same number of transactons, the computng tme of the three algorthms ncreases wth ncrement of the number of data tems. The computng tme of Apror ncreases the most fast wth the ncrement of data dmenson. PBIL and GA obtan frequent tem sets by searchng for them, so the two algorthms are less nfluenced by data dmenson. In addton, PBIL has much faster speed than GA. The reason s as follows: PBIL uses probablty vector to evolve populaton, buldng a correspondng probablty model for each varable of the ndvdual. If an addtonal dmenson s added to the data, a correspondng probablty vector wll be bult. Each of the varables s mutually ndependent, evolvng wth ts own probablty vector, greatly reducng the affect caused by the ncrement of dmenson. GA wll strengthen the affect Fgure 4. Operaton tme of algorthms at dfferent data tem amount G. Convergence of PBIL In ths experment, PBIL wll be verfed to have better convergence, compared wth GA algorthm. The data wth 9 transactons and 5 data tems s used to experment. The mnmum support s 2/9. Populaton sze s 15. Iteraton tmes are 100. The other parameters are dtto. The best ftness value of each generaton can be obtaned by operatng 20 tmes. The curve whose convergence speed s the fastest among the 20 operatons of GA s used to compare wth the convergence curve of PBIL. The result s shown n Fg.5. The maxmum support of frequent tem sets s 4.5. The two curves represent the two algorthms ablty of searchng for frequent tem sets wth the maxmum support. The fgure shows that the best ftness value of each generaton of PBIL algorthm reaches 4.5 frst. GA s later than PBIL for at least 50 teraton perods. What s more, PBIL has found 8 frequent tem sets, and GA has found 4. PBIL costs less tme as well. Therefore, PBIL algorthm s better at convergence and searchng ablty. 24

Ftness value 2017 Internatonal Conference on Computer etwork, Electronc and Automaton (ICCEA 2017) VI. 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 PBIL GA 1 11 21 31 41 51 61 71 81 91 101 teraton tm es Fgure 5. Astrngency curve of PBIL vs. GA MOBILE AALYSIS SYSTEM OF BLOOD IDICES BASED O CLOUD COMPUTIG A. Parallel Algorthm of Assocaton Rule Mnng Based on PBIL The operaton speed of the assocaton rule mnng algorthm based on PBIL depends on the populaton scale and teraton tmes. Therefore, we can consder decomposng the database, declnng the scale of the transacton database and gather the results after parallel mnng wth PBIL algorthm. Basng on the thought above, the currently popular cloud-computng framework Hadoop [14] s used. Then we desgn the algorthm under the MapReduce framework nsde Hadoop and use map functon to execute data decomposton and mnng. At last we use reduce functon to gather the mnng results. The framework of the algorthm s shown n Fg.6. Snce each map functon can acheve parallel computng, whch means that searchng all frequent tem set costs approxmately same tme as searchng a small database, the effcency of the algorthm s greatly mproved. Transacton Database D D1 D2 Dn each transacton as a small data Map Map Map Map functon parallel search output< support level, frequent tem set>key-value Reduce Reduce Reduce functon basng support level, gather frequent tem set and generate assocaton rule Parallel Operaton of Hadoop Fgure 6. Framework of parallel algorthm mnng assocaton rules Rule 1 Rule n B. Blood-Health Indces Analyss System Based on Cloud Computng The framework of the blood-health ndces analyss system based on cloud computng s shown n Fg.7. Frstly, the system mplements the algorthm n 6.2 by confgurng the parallel-computng cluster basng on Hadoop framework. The newly bult database s connected wth the hosptal s database to update the data dynamcally. To make t easer to use the system, an Androd moble applcaton s developed to help analyze the ndces of blood. Clents can upload abnormal ndces to the then the wll search for the ndces, whch form assocaton rules wth those ndces uploaded accordng to the computng results. After that, t wll read the referred llness symptoms and cure method. Eventually the wll send the nformaton to the moble clents to accomplsh the onlne analyss of llness. Blood Indces Analyss System Start PBIL parallel rule and dagnoss API API algorthm Read referred assocaton From From Real Save n rooutne HBase blood database test data Man From API Computer Cluster Based on Haddop Fgure 7. Archtecture of blood ndces analyss system based on cloud computng VII. COCLUSIO After analyzng the dsadvantages of the tradtonal Apror algorthm and the module of assocaton rule mnng based on GA n operaton tme and search qualty, ths paper puts forward a new module of assocaton rule mnng based on PBIL and proves that the appled algorthm performed better at searchng the frequent tem set comparng to Apror and GA algorthms. Moreover, ths paper desgns a new parallel algorthm of assocaton rule mnng based on PBIL and archtecture of the blood-health ndces analyss system based on cloud computng whch makes good example of the practcal applcaton of the algorthm. REFERECES [1] Jawe Han. Data Mnng: concept and technology [M]. Beng: Chna Machne Press, 2004:137-147.J. [2] Xaomn D. Research on Mnng Common Rsk Factors of Multdseases and Predctng Dsease [D]. Tayuan Unversty of Technology, 2013. [3] Yn L, Changxu Cao, Janghong Ren, etc. Applcaton of General Algorthm (GGA) n the Improvement of Apror Algorthm [J]. Computer and Modernzaton, 2004(11):1-3. 25

[4] Shwe Chen. Research on Assocaton Rule Mnng Based on Interest and Genetc Algorthm [D]. Zheang Unversty, 2012. [5] Donghao Chen, Hongwe L, Teyng Zhang, etc. Applcaton of Improved PSO Algorthm n Spatal Assocaton Rule Mnng [J]. Scence of Surveyng and Mappng, 2016, 41(2):168-172. [6] Lämmel R. Google s MapReduce programmng model Revsted [J]. Scence of Computer Programmng, 2008, 70(1):1-30. [7] Sheng Zhang. An Apror based Algorthm of Assocaton Rules based on Cloud Computng [J]. Communcatons Technology, 2011, 44(6):141-143. [8] Zhengchan Rao, anbo Fan. A revew of assocatve rule mnng Apror algorthm[j]. Computer Era, 2012(9):11-13. [9] Guoyan Xu, Yuqng Sh. Applcaton of Genetc Algorthm n Assocaton Rule Mnng[] Computer engneer, 2002, 28(7):122-124. [10] Zhang Q. On Stablty of Fxed Ponts of Lmt Models of Unvarate Margnal Dstrbuton Algorthm and Factorzed Dstrbuton Algorthm [J]. IEEE Transactons on Evolutonary Computaton,2004,8(1):80-93. [11] Shude Zhou, Zenq Sun. A Survey on Estmaton of Dstrbuton Algorthm [J]. Acta Automatca Snca, 2007, 33(2):113-124. [12] H. Muhlenben, T. Mahng. Convergence theory and applcaton of the factorzed dstrbuton algorthm [J]. Comput. Inf. Technol. 1999,7(1):19 32. [13] Qn Y J, Sun J S, Wang B Y. The dfferences of the blood routne ndces n patents wth fatty lver and non-fatty lver[j]. Journal of ClncalHepatology,2010. [14] Qang Xu, Zhenang Wang. Practce of Cloud-computng Applcaton Developng. Beng: Chna Machne Press, 2012:64-67. 26