The Application of Clustering Algorithm Based on Improved Canopy -Kmeans in Operators Data
|
|
- Eugene Griffin
- 5 years ago
- Views:
Transcription
1 rd International Conference on Engineering Technology and Application (ICETA 2016) ISBN: The Application of Clustering Algorithm Based on Improved Canopy -Kmeans in Operators Data Haoqian Mai & Lianglun Cheng* College of Computer Science and Technology, Guangdong University of Technology, Guangzhou, Guangdong, China ABSTRACT: Kmeans algorithm is commonly used in user segmentation in operators data, but its k value is difficult to be identified. Meanwhile, canopy algorithm can help Kmeans algorithm to determine the k value, but it is seriously impacted by the radius. In order to solve the above problems, an improved Canopy-Kmeans algorithm is proposed. Firstly, the initial data will be divided into K1 coarse clusters by using the Canopy algorithm with smaller radius. And then, we will use the split method or merged method to reconstruct the K1 coarse clusters to K2 convergent clusters (K1 K2). Finally, we can make the final K2 cluster centers be the initial centers on Kmeans algorithm. By the simulation experiment, the improved Canopy-Kmeans algorithm has performed well in running time, clusters result and square error. Keywords: Canopy Kmeans; clustering; split; merge 1 INTRODUCTION Data mining is a technology which finds the latent rules from the large, incomplete, noisy, and fuzzy data [1]. With the advent of era of big data, the traditional operators start to adopt intelligent data management. In order to subdivide the personalized service, how to effectively and reasonably fractionize the user group has become the key issues in operators management. Therefore, subdividing customer group according to the customers personalized characteristics and designing personalized marketing strategies for different customers will be the vital task for operators in the fierce market competition. At present, clustering is the main data mining algorithm in customer segmentation. Clustering algorithm is mainly divided into the following categories: partitioned clustering [2], hierarchical clustering [3], density clustering [4] and grid clustering [5]. Among them, Kmeans algorithm is frequently used on clustering, and its time complexity is O(nkt), where n is the size of sample, k is the number of categories, and t is the count of iterations [6]. However, the traditional Kmeans algorithm is highly dependent on the selection of initial cluster centers. If the initial value is not good enough, it will be easy to fall into the local optimal solution and get the bad clustering results [7]. The paper [8] proposed an improved Kmeans to select some points, which is most closed to the center of data set, as the new cluster centers. And this algorithm has been applied to the customer segmentation with good effect. The paper[9] proposed a method to select the initial cluster center by using the maximum distance, which can reduce the dependency on the initial cluster centers selection and the time cost. The paper [10] put forward an improved Kmeans algorithm to optimize the initial clustering center by using minimum variance according to the distribution of sample, and its clustering results are stable and have strong anti-noise ability. In addition to the problem of selecting initial cluster center, how to determine the K value is another problem on Kmeans algorithm, which also influences the clustering result [11]. This paper proposes an improved method about Canopy-Kmeans. Firstly, the initial data will be divided into K1 coarse clusters by using the Canopy algorithm with smaller radius. And then, we will use the split method or merged method to reconstruct the K1 coarse clusters to K2 convergent clusters (K1 K2). Finally, we can use the final K2 cluster centers to be the initial centers on Kmeans algorithm. *Corresponding author: mhqian@yeah.net 247
2 2 CANOPY -KMEANS CLUSTERING ALGORITHM 2.1 Kmeans algorithm Kmeans algorithm is a typical clustering algorithm based on distance, and the distance is the evaluating indicator about similarity. If the two objects distance is shorter, their similarity is greater. Kmeans algorithm thinks that the clusters are composed of the neighboring objects, and its ultimate goal is to gain the compact and independent clusters whose objects have the high similarity in the same cluster, but the objects between clusters will have lower similarity [12]. The steps of Kmeans algorithm are as follows: (1) Randomly select K objects as the initial center from the data set. (2) Calculate the distance of each remaining object to their center, and put it into the nearest cluster. The similarity calculation formula is as follows (usually use Euclidean distance). Assuming that c is a cluster center, x is a sample point: d xc, = n 2 x i c i (1) i1 (3) Recalculate the new center of each cluster. (4) Iterate step2 and step3 until the standard measurement function has been convergence. Generally, it uses mean square deviation as the standard clustering measure function: J= k n i 2 xij c i (2) i1j1 n 1 But the Kmeans algorithm has the following drawbacks: K value should be given in advance, and the K value is very difficult to estimate. Most of the time, we does not know how many categories should be divided from the given data set; Moreover, the initial clustering centers of clustering has great influence on the final results. If the choice of initial value is not good enough, the effective clustering results won t be obtained. 2.2 Canopy algorithm Canopy is one of the improved Kmeans algorithm, can be used to determine the number of clusters. With the introduction of Canopy clustering, the data set is divided into k sub-sets by setting the radius, and the sub-sets can be selected as the initial centers of Kmeans. Because the Canopy algorithm can reduce the running time of the clustering by reduce the count of comparisons, it will improve the computational efficiency. The Canopy algorithm s steps as follows: Step1: Put all data into a List, and initialize two distance radius about the loose threshold T1 and the tight threshold T2 (T1> T2). Step2: Randomly select a point as the first initial center of the Canopy cluster, and delete this node from the List. Step3: Get a point from the List, and calculate the distance d to each Canopy clusters. If d < T2, the point belongs to this cluster; if T2 d T1, this point will be marked with a weak label; If the distance d to all Canopy center is greater than T1, then the point will be a new Canopy cluster center. Finally, this point should be deleted from the List; Step4: Run the step3 repeatedly until the list is empty, and recalculate the cluster center. But the execution efficiency of Canopy algorithm is affected by the radius about T1 and T2. When T1 is too large, it will makes one point belongs to multiple Canopy cluster, which will increase the computing time; when the T2 is too large, it will reduce the clustering count. So the initial radius about T1 and T2 is generally set based on the experience or experimental test, which will influence the accuracy and efficiency of classification. In order to solve the above problems, an improved Canopy-Kmeans algorithm is proposed. 3 IMPROVED CANOPY -KMEANS CLUSTER- ING ALGORITHM The improved algorithm in this paper is mainly divided into three steps: Firstly, the initial data will be divided into K1 coarse clusters by using the Canopy algorithm with smaller radius T1 and T2; Secondly, we will use the split method or merged method to re-construct the K1 coarse clusters to K2 convergent clusters (K1 K2); Finally, we can make the final K2 cluster centers be the initial centers on Kmeans algorithm. The algorithm architecture is shown in Figure The initial Canopy algorithm Canopy algorithm is used to obtain the coarse cluster number K1 which is greater than the final cluster number K, and it can provide the optimal initial state to step2 for splitting and merging. Canopy clustering process is as follows: A) Obtain data D form operators, and preprocess the data about its missing value, outlier and quantitative feature. B) Set a small initial radius about T1 and T2 according to the expert knowledge and business background. C) Obtain K1 rough cluster by using Canopy algorithm on data D for clustering. The initial results of Canopy clustering will get more clusters with wide coverage and high shrink, and it can avoid local optimum caused by the inappropri- 248
3 Data From Operator The first clustering for generate K1 cluster rough The second clustering for generate K2 clusters by merging and splitting The third clustering according to user requirements Data preprocessing Canopy clustering Merging and Splitting Kmeans clustering Output the final clustering results Figure 1. The architecture of improved Canopy-Kmeans. ate selection of clustering center and greatly reduce the running time of the clustering. 3.2 Cluster splitting and merging Obtain K1 coarser clusters from step1, and then we will use the operation of merge and split to adjust the cluster until the clusters have been convergence or reached the maximum count of iterations. The algorithm steps are as follows: Step1: Initial the control parameters on the basis of expert background knowledge and business: TE: The upper limit of standard deviation for each characteristic component (when the standard deviation of one cluster is greater than TE, this cluster should be split); TC: The minimum distance between two clusters center (when the distance of two clusters center is smaller than TC, they should be merged); NS: the max iterative count. Step2: Split operation: When there is a cluster s standard deviation is greater than a specified threshold, it will be divided into two categories. Calculate the standard deviation vector of each cluster s samples distance: T j 1j, 2j,, nj (3) Each component in the vector as ij N 1 j x 2 ik cij (4) N j k1 In the formula, i is the dimension of the feature vector, j is the count of clusters, Nj is the count of samples of cluster j. Calculate the maximum component σj max of each standard deviation vector σj, if σj max is greater than TE, the cluster will be split into two new clustering center Ck and Ck+1, Ck is the result about the component of σj max add to σj max /2, and Ck+1 is the result about the component of σj max minus σj max /2. If it does not meet the conditions of split, it will get into the merge operation, or it will go to step4. Step3: Merge operation: Calculate the distance Dij between the centers of each cluster. When Dij is less than TC, the two clusters should be merged into one cluster. The calculation formula of center distance Dij is as follows: n 2 Dij = C ik Cjk k1 (5) Merge the two clusters which meet merge condition to obtain the new center: * NikCik N jkcjk Ck Nik Njk (6) In the formula, the two combined clusters center vector were respectively weighted by the sample s count, and Ck * is the real average vector. Step4: If the process has been convergence (using equation (1) to judge) or the number of iterations is greater than NS, then the algorithm should be terminated. Otherwise, the number of iterations should be plus one, and it will return to step2 to adjust the cluster center. Step5: After initializing the cluster, it will analyze the outlier cluster by using metrics about square error, similarity and reparability. 3.3 Customizable Kmeans clustering The K2 clusters from step2 can be used as the initial clustering center on Kmeans. And the users can also adjust the value k according to their need, if the user wants to get more cluster, it only need to split the cluster whose standard deviation is largest; Similarly, if the user wants to get less cluster class, it only need to merge the two cluster whose distance between their center is minimum. Finally, the value k and the corresponding center can use to the traditional Kmeans algorithm for clustering. The flow chart of the algorithm is shown in Figure
4 NO IS the distance between two cluster center is less than TC YES Merge operation NO NO Figure 2. Algorithm flowchart. Start Input the parameters about radius, maximum standard deviation, minimum center distance,etc Canopy clustering Calculate The cluster center, standard deviation and the distance between each cluster Is the standard deviation of this cluster greater than TE YES Split operation Convergence or reach the maximum iterations YES Delete outlier cluster Customizable Kmeans clustering END 4 EXPERIMENTS AND RESULT ANALYSIS 4.1 Experimental preparation The experiment was running on the PC with Windows 7 operating system and 8G memory, the algorithm is implemented by Matlab, and the data are provided by the operators. 4.2 Experimental design In order to achieve better user segmentation, we will analyze the data about user s attributes, consumer behavior and communication records, and build a data mining model. Finally, this mathematical model will be applied to formulate the corresponding marketing policy for different customer groups and maximize the benefits. The experimental data are shown in Table 1. The experiment selected the attribute from Table 1 for analysis, and the training data are randomly selected from the original data and divided into 10 groups (the amount of data is from 1 million to 10 million). Moreover, the training data are subject to normal distribution, so it can ensure the randomness of the experimental data and the accuracy of experimental results. Experiment 1: In the same amount of data, it compares the proposed algorithm with the traditional Canopy algorithm on running efficiency, and the results are shown in Table 2 and Figure 3. In the parameter setting, the radius T1 and T2 of traditional Canopy clustering algorithm were set 0.5 and 0.75, and the radius T1 and T2 of the proposed algorithm were set the smaller values about 0.25 and According to the gold segmentation evaluation function of the paper [13], it can calculate the maximum cluster standard deviation TE and the minimum cluster center distance were 0.09 and 3. In the case of low data volume, the running time of these two algorithms have no obvious difference, but when the volume of data reaches a certain scale, the running time of the proposed algorithm become slowly and its convergence speed is significantly higher than the traditional Canopy algorithm. Experiment 2: For the example about the 10th data set, we can observe the influence of the initial radius to the clustering results. The results are shown in Table 3 and Figure 4, and the radius tight threshold T2 is 0.05 and 0.5 respectively. As can be seen from Figure 4, the clustering results of traditional Canopy algorithm is easy to be affected by the radius, when the Table 1. Main data and instruction in the experiment. Attribute Name Explain IMSI a number of unique identification for mobile users BRND_CD The operator's brand type INNET_DUR The total service time of user from the first use to now. Unit: Month BI_AGE_CNT User's age which registe in the operator F_ACCTBAL_AMT Account balance for month ARPU A standard to measure the operator's income. Unit: Month NB_ARPU A part of Arpu which is the main data business income of the operators GPRS_FLUX The total flow of user use per month(2g+3g+4g) G4_FLUX The 4G flow of user use per month G3_FLUX The 3g flow of user use per month MOU A measure of the telecommunications. Unit: minute INT_NORM_ROAM_CALL_CNT A number of calls which is out of the service area for month F_ NORM_ROAM_CALL_DUR A time of calls which is out of the service area for month INT_NORM _4G_FRD_PCT The proportion of 4G customers in the Top20 frequent interaction friends INT_NORM _ HFLUX _FRD_ PCT The proportion of high flow customers in the Top20 frequent interaction friends INT_NORM _ HARPU _FRD_ PCT The proportion of high arpu customers in the Top20 frequent interaction friends TERM_BRND_CD The terminal brand SI_ INVOICE _FLAG The flag of pay fees invoice for whether the customer has apply 250
5 radius T2 increases, the count of clustering results is less. On the contrary, the proposed algorithm is very stable, which is not easily affected by the radius, and the count of clusters is near to 5. Table 2. The relationship between the amount of data and running time. Running time/s The amount of The proposed data / ten thousand algorithm Traditional Canopy Figure 4. The comparison of clustering results with radius. Experiment 3: The square error can be used for evaluate the centralized degree of the cluster. If the Square error is smaller, the objects within a cluster have higher concentration and higher similarity. From Table 4 and Figure 5, we can see that the proposed algorithm is better than traditional Canopy algorithm on square error. Table 4. The relationship between the amount of data and square error. Figure 3. The chart of computing efficiency. Table 3. The relationship between radius T2 and the count of clusters. The count of clusters Radius T2 The proposed algorithm Traditional Canopy The amount of data / ten thousand Square error /10^3 The proposed algorithm Traditional Canopy Figure 5. The change of square error with the different amount of data. 251
6 5 CONCLUSIONS Kmeans is a common clustering method in data mining. In order to solve the weaknesses of Kmeans algorithm and Canopy algorithm, an improved Canopy-Kmeans algorithm is proposed. The proposed method not only retained the advantages of traditional Canopy-Kmeans algorithm, but also can adjust the clustering result according to the actual need. Experiments show that the algorithm is very helpful on personalized operation of the customer subdivision, and the next step we will combine this method with the personalized recommendation for more in-depth data mining research and application. ACKNOWLEDGEMENT This work is supported by the national Joint Funds of Guangdong province support project (No. U ) and National Natural Science Foundation of China for Young Scholar (No ), all support is gratefully acknowledged. REFERENCES [1] Zhao C., Wu Y.., Gao H Study on knowledge acquisition of the telecom customers' consuming behavior based on data mining. Wireless Communications, Networking and Mobile Computing, WiCOM'08. 4th International Conference on. IEEE, pp: 1-5. [2] Soua M., Kachouri R., Akil M A new hybrid binarization method based on K-means. Communications, Control and Signal Processing (ISCCSP), th International Symposium on. IEEE, pp: [3] Tang X.Q., Zhu P Hierarchical clustering problems and analysis of fuzzy proximity relation on granular space. Fuzzy Systems, IEEE Transactions on, 21(5): [4] Smiti A., Elouedi Z DBSCAN-GM: An improved clustering method based on Gaussian Means and DBSCAN techniques. Intelligent Engineering Systems (INES), 2012 IEEE 16th International Conference on. IEEE, pp: [5] Tsai C.F., Hu Y.C Enhancement of efficiency by thrifty search of interlocking neighbor grids approach for grid-based data clustering. Machine Learning and Cybernetics (ICMLC), 2013 International Conference on. IEEE, 3: [6] Qin X., Zheng S., Huang Y., et al Improved K-Means algorithm and application in customer segmentation. Wearable Computing Systems (APWCS), 2010 Asia-Pacific Conference on. IEEE, pp: [7] Han L.B., Wang Q., Jiang Z.F Improved k-means initial clustering center selection algorithm. Computer Engineering and Applications, 46(17): [8] Du W., Zhao C.R., Huang W.J Application of improved Kmeans cluster algorithm to customer segmentation. Journal of Hebei University of Economics and Business, 35(1): [9] Zhai D.H., Yu J., Gao F K-means text clustering algorithm based on initial cluster centers selection according to maximum distance. Application Research of Computers, 31(3): [10] Xie J.Y., Wang Y.E K-means algorithm based on minimum deviation initialized clustering centers. Computer Engineering, 40(8): [11] Mehar A.M., Matawie K., Maeder A Determining an optimal value of K in K-means clustering. Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on. IEEE, pp: [12] Na S., Xumin L, Yong G Research on k-means clustering algorithm: An improved k-means clustering algorithm. Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on. IEEE, pp: [13] Zhang L.N., Jiang X.H., Na R.S Research of large sample data clustering method based on improved ISodata algorithm. Journal of Inner Mongolia Agricultural University: Natural Sciences Edition, (1):
A Recommender System Based on Improvised K- Means Clustering Algorithm
A Recommender System Based on Improvised K- Means Clustering Algorithm Shivani Sharma Department of Computer Science and Applications, Kurukshetra University, Kurukshetra Shivanigaur83@yahoo.com Abstract:
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationAn Optimization Algorithm of Selecting Initial Clustering Center in K means
2nd International Conference on Machinery, Electronics and Control Simulation (MECS 2017) An Optimization Algorithm of Selecting Initial Clustering Center in K means Tianhan Gao1, a, Xue Kong2, b,* 1 School
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationFUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 14 (2007) 103-111 Copyright c 2007 Watam Press FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
More informationA Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data
Journal of Computational Information Systems 11: 6 (2015) 2139 2146 Available at http://www.jofcis.com A Fuzzy C-means Clustering Algorithm Based on Pseudo-nearest-neighbor Intervals for Incomplete Data
More informationResearch on Mining Cloud Data Based on Correlation Dimension Feature
2016 4 th International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2016) ISBN: 978-1-60595-412-7 Research on Mining Cloud Data Based on Correlation Dimension Feature Jingwen
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationYunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction
More informationAn Improved KNN Classification Algorithm based on Sampling
International Conference on Advances in Materials, Machinery, Electrical Engineering (AMMEE 017) An Improved KNN Classification Algorithm based on Sampling Zhiwei Cheng1, a, Caisen Chen1, b, Xuehuan Qiu1,
More informationResearch on Design and Application of Computer Database Quality Evaluation Model
Research on Design and Application of Computer Database Quality Evaluation Model Abstract Hong Li, Hui Ge Shihezi Radio and TV University, Shihezi 832000, China Computer data quality evaluation is the
More informationNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm Abhishek Patel Department of Information & Technology, Parul Institute of Engineering & Technology, Vadodara, Gujarat, India Purnima Singh Department of
More informationClustering part II 1
Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationImproved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationMRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid
, pp.119-128 http://dx.doi.org/10.14257/ijdta.2015.8.2.12 MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid Li Ma 1, 2, 3, Lei Gu 1, 2, Bo Li 1, 4, Shouyi Qiao 1, 2 1, 2, 3,
More informationOpen Access Research on the Prediction Model of Material Cost Based on Data Mining
Send Orders for Reprints to reprints@benthamscience.ae 1062 The Open Mechanical Engineering Journal, 2015, 9, 1062-1066 Open Access Research on the Prediction Model of Material Cost Based on Data Mining
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationA Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes
Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com A Balancing Algorithm in Wireless Sensor Network Based on the Assistance of Approaching Nodes 1,* Chengpei Tang, 1 Jiao Yin, 1 Yu Dong 1
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationImprovements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1
3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao
More informationRegression Based Cluster Formation for Enhancement of Lifetime of WSN
Regression Based Cluster Formation for Enhancement of Lifetime of WSN K. Lakshmi Joshitha Assistant Professor Sri Sai Ram Engineering College Chennai, India lakshmijoshitha@yahoo.com A. Gangasri PG Scholar
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationParallelization of K-Means Clustering Algorithm for Data Mining
Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com
More informationResearch on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1, Shengmei Luo 1, Tao Wen 2
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) Research on Parallelized Stream Data Micro Clustering Algorithm Ke Ma 1, Lingjuan Li 1, Yimu Ji 1,
More informationA new improved ant colony algorithm with levy mutation 1
Acta Technica 62, No. 3B/2017, 27 34 c 2017 Institute of Thermomechanics CAS, v.v.i. A new improved ant colony algorithm with levy mutation 1 Zhang Zhixin 2, Hu Deji 2, Jiang Shuhao 2, 3, Gao Linhua 2,
More informationTwo Algorithms of Image Segmentation and Measurement Method of Particle s Parameters
Appl. Math. Inf. Sci. 6 No. 1S pp. 105S-109S (2012) Applied Mathematics & Information Sciences An International Journal @ 2012 NSP Natural Sciences Publishing Cor. Two Algorithms of Image Segmentation
More informationThe Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI
2017 International Conference on Electronic, Control, Automation and Mechanical Engineering (ECAME 2017) ISBN: 978-1-60595-523-0 The Establishment of Large Data Mining Platform Based on Cloud Computing
More informationLesson 3. Prof. Enza Messina
Lesson 3 Prof. Enza Messina Clustering techniques are generally classified into these classes: PARTITIONING ALGORITHMS Directly divides data points into some prespecified number of clusters without a hierarchical
More informationClustering in Ratemaking: Applications in Territories Clustering
Clustering in Ratemaking: Applications in Territories Clustering Ji Yao, PhD FIA ASTIN 13th-16th July 2008 INTRODUCTION Structure of talk Quickly introduce clustering and its application in insurance ratemaking
More informationKapitel 4: Clustering
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationCHAPTER 3 ASSOCIATON RULE BASED CLUSTERING
41 CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 3.1 INTRODUCTION This chapter describes the clustering process based on association rule mining. As discussed in the introduction, clustering algorithms have
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationAN IMPROVED DENSITY BASED k-means ALGORITHM
AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology
More informationPAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods
Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:
More informationPower Load Forecasting Based on ABC-SA Neural Network Model
Power Load Forecasting Based on ABC-SA Neural Network Model Weihua Pan, Xinhui Wang College of Control and Computer Engineering, North China Electric Power University, Baoding, Hebei 071000, China. 1471647206@qq.com
More informationResearch and Application of E-Commerce Recommendation System Based on Association Rules Algorithm
Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm Qingting Zhu 1*, Haifeng Lu 2 and Xinliang Xu 3 1 School of Computer Science and Software Engineering,
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationA New Online Clustering Approach for Data in Arbitrary Shaped Clusters
A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK
More informationFeature-Guided K-Means Algorithm for Optimal Image Vector Quantizer Design
Journal of Information Hiding and Multimedia Signal Processing c 2017 ISSN 2073-4212 Ubiquitous International Volume 8, Number 6, November 2017 Feature-Guided K-Means Algorithm for Optimal Image Vector
More informationResearch on Data Mining Technology Based on Business Intelligence. Yang WANG
2018 International Conference on Mechanical, Electronic and Information Technology (ICMEIT 2018) ISBN: 978-1-60595-548-3 Research on Data Mining Technology Based on Business Intelligence Yang WANG Communication
More informationAutomatic Shadow Removal by Illuminance in HSV Color Space
Computer Science and Information Technology 3(3): 70-75, 2015 DOI: 10.13189/csit.2015.030303 http://www.hrpub.org Automatic Shadow Removal by Illuminance in HSV Color Space Wenbo Huang 1, KyoungYeon Kim
More informationSYDE Winter 2011 Introduction to Pattern Recognition. Clustering
SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned
More informationAutomatic Grayscale Classification using Histogram Clustering for Active Contour Models
Research Article International Journal of Current Engineering and Technology ISSN 2277-4106 2013 INPRESSCO. All Rights Reserved. Available at http://inpressco.com/category/ijcet Automatic Grayscale Classification
More informationTraffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan
5th International Conference on Civil Engineering and Transportation (ICCET 205) Traffic Flow Prediction Based on the location of Big Data Xijun Zhang, Zhanting Yuan Lanzhou Univ Technol, Coll Elect &
More informationDesign of student information system based on association algorithm and data mining technology. CaiYan, ChenHua
5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017) Design of student information system based on association algorithm and data mining technology
More informationResearch on QR Code Image Pre-processing Algorithm under Complex Background
Scientific Journal of Information Engineering May 207, Volume 7, Issue, PP.-7 Research on QR Code Image Pre-processing Algorithm under Complex Background Lei Liu, Lin-li Zhou, Huifang Bao. Institute of
More informationIntegration of information security and network data mining technology in the era of big data
Acta Technica 62 No. 1A/2017, 157 166 c 2017 Institute of Thermomechanics CAS, v.v.i. Integration of information security and network data mining technology in the era of big data Lu Li 1 Abstract. The
More informationEnergy Optimized Routing Algorithm in Multi-sink Wireless Sensor Networks
Appl. Math. Inf. Sci. 8, No. 1L, 349-354 (2014) 349 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/081l44 Energy Optimized Routing Algorithm in Multi-sink
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationHybrid ant colony optimization algorithm for two echelon vehicle routing problem
Available online at www.sciencedirect.com Procedia Engineering 15 (2011) 3361 3365 Advanced in Control Engineering and Information Science Hybrid ant colony optimization algorithm for two echelon vehicle
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationSOMSN: An Effective Self Organizing Map for Clustering of Social Networks
SOMSN: An Effective Self Organizing Map for Clustering of Social Networks Fatemeh Ghaemmaghami Research Scholar, CSE and IT Dept. Shiraz University, Shiraz, Iran Reza Manouchehri Sarhadi Research Scholar,
More information数据挖掘 Introduction to Data Mining
数据挖掘 Introduction to Data Mining Philippe Fournier-Viger Full professor School of Natural Sciences and Humanities philfv8@yahoo.com Spring 2019 S8700113C 1 Introduction Last week: Association Analysis
More informationResearch on the Wood Cell Contour Extraction Method Based on Image Texture and Gray-scale Information.
, pp. 65-74 http://dx.doi.org/0.457/ijsip.04.7.6.06 esearch on the Wood Cell Contour Extraction Method Based on Image Texture and Gray-scale Information Zhao Lei, Wang Jianhua and Li Xiaofeng 3 Heilongjiang
More informationA Kind of Wireless Sensor Network Coverage Optimization Algorithm Based on Genetic PSO
Sensors & Transducers 2013 by IFSA http://www.sensorsportal.com A Kind of Wireless Sensor Network Coverage Optimization Algorithm Based on Genetic PSO Yinghui HUANG School of Electronics and Information,
More informationSpectral Methods for Network Community Detection and Graph Partitioning
Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection
More informationA Data Classification Algorithm of Internet of Things Based on Neural Network
A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To
More informationAn Energy Efficiency Routing Algorithm of Wireless Sensor Network Based on Round Model. Zhang Ying-Hui
Joint International Mechanical, Electronic and Information Technology Conference (JIMET 2015) An Energy Efficiency Routing Algorithm of Wireless Sensor Network Based on Round Model Zhang Ying-Hui Software
More informationData Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Clustering Clustering Algorithms Contents K-means Hierarchical algorithms Linkage functions Vector quantization SOM Clustering Formulation
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationRelated Work The Concept of the Signaling. In the mobile communication system, in addition to transmit the necessary user information (usually voice
International Conference on Information Science and Computer Applications (ISCA 2013) The Research and Design of Personalization preferences Based on Signaling analysis ZhiQiang Wei 1,a, YiYan Zhang 1,b,
More informationLecture-17: Clustering with K-Means (Contd: DT + Random Forest)
Lecture-17: Clustering with K-Means (Contd: DT + Random Forest) Medha Vidyotma April 24, 2018 1 Contd. Random Forest For Example, if there are 50 scholars who take the measurement of the length of the
More informationHARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION
HARD, SOFT AND FUZZY C-MEANS CLUSTERING TECHNIQUES FOR TEXT CLASSIFICATION 1 M.S.Rekha, 2 S.G.Nawaz 1 PG SCALOR, CSE, SRI KRISHNADEVARAYA ENGINEERING COLLEGE, GOOTY 2 ASSOCIATE PROFESSOR, SRI KRISHNADEVARAYA
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationUAV Motion-Blurred Image Restoration Using Improved Continuous Hopfield Network Image Restoration Algorithm
Journal of Information Hiding and Multimedia Signal Processing c 207 ISSN 2073-422 Ubiquitous International Volume 8, Number 4, July 207 UAV Motion-Blurred Image Restoration Using Improved Continuous Hopfield
More informationPerformance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms
Performance Measure of Hard c-means,fuzzy c-means and Alternative c-means Algorithms Binoda Nand Prasad*, Mohit Rathore**, Geeta Gupta***, Tarandeep Singh**** *Guru Gobind Singh Indraprastha University,
More informationA COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA
A COMPETITION BASED ROOF DETECTION ALGORITHM FROM AIRBORNE LIDAR DATA HUANG Xianfeng State Key Laboratory of Informaiton Engineering in Surveying, Mapping and Remote Sensing (Wuhan University), 129 Luoyu
More informationAn Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid
An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid Demin Wang 2, Hong Zhu 1, and Xin Liu 2 1 College of Computer Science and Technology, Jilin University, Changchun
More informationDESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM
DESIGN AND IMPLEMENTATION OF VARIABLE RADIUS SPHERE DECODING ALGORITHM Wu Di, Li Dezhi and Wang Zhenyong School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, China
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationAn algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 015) An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng
More informationT-S Neural Network Model Identification of Ultra-Supercritical Units for Superheater Based on Improved FCM
Research Journal of Applied Sciences, Engineering and echnology 4(4): 247-252, 202 ISSN: 2040-7467 Maxwell Scientific Organization, 202 Submitted: March 2, 202 Accepted: April 03, 202 Published: July 5,
More informationA Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar
Vol.137 (SUComS 016), pp.8-17 http://dx.doi.org/1457/astl.016.137.0 A Fast Speckle Reduction Algorithm based on GPU for Synthetic Aperture Sonar Xu Kui 1, Zhong Heping 1, Huang Pan 1 1 Naval Institute
More informationCLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationDigital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering
Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation - III Lecture - 31 Hello, welcome
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More informationAlgorithm research of 3D point cloud registration based on iterative closest point 1
Acta Technica 62, No. 3B/2017, 189 196 c 2017 Institute of Thermomechanics CAS, v.v.i. Algorithm research of 3D point cloud registration based on iterative closest point 1 Qian Gao 2, Yujian Wang 2,3,
More informationTemperature Calculation of Pellet Rotary Kiln Based on Texture
Intelligent Control and Automation, 2017, 8, 67-74 http://www.scirp.org/journal/ica ISSN Online: 2153-0661 ISSN Print: 2153-0653 Temperature Calculation of Pellet Rotary Kiln Based on Texture Chunli Lin,
More informationAn Adaptive Threshold LBP Algorithm for Face Recognition
An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent
More informationPrivacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationResearch on Hybrid Network Technologies of Power Line Carrier and Wireless MAC Layer Hao ZHANG 1, Jun-yu LIU 2, Yi-ying ZHANG 3 and Kun LIANG 3,*
2017 International Conference on Computer, Electronics and Communication Engineering (CECE 2017) ISBN: 978-1-60595-476-9 Research on Hybrid Network Technologies of Power Line Carrier and Wireless MAC Layer
More informationAn Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles
2016 2 nd International Conference on Energy, Materials and Manufacturing Engineering (EMME 2016) ISBN: 978-1-60595-441-7 An Improved DFSA Anti-collision Algorithm Based on the RFID-based Internet of Vehicles
More informationComparative Study of Subspace Clustering Algorithms
Comparative Study of Subspace Clustering Algorithms S.Chitra Nayagam, Asst Prof., Dept of Computer Applications, Don Bosco College, Panjim, Goa. Abstract-A cluster is a collection of data objects that
More informationCHAPTER-6 WEB USAGE MINING USING CLUSTERING
CHAPTER-6 WEB USAGE MINING USING CLUSTERING 6.1 Related work in Clustering Technique 6.2 Quantifiable Analysis of Distance Measurement Techniques 6.3 Approaches to Formation of Clusters 6.4 Conclusion
More informationAn Efficient Clustering for Crime Analysis
An Efficient Clustering for Crime Analysis Malarvizhi S 1, Siddique Ibrahim 2 1 UG Scholar, Department of Computer Science and Engineering, Kumaraguru College Of Technology, Coimbatore, Tamilnadu, India
More informationOrganization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology
, pp.49-54 http://dx.doi.org/10.14257/astl.2014.45.10 Organization and Retrieval Method of Multimodal Point of Interest Data Based on Geo-ontology Ying Xia, Shiyan Luo, Xu Zhang, Hae Yong Bae Research
More informationResearch and Improvement on K-means Algorithm Based on Large Data Set
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 6 Issue 7 July 2017, Page No. 22145-22150 Index Copernicus value (2015): 58.10 DOI: 10.18535/ijecs/v6i7.40 Research
More informationMetric and Identification of Spatial Objects Based on Data Fields
Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 25-27, 2008, pp. 368-375 Metric and Identification
More informationMultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A
MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI
More informationWeb Data mining-a Research area in Web usage mining
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More information5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015)
5th International Conference on Information Engineering for Mechanics and Materials (ICIMM 2015) An Improved Watershed Segmentation Algorithm for Adhesive Particles in Sugar Cane Crystallization Yanmei
More information