Cluster Analysis of Electrical Behavior

Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School of Electrcal and Electronc Engneerng, North Chna Electrc Power Unversty, Beng, Chna Emal: bhdluln@63.com Receved February 205 Abstract In ths paper, we apply clusterng analyss of data mnng nto power system. We adapt K-means clusterng algorthm to analyze customer load, analyzng smlar behavor between customer of electrcty, and we adapt prncpal component analyss to get the clusterng result vsble, Smulaton and analyss usng matlab, and ths well verfy cluster ratonalty. The concluson of ths paper can provde mportant bass to the peak for the power system, stable operaton the power system securty. Keywords K-Means Clusterng Analyss, Prncple Component Analyss, The Power System. Introducton On the one hand, n the age of bg data such a massve nformaton, data affects our works and lves every second, data mnng and clusterng analyss s becomng more and more mportant, on the other hand, Wth the rapd development of our natonal economy, the power consumpton s larger and larger. And our current power source s manly rely on thermal power, n order to ensure the stable operaton of power system, power dspatch and peak becomes more and more mportant. The clusterng analyss to customer power load s a key lnk n power decson. Therefore, ths paper wll focus on the applcaton of large data n power system. Clusterng algorthm can be dvded nto dfferent classfcaton wth dfferent standards. Commonly used algorthms n clusterng analyss nclude K-means clusterng algorthm, agglomeratve herarchcal clusterng algorthm, SOM of neural network clusterng algorthm, the FCM of fuzzy clusterng algorthm, and so on []. By comparson, we dscover that the K-MEANS program and the FCM program have good comprehensve performance, however the FCM program are too complex for us to use. The power system data s produced every second, so the K- MEAN program are outstandng for ts hghly effcency. We select K-means clusterng algorthm to analyze the customer power load. Thus may balance power load accordng to dfferent classfcaton. And ths can provde dfferent servce to dfferent knds of customers. The characterstc of ths artcle s: every detal s analyzed from rom the generaton of customer power load to data clusterng. 2. The Source Data of Power Load The source of data used for clusterng analyss n ths paper comes from reference [2]. We sample the reference data, then nterpolate, ths makes data regeneraton. We select 4 classfcatons of power load, each classfcaton How to cte ths paper: Lu, L. (205) Cluster Analyss of Electrcal Behavor. Journal of Computer and Communcatons, 3, 88-93. http://dx.do.org/0.4236/cc.205.350

L. Lu respectvely have 00 sets of data, a total of 400 sets of data. To analyze one day s power load, every 0 mnutes for a sample, each data set contans 44 numercal (as s shown n Fgure ). 3.. The Algorthm and Process Clusterng s one of the mportant research topcs n data mnng, s the process of physcal obects nto multple classes or clusters [3] [4]. The obects n the same cluster are as smlar as possble, whle obects n dfferent clusters as dfferent as possble. Clusterng can handle dfferent feld types and dscover clusters of arbtrary shape, t can process the abnormal data, Clusterng s not senstve to data order and less dependent of professonal knowledge. K-means algorthm s one of the most classc clusterng algorthms commonly used n the present, t has advantages n the followng three aspects [5]: It s quck and sample; For large data sets wth hgh effcency and scalablty; It has nearly lnear tme complexty, and t s sutable for mnng large data sets. K-Means clusterng algorthm's tme complexty s a functon of n, k, and t. Where n stands for the number of obects n data sets, t stands for number of the teraton algorthm, k stands for the number of clusters. So ths paper uses the K-means clusterng algorthm to analyze customer load, and desgn a flow chart as shown n Fgure 2. 3.2. The Steps of K-Means Algorthm, k Randomly select k ponts as the ntal clusterng center µ µ 2,..., µ n the data sets { x } = and N s the number of samples On the of sample ponts x n the data sets µ, calculate Eucldean dstance between t and the clusterng center, and get ts category label ( ) µ arg mn x µ =,..., N; =,..., k Recalculate the k cluster centers, accordng to type (2) µ = x, =,..., k (2) In the formula, N x µ N s the number of obects n clusters µ 2 N () Fgure. Source data: customer load. 89

L. Lu Fgure 2. Flow chart. Repeat step 2) and step 3), untl t reaches the convergence crteron functon. The evaluaton of convergence s based on the square error crteron, as shown n Formula (3). k 2 (3) = µ E = x m In the formula, E s the sum of square error of all the obects n the database; x s a pont n space; m s the average value of the cluster u. Ths obectve functon makes the generated clusters as compact as possble and ndependent. Usng the above K-means algorthm, cluster analyss was performed on the data obtaned, thus draw customer load can be dvded nto 4 categores obvously, and the clusterng center s shown n Fgure 5. 4. Vsualzaton of Clusterng Results Because of the use of the large amount of data, selected 44 samplng moments every day, we can t express the clusterng results drectly. In order to get the clusterng results vsual, we use the method of prncpal component analyss (PCA) to study the clusterng results. PCA s a mathematcal method of dmensonalty reducton. It can take many varables wth certan correlaton nto a set of new ndependent varables [6]. Use as few varables as possble to express as much nformaton as possble, ths s one of the basc prncples of PCA. By clusterng analyss, 44 dmensonal data s mapped to a 3 dmensonal space, then analyss. That s, select 3 prncpal components. The followng s the ntroducton about the calculaton steps of PCA standardzaton processng of the orgnal data. Assumng the sample observaton data matrx s x x2 x p x2 x22 x2 p X = xn xn2 xnp Then the orgnal data were standardzed accordng to the followng methods x * x x = ( =, 2,, n; =, 2,, p) Var( x ) where, x n n = = x 90

L. Lu n Var( x ) = ( x x ) ( =, 2,, p) n = Calculaton of sample correlaton coeffcent matrx For the sake of convenence, assumng that the orgnal data standardzaton s stll denoted by X, the correlaton coeffcent matrx after data standardzaton s where r r2 r p r2 r22 r2 p R = rp rp 2 rpp n ( x x )( x x ) r x x n n k = = cov(, ) = ( > ) The calculaton of characterstc value and correspondng characterstc vector of relaton coeffcent matrx R. Characterstc value: λ, λ2,, λp Characterstc vector: a = ( a, a,, a ), =, 2,, p 2 p Select the mportant prncpal components, and gve the expresson Through prncpal component analyss, we can get p prncpal components, but because the varance of each prncpal component s decreasng, the quantty of nformaton s declnng. In practcal analyss, accordng to the prncpal component contrbuton to select the frst k prncpal components, usually the accumulatve contrbuton rate can reach more than 85%, n order to ensure the ntegrated varables carry most nformaton of orgnal varables. where a a2 a3 λ contrbuton rate = p a2 a22 a23 λ a3 a32 a 33 = The calculaton of prncpal component scores Accordng to the orgnal data standardzaton, the prncpal component values are calculated from the expresson for each sample, you can get all new sample data n each prncpal component, that s, the prncpal component scores. The specfc form s as follows: where F = a x + a x + + a x =, 2,, n; =, 2,, k 2 2 p p F F2 F k F2 F22 F2 k Fn Fn 2 Fnk Snce we fnshed the clusterng analyss, prncpal component analyss, then make the scatter of clusterng results, by MATLAB. As shown n Fgure 5. 5. Smulaton Analyss In order to verfy the ratonalty of K-means algorthm used n ths paper, ths paper uses MATLAB smulaton 9

L. Lu analyss to explan. Fgure s the customer load source data before clusterng analyss, Fgure 3 s each clusterng center after clusterng, two pcture comparson, all customer load were clustered nto the D category, the customer load 2 were clustered nto the B category, the customer load 3 were clustered nto A category, the customer load 4 were clustered nto the C category, whch can prove that our clusterng method s reasonable. Fgure 4 depcts all ponts to the clusterng center dstance. Where Subgraph A conveys the dstance from all 4 knds of customer load to A clusterng center, we can see that all ponts of customer load 3 s nearest to A clusterng center Subgraph B conveys the dstance from all 4 knds of customer load to B clusterng center, we can see that all ponts of customer load 2 s nearest to B clusterng center Subgraph C conveys the dstance from all 4 knds of customer load to C clusterng center, we can see that all ponts of customer load 4 s nearest to C clusterng center Subgraph D conveys the dstance from all 4 knds of customer load to D clusterng center, we can see that all ponts of customer load s nearest to D clusterng center Ths s consstent wth the defnton of the clusterng center, the relatonshp s also compatble wth Fgure and Fgure 3 exhbted by. In the earler analyss based K-means clusterng and prncpal component analyss, draw the vsualzaton map accordng to clusterng analyss results, as Fgure 5, from the fgure, we can also drectly obtaned that the customer load can be dvded nto 4 categores, these four categores correspondng to four types of customer load types n the source data, further proved the correctness of ths clusterng analyss. Fgure 3. Clusterng center. Fgure 4. The dstance between each pont and each clusterng center. 92

L. Lu 6. Concluson Fgure 5. The scatter of the clusterng results. In ths paper, the K-means clusterng method n data mnng was used n power system on the clusterng analyss power load of customer, and the method of prncpal component analyss was used on the clusterng results vsualzaton, fully prove the ratonalty and correctness of the clusterng. To provde an mportant bass for the power system decson, and ensure the stable operaton of power system. References [] Feng, X.P. and Zhang, T.F. (200) Comparson of Four Clusterng Methods. Mcrocomputer and Applcaton, 6. [2] Zhang, M.M., Chen, J.Q., Wang, K., Peng, B. and Wu, H. (204) The Mult Tme Scale Coordnated Orderly Power Use Centralzed Decson Method. Automaton of Electrc Power Systems, 38, 70-77. [3] Xu, J., Huang, Y.L. and L, F. (2004) Research on Comparng the Sequental Learnng wth Batch Learnng for K-Means. Computer Scence, 3, 56-58. [4] Keogh, E. and Pazzan, M. (998) An Enhanced Representaton of Tme Seres Whch Allows Fast and Accurate Classfcaton, Clusterng and Relevance Feedback. Proceedngs of the 3rd Internatonal Conference of Knowledge Dscovery and Data Mnng, The Assocaton for the Advancement of Artfcal Intellgence, New York, 239-24. [5] Zhang, S.X., Lu, J.M., Zhao, B.Z. and Cao, J.P. (203) Analyss of Cloud Computng of Resdental Consumpton Behavor Model Based on. Power System Technology, 37, 542-546. [6] Zhuo, J.W., L, B.W., We, Y.S. and Qn, J. (204) Applcaton of MTLAB n Mathematcal Modelng. The Behang Unversty Press, Beng, 39-4. 93