Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority

Similar documents
Research on big data risk assessment of major transformer defects and faults fusing power grid, equipment and environment based on SVM

Intelligent Control of Micro Grid: A Big Data-Based Control Center

Research on the raw data processing method of the hydropower construction project

Quality Assessment of Power Dispatching Data Based on Improved Cloud Model

Coordinated and Unified Control Scheme of IP and Optical Networks for Smart Power Grid

Design of student information system based on association algorithm and data mining technology. CaiYan, ChenHua

Research on Design and Application of Computer Database Quality Evaluation Model

Data Mining and Business Process Management of Apriori Algorithm

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Extraction of cross-sea bridges from GF-2 PMS satellite images using mathematical morphology

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Implementation and performance test of cloud platform based on Hadoop

Jacquard Control System of Warp Knitting Machine Based on Embedded System

Research on Reconfigurable Instrument Technology of Portable Test System of Missiles

Research on Heterogeneous Communication Network for Power Distribution Automation

A Design of Remote Monitoring System based on 3G and Internet Technology

Design of Remote GPRS-based Gas Data Monitoring System

Traffic Flow Prediction Based on the location of Big Data. Xijun Zhang, Zhanting Yuan

Trust4All: a Trustworthy Middleware Platform for Component Software

A Data Classification Algorithm of Internet of Things Based on Neural Network

Research on Hybrid Network Technologies of Power Line Carrier and Wireless MAC Layer Hao ZHANG 1, Jun-yu LIU 2, Yi-ying ZHANG 3 and Kun LIANG 3,*

The Establishment of Large Data Mining Platform Based on Cloud Computing. Wei CAI

Research on Two - Way Interactive Communication and Information System Design Analysis Dong Xu1, a

A New Method Of VPN Based On LSP Technology

The Research of Internet of Things in Operation and Maintenance for Distribution Grid

A fault diagnosis system for PV power station based on global partitioned gradually approximation method

Implementation and Design of Security Configuration Check Toolkit for Classified Evaluation of Information System

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

The research and implementation of coalfield spontaneous combustion of carbon emission WebGIS based on Silverlight and ArcGIS server

Research and Design of Data Storage Scheme for Electric Power Big Data

The Vibration Characteristics Analysis of Damping System of Wallmounted Airborne Equipment Based on FEM

manufacturing process.

Test Analysis of Serial Communication Extension in Mobile Nodes of Participatory Sensing System Xinqiang Tang 1, Huichun Peng 2

Efficient Path Finding Method Based Evaluation Function in Large Scene Online Games and Its Application

Modal Analysis of Three Dimensional Numerical Control Laser Cutting Machine Based on Finite Element Method Yun-Xin CHEN*

Measuring the power consumption of social media applications on a mobile device

Design of Precise Control and Dynamic Simulation of Manipulator for Die-casting Mould

Open Access Research on the Prediction Model of Material Cost Based on Data Mining

Study on One Map Organizational Model for Land Resources Data Used in Supervision and Service

Open Access Algorithm of Context Inconsistency Elimination Based on Feedback Windowing and Evidence Theory for Smart Home

Nodes Energy Conserving Algorithms to prevent Partitioning in Wireless Sensor Networks

Research on Approach of Equipment Status and Operation Information Acquisition Based on Equipment Control Bus

Application of Augmented Reality Technology in Workshop Production Management

Preliminary Research on Distributed Cluster Monitoring of G/S Model

Power Distribution Analysis For Electrical Usage In Province Area Using Olap (Online Analytical Processing)

The application of OLAP and Data mining technology in the analysis of. book lending

Principles of E-network modelling of heterogeneous systems

The Design and Implementation of Disaster Recovery in Dual-active Cloud Center

Mobile 3D laser scanning technology application in the surveying of urban underground rail transit

Research on Scalable Zigbee Wireless Sensor Network Expansion Solution

Bridge Surface Crack Detection Method

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

Research Article Modeling and Simulation Based on the Hybrid System of Leasing Equipment Optimal Allocation

Design of EIB-Bus Intelligent Control System Based on Touch Screen Control

Exploration of Fault Diagnosis Technology for Air Compressor Based on Internet of Things

Implementation of RS-485 Communication between PLC and PC of Distributed Control System Based on VB

An Improved KNN Classification Algorithm based on Sampling

Design of Smart Home System Based on ZigBee Technology and R&D for Application

Research and Design of Key Technology of Vertical Search Engine for Educational Resources

An Information Management System of Spot Check in Wind Farms

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Microelectronics Reliability

A Kind of Wireless Sensor Network Coverage Optimization Algorithm Based on Genetic PSO

2017 2nd International Conference on Communications, Information Management and Network Security (CIMNS 2017) ISBN:

An intelligent LED landscape lighting system

Fault detection with principal component pursuit method

Landslide Monitoring Point Optimization. Deployment Based on Fuzzy Cluster Analysis.

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

A Data Collecting and Caching Mechanism for Gateway Middleware in the Web of Things

Developing a Roadmap for A Smart(er) and Strong Transmission Grid

Design of the Power Online Monitoring System Based on LabVIEW

Web Data mining-a Research area in Web usage mining

Research on Geo-information Data Model for Preselected Areas of Geological Disposal of Highlevel Radioactive Waste

Alternative Improving the Quality of Sub-Voltage Transmission System using Static Var Compensator

A NOVEL SECURED BOOLEAN BASED SECRET IMAGE SHARING SCHEME

Research on Algorithm Schema of Parametric Architecture Design Based on Schema Theory

The Study of Intelligent Scheduling Algorithm in the Vehicle ECU based on CAN Bus

Application of SQL in Data Management about TBM Breaking Test Machine

Developing Control System of Electrical Devices with Operational Expense Prediction

Research on the Improvement of the RTLAB Simulation Efficiency in DG

UAV Motion-Blurred Image Restoration Using Improved Continuous Hopfield Network Image Restoration Algorithm

RoboCup 2014 Rescue Simulation League Team Description. <CSU_Yunlu (China)>

Design and Implementation of Agricultural Information Resources Vertical Search Engine Based on Nutch

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

Research on the Checkpoint Server Selection Strategy Based on the Mobile Prediction in Autonomous Vehicular Cloud

Wavelength Estimation Method Based on Radon Transform and Image Texture

Open Access The Three-dimensional Coding Based on the Cone for XML Under Weaving Multi-documents

Distributed Bottom up Approach for Data Anonymization using MapReduce framework on Cloud

Modal and harmonic response analysis of key components of robotic arm based on ANSYS

Big Data Analytics: Research Needs. Ali Ghassemian

Serial Communication Based on LabVIEW for the Development of an ECG Monitor

Hole repair algorithm in hybrid sensor networks

A Decision Support System Based on SSH and DWR for the Retail Industry

Research on Quality Inspection method of Digital Aerial Photography Results

Data Cleaning for Power Quality Monitoring

Comparison of 3-D Fracture Analysis Methods Based on ANSYS Workbench

Big Data and Fans Economy Oriented Star Node Mining Improved Algorithm in Social Networks Based on Acquaintance Immunization

Mining Quantitative Association Rules on Overlapped Intervals

Performance Degradation Assessment and Fault Diagnosis of Bearing Based on EMD and PCA-SOM

Transcription:

IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority To cite this article: Lijuan Guo et al 2018 IOP Conf. Ser.: Earth Environ. Sci. 108 052026 View the article online for updates and enhancements. This content was downloaded from IP address 148.251.232.83 on 12/10/2018 at 02:18

Comprehensive analysis and evaluation of big data for main transformer equipment based on PCA and Apriority Lijuan Guo 1, Haijun Yan 1, Yongqi Hao 2,* and Yun Chen 3 1 Guangxi Electric Power Research Institute, Nanning, China; 2 School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China; 3 School of Electrical Engineering, Tsinghua University, Bejing, China. *Corresponding author e-mail: haoyongqi001@163.com Abstract. With the power supply level of urban power grid toward high reliability development, it is necessary to adopt appropriate methods for comprehensive evaluation of existing equipment. Considering the wide and multi-dimensional power system data, the method of large data mining is used to explore the potential law and value of power system equipment. Based on the monitoring data of main transformer and the records of defects and faults, this paper integrates the data of power grid equipment environment. Apriori is used as an association identification algorithm to extract the frequent correlation factors of the main transformer, and the potential dependence of the big data is analyzed by the support and confidence. Then, the integrated data is analyzed by PCA, and the integrated quantitative scoring model is constructed. It is proved to be effective by using the test set to validate the evaluation algorithm and scheme. This paper provides a new idea for data fusion of smart grid, and provides a reference for further evaluation of big data of power grid equipment. 1. Introduction With the development of urban power grid, the automated level of power grid equipment is also improved. As a result, the data generated by the power grid, both kinds and quantities, is increasing rapidly. Because of the universality and multi-dimension of power grid data, the disadvantages of traditional methods for data modeling and analysis will be highlighted. Due to the low utilization of data, the existing equipment quantitative evaluation system and model have been suspected by power system staff. In order to solve the above problems, in the case of the continuous expansion of data types and the unreliability of data, the machine learning algorithm is used to avoid the drawbacks of traditional data modeling and analysis. Then, the big data mining technology is used to quickly extract the potential law of data, so as to form an elastic comprehensive evaluation scheme. The data in this paper are from the monitoring, defect, test and other data of the whole network main transformer equipment in Guangxi Province. Firstly, the existing data is pretreated and fused, and then the evaluation scheme based on PCA and Apriority is designed for the main transformer. Finally, the algorithm is compiled and implemented, and the experimental results are analyzed. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by Ltd 1

2. Data fusion Big data has been widely used in commercial, financial and other fields, the development of intelligent power grid also provides a basis for big data applications[1]-[7].according to data sources, smart grid data can be divided into two broad categories: one is the internal data of the grid, and the other is external data. The internal data from the electricity information collection system (collection system information, CIS), marketing system, wide area monitoring system (wide area measurement system, WAMS), production management system, distribution management system (production management system, PMS), the energy management system (energy management system, EMS), detection and monitoring systems and equipment customer service system, financial management system data. The external data comes from electric vehicle charging management system, weather information system, geographic information system (GIS), public service department, Internet and so on. These data are scattered in different places, managed by different units/departments, and have the characteristics of decentralized placement and distributed management [8]-[10].. Figure 1. Data fusion and specific data structure of power equipment Therefore, on the basis of data fusion analysis, this paper uses Apriority algorithm as the main algorithm of equipment association analysis, and uses PCA to analyze and evaluate each attribute data. Finally, a test set is extracted to analyze the effect of this evaluation scheme. 3. Raise and analysis of the evaluation scheme It is difficult to analyze and evaluate the main transformer, such as large amount of information, complex types, and the association analysis of multidimensional data. At the same time, due to the lack of effective integrated information management and data mining analysis methods, all kinds of data cannot be fully utilized. The cost of operation and maintenance of transmission and transformation equipment is high during the whole life cycle. Thus, to make a comprehensive evaluation of power transmission and transformation equipment, it is necessary to fuse the data first, and the logic is shown below. In the data preprocessing and fusion stage, according to the equipment data sheet, combined with accounting information, construct the data associated with a main transformer only ID, and associated monitoring, inspection and important defects and fault data. In the data preprocessing and fusion stage, according to the equipment data sheet, the big data is combined with accounting information. The paper constructs the data associated with the main transformer only ID, to achieve data association of monitoring, inspection and defects as well as faults. Considering the related factors of environment and power grid data, the power grid data and environment data are extracted from each kind of data, which is related to the equipment ID, so as to fuse all the data and realize the pretreatment. In Figure 2, the level of pollution zone reflects the environmental factors 2

corresponding to the equipment. Cumulative number of failures and the number of months running normally reflect the information storage of the power grid. The equipment itself includes the commissioning time, voltage level and type and so on. As can be seen from the fusion data, the device related factors are multi-class and cross-system. On the basis of preprocessing, firstly, the fusion data is analyzed by Apriori algorithm. Apriori algorithm is one of the most influential algorithms for mining frequent itemsets of Boolean Association rules. Its core is a recursive algorithm based on the idea of two stage frequency set. In the process of association analysis, the program needs to identify specific indicators to measure the intensity of association rules. If there is a combination of the event M and the event N in the record of the total transaction, the probability of the combination is represented by the support. Support is also a measure of the importance of association rules. The accuracy of association rules is expressed in confidence, and the intensity of association rules is also expressed. Specific definitions are as follows: count( M N) Sup( M N) (1) count( T ) count( M N) Con( M N) (2) count( N) Where Sup and Con indicate support and confidence respectively. Count is the number of events. The minimum support and minimum confidence need to be determined before the association analysis is carried out. When the support of the event set is greater than the minimum support, then the set of events becomes the frequent set. After analyzing the data of each attribute, it is necessary to determine the specific quantitative evaluation model for the main transformer equipment. The fusion data contains 12 categories, so the PCA algorithm is used to achieve dimensionality reduction and determine the relevant principal components. First, the first linear combination F1 is selected, and when the Var (F1) is larger, it means that the more information F1 contains. If the F1 variance is the largest in all linear combinations, then F1 is the first principal component. In the process of defining the other principal components, you need to make sure Cov (F1, Fi) =0 to avoid the information already in the F1. On the basis of the above algorithm, a big data computing and analysis platform based on Clementine is built, and the corresponding value rules are extracted by the algorithm analysis results. Finally, the training model is tested and analyzed. 4. Analysis of machine learning results As mentioned above, the Apriori algorithm is used to do the association analysis of the fused data, and get the following experimental results. Table 1. Association rules analysis of equipment factors Consequent Antecedent Support% Confidence% C8=c C7=500000.0 5.854481407 86.31355932 C8=c C7=500000.0andC9=1.0 5.854481407 86.31355932 C8=c C7=500000.0andC1=0.0 5.581603036 85.73333333 C8=c C7=500000.0andC1=0.0andC9=1.0 5.581603036 85.73333333 Table 1 shows that equipment status, equipment failure, and voltage level are closely related to the level of pollution zone. Because the environmental impact can affect the health status of equipment and evaluation results, so the above analysis results are in line with the actual situation. To better analyze the relationships among other factors, the minimum support and minimum confidence limits are reduced 3

and level of pollution zone are filtered. Table 2 is the correlation result of the Apriority algorithm without environmental factors. As can be seen from the table below, vendor information is associated with unit type. Therefore, it is possible that the same device type may come from the same vendor during the equipment purchase process. The real time failure of the main transformer equipment has a potential connection with the manufacturer and equipment type. The analysis results of the relevant equipment also provide a new procurement analysis thinking for the purchasing department. Table 2. Association rules analysis of different influencing factors of equipment without environmental factors Consequent Antecedent Support% Confidence% of Limited by Share Ltd of Limited by Share Ltd and C7=110000.0 of Limited by Share Ltd and C9=1.0 of Limited by Share Ltd and C7=110000.0 and C9=1.0 The frequent set of association analysis by Apriority algorithm can get more potential rules related to equipment. But according to the actual situation of the data, it is necessary to make a quantitative assessment of the main transformer equipment. Because the equipment type, ID and the manufacturer are non-numerical information, and the impact on the comprehensive evaluation is small. Thus, the data is filtered. The processed data can be used to quantify the level of pollution zone (the higher the value, the more serious the polluted area), and the following principal component analysis results can be obtained: Table 3. Total variance explained Component Extraction sums of squared loadings Total Of variance% Cumulative% 1 2.399 29.991 29.991 2 1.153 14.409 44.4 3 1.001 12.51 56.91 4

Component 1 2 3 Table 4. Principal component expression Principal component factor expression 0.5545*C1+0.0001434*C5-0.0000001257*C7-0.1132*C8-0.1459*C9+0.0001861*C10+0.09762*C11-0.00654*C14-12.4 0.6603*C1-0.00004177*C5-0.000005102*C7+0.6716*C8-1.644*C9+0.00002944*C10+0.05093*C11+0.0006849*C14+0.6396 2.083*C1-0.0001796*C5+0.000005672*C7+0.23*C8+0.5981*C9+ 0.00002413*C10+0.1343*C11+0.003166*C14+3.039 Equipment ID Table 5. Equipment scoring test Component 1 score Component 2 score Component 3 score 0403000B335256FCF34B9D9D4141080DF870DB 1.612201803 1.324544313 0.853208699 04090119F9CF1FDF0644F093353A4BAEB249DE -1.496936968 1.477876317 0.196934653 e289e388f2c1461f830cf6ab7af6b524 1.735557945-1.195175817 3.274299404 As you can see in Table 3, the three principal components represent the information in the data well. In the principal component expression, the expression coefficients of equipment run time, voltage level and time stamp are very small, so these factors are not the dominant factor of scoring. The principal component 1 mainly considers the scores under various factors, while the main component 2 focuses on the environmental impact score. The principal component 3 can be used as the real-time risk assessment score of the equipment. By extracting the test set, table 5 shows that the first device has a better overall score, but there is also an impact on the risk of failure. The risk of the second equipment is small and the operating environment is poor. The final equipment risk assessment score is greater, indicating that the equipment has a greater operational risk. 5. Summary Based on PCA and Apriority power transmission equipment analysis and evaluation program research shows: Equipment status, equipment failure, and pollution level have a strong correlation. At the same time, there is a certain interconnection between the manufacturer and the equipment type as well as the real-time health status of the equipment. The quantitative evaluation constructed by principal component analysis can directly provide reference for all aspects of equipment. This research provides a new way for grid equipment environment data fusion of smart grid. A new exploration is made for the assessment of equipment, which provides a reference for more efficient models in the future and a more accurate assessment of device status. References [1] H. Jiang, K. Wang, Y. Wang, M. Gao and Y. Zhang, "Energy big data: A survey," in IEEE Access, vol. 4, no., pp. 3844-3861, 2016. [2] K. Wang; H. Li; Y. Feng; G. Tian, "Big Data Analytics for System Stability Evaluation Strategy in the Energy Internet," in IEEE Transactions on Industrial Informatics, vol.pp, no.99, pp.1-1 [3] J. Hu and A. V. Vasilakos, "Energy Big Data Analytics and Security: Challenges and Opportunities," in IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2423-2436, Sept. 2016. [4] J. Choi, M. Kim and J. Yoon, "Implementation of the Big Data Management System for Demand Side Energy Management," 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Liverpool, 2015, pp. 1515-1520. 5

[5] M. Shah, P. K. Shukla and R. Pandey, "Phase level energy aware map reduce scheduling for big data applications," 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, Odisha, India, 2016, pp. 532-535. [6] F. Xiao, S. Wang and C. Fan, "Mining Big Building Operational Data for Building Cooling Load Prediction and Energy Efficiency Improvement," 2017 IEEE International Conference on Smart Computing (SMARTCOMP), Hong Kong, China, 2017, pp. 1-3. [7] Z. Zhou et al., "Game-Theoretical Energy Management for Energy Internet With Big Data-Based Renewable Power Forecasting," in IEEE Access, vol. 5, no., pp. 5731-5746, 2017. [8] K. Yang, Q. Yu, S. Leng, B. Fan and F. Wu, "Data and Energy Integrated Communication Networks for Wireless Big Data," in IEEE Access, vol. 4, no., pp. 713-723, 2016. [9] E. Casalicchio, L. Lundberg and S. Shirinbad, "An Energy-Aware Adaptation Model for Big Data Platforms," 2016 IEEE International Conference on Autonomic Computing (ICAC), Wurzburg, 2016, pp. 349-350. [10] Z. Niu and B. He, "A Study of Big Data Computing Platforms: Fairness and Energy Consumption," 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW), Berlin, 2016, pp. 207-209. 6