Feature Extraction and Selection for Automatic Fault Diagnosis of Rotating Machinery

Size: px

Start display at page:

Download "Feature Extraction and Selection for Automatic Fault Diagnosis of Rotating Machinery"

Bernadette Gibson
5 years ago
Views:

1 Feature Extraction and Selection for Automatic Fault Diagnosis of Rotating Machinery Francisco de A. Boldt 1,2, Thomas W. Rauber 1, Flávio M. Varejão 1 1 Departamento de Informática Universidade Federal do Espírito Santo (UFES) Av. Fernando Ferrari, 514, Goiabeiras Vitória ES Brazil 2 Instituto Federal de Educação, Ciência de Tecnologia do Espírito Santo (IFES) Rodovia ES Km 6,5 Manguinhos Serra ES Brazil {fboldt,thomas,fvarejao}@inf.ufes.br Abstract. In this work we present three feature extraction models used in vibratory data from rotating machinery for bearing fault diagnosis. Vibrations signals are acquired by accelerometers which are then submitted to different feature extraction modules. Our tests suggest that pooling heterogeneous feature sets achieve better results than using a single extraction model. Besides, different classifiers are used for performance optimization, K-earest-eighbor and Support Vector Machine. 1. Introduction Automatic fault diagnosis of complex machinery has economical and security related advantages. Identifying a fault in its initial stage allows the early replacement of damaged parts. This type of predictive maintenance is better than the preventive counterpart, which replaces parts that are not necessarily defective. Pattern recognition techniques have widely been used in automatic fault diagnosis of rotating machinery [Wandekokem et al. 2011, Xia et al. 2012, Liu 2012, Wu et al. 2012]. We use the supervised learning paradigm to diagnose bearing failures. The study of bearings is motivated by the fact that they play an important role in a wide range of rotating machines and a sophisticated fault diagnosis system can be build with the help of computational intelligence techniques. Experimental results are shown for the Case Western Reserve University (CWRU) Bearing Data [CWRU 2013]. Model free diagnosis needs to extract relevant data from the problem domain in order to train the diagnosis system algorithms. We use three basics feature extraction models, generating one dataset for each model, assembling a global feature pool. This approach is motivated by the high plausibility that several feature sets contain more discriminative information than a single feature set. A subsequent necessary step to filter the most discriminative information is feature selection, which in general increases accuracy and reduces computational cost of the fault classifier. We use a greedy heuristic in form of the Sequential Forward Selection to implement the search algorithm. As the classifier paradigm we compare the K-earest eighbor Algorithm (K-) [Cover and Hart 1967] as a representative of a less sophisticated method to the Support Vector Machine (SVM) [Burges 1998]. Leave-one-out and 10-folds cross-validation is used for performance estimation. The main contribution of this work is the variety of diagnosed faults and feature extraction models applied, appropriate use of machine learning techniques and feature selection techniques for the considered CWRU bearing data. In the reviewed literature frequently only a few classes are considered, and no cross-validation is employed in the tests, splitting the data

2 only once into a training and test set, e.g. [Xia et al. 2012, Wu et al. 2012]. o classifier at all is used in [Wu et al. 2012], [Luo et al. 2013], only the visual inspection of peaks in frequency graphs suggest the discriminative power of the method. Our system is capable to detect 21 classes of bearing conditions, employing two classifiers models and two cross-validation techniques. To the best of our knowledge, no paper uses such a variety of feature extraction models for the CWRU bearing data, as well, none used multivariate feature selection techniques. The rest of this paper is organized as follows: In section 2 we present the feature extraction models to describe the bearing faults. Section 3 presents the machine learning techniques used. Experimental results are shown in section 4 and section 5 presents our conclusions and future works. 2. Feature Extraction Vibratory signals, collected by accelerometers, are widely used in automatic rotating machine failure diagnosis [Wandekokem et al. 2011, Xia et al. 2012, Liu 2012, Wu et al. 2012]. The signals collected from the machinery are not directly usable for diagnosis, so is necessary extract static features. We use three basics models of features extraction techniques. Statistical models are applied in the time and frequency domain, while wavelet package analysis represents an extraction in the time-frequency domain [Xia et al. 2012]. Complex envelope analysis completes the methods used in the frequency domain Statistical Model We used ten statistical features in the time domain and three in the frequency domain. As a representative set we choose those features proposed in [Xia et al. 2012], c.f. table 1 and table 2. Table 1 presents the definition of statistical features in the time domain as root mean square (RMS), square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV), peakpeak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF) and kurtosis factor (KF). Table 2 presents the definition of statistical features in the frequency domain as frequency center (FC), RMS frequency (RMSF) and root variance frequency (RVF) Wavelet Package Analysis The wavelet package analysis is a time-frequency domain method which permits the level by level decomposition using a wavelet function. The decomposition results in 2 l signals, where l is the number of desired levels. We follow the procedure proposed in [Xia et al. 2012] which uses as mother wavelet Daubechies 4 and refining down to the fourth decomposition level. The energy calculated in the leaf nodes are used as final features Complex Envelope Spectrum The complex envelope spectrum allows to calculate the energy value in the frequency where the faults manifest themselves. There are four characteristic frequencies at which faults can occur. Knowing the shaft rotational frequency, they are the fundamental cage frequency, ball pass inner raceway frequency, ball pass outer raceway frequency and the ball spin frequency [McInerny and Dai 2003]. For the complex envelope analysis, first a high pass filter was applied, in order to eliminate the influence of the low frequency vibrations, caused by noise, unbalance and misalignment. Subsequently, an analytical signal was calculated by applying the Hilbert transform to the original signal and adding it in quadrature to it. The magnitude 2

3 Table 1. Time domain statistical feature set of the vibration signal ( 1/2 ( ) 2 1 X rms = x 2 1 i) X sra = xi X kv = 1 ( ) 4 xi x X sv = 1 σ ( xi x σ ) 3 X ppv = max(x i ) min(x i ) X cf = X if = max( x i ) 1 x i X mf = ( max( x i ) ) 1/2 1 x2 i max( x i ) ( 1 xi ) 2 X sf = ( max( x i ) ) 1/2 X kf = 1 x2 i 1 ( xi ) x 4 σ ( ) 2 1 x2 i Table 2. Frequency domain statistical feature set of the vibration signal ( ) 1/2 X fc = 1 1 f i X rmsf = fi 2 X rvf = ( 1 ) 1/2 (f i X fc ) 2 of the Fourier transform of the analytical signal translate the characteristic bearing faults frequencies to the low frequency band. The final features are the narrow band energy around the expected fault frequencies and their harmonics. 3. Machine Learning Methods The supervised learning paradigm [Bishop et al. 2006] can be used in automatic fault diagnosis. For classification this approach needs a dataset with labeled patterns to train a classifier. The classifier performance can be estimated using labeled patterns unused during training. We present two well known classifier algorithms, two performance estimation methods, one feature selection search algorithm with two different selection criteria. We think that the chosen set of pattern recognition techniques is an appropriate toolbox to approach optimality with respect to the compactness and accuracy of the proposed diagnosis system. The K-earest eighbor Algorithm (K-) [Cover and Hart 1967] classifies a new pattern according to the majority vote of its closest neighbors, usually using the Euclidean distance. The benefit of this architecture is its simplicity and its theoretical properties, with 3

4 respect to the error bound. The Support Vector Machine (SVM) [Burges 1998] training algorithm creates a maximum-margin separation hyperplane between two classes. In order to enhance the linear separation in the original Euclidean space the SVM maps the input vectors into a highdimensional feature space through some nonlinear mapping [Vapnik 1999], using a kernel function. To classify more than two classes one can uses a one-against-all approach. We use the C-SVM classification architecture with Radial Basis Function (RBF) kernel. K-fold cross-validation performance estimation splits the data D into k approximately equal parts D 1,...,D k, and learns with the reduced data set D \ D i,1 i k with one part left out. The part D i left out is used as the test set [Bouckaert 2004]. A special case of crossvalidation is leave-one-out, where the number of folds equals the number of samples. We use the estimated accuracy as performance criterion. The global accuracy of cross-validation is estimated as the mean of the k folds as ACC global = 1 k ACC i. k In this study we use a large number of features, as shown in section 2. Feature selection generally improves prediction performance and simultaneously reduces the problem dimensionality, providing faster and more cost-effective predictors, and allowing a better understanding of the underlying processes that generate the data [Guyon and Elisseeff 2003]. We use Sequential Forward Selection (SFS) which is a good compromise between exploring the search space sufficiently and computational cost. In order to select k from a total of Q features, SFS initializes with an empty feature sety. Features are iteratively added toy,according to some selection criterion. The algorithm stops and returns Y when Y = k. We employ two selection criteria. Interclass distance expresses the separation of classes according to some distance measure, mainly Euclidean distance. SFS with estimated mean error probability runs a complete performance estimation for each available featuref j X,f j / Y with the candidate set(f j / Y) Y,j {1,...,Q} wherex is the complete set of all features. Those feature that increases, or less decreases the performance criterion is joined to Y. Ties are solved arbitrarily. 4. Experimental Results We used as a benchmark for our methods the bearing dataset provided by the Bearing Data Center of Case Western Reserve University [CWRU 2013]. This publicly available benchmark allows a objective comparison of the proposed method to other research work. This dataset is composed of vibratory signals of normal and fault bearings extracted from a 2 hp reliance electric motor. The faults were introduced at a specific position of the bearing, using an electro-discharging machining with fault diameters of 0.007, 0.014, and inches. A dynamometer induced loads of 0, 1, 2 and 3 hp, changing the shaft rotation from 1797 to 1720 rpm. One model of bearing was used on the drive end and the other was used on the fan end. Three accelerometers collected the vibratory data, placed on the drive end, fan end and the base of the motor. ot all data files contain the base plate data, so we did not use this sensor in our experiments. As done in other work [Xia et al. 2012, Liu 2012, Wu et al. 2012], we split the signals in several parts before the feature extraction, aiming at a better classification performance estimation. We split the signals into 15 parts, resulting in a total of 2415 samples. Preliminary 4

5 experiments showed that this was the maximum possible division without considerable loss of accuracy Identified Classes The identified classes can be labeled according to the number of bearings, the bearing state (normal or defective), fault severity (depth) and motor load. Another main contribution of our work is the large number of machine condition classes, normal, plus three faults (ball, inner race and outer race) times three severities (0.007, 0.014, in) times two bearing models, plus two faults (ball, inner race) times one severity (0.028 in) times one bearing model (drive end), resulting in = 21 classes. Table 3 presents the distribution and description of the classes used in our experiments. We are able to identify not only the fault class, but within the same class also the severity of the fault. [Xia et al. 2012, Liu 2012, Wu et al. 2012] identify only a small number of classes, from one sensor position (drive end). [Xia et al. 2012] uses four classes, normal, ball, inner race and outer race, or fixes a fault class and then distinguishes among its severities. Table 3. Class distribution and description Class ame Samples Distribution Description 1 Ball DE % inch ball fault in the drive end bearing. 2 Ball FE % inch ball fault in the fan end bearing. 3 Ball DE % inch ball fault in the drive end bearing. 4 Ball FE % inch ball fault in the fan end bearing. 5 Ball DE % inch ball fault in the drive end bearing. 6 Ball FE % inch ball fault in the fan end bearing. 7 Ball DE % inch ball fault in the drive end bearing. 8 InnerRace DE % inch inner race fault in the drive end bearing. 9 InnerRace FE % inch inner race fault in the fan end bearing. 10 InnerRace DE % inch inner race fault in the drive end bearing. 11 InnerRace FE % inch inner race fault in the fan end bearing. 12 InnerRace DE % inch inner race fault in the drive end bearing. 13 InnerRace FE % inch inner race fault in the fan end bearing. 14 InnerRace DE % inch inner race fault in the drive end bearing. 15 ormal % ormal bearing 16 OuterRace DE % inch outer race fault in the drive end bearing. 17 OuterRace FE % inch outer race fault in the fan end bearing. 18 OuterRace FE % inch outer race fault in the drive end bearing. 19 OuterRace DE % inch outer race fault in the fan end bearing. 20 OuterRace DE % inch outer race fault in the drive end bearing. 21 OuterRace FE % inch outer race fault in the fan end bearing Features Extraction Models We used three basic features extraction models: complex envelope spectrum, statistical features extracted from the time and frequency domain, and wavelet package analysis in the timefrequency domain. Table 4 shows the number of features extracted by each model used. We employed the K- classifier, with 1, 3, 5 and 7 as the value of K, to compare the quality of the feature extraction diagnosis. We used the leave-one-out performance estimator with accuracy as quality criterion. Table 5 compares the accuracy of the different feature extraction techniques. 5

6 Table 4. umber of features of features extraction models used Feature Extraction Model umber of Features Complex Envelope Spectrum 72 Statistical Features 26 Wavelet Package Analysis 32 All together 130 Table 5. Estimated K- accuracy for each feature extraction model Extraction Model Envelope 97.76% 97.64% 97.43% 97.35% Statistical 96.44% 96.36% 95.94% 95.69% Wavelet 99.83% 99.88% 99.79% 99.63% Global Feature Pool 99.96% 99.92% 99.92% 99.92% 4.3. Feature Selection Sequential forward search (SFS) was used with two selection criteria: the estimated mean error probability (EMEP) and the interclass distance (ICD). The 1- algorithm generally showed higher accuracy than other values of K. We increased the number of SFS selected features in steps of three to reduce the computational complexity, until reaching the final number of 90 features. The maximum number of 90 selected features provided a sufficiently expressive evolution of the selection criterion. Fig. 1 shows the relation between the estimated accuracy and selection criterion up to 30 features for each of the used criteria. From feature number 27 to 90 we estimated a 0% error, suggesting the benefit of feature selection. The experiment illustrates that, for less than nine features, the EMEP selection criterion performs better than ICD, but for nine or more features performance is equal or very similar. A collateral conclusion is that the CWRU dataset is relatively easy to classify, which in a certain way contradicts the low scores reported in [Wang et al. 2012], where a auto-regressive model of time-domain signal, plus a SVD decomposition is proposed. Accuracy Figure 1. Accuracy by selection criterion umber of features selected EMEP ICD Fig. 2 compares the EMEP to the ICD features selection criterion for the three different feature models (envelope, statistical and wavelet package). The quality of each of these three models reflects itself in the particular number of the selected features of each model. The horizontal axis shows the number of selected features during the SFS search. The vertical axis shows the number of selected features for each of the three feature models. For instance with the EMEP criterion, after having reached six selected features, zero are wavelet features, two are statistical features and four are envelope features. The main difference between the EMEP 6

7 and ICD selection criteria is the preference for a certain feature extraction model. While EMEP chooses the models in an equilibrated manner, the ICD criterion prefers the envelope model. # of features Figure 2. Feature model selected EMEP criterion ICD criterion Envelope 5 10 Statistical Wavelet Total number of features selected Total number of features selected 4.4. Experiments with the Support Vector Machine For the SVM experiments we use a 10-fold cross-validation due to the high computational cost of SVM training. The SVM type used was the C-SVM, with the radial basis kernel function. The RBF intrinsic parameter was set toγ = and the regularization parameter to C = 1, obtained from preliminary experiments. Like in the case of K-, the feature increasing step is set to three until reaching60 selected features. In the case of the SVM, we used only the EMEP selection criterion, because the experiments with K- suggested a considerably inferiority of the ICD criterion. Since the computational cost of the performance estimation with the SVM classifier is high, we therefore unconsidered the ICD criterion. The classifiers that used the wavelet package analysis feature model exhibited an estimated accuracy considerably higher than those trained by the other two feature models, and higher than global feature pool. After 9 features selected, the accuracy was even higher than for wavelet features alone. The results are shown in table 6. Table 6. Estimated SVM accuracy for different feature models Extraction Model SVM Envelope 85.47% Statistical 92.88% Wavelet 99.30% Global Feature Pool 90.56% 9 Selected from Global Feature Pool 99.38% 18 Selected from Global Feature Pool 99.96% 5. Conclusions and Future Works The K- classifier trained with the global feature pool showed a higher estimated accuracy than each of the isolated feature models. This behavior does not occur when the SVM was used as the classifier method. The feature selection by the SFS search increased the accuracy of both classifiers, K- and SVM. When the SFS used the EMEP criterion to select the features, it preferred the more equilibrated features models than when used with the ICD criterion, which preferred more the envelope features than the other two. For nine features or more, the SVM 7

8 classifier achieved higher estimated accuracy than when it used any specific feature extraction model, inclusive when the three feature extraction techniques were used together. In future work we will test Artificial eural etworks, additional feature models, more features selection techniques, use ensembles of classifiers, other application domains and other performance estimation metrics to optimize the global quality of the fault diagnosis system. References Bishop, C. M. et al. (2006). Pattern recognition and machine learning, volume 1. Springer, ew York. Bouckaert, R. R. (2004). Estimating replicability of classifier learning experiments. In Proceedings of the twenty-first international conference on Machine learning, page 15. ACM. Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2): Cover, T. and Hart, P. (1967). earest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1): CWRU (2013). Case Western Reserve University, Bearing Data Center. eecs.cwru.edu/laboratory/bearing. Accessed: Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3: Liu, J. (2012). Shannon wavelet spectrum analysis on truncated vibration signals for machine incipient fault detection. Measurement Science and Technology, 23(5): Luo, J., Yu, D., and Liang, M. (2013). A kurtosis-guided adaptive demodulation technique for bearing fault detection based on tunable-q wavelet transform. Measurement Science and Technology, 24(5): McInerny, S. A. and Dai, Y. (2003). Basic vibration signal processing for bearing fault detection. IEEE Transactions on Education, 46(1): Vapnik, V. (1999). The nature of statistical learning theory. Springer-Verlag, ew York. Wandekokem, E.. D., Mendel, E., Fabris, F., Valentim, M., Batista, R. J., Varejão, F. M., and Rauber, T. W. (2011). Diagnosing multiple faults in oil rig motor pumps using support vector machine classifier ensembles. Integrated Computer-Aided Engineering, 18(1): Wang, Y., Kang, S., Jiang, Y., Yang, G., Song, L., and Mikulovich, V. (2012). Classification of fault location and the degree of performance degradation of a rolling bearing based on an improved hyper-sphere-structured multi-class support vector machine. Mechanical Systems and Signal Processing, 29(0): Wu, S.-D., Wu, P.-H., Wu, C.-W., Ding, J.-J., and Wang, C.-C. (2012). Bearing fault diagnosis based on multiscale permutation entropy and support vector machine. Entropy, 14(8): Xia, Z., Xia, S., Wan, L., and Cai, S. (2012). Spectral regression based fault feature extraction for bearing accelerometer sensor signals. Sensors, 12(10):

Rolling element bearings fault diagnosis based on CEEMD and SVM

Rolling element bearings fault diagnosis based on CEEMD and SVM Tao-tao Zhou 1, Xian-ming Zhu 2, Yan Liu 3, Wei-cai Peng 4 National Key Laboratory on Ship Vibration and Noise, China Ship Development and