Automatic Paroxysmal Atrial Fibrillation Based on not Fibrillating ECGs. 1. Introduction

Size: px

Start display at page:

Download "Automatic Paroxysmal Atrial Fibrillation Based on not Fibrillating ECGs. 1. Introduction"

Russell Thornton
5 years ago
Views:

1 Schattauer GmbH Automatic Paroxysmal Atrial Fibrillation Based on not Fibrillating ECGs E. Ros, S. Mota, F. J. Toro, A. F. Díaz, F. J. Fernández Department of Architecture and Computer Technology, University of Granada, Spain Summary Objectives: The objective of the paper is to describe an automatic algorithm for Paroxysmal Atrial Fibrillation (PAF) Detection, based on parameters extracted from ECG traces with no atrial fibrillation episode. The modular automatic classification algorithm for PAF diagnosis is developed and evaluated with different parameter configurations. Methods: The database used in this study was provided by Physiobank for The Computers in Cardiology Challenge Each ECG file in this database was translated into a 48 parameter vector. The modular classification algorithm used for PAF diagnosis was based on the nearest K-neighbours. Several configuration options were evaluated to optimize the classification performance. Results: Different configurations of the proposed modular classification algorithm were tested. The uniparametric approach achieved a top classification rate value of 76%. A multi-parametric approach was configured using the 5 parameters with highest discrimination power, and a top classification rate of 80% was achieved; different functions to typify the parameters were tested. Finally, two automatic parametric scanning strategies, Forward and Backward methods, were adopted. The results obtained with these approaches achieved a top classification rate of 92%. Conclusions: A modular classification algorithm based on the nearest K-neighbours was designed. The classification performance of the algorithm was evaluated using different parameter configurations, typification functions and number of K-neighbors. The automatic parametric scanning techniques achieved much better results than previously tested configurations. Keywords Paroxysmal Atrial Fibrillation, automatic diagnosis, ECG signal processing Methods Inf Med 2004; 43: Introduction The automatic diagnosis of patients that suffer PAF analysing ECG registers that do not contain explicit PAF episodes is a difficult task. An international initiative recently concluded that addressed this problem [1, 2] but the results where not definitive thus the problem remains open. The development of an automatic algorithm for PAF detection consists on two different stages: Parameter characterization of PAF. This stage has been described in [3]. It describes a parameter set to be used for PAF detection applications. Automatic classification algorithm set up. In this stage the classification strategy and the parameter selection that optimise its performance is defined. The present paper describes an automatic classification algorithm that discriminates between PAF patients and healthy subjects (Section 2). The algorithm uses as inputs parameters extracted from ECG traces that do not contain explicit fibrillation episodes. It is studied how the performance is affected with different algorithm set ups (Section 3) and finally Section 4 summarizes the discussion of the results. 2. Methodology As described in [3], each ECG register is translated into a 48 component vector (P 1,,P 48 ). It is necessary to remark that the characterization parameters described in [3] are low level parameters (P wave amplitude, P wave width, etc). Each parameter represents a different physical characteristic of very diverse range. In order to make possible a multiparametric distance scheme as classification kernel, all these parameters must be typified: for instance the average (M i ) of each parameter is calculated within the whole database and then each parameter is divided by its correspondent M i, i.e. p i = P i /M i. A modular classification algorithm based on the closest K-neighbours has been used for this application. The labelled vectors work as references of the classification system. For each new non-labelled vector, the Euclidean distances to the labelled vectors are calculated.the labels of the K closest neighbours are consulted and the final label is calculated through a voting scheme as the label of the majority of the K-neighbours. The scheme of the complete classification system is shown in Figure 1. In this way the classification algorithm is modular, new parameters can be added easily, only the dimension considered in the Euclidean distance calculation step has to be modified. The modularity of the classification algorithm enables the parameter scanning techniques described in the next section. 3. Results Due to the small size of the training database (25 patients and 25 healthy subjects), the evaluation of the classification rate is calculated in 50 cycles with the method leaving one out. In each of them, one vector is selected as a test element. This vector is classified according to the scheme described above with the other 49 labelled vectors as classification references. In each cycle the classification results in four counters are actualized: True_Positive (TP),

2 95 Paroxysmal Atrial Fibrillation Diagnosis True_Negative (TN), False_Positive (FP) and False_Negative (FN). The final classification rate (CR), the sensibility (SE) and the specificity (SP) are finally calculated with these counters that accumulate the classification results of the 50 cycles. All the values of CR, SE and SP reported in the following tables are expressed in %. A more appropriate study of the classification performance requires a specific test data set. In fact, a test database is available via WWW [1]. Any registered user can send a file of labels obtained by a classification algorithm for the test database and the Classification Rate is received automatically as the only result. The number of accesses is very restricted and the test registers labels are maintained secret. Because of that, no specific test classification result is reported in the Discussion Section. But it has to be remarked that the test results with the proposed approach are much lower (below 70 %) than the classification results reported in the next sections. These high disparities seem to be caused by significant statistical differences between the training vector database and the test vector database as also concluded by the top-scores authors in [1]. We expected to overcome these disparities using the normalized parameters proposed in [3] but this is not the case. A multifactorial statistical study has been carried out with the Analysis of Variance method and it has been observed that the algorithm performance depends significantly on several factors: number of K neighbours, typification function and number of considered parameters (dimension of the input vectors). We will study the influence of these factors. Uniparametric Classifier In Table 1, the classification results of the best parameters considering different number of K neighbours are summarized. The best results are obtained taking into account 9 neighbours, this characteristic depends on the overlapping degree of the data. Fig. 1 Classification scheme. algorithm Multiparametric Classifier In order to implement a multiparametric classification scheme it is necessary to typify the input vector in order to transform the parameters into adimensional values. We have studied three typification functions: A. The average (M i ) of each parameter (P i ) is calculated. Then each parameter is typified: p i = P i /M i. B. The Standard Deviation of each parameter (SD i ) is calculated. Then each parameter is typified as follows: p i = (P i -M i )/SD i. C. We define a gain term β i = α C i, where a is a scale factor (with possible values 1, 10 and 100) and C i is a discrimination power estimator calculated for each parameter using it in an uniparametric classification scheme (C i is the classification rate (%) of each parameter in Table 1). Finally, the typification is given by the following expression: p i = β i P i / M i. In this way each dimension of the parametric space is deformed according to the power discrimination (C i ) of its correspondent parameter. To evaluate how the classification performance changes with the typification modality, we have calculated the classification results using the 5 parameters with a greater individual discrimination power (C i ). In Table 2 the classification results obtained with the different typification functions are summarized. It is seen that typification

3 96 Ros et al. Table 1 Classifications results obtained by the best uniparametric classifier varying the number of K-neighbours. The parameters are described in (3). Table 2 Performance variation due to the typification mode. Results obtained using a multiparametric scheme with the best 5 parameters and taking into account different number of neighbours in the voting step mode A and typification mode C are almost equivalent (with values of α from 1 to 100). This is so because the discrimination power of the different parameters used as weight are similar. Automatic Configuration: Parameter Scanning The parameter space has been scanned sequentially in order to maximize the classification performance reducing the number of parameters in which it is based. Two well known methods have been used for this sequential scanning task [4]: Forward Method. In a first step a uniparametric classifier is used (trying with all the parameters). The parameter that leads to a better classification performance is selected. In a second step, the classifier is used with the selected parameter and a new one (trying combinations with all but the already selected parameter). The second parameter leading to a better result is also selected. In this way the parameter set in which the classifier is based grows up by the addition of a single parameter in each cycle until all the parameters are considered. The performance curve obtained as the parameter set grows increasingly for the first parameters and decreases when too many parameters are taken into account. The best parameter set is finally selected, i.e. the minimum parameter set with the maximum classification performance. Backward Method. In a first step all the parameters are used in the classifier. In a second step one of the parameters is dropped out (trying with all the parameters individually). The parameter set that reaches a better classification is maintained. In this way, a different parameter is dropped out in each cycle. The final parameter set is again the one that maximizes the classification performance with a minimum size.

4 97 Paroxysmal Atrial Fibrillation Diagnosis After defining the classification performance (CP) as simply the classification rate (CR) or other more sophisticated function these two methods can maximize it. Tables 3 and 4 show the results applying the forward and backward methods to maximize the classification rate (CR) using a minimum number of parameters in the classification process. The classification performance (CP) can be defined as a function of the classification rate (CR), the sensibility (SE) and the specificity (SP), using the following expression: (1) In this way, the classification rate is maximized, and the ratio SE/(SE+SP) is maximized for the cases with equivalent CR values. The influence of this second component is always between 0 and 1, thus being of significant importance only between cases with equivalent Classification Rate values. This target function takes into account the importance of the sensibility in this application, provided that the ECG is a non-invasive method. Other more specific tests can be applied for subjects indicated as PAF patients with this algorithm. Tables 5 and 6 show the results applying the forward and backward methods to maximize the CP defined in expression [1] and using a minimum number of parameters in the classification process. [1] are higher than the ones obtained using CP just as CR. With both CP functions the better results are obtained using a single closest neighbour configuration, with a top score CR = 92 % and SE = 96 %, defining the CP in expression [1]. As expected the Table 3 Forward Method: Obtains the maximum classification rate with a minimum number of parameters growing up the parameter set Table 4 Backward Method: Obtains the maximum classification rate with a minimum number of parameters reducing the parameter set definition of CP in equation [1] leads to better SE levels, but it also interferes with the obtained CR scores, because it leads the searching process through different parameter space areas. 4. Discussion The uniparametric approach reaches top CR values of 76%. A multiparametric approach has been configured using the 5 parameters with higher discrimination power, reaching top CR values of 80% and testing different typification functions. Finally, two automatic parametric scanning strategies are adopted: Forward and Backward methods. Tables 3 to 6 show the results obtained with these approaches with a top CR of 92%. In Tables 5 and 6 it is observed that the values of SE reached with the expression Table 5 Forward Method: Obtains the maximum CP with a minimum number of parameters growing up the parameter set

5 98 Ros et al. 5. Conclusion Table 6 Backward Method: Obtains the maximum CP with a minimum number of parameters reducing the parameter set Using a parameters set that characterizes PAF pathology with an acceptable efficiency [3] a modular classification algorithm based on the closest K-neighbors has been designed. The classification performance of an algorithm has been evaluated using different parameter configurations, typification functions and number of K-neighbors. The automatic parametric scanning techniques reach results much better than the previous tested configurations. It is important to remark that the parametric scanning processes with the Forward and Backward methods are not exhaustive, and therefore new scanning techniques may improve the reported results. References Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov P.Ch, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and Physionet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, Vol. 101(23): e215-e220 [Circulation Electronic Pages; circ.ahajournals.org/cgi/content/full/101/23/ e215]. 3. Mota S, Ros E, Fernández FJ, Díaz AF, Prieto A. ECG Parameter Characterization of Paroxysmal Atrial Fibrillation. BSI 2002, Narendra PM, Fukunaga K. A branch and bound algorithm for feature subset selection, IEEE Trans on Comp 1977; 26: Correspondence to: Eduardo Ros Departamento de Arquitectura y Tecnología de Computadores E.T.S.I. Informática Universidad de Granada, Spain C/Periodista Daniel Saucedo, s/n Granada, Spain eduardo@atc.ugr.es

Multi-objective Optimization for Paroxysmal Atrial Fibrillation Diagnosis

Multi-objective Optimization for Paroxysmal Atrial Fibrillation Diagnosis Francisco de Toro, Eduardo Ros 2, Sonia Mota 2, Julio Ortega 2 Departamento de Ingeniería Electrónica, Sistemas Informáticos y