Arrhythmia Classification via k-means based Polyhedral Conic Functions Algorithm Full Research Paper / CSCI-ISPC Emre Cimen Anadolu University Industrial Engineering Eskisehir, Turkey ecimen@anadolu.edu.tr Gurkan Ozturk Anadolu University Industrial Engineering Eskisehir, Turkey gurkan.o@anadolu.edu.tr Abstract Heart disease is one of the important cause of death. In this study, we used ECG data obtained from MIT-BIH database to classify arrhythmias. We select 5 classes; normal beat (N), right bundle branch block (RBBB), left bundle branch block (LBBB), atrial premature contraction (APC) and ventricular premature contraction (VPC). We applied k-means based Polyhedral Conic Functions (k-means PCF) algorithm to classify instances. The performance of the proposed classifier is shown with numerical experiments. With proposed algorithm we obtained 98 % accuracy rate. This test result is compared with other well known classification methods. Computer aided arrhythmia classification plays an important role to diagnose heart diseases. ECG signal from the heart is used generally in these systems. Keywords arrhythmia; classification; clustering; mathematical programming. I.! INTRODUCTION Cardiovascular diseases are known as the most important diseases that cause deaths. According to World Health report in 2000, 7 million people die because of of this reason every year. 13% of men and 12% of women deaths are due to coronary artery diseases that cause hearth attacks [1]. Hearth consist of miocards that contact rhythmically. With these rhythmic contracts blood can circulate in the body. Before the each contraction of the heart an electrical signal is generated that consist of p, q,r, s and t waves. Hearth beats via electrical impulse generated by sinoatrial node (SA). The discharge of electrical impulse from different than SA node or problems in impulse transmission cause arrhythmia. While some of the arrhythmia types are not dangerous, some of them cause sudden deaths; like ventricular tachycardia. To prevent people this kind of sudden deaths, researchers work on early warning systems. Arrhythmias are diagnosed via electrocardiogram (ECG), rhythm holter, event recorder, effort test, echocardiogram, cardiac catheterization, electrophysiological study (EPS). ECG is the most practical one among these methods. ECG amplifies and filters the electrical signal on the heart. By this way hearth diseases can be diagnosed easily. Fig. 1.! PQRST signal There are lot of important researches in the literature. In [2] researchers allocate manually detected heartbeats to one of the five beat classes recommended by ANSI/AAMI EC57:1998 standard, i.e., normal beat, ventricular ectopic beat (VEB), supraventricular ectopic beat (SVEB), fusion of a normal and a VEB, or unknown beat type. 44 nonpacemaker recordings of the MIT-BIH arrhythmia database are used in the study. Their feature sets are based on ECG morphology, heartbeat intervals, and RR-intervals. In [3], researchers present a patient-adaptable algorithm for ECG heartbeat classification. This algorithm based on an automatic classifier and a clustering algorithm. Both classifier and clustering algorithms include features from the RR interval series and morphology descriptors calculated from the wavelet transform. The algorithm was comprehensively evaluated in several ECG databases for comparison purposes. In [4], they developed an adaptive system for the automatic processing of the electrocardiogram (ECG) for the classification of heartbeats into one of the five beat classes recommended by ANSI/AAMI EC57:1998 standard. With this
study they illustrate the ability to provide beneficial automatic arrhythmia monitoring system. In [5], researchers used Hidden Markov Modeling" (HMM). QRS complexes and R-R intervals were used in the model. The Hidden Markov Modeling approach combines structural and statistical knowledge of the ECG signal in a single parametric model. They estimated model parameters from training data using an iterative, maximum likelihood reestimation algorithm. In [6], researchers developed an algorithm based on support vector machine (SVM). They applied two different preprocessing methods; higher order statistics (HOS) and Hermite characterization of QRS complex. They get two neural classifiers by combining the SVM network with these preprocessing methods. They gave the results of the performed numerical experiments for the recognition of 13 heart rhythm types on the basis of ECG waveforms. In [7], researchers used MIT-BIH database and they worked on 4 arrhythmia classes. They get 95.9% accuracy rate. In [8], wavelet transform is used and 98% accuracy rate obtained. 1200 test and 1200 train data points are used from 6 classes. In [9], researchers used artificial neural networks on MIT-BIH database and they get 92% accuracy rate. In [10], Support Vector Machines (SVM) algorithm is used and they classified signals from MIT-BIH database with 99% accuracy rate. In [11], wavelet transform is used. They selected 3 classes from MIT-BIH database. Their algorithms accuracy rate is 97%. In this study we use ECG data obtained from MIT- BIH database. We select 5 classes; normal beat (N), right bundle branch block (RBBB), left bundle branch block (LBBB), atrial premature contraction (APC) and ventricular premature contraction (VPC). We applied k-means based Polyhedral Conic Functions (k-means PCF) algorithm to classify instances. In Section II one can find brief description of k-means PCF algorithm. In Section III we give data handling and preprocessing procedures. We present in Section IV numerical experiments and in Section V conclusions. II.! PCF BASED CLASSIFICATION ALGORITHMS The concept of polyhedral conic separability based on polyhedral conic functions (PCFs) was first introduced in [12] (see, also [13]). An algorithm for calculation of polyhedral conic functions separating two sets was developed in [12]. This algorithm randomly chooses a data point from one of these sets as a first vertex and computes the first PCF. Then all data points from this set separated by the obtained PCF are removed from the set and next vertex is randomly selected from the rest of the set. This process continues until all points from the selected set are separated. A classifier is constructed as a pointwise minimum of all obtained PCFs. Despite some promising results such an approach may suffer over-fitting. This algorithm is also used for arrhythmia classification by the authors in [14]. Another algorithm was introduced in [13] based on the biobjective integer programming approach. Objectives in this approach are to minimize the number of PCFs separating sets and to maximize the number of correctly classified points. Although this algorithm suffers over-fitting problem in some data sets, however it reduces this problem in comparison with the first algorithm. Furthermore this algorithm is time consuming in large data sets. There are also some other PCF based classifier algorithms. In [15], linear classifiers based on polyhedral conic and max min separabilities and in [16] incremental piecewise linear classifier based on polyhedral conic separation was introduced. A.! k-means based PCF Algorithm In this approach a classifier is designed based on the combination of the polyhedral conic separation approach and k-means clustering technique [17]. They apply k - means algorithm to find vertices of PCFs and then find PCFs for each cluster by solving a linear programming problem. This classifier is different from that given in [12, 13] where the final classifier is obtained by sequentially eliminating the correctly classified points whereas in this algorithm the classifier is constructed in one step using cluster centers found by the k- means algorithm. The use of clustering algorithms allows to decrease significantly the number of vertices and consequently the number of PCFs which helps to avoid over-fitting problem. Moreover, the use of linear programming techniques makes the algorithm applicable to large datasets. k-means based PCF algorithm can be summarized as follows: Assume that we are given finite point set A from! " with p classes. More specifically the set! = # $ & ' ) *, * = {1,2,, 0} and its classes A j, j = 1,, p are given. For each A j we construct the following set '! " = $ % & &(),&+" For the classification problems solution dataset is separated to two subsets; training and test sets. Respectively:! = # $ & ' ) * *,,-./,,! =!/! Step 0: Set j := 0 and select the number of clusters, k. Step 1: Set j := j + 1 and select the sets! " and! ". Step 2: Apply the k-means algorithm to the set! "# to find k clusters and their centroids:
! "# % &, ( = 1,,, Step 3: Find the k-pcf s! "# $, & = 1,, * with the parameters (" #$, & #$, ' #$ ) for class j by solving the linear programming problem (! "# ) for each cluster! "#. min 1, - + 1 0 1 (" #$ ) *+ #$ - 3 56 */ # 1 3 4 7. 9 : #$ ; - <= #$ +*> #$ * ; - <= #$? <*@ #$ + 1* *, -,*********** - * * D #$ <: #$ E 1 <= #$ <*> #$ * E 1 <= #$? <*@ #$ + 1* * 0 1,*********** 1 * * D F! " > 0, &' ( > 0 /! "# = %:'( ) + "#, - " =.01 +., 2 4'56'! 7 = %:'( ) - " Step 4: Construct the separating function for the class j as follows:! " # = min ()*,,-! "( # Step5: If j < p go to Step 1, otherwise the algorithm terminates. * III.! DATASET The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. 25 of the subjects are men with ages 32 to 89 and 22 of the subjects are women with ages 23 to 89. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample [18]. Collected analog data are converted digital with analog to digital converter (ADC). Also signals are passed from 0.1-100 Hz pass filter. In this study we select 100 PQRST signals for each class from MIT-BIH files 100, 106, 109, 111, 114, 116, 119, 124, 200, 207, 209, 212 and 214. Firstly, the continuous signal is cropped to windows. In the cropping process R peaks are selected and after that signal is cropped from 61 sample left of R peak and 38 sample right of R peak. By this way we get vectors with 100 features. R to R peak interval information is very important and characteristic for arrhythmic ECG signals. In most of the researches, this information is used. Because of this reason we add R to R peak distance to all vectors, so we get data vectors with 101 features. Fig. 2.! Illustrative example dataset with 3 classes [17]. Fig. 5.! All collected samples from MIT-BIH database Fig. 3.! Separating functions for different clusters [17]. Additionally, to these processes median filter is used to eliminate base voltages. With this step in-class distances are minimized and noise in the signals are eliminated. Fig. 4.! Final classifier for class-green [17]. Fig. 6.! Example signals that are not passed from median filter
successful one, the second is proposed approach, among all well known algorithms. In future work, we may search the ways of implementation this algorithm in real time embedded systems. Fig. 7.! Example signals that are passed from median filter IV.! COMPUTATIONAL RESULTS The dataset obtained by the MIT-BIH database includes 500 instances and 101 features as we mention in previous section. Preprocessing steps are made in Matlab. The proposed algorithm is implemented with C++. One can find many papers about arrhythmia classification that use MIT-BIH database. But researchers handle data and choose PQRST signals with different approaches. Comparing the accuracies with relevant papers can give idea about the success of the algorithm, but using exactly the same dataset will give fair comparison chance. Because of this reason we compared the computational results with other well known classifiers with using Weka. In Table 1. Test accuracies are given. Accuracies are calculated with 10-fold cross validation. TABLE I.! Method TEST ACCURACIES OF ALGORITHMS Accuracy k-means PCF 98.0 % J48 93.6 % Logistics 96.2 % SMO 96.0 % kstar 97.0 % Ibk 98.8 % Bagging 94.6 % BayesNet 82.4 % One can see that the best result is Ibk Algorithm s. The second successful algorithm is k-means PCF, the proposed one. We didn t mention about times, because all of the algorithms solved the problem in short time. V.! COCLUSIONS In this paper we applied k-means based PCF algorithm to arrhythmia classification. A commonly chosen arrhythmia database by the researchers, MIT-BIH is used to collect PQRST signals. With this research we show that k- Means based PCF algorithm is successful in classifying arrhythmias. In numerical tests Ibk algorithm is the most ACKNOWLEDGMENT The authors would like to thank anonymous referees for their criticism and comments which allowed to improve the quality of the paper. The authors also thank to cardiologist Dr. Özcan Yücel for his help in analyzing the ECG signals, and Prof. Dr. Ömer Nezih Gerek for his guiding in signal processing. This study was supported by Anadolu University Scientific Research Projects Commission under the grant no:1103f035. REFERENCES [1]! F. Hu, M. Jiang, L. Celentano and Y. Xiao, Robust medical ad hoc sensor networks (MASN) with wavelet-based ECG data mining, Ad Hoc Networks, vol. 6, pp. 986-1012, September 2008. [2]! P. De Chazal, M. O'Dwyer and R.B. Reilly, Automatic classification of heartbeats using ECG morphology and heartbeat interval features, IEEE Transactions on Biomedical Engineering, vol. 51, pp. 1196-1206, July 2004. [3]! M. Llamedo and J.P. Martinez, An Automatic Patient-Adapted ECG Heartbeat Classifier Allowing Expert Assistance, IEEE Transactions on Biomedical Engineering, vol. 59, pp. 2312-2320, August 2012. [4]! P. De Chazal and R.B. Reilly, A Patient-Adapting Heartbeat Classifier Using ECG Morphology and Heartbeat Interval Features, IEEE Transactions on Biomedical Engineering, vol. 53, pp. 2535-2543, December 2006. [5]! D.A. Coast, R.M. Stern and G.G. Cano, and S.A. Briller, An approach to cardiac arrhythmia analysis using hidden Markov models, IEEE Transactions on Biomedical Engineering, vol. 37, pp. 826-836, September 1990. [6]! S. Osowski, and L. T. Hoai and T. Markiewicz, Support vector machine-based expert system for reliable heartbeat recognition, IEEE Transactions on Biomedical Engineering, vol. 51, pp. 582-589, April 2004. [7]! Y. H. Hu, S. Palreddy and W. J. Tompkins, A patient-adaptable ECG beat classifier using a mixture of experts approach, IEEE Transactions on Biomedical Engineering, vol. 44, pp. 891-900, September 1997. [8]! E. Uslu, G. Bilgin, Classification of heart arrthymias by using wavelet and merged wavelet packet transforms, IEEE 16th Signal Processing, Communication and Applications Conference (SIU), September 2008. [9]! S. G. Artis, R. G. Mark and G. B. Moody, Detection of Atrial Fibrillation Using Artificial Neural Network, Computers in Cardiology Proceedings September 1991. [10]! B. M. Asl, S. K. Setarehdan and M. Mohebbi, Support vector machinebased arrhythmia classification using reduced features of heart rate variability signal, Artificial Intelligence in Medicine, vol. 44, pp. 51-64, September 2008. [11]! A. R. Sahab, Y. M. Gilmalek, ECG arrhythmias classification using wavelet transform and neural networks, Proceedings of the 2010 international conference on Mathematical models for engineering science, pp. 256-258. [12]! R. N. Gasimov and G. Ozturk, Separation via polihedral conic functions, Optimization Methods and Software, vol. 21, pp. 527 540, 2006. [13]! G. Ozturk, A New Mathematical Programming Approach to Solve Classification Problems, PhD thesis, Eskisehir Osmangazi University, Institute of Scince, 2007. (in Turkish).
[14]! E. Cimen, Arrhythmia Classification via Polyhedral Conic Functions, bachelor degree final project, Anadolu University, Faculty of Engineering, June 2011. [15]! A. M. Bagirov, J. Ugon, D. Webb, G. Ozturk and R. Kasımbeyli, A novel piecewise linear classifier based on polyhedral conic and max min separabilities, TOP, vol. 21, pp. 3-24, April 2013 [16]! G. Ozturk, A. M. Bagirov and R. Kasımbeyli, An incremental piecewise linear classifier based on polyhedral conic separation, Machine Learning, vol. 101, pp. 397-413, October 2015. [17]! G. Ozturk and M. T. Ciftci, Clustering based polyhedral conic functions algorithm in classification, Journal of Industrial and Management Optimization, vol. 11, pp. 921-932, July 2015. [18]! MIT-BIH Arrhythmia Database, physionet.org /physiobank/ database/ mitdb/, September 2016