Comparative Study of Classification Techniques (SVM, Logistic Regression and Neural Networks) to Predict the Prevalence of Heart Disease

Similar documents
Support Vector Machines

Machine Learning 9. week

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Support Vector Machines

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Classification / Regression Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

The Research of Support Vector Machine in Agricultural Data Classification

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Lecture 5: Multilayer Perceptrons

Classifier Selection Based on Data Complexity Measures *

Data Mining: Model Evaluation

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Feature Reduction and Selection

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Machine Learning: Algorithms and Applications

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

y and the total sum of

A Binarization Algorithm specialized on Document Images and Photos

Using Neural Networks and Support Vector Machines in Data Mining

Smoothing Spline ANOVA for variable screening

Cluster Analysis of Electrical Behavior

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

S1 Note. Basis functions.

An Optimal Algorithm for Prufer Codes *

Three supervised learning methods on pen digits character recognition dataset

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

ÇUKUROVA UNIVERSITY INSTITUTE OF NATURAL AND APPLIED SCIENCES. Dissertation.com

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition Based on SVM and 2DPCA

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

TN348: Openlab Module - Colocalization

SVM-based Learning for Multiple Model Estimation

CS 534: Computer Vision Model Fitting

Pattern classification of cotton yarn neps

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

Private Information Retrieval (PIR)

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Wishing you all a Total Quality New Year!

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

X- Chart Using ANOM Approach

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

A Robust LS-SVM Regression

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

Module Management Tool in Software Development Organizations

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Solving two-person zero-sum game by Matlab

Related-Mode Attacks on CTR Encryption Mode

Collaboratively Regularized Nearest Points for Set Based Recognition

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Learning a Class-Specific Dictionary for Facial Expression Recognition

Machine Learning. Topic 6: Clustering

Meta-heuristics for Multidimensional Knapsack Problems

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Announcements. Supervised Learning

Optimizing Document Scoring for Query Retrieval

Unsupervised Learning

An Improved Image Segmentation Algorithm Based on the Otsu Method

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Classifying Acoustic Transient Signals Using Artificial Intelligence

Network Intrusion Detection Based on PSO-SVM

User Authentication Based On Behavioral Mouse Dynamics Biometrics

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

A MODIFIED K-NEAREST NEIGHBOR CLASSIFIER TO DEAL WITH UNBALANCED CLASSES

CLASSIFICATION OF ULTRASONIC SIGNALS

Backpropagation: In Search of Performance Parameters

Classification algorithms on the cell processor

Fast Computation of Shortest Path for Visiting Segments in the Plane

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Classification Of Heart Disease Using Svm And ANN

Face Detection with Deep Learning

A Facet Generation Procedure. for solving 0/1 integer programs

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Human Face Recognition Using Generalized. Kernel Fisher Discriminant

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Data Mining For Multi-Criteria Energy Predictions

Parallel matrix-vector multiplication

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Feature Selection as an Improving Step for Decision Tree Construction

Associative Based Classification Algorithm For Diabetes Disease Prediction

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Evolutionary Wavelet Neural Network for Large Scale Function Estimation in Optimization

A Statistical Model Selection Strategy Applied to Neural Networks

Transcription:

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 Comparatve Study of Classfcaton echnques (SVM, Logstc Regresson and Neural Networks) to Predct the Prevalence of Heart Dsease Dvyansh Khanna, Rohan Sahu, Veeky Baths, and Bharat Deshpande Abstract hs paper does a comparatve study of commonly used machne learnng algorthms n predctng the prevalence of heart dseases. It uses the publcly avalable Cleveland Dataset and models the classfcaton technques on t. It brngs up the dfferences between dfferent models and evaluates ther accuraces n predctng a heart dsease. We have shown that lesser complex models such as logstc regresson and support vector machnes wth lnear kernel gve more accurate results than ther more complex counterparts. We have used F1 score and ROC curves as evaluatve measures. hrough ths effort, we am to provde a benchmark and mprove earler ones n the feld of heart dsease dagnostcs usng machne learnng classfcaton technques. Index erms Cleveland heart dsease dataset, classfcaton, svm, neural networks. I. INRODUCION An mportant aspect of medcal research s the predcton of varous dseases and the analyss of factors that cause them. In ths work, we focus on Heart Dsease, specfcally the Unversty of Calforna (UCI) heart dsease dataset. Varous researches have nvestgated ths dataset for better predcton measures. hrough our effort, we brng out a comparatve understandng of dfferent algorthms n estmatng the heart dsease accurately. Plan of ths paper s as follows: Secton II provdes an nsght nto the dataset used. We follow that up wth past research n ths feld under Secton III. Secton IV has an overvew of the classfcaton models mplemented. It tres to gve an understandng of the workng of the models and what makes them so successful. Secton V has the performance evaluaton mechansms employed frequently n ths feld of analyss. Secton VI has our results and Secton VII concludes the paper wth a summary of fndngs and future research drectons. II. DAASE DEAILS he dataset used n our study s the publcly avalable Manuscrupt receved December 8, 014; revsed Aprl, 015. Dvyansh Khanna, Rohan Sahu, and Bharat Deshpande are wth the Department of Computer Scence, Brla Insttute of echnology and Scences, Plan, Goa Campus, 40376 Inda (e-mal: dvyanshkhanna09@gmal.com, rohan9605@gmal.com, bmd@goa.bts-plan.ac.n). Veeky Baths s wth the Department of Bologcal Scence, Brla Insttute of echnology and Scences, Plan, Goa Campus, 40376 Inda (e-mal: veeky@goa.bts-plan.ac.n). Cleveland Heart Dsease Dataset from the UCI repostory [1]. he UCI heart dsease dataset conssts of a total 76 attrbutes. However, majorty of the exstng studes have used the processed verson of the data consstng of 303 nstances wth only 14 attrbutes. Dfferent datasets have been based on the UCI heart dsease data. Computatonal ntellgence researchers, however, have manly used the Cleveland dataset consstng of 14 attrbutes. he 14 attrbutes of the Cleveland dataset along wth the values and data types are as follows: Age, n years Sex: male, female Chest Pan type (a) typcal angna (angna), (b) atypcal angna (abnang), (c) non-angnal pan (notang), (d) asymptomatc (asympt). hese are denoted by numbers 1 to 4 restbps: Patent's restng blood pressure n mm Hg at the tme of admsson to the hosptal Chol: serum cholesterol n mg/dl Fbs: Boolean measure ndcatng whether fastng blood sugar s greater than 10 mg/dl: (1 = rue; 0 = false) Restecg: electrocardographc results durng rest halach: maxmum heart rate acheved Exang: Booelan measure ndcatng whether exercse nducng angna has occurred Oldpeak: S depresson nduced by exercse relatve to rest Slope: the slope of the S segment for peak exercse Ca: number of major vessels (0-3) colour by fluoroscopy hal: the heart status (normal, fxed defect, reversble defect) he class attrbutes: value s ether healthy or heart dsease (sck type: 1,, 3, and 4). But for our purposes, we ndcated a heart dsease by 1 and healthy by 0. For purpose of ths research, the mult-class classfcaton problem s converted to bnary classfcaton problem. hs facltates better applcaton of the models and also gves a better outlook to the overall problem statement at hand. For ths study, the data was splt nto two equal parts.e., tranng data and testng data. he models were traned on one half and after selecton of parameters through cross-valdaton; t was tested for accuracy on the test data. hs s done to keep a suffcent amount of data from basng the models and thus gvng a completely fresh perspectve for testng. DOI: 10.7763/IJMLC.015.V5.544 414

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 III. PAS RESEARCH Over the past years, a lot of work and research has gone nto better and accurate models for the Heart Dsease Dataset. he work by Nahar J. et al. (013) [] gves a knowledge drven approach. Intally Logstc Regresson was used by Dr. Robert Detrano to obtan 77% accuracy (Detrano, 1989 [3]). Newton Cheung utlzed C4.5, Nave Bayes, BNND and BNNF algorthms and reached the classfcaton accuraces of 81.11%, 81.48%, 81.11% and 80.96%, respectvely (Cheung, 001 [4]). Polat et al. proposed a method that uses artfcal mmune system (AIS) and obtaned 84.5\% classfcaton accuracy (Polat et al., 005 [5]). More results were reported by usng ooldag and WEKA tools. Our study has utlzed Python and machne learnng supportng lbrares. In the case of medcal data dagnoss, many researchers have used a 10-fold cross valdaton on the total data and reported the result for dsease detecton, whle other researchers have not used ths method for heart dsease predcton. For our work, we have used both test-tran splt dea along wth cross-valdaton for optmal parameters selecton. most popular and effectve machne learnng algorthms wdely used n classfcaton and recognton tasks n supervsed learnng. hey have a very strong theoretcal background that makes them ndspensable n ths feld. he basc dea behnd SVMs s as follows: there s some unknown and non-lnear dependency (mappng, functon) y = f(x) between some hgh-dmensonal nput vector x and the vector output y. IV. BRIEF INRODUCION O MODELS IMPLEMENED In ths secton, we gve an understandng of the technques we have used n ths study. We dscuss Logstc Regresson, Support Vector Machnes and Neural Networks. Along wth each of them, provded are the mplementaton detals of the models, such as the cross-valdaton, number of hdden unts used, etc used for predcton of results. hroughout ths secton we try and mantan a balance between the ntutve understandng and the mathematcal formulaton, though the former overshadows the other n certan cases for better expresson of deas. A. Logstc Regresson Logstc Regresson s a standard classfcaton technque based on the probablstc statstcs of the data. It s used to predct a bnary response from a bnary predctor. Let us assume our hypothess s gven by h θ (x). We wll choose: 1 h ( x) g( x) (1) x 1 e where s called the logstc functon or the sgmod functon. Assumng all the tranng examples are generated ndependently, t s easer to maxmze the log lkelhood. Smlar to the dervaton n case of standard lnear regresson, we can use any gradent descent algorthm to acheve the optmal ponts. he updates wll be gven by (θ), where l s the log lkelhood functon. θ: = θ - α θ l 1 gz () () z 1 e In our use of the logstc regresson, we have used L- regularzaton along wth 5 fold and 10 fold cross valdaton on the tranng dataset. LR model gves a good enough test data accuracy of 86%-88% and an mpressve F1-score. B. Support Vector Machnes Support Vector Machnes, SVMs, are clearly one of the Fg. 1. Lnear margn classfer. here s no nformaton about the under-lyng jont dstrbutons of the nput data vectors. he only nformaton avalable s the tranng data. Hence, makng them a true member of the supervsed learnng algorthms class. SVMs construct a hyperplane that separates two classes. Essentally, the algorthm tres to acheve maxmum separaton of the classes. Fg. 1 shows a maxmal classfer for a two dmensonal data problems. he same can be acheved for any dmensonal data. he support vectors are those data ponts whch fall on the boundary planes. As the name suggests, these vectors can be understood to be supportng the hyperplane n classfyng the data accordng to the learned classfer. he followng s the prmal optmzaton problem for fndng the optmal margn classfer: 1 mn,, b (3) ( ) ( ) y ( x b) 1, 1... m We can wrte the constrants as g y x b (4) ( ) ( ) ( ) ( ) 1 0 We now have one such constrant for each tranng example. We construct the Langrangan for our optmzaton problem and take t up as a dual optmzaton problem and solve wth KK constrants. After solvng m ( ( ) ( ) ) 1 x b y x x b m 1 ( ) ( ) y x, x b (5) If data s not lnearly separable, as n Fg., the functon 415

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 ϕ(.) may be used to map each data pont x n to a hgher dmensonal space, and then try to obtan a maxmally separable hyperplane n that space as a classfer. Specfcally, gven a feature mappng ϕ, we defne correspondng kernel to be K( x, z) ( x) ( z) (6) Now we could replace (x, z) everywhere n our algorthm by the kernel K(x, z) n the algorthm. Now gven a ϕ we can easly compute the kernel. But, because of the hgh dmensonalty nvolved, t s computatonally very expensve to calculate ϕ (x). he kernel trck ssued for obtanng the dot products wthout explctly mappng the data ponts nto a hgher dmensonal space. hs helps us evade the curse of dmensonalty n a smple way. Fg.. Non-lnear margn classfer. In our mplementaton, we work wth a grd search technque. Grd search trans an SVM wth each par (C, γ) n the Cartesan product (where C and γ s chosen from a manually specfed dataset of hyper parameters) of these two sets and evaluates ther performance on a held-out valdaton set. We search for optmal hyper-parameters of the model whch gve the least error and better approxmaton. We use three SVM kernels, lnear kernel, polynomal kernel and a radal bass functon kernel. he model for each kernel chooses a set of parameters from a gven set and fts them usng cross-valdaton. For dfferent kernels, the results vary. Mostly, the accuraces on the test data vary n the range of 84%-87%. C. Neural Networks Neural Networks are an extremely popular set of algorthms used extensvely n all sub-felds of Artfcal Intellgence, concsely ntroduced n [6], and thoroughly explaned n the [7]. he strength of a connecton between neuron and j s referred to as w j. Bascally, a neural network conssts of three sets, the vsble set, V, the hdden set, H, and the output set of neurons O. he set V conssts of neurons whch receve the sgnals and pass onto the hdden neurons n set H. In supervsed learnng, the tranng set conssts of nput patterns as well as ther correct results n the form of precse actvaton of all output neurons. Each neuron accepts a weghted set of nputs and responds wth an output, whch s the weghted sum along wth the bas processed by an actvaton functon. he learnng ablty of a neural network depends on ts archtecture and appled algorthmc method durng the tranng. ranng procedure can be ceased f the dfference between the network output and desred/actual output s less than a certan tolerance value. hereafter, the network s ready to produce outputs based on the new nput parameters that are not used durng the learnng procedure. 1) Sngle layer perceptron and back propagaton A sngle layer perceptron (SLP) s a feed-forward network havng only one layer of varable weghts and one layer of output neurons. Along wth nput layer s a bas neuron. For better performance, more than one tranable weght layers are used n the perceptron before the fnal output layers. A SLP s capable of representng only lnearly separable data by straght lnes. Whereas, the two-stage perceptron s capable of classfyng convex polygons by further processng these straght lnes. An extremely mportant algorthm used to tran mult-stage perceptron wth sem-lnear functons s the Back-propagaton of errors. he dea behnd the algorthm s as follows. Gven a tranng example (x, y), we wll frst run forward pass to compute all the actvatons throughout the network. hen, for each node and layer l, the error term s computed whch measures how responsble that node s for any errors n the output. For an output node, the error s drect dfference between the network's actvaton and the true target value whch s gven to us. But for the ntermedate error terms δ (l) of the hdden unt n layer l, we use the weghted average of the error terms of the nodes that uses a (l) as an nput. We make use of a momentum value of 0.1. It specfes what fracton of the prevous weght change s added to the new weght change. Momentum bascally allows a change to the weghts to persst for a number of adjustment cycles. he magntude of the persstence s controlled by the momentum factor. In our mplementaton, we use 15 sgmod hdden unts for approprate feature extracton. Also, the fnal result s calculated over 0 epochs (so that the weghts get learned well enough for sensble predcton), wth a softmax layer as the output layer of neurons. he model gves a test and tran data accuracy n the range 83%-85%. ) Radal bass functon network RBFN s an alternatve to the more wdely used MLP network and s less computer tme consumng for network tranng. RBFN conssts of three layers: an nput layer, a hdden (kernel) layer, and an output layer. he nodes wthn each layer are fully connected to the prevous layer as elaborated n [8] and Fg. 3. he transfer functons of the hdden nodes are RBF. An RBF s symmetrcal about a gven mean or center pont n a mult dmensonal space. In the RBFN, a number of hdden nodes wth RBF actvaton functons are connected n a feed forward parallel archtecture. he parameters assocated wth the RBFs are optmzed durng the network tranng. he RBF expanson for one hdden layer and a Gaussan RBF wth centers u and wdth parameters σ s represented by H X Yk ( K) Wk exp( ) (7) 1 where H s the number of hdden neurons n the layer, W are the correspondng layer's weght and X s the nput vector. Estmatng µ can be a challenge n usng RBFNs. hey can choose randomly or can be estmated usng K-Means clusterng. In our study, we use K-Means to fnd centrods, µ for the 15 RBF neurons by fttng n the tranng data. For σ, we take the standard devatons of the ponts n each cluster. hs also goes by the ntuton behnd RBF actvaton. hese 416

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 are then used n the RBF actvatons of the neural network. Fg. 3. Radal bass functon network. hs model gves a test and tran data accuracy n the range 78%-84%. he varaton occurs because of the selecton mechansm of the centers. 3) Generalzed regresson neural network A GRNN (Specht 1991 [9]) s a varaton of the radal bass neural networks, whch s based on kernel regresson networks. A GRNN does not requre an teratve tranng procedure as back propagaton networks. It approxmates any arbtrary functon between nput and output vectors, drawng the functon estmate drectly from the tranng data. In addton, t s consstent that as the tranng set sze becomes large, the estmaton error approaches zero, wth only mld restrctons on the functon. GRNN conssts of four layers: nput layer, pattern layer, summaton layer and output layer as shown. he summaton layer has two neurons, S and D summaton neurons. S summaton neuron computes the sum of weghted responses of the pattern layer. On the other hand, D summaton neuron s used to calculate un-weghted outputs of pattern neurons. he output layer merely dvdes the output of each S-summaton neuron by that of each D-summaton neuron, yeldng the predcted value to an unknown nput vector. Y n 1 n 1 y.exp F( x, x ) exp F( x, x ) (, ) exp( / ) (8) F x x D (9) D ( X X ).( X X ) (10) F s the radal bass functon between the x, the pont of nqury and x, the tranng samples whch are used as the mean. he dstance between the tranng sample and the pont of predcton s used as a measure of how well each tranng sample can represent the pont of predcton. As ths dstance becomes bgger, the F(x, x ) value becomes smaller and therefore the contrbuton of the other tranng samples to the predcton s relatvely small. he smoothness parameter s the only parameter of the procedure. For our study, ths value was chosen to be 1.30. he search for the smoothness parameter has to take several aspects nto account dependng on the applcaton the predcted output s used for. We used the holdout method. In the holdout method, one sample of the entre set s removed and for a fxed σ GRNN s used agan to predct ths sample wth the reduced set of tranng samples. he squared dfference between the predcted value of the removed tranng sample and the tranng sample tself s then calculated and stored. he removng of samples and predcton of them agan for ths chosen σ s repeated for each sample-vector. After fnshng ths process the mean of the squared dfferences s calculated for each run. hen the process of reducng the set of tranng samples and predctng the value for these samples s repeated for many dfferent values of σ. hs way we get the most sutable σ wth the least error. GRNN gves a test data accuracy of 89% of the dataset. V. PERFORMANCE EVALUAION AND COMPARISON In ths secton, we go through the results and comparatve measures mplemented n the study. In all studes the comparson technques play an mportant role. hey defne how dfferent models are to be compared and thus whether the predcted results wll be useful for further applcatons. Frst, we start wth the measures used followed by a dscusson on our fndngs. 1) F1-Score: It s a very commonly used measure of a test's accuracy. It embodes both precson and recall of the test to compute the score. Precson s the number of true postves dvded by the sum of true postves and false postves. Smlarly, recall s the number of true postves dvded by the sum of true postves and false negatves, whch s the total number of elements that actually belong to the postve class. In bnary classfcaton, recall s also referred to as senstvty. It sn't competent to measure just the recall (100% senstvty can be acheved by predctng all the test cases to be postve), so t s usually combned together wth precson n the form of F1 score. he F1 score can be nterpreted as a weghted average of the precson and recall, where an F1 score reaches ts best value at 1 and worst score at 0. For our purpose, we use the balanced F1 score, whch s the geometrc mean of precson and recall. ) ROC: ROC Curve, or recever operatng characterstc s a graphcal plot that plots true postve rate (whch s same as senstvty) aganst the false postve rate (whch s same as the complementary of specfcty), at varous threshold settngs. Our concern s the Area under the ROC Curve (AUC). It tells how well the test can separate the group beng tested nto those wth and wthout the dsease n queston. Recently, the correctness of ROC curves has been questoned because n some cases AUC can be qute nosy as a classfcaton measure. Nevertheless, t gves a good enough result n our case. he more the area, the better t s. In able I and able II, we can see the F1 scores of each of the models along wth ther respectve accuraces. A smlar ntuton s reflected through the ROC Curves n Fg. 4 and Fg. 5. he RBF network doesn't fare as well as other networks. 417

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 ABLE I: LOGISIC REGRESSION AND SVM RESULS MODEL KERNEL CV ACCURACY (ES DAA) % ACCURACY (RAIN DAA) % F1 SCORE LOGISIC REGRESSION NA 5 FOLD 86.8 84.8 0.87 LOGISIC REGRESSION NA 10 FOLD 88. 8.8 0.88 SVM LINEAR 5 FOLD 87.6 84. 0.88 SVM LINEAR 10 FOLD 87 8 0.87 SVM RBF 5 FOLD 84.8 86 0.85 SVM RBF 10 FOLD 84.8 86 0.85 SVM POLY 5 FOLD 84.7 85.5 0.85 SVM POLY 10 FOLD 84.7 85.5 0.85 ABLE II: NEURAL NEWORK RESULS MODEL NO OF HIDDEN UNIS PARAMEERS ACCURACY (ES DAA) % ACCURACY (RAIN DAA) % F1-SCORE BACK PROPAGAION 15 NA 83-85 83-85 0.86 RBF (K-MEANS) 15 NA 78-84 78-84 0.8 GRNN NA 1.3 89 NA 0.88 Fg. 4. ROC curve for grdsearch (SVM). Performance results were presented based on the predcton outcomes of the test set. As evdent from the results from the adjonng tables and plots, Logstc Regresson and the SVM approach gve better performance, partcularly wth lnear kernel. Among neural networks, the GRNN method stands out whle the RBF NN doesn't prove to be very useful. Also, the number of hdden unts plays a small role n defnng the shape of the predctons. For network wth a very large number of hdden neurons, for e.g. 150, the tranng accuracy ncreases by a sgnfcant margn but suffers a mnor fall on the F1 Score. he ROC of RBF network also sgnfes that for a gven set of parameters the area under the curve s comparatvely low as compared to other network classfcaton technques. Fg. 5. ROC curve for neural networks. VI. RESULS AND DISCUSSION A common approach to reportng performance of classfers s by performng a 10-fold cross valdaton on a provded dataset and report performance results on the gven dataset. However, ths method s expected to be based to the tranng data and may not reflect the expected performance when appled on real-lfe data. So, n addton to generally used 10-fold cross valdaton, we have also performed a tran-test splt on the dataset and then used a 10-fold cross valdaton to select the best parameter for tranng. VII. CONCLUSIONS hs study provdes a benchmark to the present research n the feld of heart dsease predcton. he dataset used s the Cleveland Heart Dsease Dataset, whch s to an extent curated, but s a vald standard for research. hs paper has provded detals on the comparson of classfers for the detecton of heart dsease. We have mplemented logstc regresson, support vector machnes and neural networks for classfcaton. he results suggest SVM methodologes as a very good technque for accurate predcton of heart dsease, especally consderng classfcaton accuracy as a performance measure. Generalzed Regresson Neural Network gves remarkable results, consderng ts novelty and unorthodox approach as compared to classcal models. Overall for the heart dsease dataset, smpler methods lke logstc regresson and SVM wth lnear kernel prove to be more mpressve. hs study can be further extended by utlzng these results n makng technologes for accurate predcton of heart dsease n hosptals. It can enhance the capabltes of tradtonal methods and reduce the human error, thereby makng a contrbuton to the scence of medcal dagnoss and analyss. ACKNOWLEDGEMENS We would lke to sncerely thank the Cardology 418

Internatonal Journal of Machne Learnng and Computng, Vol. 5, No. 5, October 015 Department at Goa Medcal College for helpng us understand the medcal detals of ths project. her support also helped us valdate the 14 attrbutes chosen for ths project. We would also lke to thank the Dean of Goa Medcal College for allowng us to nteract wth doctors at GMC. REFERENCES [1] Uc. 010. V. A. Medcal Center, Long Beach and Cleveland Clnc Foundaton: Robert Detrano, M.D., Ph.D. [Onlne]. Avalable: http://archve.cs.uc.edu/ml/machne-learnng-databases/heart-dseas e/heart-dsease.names [] J. Nahar, Computatonal ntellgence for heart dsease dagnoss: A medcal knowledge drven approach, Expert Systems wth Applcatons, pp. 96-104, 013. [3] R. Detrano, A. Janos, W. Stenbrunn, M. Pfsterer, J. Schmd, S. Sandhu et al., Internatonal applcaton of a new probablty algorthm for the dagnoss of coronary artery dsease, Amercan Journal of Cardology, vol. 6, pp. 304-310, 1989. [4] N. Cheung, Machne learnng technques for medcal analyss, B.Sc. hess, School of Informaton echnology and Electrcal Engneerng, Unversty of Queenland, 001. [5] K. Polat, S. Sahan, H. Kodaz, and S. Gunes, A new classfcaton method to dagnoss heart dsease: Supervsed artfcal mmune system (AIRS), n Proc. the urksh Symposum on Artfcal Intellgence and Neural Networks (AINN), 005. [6] D. Kresel. A Bref Introducton to Neural Networks. [Onlne]. Avalable: http://www.dkresel.com/en/scence/neural\_networks [7] C. M. Bshop, Neural Networks for Pattern Recognton, Oxford Unversty Press, 1995. [8]. Haste, R. bshran, and J. Fredman, Elements of Statstcal Learnng, Sprnger-Verlag, New York, 001. [9] D. F. Specht, A general regresson neural network, IEEE ransactons on Neural Networks. vol., no. 6, 1991. [10] Y. S. Abu-Mostafa. Learnng wth data. [Onlne]. Avalable: http://amlbook.com/support.html [11] V. N. Vapnk, Statstcal Learnng heory, John Wley & Sons, Inc., New York, 1998. [1] S. A. Hannan, R. R. Manza, and R. J. Ramteke, Generalzed regresson neural network and radal bass functon for heart dsease dagnoss, Internatonal Journal of Computer Applcatons, vol. 7, no. 13, 010. Dvyansh Khanna s a student completng hs B.E. (hons) n computer scence and M.Sc. (hons) n mathematcs from BIS Plan KK Brla Goa Campus. He dd hs schoolng from New Delh. Hs research nterests nclude machne learnng, neural networks and parallel computng. He has worked as a summer technology ntern n a tech frm n Hyderabad and wll be workng towards hs thess. Hs current projects nclude portng popular machne learnng algorthms to a parallelzed mplementaton to ncrease speedup. He was awarded the Inspre Scholarshp for mertorous performance n natonal level board examnaton. Rohan Sahu s a computer scence graduate from BIS-Plan, Goa. He has worked at Oracle Bangalore n the feld of database development. He has spent the last year workng on machne learnng and data scence projects that ncludes a research stnt at BIS-Plan, Goa, where he worked on applyng machne learnng to the feld of healthcare. Rohan s currently workng at a dgtal health frm n Gurgaon, Inda where he handles analytcs and pre-sales roles. Veeky Baths s an assocate professor n BIS Plan Goa. Veeky s research areas and core competences are n the feld of systems bology, machne learnng, bologcal sequence analyss and metabolc network analyss. Veeky s applyng graph theory and computatonal approach to understand the ntegrty and robustness of metabolc network, whch wll be a great help for knowledge based drug dscovery. When complex bologcal networks are represented as a graph then t becomes amenable to varous types of mathematcal analyss and there s a reducton n the complexty. Graphs or networks are abstract representaton of more complex bologcal process. Veeky joned the Department of Bologcal Scences n 005. He obtaned hs B.Sc degree from Ravenshaw Unversty and M.Sc. n bonformatcs from the Orssa Unversty of Agrculture and echnology. He completed hs Ph.D. degree n scence from Bts Plan K.K.Brla Goa Campus n 011. He then obtaned an MBA from Goa Insttute of Management. Bharat Deshpande receved hs Ph.D. degree from II Bombay n 1998. hen he receved postdoctoral fellowshp from Department of Atomc Energy. Dr. Deshpande joned BIS Plan n 001 and moved to BIS Plan, Goa Campus n 005. Snce 006 he has been headng the Department of Computer Scence & Informaton Systems at Goa. Apart from basc courses n computer scence, he has taught specalzed courses lke algorthms, theory of computaton, parallel computng, artfcal ntellgence & few more. Hs research nterests are n areas of complexty theory, parallel algorthms, and data mnng. Over some years he has supervsed numerous masters and doctoral students. He has many natonal and nternatonal publcatons to hs credt. He s also on the board of studes of Unversty of Goa and College of Engneerng, Pune. Dr. Deshpande was also the vce presdent of the Goa Chapter of Computer Socety of Inda and s currently vce presdent of the ACM Professonal Goa Chapter. 419