LAWRA: a layered wrapper feature selection approach for network attack detection

Size: px

Start display at page:

Download "LAWRA: a layered wrapper feature selection approach for network attack detection"

Reginald Leonard
5 years ago
Views:

1 SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8: Published online 26 May 2015 in Wiley Online Library (wileyonlinelibrary.com) RESEARCH ARTICLE LAWRA: a layered wrapper feature selection approach for network attack detection Sangeeta Bhattacharya* and Subramanian Selvakumar Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India ABSTRACT The feature selection phase in network attack detection is mostly classifier based, while clustering techniques are used for labeling and creating compact training datasets. Because clustering finds natural groupings in the data, in this paper, a clustering-based layered wrapper feature selection approach, LAWRA, has been proposed for selecting appropriate features for attack detection. The existing layered feature selection approaches in attack detection are unable to give results with high precision and recall because of the depence on classifier accuracy, fitness value, and so on. Hence, in this paper, LAWRA uses external cluster validity indices, F-measure, and Fowlkes Mallows index, for feature selection. The two indices are the harmonic and geometric mean of precision and recall, respectively. Each index identifies features that give high precision and high recall of the attack detection algorithm. The first layer of LAWRA identifies the feature subset that best distinguishes between normal and attack instances and the second layer identifies the best cooperating features using cooperative game theory. Experiments have been conducted on NSL-KDD dataset, and LAWRA has been compared with the existing approaches using different classifiers. The results show that LAWRA gives better overall accuracy and F-measure value than the other approaches. Copyright 2015 John Wiley & Sons, Ltd. KEYWORDS network attack; feature selection; layered; wrapper; external cluster validity indices *Correspondence Sangeeta Bhattacharya, Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India @nitt.edu 1. INTRODUCTION Internet is very important in connecting people throughout the world. Today, its usage has increased 10-fold from 1999 to 2013 [1]. The increased internet usage has given rise to different types of network attacks and cybercrimes. It can be seen from [2] that round the clock, cyber attacks are being carried out from different countries. Such cybercrime accounts for almost 400bn dollar annual cost to global economy [3], which demands for efficient detection mechanisms. It is shown that the efficiency of the detection algorithms in terms of computational complexity and accuracy deps highly on the selected feature set [4]. Thus, selecting appropriate features is one of the important data-preprocessing steps in attack detection. Feature selection can be achieved using filter and wrapper methods [5]. While filter methods evaluate each feature indepently based on some goodness criteria, wrapper methods [6] evaluate the possible feature subsets with respect to a particular learning algorithm. As such, wrapper methods are computationally expensive but results in a better feature subset. Feature selection can be supervised where features are selected by checking the feature relevancy with respect to the class. In unsupervised feature selection methods, class labels are not available, and feature relevancy is calculated using data variance and partition quality. In intrusion and attack detection paradigm, feature selection has been mostly supervised [7 13]. Also, it is seen that the feature selection approaches are based on a classifier performance, thus biasing the feature selection phase towards the particular classifier. Such feature selection approaches when used with other classifiers result in high false alarms. Unsupervised clustering-based feature selection algorithms have not been much explored in attack detection. Clustering has been mostly applied for detecting intrusions and attacks, if any [14,15], and in the datapreprocessing stage for obtaining labels of the unlabeled dataset and creating compact dataset [11,16,17]. Feature selection using clustering can be achieved by internal or external validity indices [18 22]. While internal indices use intracluster similarity to evaluate a clustering Copyright 2015 John Wiley & Sons, Ltd. 3459

2 LAWRA S. Bhattacharya and S. Selvakumar result, external validity indices are used for validating the clustering result against a ground truth such as a reference cluster or the existing label of the dataset [23]. The main aim of this paper is to identify features for attack detection from the available labeled attack datasets. Hence, in this paper, a clustering-based layered wrapper feature selection approach, LAWRA, is proposed using two external validity indices, F-measure (FM) and Fowlkes Mallows index (FMI). A layered approach helps in identifying features possessing various properties such as relevancy with the class and cooperativeness among the features, so as to improve the efficiency of the classifiers. F-measure and FMI are the harmonic and geometric mean of precision and recall, respectively. Precision is the proportion of attacks among all the attacks detected by the detection algorithm, and recall is the proportion of attack instances that were correctly identified. Hence, feature set identified using FM and FMI increases the efficiency of a detection algorithm in detecting attacks. Also, it is shown in [24] that FM and FMI are good measures for evaluating the clustering result for large datasets. The first layer of the proposed approach, LAWRA, helps in identifying the feature subset with the best distinguishing ability between normal and attack instances, while the second layer identifies the features with good coalition ability with other features using a cooperative game theory framework [25]. The clustering algorithm that is used for feature selection in the proposed approach is the one that produces mutually exclusive clusters where each instance belongs to a single cluster. In this paper, NSL-KDD dataset [26] has been used for the experiments to analyze the performance of the proposed approach. WEKA tool [27] and MATLAB R2012a (The MathWorks, Inc. Natick, Massachusetts USA) have been used to carry out all the experiments. The rest of the paper is organized as follows: In Section 2, a review of the related papers is given. Section 3 gives the motivation for this research paper. Section 4 describes the proposed model and the proposed algorithms. Section 5 discusses the results of the experiments performed. Finally, Section 6 presents the conclusion and the ongoing work. 2. RELATED WORK 2.1. Layered feature selection approach in attack detection In [28], a bilayer feature selection approach was used where the two layers select features based on global maxima and local maxima of classification accuracy, thus making the feature selection classifier depent. In [29] also, two layers were used to select features based on Pearson correlation between the features and the class. A fitness value was used to rank the features. The drawback of this scheme is that it is difficult to determine the suitable fitness value Cluster validity indices for feature selection Unsupervised feature selection can be categorized into two approaches: feature clustering and data clustering. In this paper, the second approach is followed. In [18], a wrapper unsupervised feature-clustering approach has been proposed where two internal validity indices, viz. criteria scatter separability and maximum likelihood, have been used for evaluating the candidate feature subsets, while in [19], internal validity index Silhouette criterion has been used for selecting the feature subset. However, for complex clusters, internal indices do not correlate well with the algorithm error. Adjusted Rand index has been used in [20] for ranking the features and selecting the required feature. Deciding the number of features from rank of features may be difficult. In [21], features were ranked according to cluster importance on features, calculated using different decision tree-based measures using single-pass clustering algorithm. However, to identify the feature subset according to feature importance, a rapid changing point need to be determined, which is difficult. In [22], a generalized feature selection framework was defined using greedy forward selection, and clusters were evaluated using three different validity indices, adjusted Rand, Jaccard, and Fowlkes Mallows. A threshold was used to determine the feature subset. Determining an appropriate value of the threshold is difficult Cooperative game theory for feature selection In [30] and [31], a wrapper approach was proposed where features were ranked according to the Shapley value of each feature. The Shapley value was based on the classification accuracy. The feature ranking generated using classification accuracy makes it depent on the classifier. In [32] and [33], mutual information was used to define the relevancy of the features with the class and redundancy among the features. While in [32], the features were ranked according to Shapley value of each feature, in [33], features were ranked by calculating the Banzhaf power of each feature. 3. MOTIVATION Clustering-based feature selection identifies the features that produce natural partitions of the data and thus enables detection of new attacks. The existing feature selection approaches in attack detection are mostly classifier depent, which when used with other classifiers result in high false alarms. This motivated us to propose a classifier-indepent clustering-based feature selection in this paper to overcome the drawback in the existing approaches and to improve the performance of the attack detection algorithms Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

3 S. Bhattacharya and S. Selvakumar LAWRA 4. LAWRA: A PROPOSED LAYERED WRAPPER FEATURE SELECTION APPROACH The proposed feature selection approach, LAWRA, is shown in Figure 1. The three main modules in the proposed approach are Feature Ranker, Best Feature Identifier (layer 1), and Best Feature Correlator (layer 2). These modules are described as follows: 4.1. Feature Ranker module This module takes as input the original feature set S and returns the ranked feature set S. In this paper, nonparametric method is used for generating the rank of the features. Nonparametric methods have been used in [34] for feature ranking. As the underlying distribution of the network data is unknown and in this paper, only two groups, normal and attack, have been considered, the nonparametric method Mann Whitney U-test is best suited. A parameter called effect size [35] is used to rank the features. Effect size ES gives the degree of association between the two groups and is calculated as given in Equation (1): p ES ¼ jz* j= ffiffiffi n where z * score is the z-score, which indicates whether there exists a significant difference between the two groups, and n is the sum of the sample sizes of the two groups. The effect size is between 0 and 1, and according to [35], effect size of 0.10 is small, 0.30 is middle, and 0.50 is large. Thus, the features of S are ranked according to high effect size to low effect size and stored in the set S Best Feature Identifier (layer 1) module There are two submodules in this layer, one is F-measurebased feature selection (FFS) and the other is conditional mutual information (CMI)-based feature selection (CMIFS). The two submodules work in parallel and take ranked feature set S as input. (1) Figure 1. LAWRA: proposed layered wrapper feature selection approach. CMI, conditional mutual information. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3461

4 LAWRA S. Bhattacharya and S. Selvakumar F-measure-based feature selection. In this submodule, Algorithm FFS is used to generate the feature subset F FM. For Algorithm FFS, external cluster validity criterion FM is used to evaluate each feature. FM is defined as follows: Given a cluster i and class j, FM is the harmonic mean of the precision P and recall R of cluster i as given in Equation (2), F-measure ðfmþ ¼ β2 þ 1 PR β 2 S (2) PR where β =1,P = the number of data points that are common in cluster i and class j divided by the number of data points in cluster i, andr = the number of data points that are common in cluster i and class j divided by the number of data points in class j [23]. Forward selection search is used in Algorithm FFS to add features to the set F FM from the set S. A feature f is relevant to set F FM if FM(S, f, L D ) > FM(S, L D ) and redundant if FM(S, f, L D )=FM(S, L D ). L D is the existing class labels of the dataset, which is used for calculating the FM. Algorithm FFS (F-Measure Based Feature Selection) Input: i. Dataset D ii. Ranked Feature Set S iii. Class labels of Dataset L D Output: Feature set F FM Steps: 1. Initialize F FM ={} 2. Initialize BestMeasure = 0 3. for each feature f i of S where i =1,, ndo a) F I = F FM f i b) CurrCluster = Cluster(Dataset, F I ) c) CurrMeasure = FM(CurrCluster, L D ) d) if CurrMeasure > BestMeasure then i) F FM = F I ii) BestMeasure = CurrMeasure CMI-based feature selection. In this submodule, Algorithm CMIFS is used to generate the feature subset F CMI. In Algorithm CMIFS, CMI has been used to generate the feature set F CMI. The CMI of two random variables X and Y given another random variable Z is computed using Equation (3), IX; ð Yj Þ ¼ HXjZ ð Þ HXjY; ð ZÞ (3) where H(X Z) = the conditional entropy of X given Z and H(X Y,Z) = the conditional entropy of X given both Y and Z [36]. If f is the feature to be considered for inclusion in the set F CMI, such that f F CMI, and L D is the class labels of the given dataset, then the mutual information of set F CMI with class L D should be maximized in the presence of f. Thus, in this paper, the CMI I(F CMI ; L D f) is calculated in order to evaluate the impact of the newly added feature f on the information shared by already selected feature set F CMI with the class L D. Hence, f is relevant to set F CMI if CMI(S, f, L D ) > CMI(S, L D ) and redundant if CMI(S, f, L D )=CMI (S, L D ). In this algorithm also, features are added to the set F CMI from the set S using forward selection search. Algorithm CMIFS (Conditional Mutual Information Based Feature Selection) Input: i) Dataset D ii) Ranked Feature Set S iii) Class labels of Dataset L D Output: Feature set F CMI Steps: 1. Initialize F CMI ={f 1 : f 1 S } 2. BestMeasure = MutualInformation(F CMI, L D ) 3. for each feature f i of S where i =2,, ndo a) CurrMeasure = CMI(F CMI, L D, f i ) b) if CurrMeasure > BestMeasure then i) F CMI = F CMI f i ii) BestMeasure = CurrMeasure 4.3. Best Feature Correlator (layer 2) module There are two submodules in this layer, one is Fowlkes Mallows-based Banzhaf index calculation and the other is selection of k features. The input to this module is the feature set F diff as shown in Figure 1 and the output is the top k cooperating features Fowlkes Mallows-based Banzhaf index calculation. In this submodule, external cluster validity index FMI proposed in a cooperative game theory framework has been used. This submodule takes the feature set F diff as input, which is calculated using Equation (4), F diff ¼ ðf FM F CMI Þ ðf CMI F FM Þ (4) Cooperative game theory. In cooperative game theory, there are two elements: (i) a set of n players and (ii) a characteristics function v(s) to specify the value of different subsets of n players. Each subset of n players is a coalition in which players come to an agreement. v(s) is zero for an empty coalition, while for each nonempty coalitions, v(s) R is a scalar value. The contribution of each player to the coalition 3462 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

5 S. Bhattacharya and S. Selvakumar LAWRA can be found by the use of a value function, which associates a value to each n player. The value of each n player in the coalition can be found out by using Banzhaf power index [37]. A critical player is one whose leaving a winning coalition turns the coalition into a losing one. The power of the player is proportional to the number of times the player has been identified as critical. In this work, each of the features of F diff is the players. The Banzhaf power index of each feature in the set F diff is calculated using Algorithm FMBIC. The Banzhaf power index determines how many times a feature has been critical in making a coalition win and hence indicates the cooperativeness of the feature with the different feature coalitions. Thus, a high Banzhaf power index of a feature indicates that the feature is cooperative with most of the feature coalitions and hence with most of the features. Therefore, the higher the Banzhaf power index of a set of features, the better is the detection algorithm (classifier) result Algorithm FMBIC. FMI is defined as follows: Given a cluster i and class j, FMI index is the geometric mean of the precision P and recall R of cluster i as given in Equation (5), p FMI ¼ ffiffiffiffiffiffiffiffi PR (5) where P = the number of data points that are common in cluster i and class j divided by the number of data points in cluster i and R = the number of data points that are common in cluster i and class j divided by the number of data points in class j [23]. In Algorithm FMBIC, to determine the winning or losing status of a coalition, FMI has been used. FMI is calculated for the original feature set F diff. Then, a winning coalition Co of any size is found when it satisfies Equation (6), FMIðCoÞ > FMI F diff For a winning coalition, the power of each feature f is calculated as follows: If FMI(Co f) < FMI(F diff ), then the feature f is considered as a critical player and power of f is increased by 1. The computational complexity of Algorithm FMBIC is proportional to the number of evaluated feature subsets of F diff. Hence, in order to reduce the computational complexity, the subset size of each feature is limited by a parameter ω as in [32,33]. Although for different datasets, ω may differ, experiments in [32] and [33] have shown that ω =3 gives good performance for Support Vector Machine (SVM) classifier for most of the datasets. Hence, in this work, ω = 3 has been used. The output of this submodule is the Banzhaf power index vector BV of the set F diff. In this algorithm also, L D is the existing class labels of the dataset, which is used for calculating the FMI. (6) Algorithm FMBIC (Fowlkes Mallows based Banzhaf Index Calculation) Input: i) Dataset D ii) Feature Set F iii) Class labels of Dataset L D Output: BV: Banzhaf Power Index Vector of the set F Steps: 1. CurrCluster = Cluster(Dataset, F) 2. BestMeasure = FMI(CurrCluster, L D ) 3. for each feature f i of F where i =1,, ndo BI(i)=0 4. for each feature f i of F where i =1,, ndo a) Create feature subset C = {c 1,,c t } over F\ f i limited by ω; b) for each subset c j of C where j =1,, tdo i) CurrCluster = Cluster(Dataset, c j ) ii) CurrMeasure = FMI (CurrCluster, L D ) iii) if CurrMeasure > BestMeasure then a) F I = c j f i b) ICluster = Cluster(Dataset, F I ) c) IMeasure = FMI(ICluster, L D ) d) if IMeasure < BestMeasure then Increment BI(i) by one Selection of k features. This submodule takes the Banzhaf power index vector BV as input. The process for selecting the value of k is as follows: (i) The feature set F diff is arranged according to the increasing order of BV. (ii) Each feature of F diff is added to the set F sequentially, and the corresponding FM of the set F is calculated. (iii) The position of feature f, which gives the highest FM value for set F, is the value of k Combine The final feature subset F F is the combination of the feature set F c and the k features from layer EXPERIMENTS AND RESULTS In this section, the dataset used for experiments, experimental setup, preprocessing, and the experimental results have been discussed. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3463

6 LAWRA S. Bhattacharya and S. Selvakumar Table I. Features in NSL-KDD dataset. Sl. no. Feature name Type Category Sl. no. Feature name Type Category 1. duration Cont. Basic 22. is_guest_login Binary Content 2. protocol_type Discrete Basic 23. count Cont. Time-based 3. service Discrete Basic 24. srv_count Cont. Time-based 4. Flag Cont. Basic 25. serror_rate Cont. Time-based 5. src_bytes Cont. Basic 26. srv_serror_rate Cont. Time-based 6. dst_bytes Cont. Basic 27. rerror_rate Cont. Time-based 7. land Cont. Basic 28. srv_rerror_rate Cont. Time-based 8. wrong_fragment Cont. Basic 29. same_srv_rate Cont. Time-based 9. urgent Cont. Basic 30. diff_srv_rate Cont. Time-based 10. hot Cont. Content 31. srv_diff_host_rate Cont. Time-based 11. num_failed_logins Cont. Content 32. dst_host_count Cont. Connection-based 12. logged_in Binary Content 33. dst_host_srv_count Cont. Connection-based 13. num_compromised Cont. Content 34. dst_host_same_srv_rate Cont. Connection-based 14. root_shell Binary Content 35. dst_host_diff_srv_rate Cont. Connection-based 15. su_attempted Binary Content 36. dst_host_same_src_port_rate Cont. Connection-based 16. num_root Cont. Content 37. dst_host_srv_diff_host_rate Cont. Connection-based 17. num_file_creations Cont. Content 38. dst_host_serror_rate Cont. Connection-based 18. num_shells Cont. Content 39. dst_host_srv_serror_rate Cont. Connection-based 19. num_access_files Cont. Content 40. dst_host_rerror_rate Cont. Connection-based 20. num_outbound_cmds Cont. Content 41. dst_host_srv_rerror_rate Cont. Connection-based 21. is_hot_login Binary Content 5.1. Dataset used The dataset used for the experimental purpose in this paper is NSL-KDD. This dataset was created from KDD Cup 99 intrusion detection dataset. KDD Cup 99 has some wellknown problem as found in [38] and [39]. One more problem is the presence of duplicate records. In NSL-KDD, such duplicate records have been removed and validated using seven learning algorithms. In this paper, the training dataset KDDTrain+ and KDDTest+, given in [40], which consisted of and records, respectively, have been used as the training and test data for the experimental purpose. The datasets consisted of 41 features as shown in Table I, which are classified into two classes, normal and anomaly. The anomaly class represented the attack class. As the main aim is to identify the features capable of detecting normal and attack network data, the binary class dataset was preferred for this work Experimental setup MATLAB R2012a has been used for implementing the proposed feature selection approach. For the feature selection phase, KDDTrain+ dataset has only been used. WEKA has been used for verifying the obtained result using four different classifiers, naive Bayes, SVM (LibSVM in WEKA), knn (IBK in WEKA), and AdaBoost. The classifiers have been chosen according to [41]. KDDTrain+ has been used for training the classifiers, and KDDTest+ has been used for testing the performance of the trained models. Tenfold crossvalidation has been used for training the classifiers, which resulted in the dataset with 10 equal partitions. At each of the 10 folds, 9 data partitions were used for training and 1 data partition was used for validating the model. The data partition used for validation was different for each of the 10 folds Preprocessing Before using the proposed approach on KDDTrain+ dataset for feature selection, the numeric features have been normalized in the range [0, 1]. The nominal and the binary features have been kept unchanged. Before training the classifiers, the numeric features of both the training and test dataset have been discretized using a static supervised approach using minimum description length criterion [42]. Simple k-means has been used as the clustering algorithm. The reason for using simple k-means is that it is easy to use and also it assigns each instance to separate clusters. Further, simple k-means can handle numeric features. Hence, for the algorithms FFS and FMBIC, only 38 numeric and binary features have been considered. But, for Algorithm CMIFS, the discretized dataset has been used. As such, all the 41 features have been considered Experimental results The class labels of dataset L D in Algorithms FFS, CMIFS, and FMBIC are the classes, normal and anomaly, of NSL- KDD dataset Results of the proposed approach, LAWRA. In Table I, it can be seen that feature numbers 2, 3, and 4 are nominal in type. Hence, these three features have not been considered for Algorithms FFS and FMBIC, which use simple k-means, but have been considered for 3464 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

7 S. Bhattacharya and S. Selvakumar LAWRA (a) F-Measure (b) Overall Accuracy of four classifiers Figure 2. Effect of the features in F diff. (a) F-measure and (b) overall accuracy of four classifiers. Algorithm CMIFS. Hence, the nominal features selected by Algorithm CMIFS were retained in the final feature set. In order to determine the value of k, FM of the 20 features in the set F diff is calculated and plotted in Figure 2a. The process starts with the feature having the highest Banzhaf power index and then incrementally adding a feature from F diff from the highest to the lowest Banzhaf power index. At the same time, the overall accuracy of the four classifiers, naive Bayes, SVM, knn, and AdaBoost, has been plotted for the set F C and then a feature from F diff from the highest to the lowest Banzhaf power index in Figure 2b is incrementally added. The overall accuracy of the classifiers is obtained from the 10-fold cross-validation of the training set KDDTrain+. In Figure 2, each step in the x-axis corresponds to the addition of a feature from set F diff. For instance, step 12 in Figure 2a represents the inclusion of 12 features with the highest Banzhaf power index. It can be seen from Figure 2a that FM reaches the highest at the 15th step. This corresponds to the first 15 features with the highest Banzhaf power index. Hence, in this work, k = 15 is chosen. From Figure 2b, it is observed that the classifiers overall accuracy converges at the 13th step, which corresponds to the first 13 features having the highest Banzhaf power index. The classifiers accuracy remains steady for the remaining steps. These classification results agree with the feature set identified by LAWRA and hence validates the selection of the first 15 features with high Banzhaf power index for k. The final feature set returned by LAWRA is {1, 3, 5, 6, 8, 10, 11, 12, 13, 17, 19, 27, 29, 34, 38, 39}, of size Comparison with the existing approaches. The performance of the feature set obtained using LAWRA is compared with four other approaches. The approaches are as follows: (i) linear correlation-based approach [29]; (ii) Feature Vitality Based Reduction Method (FVBRM) algorithm [10] feature set selected using naive Bayes classifier; (iii) CfsSubsetEval algorithm [43] evaluates the worth of a subset of features by considering the individual predictive ability of each feature along with the degree of redundancy between them; and (iv) ConsistencySubsetEval algorithm [43] evaluates the worth of a subset of features by the level of consistency in the class values when the training instances are projected onto the subset of features. The search algorithm used for (iii) and (iv) was BestFirst, which searches the space of feature subsets by greedy hill climbing augmented with a backtracking Table II. Overall accuracy and F-measure of different feature selection approaches. Feature selection approaches Classifiers Naive Bayes SVM knn AdaBoost OA FM OA FM OA FM OA FM Number of features selected Nil LC FVBRM CfS + BF ConSub + BF LAWRA (proposed approach) OA, overall accuracy; LC, linear correlation-based; FM, F-measure; CfS + BF, CfsSubsetEval + BestFirst; ConSub + BF, ConsistencySubsetEval + BestFirst. The values were set in bold in Table II to highlight the result that the overall accuracy (OA) and F-Measure (FM) of the proposed approach LAWRA is better than the other approaches. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3465

8 LAWRA S. Bhattacharya and S. Selvakumar (a) Naive Bayes (b) SVM (c) knn (d) AdaBoost Figure 3. Receiver operating characteristic (ROC) curve for the class attack. facility. The direction used for searching the feature set is forward [43] Accuracy and F-measure comparison. The performance metrics chosen for comparison are as follows: jtpjþjtnj jtpjþjfnjþjtnjþjfpj jtpjþjfnjþjfpj i Overall accuracy = jtpj ii FM = where TP = true positive, TN = true negative, FP = false positive, and FN = false negative represent the number of attack instances correctly classified, normal instances correctly classified, normal instances incorrectly identified as attack, and attack instances incorrectly identified as normal by the classifiers, respectively. Table II shows the comparison between the overall accuracy and FM of LAWRA with all 41 features and other approaches for the four classifiers. The overall accuracy was obtained by training the classifiers with KDDTrain+ dataset using 10-fold cross-validation and then testing with KDDTest+ dataset. The number of features selected by each approach is shown in the last column of Table II. From Table II, it can be seen that the performance of LAWRA is better than that of the other approaches for most of the classifiers both for the overall accuracy and FM. The only exception is the SVM performance of the approach FVBRM, which is better than LAWRA. For the two classifiers knn and AdaBoost, performance of the original 41 features is slightly better than that of LAWRA. However, the size of the feature set selected by LAWRA is only 16, which will reduce the training and testing time compared with 41 features Receiver operating characteristic (ROC) curve. Figure 3 shows the ROC curve (true positive rate versus false positive rate) of the four classifiers for the class attack, for the different feature selection approaches. The notations are the same as used in Table II. It can be seen from Figure 3 that for the class attack, LAWRA is better than all the approaches in naive Bayes. For SVM and knn, LAWRA has a higher true positive rate and lower false positive rate than the other approaches initially but not so good afterwards. For AdaBoost, LAWRA and FVBRM have almost the same performance, which is better than the other approaches. 6. CONCLUSION In this paper, a layered wrapper approach of feature selection, LAWRA, has been proposed for attack detection. LAWRA used two external cluster validity indices, FM and Fowlkes Mallows, which are based on precision and recall. The first layer of LAWRA identified the feature subsets that produced good distinction between normal and 3466 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

9 S. Bhattacharya and S. Selvakumar LAWRA attack data. The second layer utilized the first layer information to determine the features with good coalition capability with other features using cooperative game theory framework. LAWRA was compared with some of the existing feature selection approaches used in attack detection. Four classifiers were used for evaluating the performance of the feature selection approaches on NSL- KDD dataset. The experimental results show that LAWRA performs better than the other approaches both in terms of overall accuracy and FM. Also, the ROC curves show that LAWRA has a higher true positive and lower false positive rate for the classifiers. The ongoing work is the design of an attack detection algorithm by using LAWRA. REFERENCES 1. (Accessed on 1 December 2014) 2. (Accessed on 1 December 2014) 3. Net Losses: Estimating the Global Cost of Cybercrime, Economic impact of cybercrime II, Center for Strategic and International Studies, June 2014, McAFee. 4. Mukkamala S, Sung AH. Significant feature selection using computational intelligent techniques for intrusion detection. In Advanced Methods for Knowledge Discovery from Complex, Data Maulik U, Holder LB, Cook DJ (eds). Springer: London, 2005; Chen Y, Li Y, Cheng X, Guo L. Survey and taxonomy of feature selection algorithms in intrusion detection system. Proceedings of Second SKLOIS Conference on Information Security and Cryptology, Beijing, China, 2006; Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence 1997; 97: Li Y, Wang JL, Tian ZH, Lu TB, Young C. Building lightweight intrusion detection system using wrapperbased feature selection mechanisms. Computers & Security 2009; 28: Amiri F, Yousefi MMR, Lucas C, Shakery A, Yazdani N. Mutual information-based feature selection for intrusion detection systems. Journal of Network and Computer Applications 2011; 34(4): Sindhu SSS, Geetha S, Kannan A. Decision tree based light weight intrusion detection using a wrapper approach. Expert Systems with Applications 2012; 39: Mukherjee S, Sharma N. Intrusion detection using naive Bayes classifier with feature reduction. Procedia Technology 2012; 4: Li Y, Xia J, Zhang S, Yan J, Ai X, Dai K. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications 2012; 39: Lin SW, Ying KC, Lee CY, Lee ZJ. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Applied Soft Computing 2012; 12: Ahmad I, Hussain M, Alghamdi A, Alelaiwi A. Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components. Neural Computing and Applications 2014; 24: Portnoy L, Eskin E, Stolfo SJ. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, Horng SJ, Su MY, Chen YH, et al. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications 2011; 38: Tsai CF, Lin CY. A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognition 2010; 43: Louvieris P, Clewley N, Liu X. Effects-based feature identification for network intrusion detection. Neurocomputing 2013; 121: Dy JG, Brodley CE. Feature selection for unsupervised learning. Journal of Machine Learning Research 2004; 5: Hruschka ER, Covões TF. Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. IEEE International Conference on Computational Intelligence for Modelling, Control and Automation, Vienna, 2005; Santos JM, Ramos S. Using a clustering similarity measure for feature selection in high dimensional data sets. IEEE 10th International Conference on Intelligent Systems Design and Applications, Cairo, 2010; Jiang SY, Wang LX. An unsupervised feature selection framework based on clustering. International Workshops on New Frontiers in Applied Data Mining, Shenzhen, China, 2011; Béjar J. External validity indices for unsupervised feature selection. Artificial Intelligence Research and Development 2013; 256: Zaki MJ, Meira W Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. New York: Cambridge University Press, 2014; Santibáñez M, Valdovinos RM, Trueba A, Rón E. Applicability of cluster validation indexes for large data sets. 12th Mexican International Conference on Artificial Intelligence, Mexico City, 2013; Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3467

10 LAWRA S. Bhattacharya and S. Selvakumar 25. Nurmi H. Game theory and power indices. Journal of Economics 1980; 40(1 2): Tavallaee M, Bagheri E, Lu W, Ghorbani A-A. A detailed analysis of the KDD CUP 99 data set. Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications, Ottawa, ON, 2009; Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations 2009; 11(1): Eid HF, Salama MA, Hassanien AE, Kim T. Bi-layer behavioral-based feature selection approach for network intrusion classification. Proceedings of International Conference on Security Technology, Jeju Island, Korea, 2011; Eid HF, Hassanien AE, Kim T, Banerjee S. Linear correlation-based feature selection for network intrusion detection model. Proceedings of 1st International Conference on Advances in Security of Information and Communication Networks, Cairo, Egypt, 2013; Cohen S, Dror G, Ruppin E. Feature selection via coalitional game theory. Neural Computation 2007; 19: Liu J, Wang G. A hybrid feature selection method for data sets of thousands of variables. IEEE 2nd International Conference on Advanced Computer Control (Volume: 2), Shenyang, 2010; Sun X, Liu Y, Li J, Zhu J, Liu X, Chen H. Using cooperative game theory to optimize the feature selection problem. Neurocomputing 2012; 97: Sun X, Liu Y, Li J, Zhu J, Chen H, Liu X. Feature evaluation and selection with cooperative game theory. Pattern Recognition 2012; 45: Martínez-Murcia FJ, Górriz JM, Ramírez J, Puntonet CG, Salas-González D. Computer aided diagnosis tool for Alzheimer s disease based on Mann Whitney- Wilcoxon U-test. Expert Systems with Applications 2012; 39: Corder GW, Foreman DI. Nonparametric Statistics for Non-Statisticians: A Step-by-step Approach. Wiley- Blackwell: New Jersey, Cover TM, Thomas JA. Elements of Information Theory (2nd edn). John Wiley & Sons, Inc.: Hoboken, NJ, USA, Banzhaf JF. Weighted voting doesn t paper a mathematical analysis. Rutgers Law Review 1965; 19: McHugh J. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security 2000; 3(4): Mahoney MV, Chan PK. An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. Proceedings of 6th International Symposium on Recent Advances in Intrusion Detection, Pittsburgh, USA, 2003; The NSL-KDD Data Set, KDD/ (Accessed on 1 June 2014) 41. Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowledge and Information Systems 2008; 14(1): Fayyad UM, Irani KB. Multi-interval discretization of continuous-valued features for classification learning. Proceedings of the Thirteenth International Conference on Artificial Intelligence, Morgan Kaufmann, 1993; Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques (3rd edn). Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

Analysis of Feature Selection Techniques: A Data Mining Approach

Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature