LAWRA: a layered wrapper feature selection approach for network attack detection

Size: px
Start display at page:

Download "LAWRA: a layered wrapper feature selection approach for network attack detection"

Transcription

1 SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8: Published online 26 May 2015 in Wiley Online Library (wileyonlinelibrary.com) RESEARCH ARTICLE LAWRA: a layered wrapper feature selection approach for network attack detection Sangeeta Bhattacharya* and Subramanian Selvakumar Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India ABSTRACT The feature selection phase in network attack detection is mostly classifier based, while clustering techniques are used for labeling and creating compact training datasets. Because clustering finds natural groupings in the data, in this paper, a clustering-based layered wrapper feature selection approach, LAWRA, has been proposed for selecting appropriate features for attack detection. The existing layered feature selection approaches in attack detection are unable to give results with high precision and recall because of the depence on classifier accuracy, fitness value, and so on. Hence, in this paper, LAWRA uses external cluster validity indices, F-measure, and Fowlkes Mallows index, for feature selection. The two indices are the harmonic and geometric mean of precision and recall, respectively. Each index identifies features that give high precision and high recall of the attack detection algorithm. The first layer of LAWRA identifies the feature subset that best distinguishes between normal and attack instances and the second layer identifies the best cooperating features using cooperative game theory. Experiments have been conducted on NSL-KDD dataset, and LAWRA has been compared with the existing approaches using different classifiers. The results show that LAWRA gives better overall accuracy and F-measure value than the other approaches. Copyright 2015 John Wiley & Sons, Ltd. KEYWORDS network attack; feature selection; layered; wrapper; external cluster validity indices *Correspondence Sangeeta Bhattacharya, Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India @nitt.edu 1. INTRODUCTION Internet is very important in connecting people throughout the world. Today, its usage has increased 10-fold from 1999 to 2013 [1]. The increased internet usage has given rise to different types of network attacks and cybercrimes. It can be seen from [2] that round the clock, cyber attacks are being carried out from different countries. Such cybercrime accounts for almost 400bn dollar annual cost to global economy [3], which demands for efficient detection mechanisms. It is shown that the efficiency of the detection algorithms in terms of computational complexity and accuracy deps highly on the selected feature set [4]. Thus, selecting appropriate features is one of the important data-preprocessing steps in attack detection. Feature selection can be achieved using filter and wrapper methods [5]. While filter methods evaluate each feature indepently based on some goodness criteria, wrapper methods [6] evaluate the possible feature subsets with respect to a particular learning algorithm. As such, wrapper methods are computationally expensive but results in a better feature subset. Feature selection can be supervised where features are selected by checking the feature relevancy with respect to the class. In unsupervised feature selection methods, class labels are not available, and feature relevancy is calculated using data variance and partition quality. In intrusion and attack detection paradigm, feature selection has been mostly supervised [7 13]. Also, it is seen that the feature selection approaches are based on a classifier performance, thus biasing the feature selection phase towards the particular classifier. Such feature selection approaches when used with other classifiers result in high false alarms. Unsupervised clustering-based feature selection algorithms have not been much explored in attack detection. Clustering has been mostly applied for detecting intrusions and attacks, if any [14,15], and in the datapreprocessing stage for obtaining labels of the unlabeled dataset and creating compact dataset [11,16,17]. Feature selection using clustering can be achieved by internal or external validity indices [18 22]. While internal indices use intracluster similarity to evaluate a clustering Copyright 2015 John Wiley & Sons, Ltd. 3459

2 LAWRA S. Bhattacharya and S. Selvakumar result, external validity indices are used for validating the clustering result against a ground truth such as a reference cluster or the existing label of the dataset [23]. The main aim of this paper is to identify features for attack detection from the available labeled attack datasets. Hence, in this paper, a clustering-based layered wrapper feature selection approach, LAWRA, is proposed using two external validity indices, F-measure (FM) and Fowlkes Mallows index (FMI). A layered approach helps in identifying features possessing various properties such as relevancy with the class and cooperativeness among the features, so as to improve the efficiency of the classifiers. F-measure and FMI are the harmonic and geometric mean of precision and recall, respectively. Precision is the proportion of attacks among all the attacks detected by the detection algorithm, and recall is the proportion of attack instances that were correctly identified. Hence, feature set identified using FM and FMI increases the efficiency of a detection algorithm in detecting attacks. Also, it is shown in [24] that FM and FMI are good measures for evaluating the clustering result for large datasets. The first layer of the proposed approach, LAWRA, helps in identifying the feature subset with the best distinguishing ability between normal and attack instances, while the second layer identifies the features with good coalition ability with other features using a cooperative game theory framework [25]. The clustering algorithm that is used for feature selection in the proposed approach is the one that produces mutually exclusive clusters where each instance belongs to a single cluster. In this paper, NSL-KDD dataset [26] has been used for the experiments to analyze the performance of the proposed approach. WEKA tool [27] and MATLAB R2012a (The MathWorks, Inc. Natick, Massachusetts USA) have been used to carry out all the experiments. The rest of the paper is organized as follows: In Section 2, a review of the related papers is given. Section 3 gives the motivation for this research paper. Section 4 describes the proposed model and the proposed algorithms. Section 5 discusses the results of the experiments performed. Finally, Section 6 presents the conclusion and the ongoing work. 2. RELATED WORK 2.1. Layered feature selection approach in attack detection In [28], a bilayer feature selection approach was used where the two layers select features based on global maxima and local maxima of classification accuracy, thus making the feature selection classifier depent. In [29] also, two layers were used to select features based on Pearson correlation between the features and the class. A fitness value was used to rank the features. The drawback of this scheme is that it is difficult to determine the suitable fitness value Cluster validity indices for feature selection Unsupervised feature selection can be categorized into two approaches: feature clustering and data clustering. In this paper, the second approach is followed. In [18], a wrapper unsupervised feature-clustering approach has been proposed where two internal validity indices, viz. criteria scatter separability and maximum likelihood, have been used for evaluating the candidate feature subsets, while in [19], internal validity index Silhouette criterion has been used for selecting the feature subset. However, for complex clusters, internal indices do not correlate well with the algorithm error. Adjusted Rand index has been used in [20] for ranking the features and selecting the required feature. Deciding the number of features from rank of features may be difficult. In [21], features were ranked according to cluster importance on features, calculated using different decision tree-based measures using single-pass clustering algorithm. However, to identify the feature subset according to feature importance, a rapid changing point need to be determined, which is difficult. In [22], a generalized feature selection framework was defined using greedy forward selection, and clusters were evaluated using three different validity indices, adjusted Rand, Jaccard, and Fowlkes Mallows. A threshold was used to determine the feature subset. Determining an appropriate value of the threshold is difficult Cooperative game theory for feature selection In [30] and [31], a wrapper approach was proposed where features were ranked according to the Shapley value of each feature. The Shapley value was based on the classification accuracy. The feature ranking generated using classification accuracy makes it depent on the classifier. In [32] and [33], mutual information was used to define the relevancy of the features with the class and redundancy among the features. While in [32], the features were ranked according to Shapley value of each feature, in [33], features were ranked by calculating the Banzhaf power of each feature. 3. MOTIVATION Clustering-based feature selection identifies the features that produce natural partitions of the data and thus enables detection of new attacks. The existing feature selection approaches in attack detection are mostly classifier depent, which when used with other classifiers result in high false alarms. This motivated us to propose a classifier-indepent clustering-based feature selection in this paper to overcome the drawback in the existing approaches and to improve the performance of the attack detection algorithms Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

3 S. Bhattacharya and S. Selvakumar LAWRA 4. LAWRA: A PROPOSED LAYERED WRAPPER FEATURE SELECTION APPROACH The proposed feature selection approach, LAWRA, is shown in Figure 1. The three main modules in the proposed approach are Feature Ranker, Best Feature Identifier (layer 1), and Best Feature Correlator (layer 2). These modules are described as follows: 4.1. Feature Ranker module This module takes as input the original feature set S and returns the ranked feature set S. In this paper, nonparametric method is used for generating the rank of the features. Nonparametric methods have been used in [34] for feature ranking. As the underlying distribution of the network data is unknown and in this paper, only two groups, normal and attack, have been considered, the nonparametric method Mann Whitney U-test is best suited. A parameter called effect size [35] is used to rank the features. Effect size ES gives the degree of association between the two groups and is calculated as given in Equation (1): p ES ¼ jz* j= ffiffiffi n where z * score is the z-score, which indicates whether there exists a significant difference between the two groups, and n is the sum of the sample sizes of the two groups. The effect size is between 0 and 1, and according to [35], effect size of 0.10 is small, 0.30 is middle, and 0.50 is large. Thus, the features of S are ranked according to high effect size to low effect size and stored in the set S Best Feature Identifier (layer 1) module There are two submodules in this layer, one is F-measurebased feature selection (FFS) and the other is conditional mutual information (CMI)-based feature selection (CMIFS). The two submodules work in parallel and take ranked feature set S as input. (1) Figure 1. LAWRA: proposed layered wrapper feature selection approach. CMI, conditional mutual information. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3461

4 LAWRA S. Bhattacharya and S. Selvakumar F-measure-based feature selection. In this submodule, Algorithm FFS is used to generate the feature subset F FM. For Algorithm FFS, external cluster validity criterion FM is used to evaluate each feature. FM is defined as follows: Given a cluster i and class j, FM is the harmonic mean of the precision P and recall R of cluster i as given in Equation (2), F-measure ðfmþ ¼ β2 þ 1 PR β 2 S (2) PR where β =1,P = the number of data points that are common in cluster i and class j divided by the number of data points in cluster i, andr = the number of data points that are common in cluster i and class j divided by the number of data points in class j [23]. Forward selection search is used in Algorithm FFS to add features to the set F FM from the set S. A feature f is relevant to set F FM if FM(S, f, L D ) > FM(S, L D ) and redundant if FM(S, f, L D )=FM(S, L D ). L D is the existing class labels of the dataset, which is used for calculating the FM. Algorithm FFS (F-Measure Based Feature Selection) Input: i. Dataset D ii. Ranked Feature Set S iii. Class labels of Dataset L D Output: Feature set F FM Steps: 1. Initialize F FM ={} 2. Initialize BestMeasure = 0 3. for each feature f i of S where i =1,, ndo a) F I = F FM f i b) CurrCluster = Cluster(Dataset, F I ) c) CurrMeasure = FM(CurrCluster, L D ) d) if CurrMeasure > BestMeasure then i) F FM = F I ii) BestMeasure = CurrMeasure CMI-based feature selection. In this submodule, Algorithm CMIFS is used to generate the feature subset F CMI. In Algorithm CMIFS, CMI has been used to generate the feature set F CMI. The CMI of two random variables X and Y given another random variable Z is computed using Equation (3), IX; ð Yj Þ ¼ HXjZ ð Þ HXjY; ð ZÞ (3) where H(X Z) = the conditional entropy of X given Z and H(X Y,Z) = the conditional entropy of X given both Y and Z [36]. If f is the feature to be considered for inclusion in the set F CMI, such that f F CMI, and L D is the class labels of the given dataset, then the mutual information of set F CMI with class L D should be maximized in the presence of f. Thus, in this paper, the CMI I(F CMI ; L D f) is calculated in order to evaluate the impact of the newly added feature f on the information shared by already selected feature set F CMI with the class L D. Hence, f is relevant to set F CMI if CMI(S, f, L D ) > CMI(S, L D ) and redundant if CMI(S, f, L D )=CMI (S, L D ). In this algorithm also, features are added to the set F CMI from the set S using forward selection search. Algorithm CMIFS (Conditional Mutual Information Based Feature Selection) Input: i) Dataset D ii) Ranked Feature Set S iii) Class labels of Dataset L D Output: Feature set F CMI Steps: 1. Initialize F CMI ={f 1 : f 1 S } 2. BestMeasure = MutualInformation(F CMI, L D ) 3. for each feature f i of S where i =2,, ndo a) CurrMeasure = CMI(F CMI, L D, f i ) b) if CurrMeasure > BestMeasure then i) F CMI = F CMI f i ii) BestMeasure = CurrMeasure 4.3. Best Feature Correlator (layer 2) module There are two submodules in this layer, one is Fowlkes Mallows-based Banzhaf index calculation and the other is selection of k features. The input to this module is the feature set F diff as shown in Figure 1 and the output is the top k cooperating features Fowlkes Mallows-based Banzhaf index calculation. In this submodule, external cluster validity index FMI proposed in a cooperative game theory framework has been used. This submodule takes the feature set F diff as input, which is calculated using Equation (4), F diff ¼ ðf FM F CMI Þ ðf CMI F FM Þ (4) Cooperative game theory. In cooperative game theory, there are two elements: (i) a set of n players and (ii) a characteristics function v(s) to specify the value of different subsets of n players. Each subset of n players is a coalition in which players come to an agreement. v(s) is zero for an empty coalition, while for each nonempty coalitions, v(s) R is a scalar value. The contribution of each player to the coalition 3462 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

5 S. Bhattacharya and S. Selvakumar LAWRA can be found by the use of a value function, which associates a value to each n player. The value of each n player in the coalition can be found out by using Banzhaf power index [37]. A critical player is one whose leaving a winning coalition turns the coalition into a losing one. The power of the player is proportional to the number of times the player has been identified as critical. In this work, each of the features of F diff is the players. The Banzhaf power index of each feature in the set F diff is calculated using Algorithm FMBIC. The Banzhaf power index determines how many times a feature has been critical in making a coalition win and hence indicates the cooperativeness of the feature with the different feature coalitions. Thus, a high Banzhaf power index of a feature indicates that the feature is cooperative with most of the feature coalitions and hence with most of the features. Therefore, the higher the Banzhaf power index of a set of features, the better is the detection algorithm (classifier) result Algorithm FMBIC. FMI is defined as follows: Given a cluster i and class j, FMI index is the geometric mean of the precision P and recall R of cluster i as given in Equation (5), p FMI ¼ ffiffiffiffiffiffiffiffi PR (5) where P = the number of data points that are common in cluster i and class j divided by the number of data points in cluster i and R = the number of data points that are common in cluster i and class j divided by the number of data points in class j [23]. In Algorithm FMBIC, to determine the winning or losing status of a coalition, FMI has been used. FMI is calculated for the original feature set F diff. Then, a winning coalition Co of any size is found when it satisfies Equation (6), FMIðCoÞ > FMI F diff For a winning coalition, the power of each feature f is calculated as follows: If FMI(Co f) < FMI(F diff ), then the feature f is considered as a critical player and power of f is increased by 1. The computational complexity of Algorithm FMBIC is proportional to the number of evaluated feature subsets of F diff. Hence, in order to reduce the computational complexity, the subset size of each feature is limited by a parameter ω as in [32,33]. Although for different datasets, ω may differ, experiments in [32] and [33] have shown that ω =3 gives good performance for Support Vector Machine (SVM) classifier for most of the datasets. Hence, in this work, ω = 3 has been used. The output of this submodule is the Banzhaf power index vector BV of the set F diff. In this algorithm also, L D is the existing class labels of the dataset, which is used for calculating the FMI. (6) Algorithm FMBIC (Fowlkes Mallows based Banzhaf Index Calculation) Input: i) Dataset D ii) Feature Set F iii) Class labels of Dataset L D Output: BV: Banzhaf Power Index Vector of the set F Steps: 1. CurrCluster = Cluster(Dataset, F) 2. BestMeasure = FMI(CurrCluster, L D ) 3. for each feature f i of F where i =1,, ndo BI(i)=0 4. for each feature f i of F where i =1,, ndo a) Create feature subset C = {c 1,,c t } over F\ f i limited by ω; b) for each subset c j of C where j =1,, tdo i) CurrCluster = Cluster(Dataset, c j ) ii) CurrMeasure = FMI (CurrCluster, L D ) iii) if CurrMeasure > BestMeasure then a) F I = c j f i b) ICluster = Cluster(Dataset, F I ) c) IMeasure = FMI(ICluster, L D ) d) if IMeasure < BestMeasure then Increment BI(i) by one Selection of k features. This submodule takes the Banzhaf power index vector BV as input. The process for selecting the value of k is as follows: (i) The feature set F diff is arranged according to the increasing order of BV. (ii) Each feature of F diff is added to the set F sequentially, and the corresponding FM of the set F is calculated. (iii) The position of feature f, which gives the highest FM value for set F, is the value of k Combine The final feature subset F F is the combination of the feature set F c and the k features from layer EXPERIMENTS AND RESULTS In this section, the dataset used for experiments, experimental setup, preprocessing, and the experimental results have been discussed. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3463

6 LAWRA S. Bhattacharya and S. Selvakumar Table I. Features in NSL-KDD dataset. Sl. no. Feature name Type Category Sl. no. Feature name Type Category 1. duration Cont. Basic 22. is_guest_login Binary Content 2. protocol_type Discrete Basic 23. count Cont. Time-based 3. service Discrete Basic 24. srv_count Cont. Time-based 4. Flag Cont. Basic 25. serror_rate Cont. Time-based 5. src_bytes Cont. Basic 26. srv_serror_rate Cont. Time-based 6. dst_bytes Cont. Basic 27. rerror_rate Cont. Time-based 7. land Cont. Basic 28. srv_rerror_rate Cont. Time-based 8. wrong_fragment Cont. Basic 29. same_srv_rate Cont. Time-based 9. urgent Cont. Basic 30. diff_srv_rate Cont. Time-based 10. hot Cont. Content 31. srv_diff_host_rate Cont. Time-based 11. num_failed_logins Cont. Content 32. dst_host_count Cont. Connection-based 12. logged_in Binary Content 33. dst_host_srv_count Cont. Connection-based 13. num_compromised Cont. Content 34. dst_host_same_srv_rate Cont. Connection-based 14. root_shell Binary Content 35. dst_host_diff_srv_rate Cont. Connection-based 15. su_attempted Binary Content 36. dst_host_same_src_port_rate Cont. Connection-based 16. num_root Cont. Content 37. dst_host_srv_diff_host_rate Cont. Connection-based 17. num_file_creations Cont. Content 38. dst_host_serror_rate Cont. Connection-based 18. num_shells Cont. Content 39. dst_host_srv_serror_rate Cont. Connection-based 19. num_access_files Cont. Content 40. dst_host_rerror_rate Cont. Connection-based 20. num_outbound_cmds Cont. Content 41. dst_host_srv_rerror_rate Cont. Connection-based 21. is_hot_login Binary Content 5.1. Dataset used The dataset used for the experimental purpose in this paper is NSL-KDD. This dataset was created from KDD Cup 99 intrusion detection dataset. KDD Cup 99 has some wellknown problem as found in [38] and [39]. One more problem is the presence of duplicate records. In NSL-KDD, such duplicate records have been removed and validated using seven learning algorithms. In this paper, the training dataset KDDTrain+ and KDDTest+, given in [40], which consisted of and records, respectively, have been used as the training and test data for the experimental purpose. The datasets consisted of 41 features as shown in Table I, which are classified into two classes, normal and anomaly. The anomaly class represented the attack class. As the main aim is to identify the features capable of detecting normal and attack network data, the binary class dataset was preferred for this work Experimental setup MATLAB R2012a has been used for implementing the proposed feature selection approach. For the feature selection phase, KDDTrain+ dataset has only been used. WEKA has been used for verifying the obtained result using four different classifiers, naive Bayes, SVM (LibSVM in WEKA), knn (IBK in WEKA), and AdaBoost. The classifiers have been chosen according to [41]. KDDTrain+ has been used for training the classifiers, and KDDTest+ has been used for testing the performance of the trained models. Tenfold crossvalidation has been used for training the classifiers, which resulted in the dataset with 10 equal partitions. At each of the 10 folds, 9 data partitions were used for training and 1 data partition was used for validating the model. The data partition used for validation was different for each of the 10 folds Preprocessing Before using the proposed approach on KDDTrain+ dataset for feature selection, the numeric features have been normalized in the range [0, 1]. The nominal and the binary features have been kept unchanged. Before training the classifiers, the numeric features of both the training and test dataset have been discretized using a static supervised approach using minimum description length criterion [42]. Simple k-means has been used as the clustering algorithm. The reason for using simple k-means is that it is easy to use and also it assigns each instance to separate clusters. Further, simple k-means can handle numeric features. Hence, for the algorithms FFS and FMBIC, only 38 numeric and binary features have been considered. But, for Algorithm CMIFS, the discretized dataset has been used. As such, all the 41 features have been considered Experimental results The class labels of dataset L D in Algorithms FFS, CMIFS, and FMBIC are the classes, normal and anomaly, of NSL- KDD dataset Results of the proposed approach, LAWRA. In Table I, it can be seen that feature numbers 2, 3, and 4 are nominal in type. Hence, these three features have not been considered for Algorithms FFS and FMBIC, which use simple k-means, but have been considered for 3464 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

7 S. Bhattacharya and S. Selvakumar LAWRA (a) F-Measure (b) Overall Accuracy of four classifiers Figure 2. Effect of the features in F diff. (a) F-measure and (b) overall accuracy of four classifiers. Algorithm CMIFS. Hence, the nominal features selected by Algorithm CMIFS were retained in the final feature set. In order to determine the value of k, FM of the 20 features in the set F diff is calculated and plotted in Figure 2a. The process starts with the feature having the highest Banzhaf power index and then incrementally adding a feature from F diff from the highest to the lowest Banzhaf power index. At the same time, the overall accuracy of the four classifiers, naive Bayes, SVM, knn, and AdaBoost, has been plotted for the set F C and then a feature from F diff from the highest to the lowest Banzhaf power index in Figure 2b is incrementally added. The overall accuracy of the classifiers is obtained from the 10-fold cross-validation of the training set KDDTrain+. In Figure 2, each step in the x-axis corresponds to the addition of a feature from set F diff. For instance, step 12 in Figure 2a represents the inclusion of 12 features with the highest Banzhaf power index. It can be seen from Figure 2a that FM reaches the highest at the 15th step. This corresponds to the first 15 features with the highest Banzhaf power index. Hence, in this work, k = 15 is chosen. From Figure 2b, it is observed that the classifiers overall accuracy converges at the 13th step, which corresponds to the first 13 features having the highest Banzhaf power index. The classifiers accuracy remains steady for the remaining steps. These classification results agree with the feature set identified by LAWRA and hence validates the selection of the first 15 features with high Banzhaf power index for k. The final feature set returned by LAWRA is {1, 3, 5, 6, 8, 10, 11, 12, 13, 17, 19, 27, 29, 34, 38, 39}, of size Comparison with the existing approaches. The performance of the feature set obtained using LAWRA is compared with four other approaches. The approaches are as follows: (i) linear correlation-based approach [29]; (ii) Feature Vitality Based Reduction Method (FVBRM) algorithm [10] feature set selected using naive Bayes classifier; (iii) CfsSubsetEval algorithm [43] evaluates the worth of a subset of features by considering the individual predictive ability of each feature along with the degree of redundancy between them; and (iv) ConsistencySubsetEval algorithm [43] evaluates the worth of a subset of features by the level of consistency in the class values when the training instances are projected onto the subset of features. The search algorithm used for (iii) and (iv) was BestFirst, which searches the space of feature subsets by greedy hill climbing augmented with a backtracking Table II. Overall accuracy and F-measure of different feature selection approaches. Feature selection approaches Classifiers Naive Bayes SVM knn AdaBoost OA FM OA FM OA FM OA FM Number of features selected Nil LC FVBRM CfS + BF ConSub + BF LAWRA (proposed approach) OA, overall accuracy; LC, linear correlation-based; FM, F-measure; CfS + BF, CfsSubsetEval + BestFirst; ConSub + BF, ConsistencySubsetEval + BestFirst. The values were set in bold in Table II to highlight the result that the overall accuracy (OA) and F-Measure (FM) of the proposed approach LAWRA is better than the other approaches. Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3465

8 LAWRA S. Bhattacharya and S. Selvakumar (a) Naive Bayes (b) SVM (c) knn (d) AdaBoost Figure 3. Receiver operating characteristic (ROC) curve for the class attack. facility. The direction used for searching the feature set is forward [43] Accuracy and F-measure comparison. The performance metrics chosen for comparison are as follows: jtpjþjtnj jtpjþjfnjþjtnjþjfpj jtpjþjfnjþjfpj i Overall accuracy = jtpj ii FM = where TP = true positive, TN = true negative, FP = false positive, and FN = false negative represent the number of attack instances correctly classified, normal instances correctly classified, normal instances incorrectly identified as attack, and attack instances incorrectly identified as normal by the classifiers, respectively. Table II shows the comparison between the overall accuracy and FM of LAWRA with all 41 features and other approaches for the four classifiers. The overall accuracy was obtained by training the classifiers with KDDTrain+ dataset using 10-fold cross-validation and then testing with KDDTest+ dataset. The number of features selected by each approach is shown in the last column of Table II. From Table II, it can be seen that the performance of LAWRA is better than that of the other approaches for most of the classifiers both for the overall accuracy and FM. The only exception is the SVM performance of the approach FVBRM, which is better than LAWRA. For the two classifiers knn and AdaBoost, performance of the original 41 features is slightly better than that of LAWRA. However, the size of the feature set selected by LAWRA is only 16, which will reduce the training and testing time compared with 41 features Receiver operating characteristic (ROC) curve. Figure 3 shows the ROC curve (true positive rate versus false positive rate) of the four classifiers for the class attack, for the different feature selection approaches. The notations are the same as used in Table II. It can be seen from Figure 3 that for the class attack, LAWRA is better than all the approaches in naive Bayes. For SVM and knn, LAWRA has a higher true positive rate and lower false positive rate than the other approaches initially but not so good afterwards. For AdaBoost, LAWRA and FVBRM have almost the same performance, which is better than the other approaches. 6. CONCLUSION In this paper, a layered wrapper approach of feature selection, LAWRA, has been proposed for attack detection. LAWRA used two external cluster validity indices, FM and Fowlkes Mallows, which are based on precision and recall. The first layer of LAWRA identified the feature subsets that produced good distinction between normal and 3466 Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

9 S. Bhattacharya and S. Selvakumar LAWRA attack data. The second layer utilized the first layer information to determine the features with good coalition capability with other features using cooperative game theory framework. LAWRA was compared with some of the existing feature selection approaches used in attack detection. Four classifiers were used for evaluating the performance of the feature selection approaches on NSL- KDD dataset. The experimental results show that LAWRA performs better than the other approaches both in terms of overall accuracy and FM. Also, the ROC curves show that LAWRA has a higher true positive and lower false positive rate for the classifiers. The ongoing work is the design of an attack detection algorithm by using LAWRA. REFERENCES 1. (Accessed on 1 December 2014) 2. (Accessed on 1 December 2014) 3. Net Losses: Estimating the Global Cost of Cybercrime, Economic impact of cybercrime II, Center for Strategic and International Studies, June 2014, McAFee. 4. Mukkamala S, Sung AH. Significant feature selection using computational intelligent techniques for intrusion detection. In Advanced Methods for Knowledge Discovery from Complex, Data Maulik U, Holder LB, Cook DJ (eds). Springer: London, 2005; Chen Y, Li Y, Cheng X, Guo L. Survey and taxonomy of feature selection algorithms in intrusion detection system. Proceedings of Second SKLOIS Conference on Information Security and Cryptology, Beijing, China, 2006; Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence 1997; 97: Li Y, Wang JL, Tian ZH, Lu TB, Young C. Building lightweight intrusion detection system using wrapperbased feature selection mechanisms. Computers & Security 2009; 28: Amiri F, Yousefi MMR, Lucas C, Shakery A, Yazdani N. Mutual information-based feature selection for intrusion detection systems. Journal of Network and Computer Applications 2011; 34(4): Sindhu SSS, Geetha S, Kannan A. Decision tree based light weight intrusion detection using a wrapper approach. Expert Systems with Applications 2012; 39: Mukherjee S, Sharma N. Intrusion detection using naive Bayes classifier with feature reduction. Procedia Technology 2012; 4: Li Y, Xia J, Zhang S, Yan J, Ai X, Dai K. An efficient intrusion detection system based on support vector machines and gradually feature removal method. Expert Systems with Applications 2012; 39: Lin SW, Ying KC, Lee CY, Lee ZJ. An intelligent algorithm with feature selection and decision rules applied to anomaly intrusion detection. Applied Soft Computing 2012; 12: Ahmad I, Hussain M, Alghamdi A, Alelaiwi A. Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components. Neural Computing and Applications 2014; 24: Portnoy L, Eskin E, Stolfo SJ. Intrusion detection with unlabeled data using clustering. Proceedings of ACM CSS Workshop on Data Mining Applied to Security, Horng SJ, Su MY, Chen YH, et al. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications 2011; 38: Tsai CF, Lin CY. A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognition 2010; 43: Louvieris P, Clewley N, Liu X. Effects-based feature identification for network intrusion detection. Neurocomputing 2013; 121: Dy JG, Brodley CE. Feature selection for unsupervised learning. Journal of Machine Learning Research 2004; 5: Hruschka ER, Covões TF. Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. IEEE International Conference on Computational Intelligence for Modelling, Control and Automation, Vienna, 2005; Santos JM, Ramos S. Using a clustering similarity measure for feature selection in high dimensional data sets. IEEE 10th International Conference on Intelligent Systems Design and Applications, Cairo, 2010; Jiang SY, Wang LX. An unsupervised feature selection framework based on clustering. International Workshops on New Frontiers in Applied Data Mining, Shenzhen, China, 2011; Béjar J. External validity indices for unsupervised feature selection. Artificial Intelligence Research and Development 2013; 256: Zaki MJ, Meira W Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. New York: Cambridge University Press, 2014; Santibáñez M, Valdovinos RM, Trueba A, Rón E. Applicability of cluster validation indexes for large data sets. 12th Mexican International Conference on Artificial Intelligence, Mexico City, 2013; Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd. 3467

10 LAWRA S. Bhattacharya and S. Selvakumar 25. Nurmi H. Game theory and power indices. Journal of Economics 1980; 40(1 2): Tavallaee M, Bagheri E, Lu W, Ghorbani A-A. A detailed analysis of the KDD CUP 99 data set. Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications, Ottawa, ON, 2009; Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations 2009; 11(1): Eid HF, Salama MA, Hassanien AE, Kim T. Bi-layer behavioral-based feature selection approach for network intrusion classification. Proceedings of International Conference on Security Technology, Jeju Island, Korea, 2011; Eid HF, Hassanien AE, Kim T, Banerjee S. Linear correlation-based feature selection for network intrusion detection model. Proceedings of 1st International Conference on Advances in Security of Information and Communication Networks, Cairo, Egypt, 2013; Cohen S, Dror G, Ruppin E. Feature selection via coalitional game theory. Neural Computation 2007; 19: Liu J, Wang G. A hybrid feature selection method for data sets of thousands of variables. IEEE 2nd International Conference on Advanced Computer Control (Volume: 2), Shenyang, 2010; Sun X, Liu Y, Li J, Zhu J, Liu X, Chen H. Using cooperative game theory to optimize the feature selection problem. Neurocomputing 2012; 97: Sun X, Liu Y, Li J, Zhu J, Chen H, Liu X. Feature evaluation and selection with cooperative game theory. Pattern Recognition 2012; 45: Martínez-Murcia FJ, Górriz JM, Ramírez J, Puntonet CG, Salas-González D. Computer aided diagnosis tool for Alzheimer s disease based on Mann Whitney- Wilcoxon U-test. Expert Systems with Applications 2012; 39: Corder GW, Foreman DI. Nonparametric Statistics for Non-Statisticians: A Step-by-step Approach. Wiley- Blackwell: New Jersey, Cover TM, Thomas JA. Elements of Information Theory (2nd edn). John Wiley & Sons, Inc.: Hoboken, NJ, USA, Banzhaf JF. Weighted voting doesn t paper a mathematical analysis. Rutgers Law Review 1965; 19: McHugh J. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security 2000; 3(4): Mahoney MV, Chan PK. An analysis of the 1999 DARPA/Lincoln Laboratory evaluation data for network anomaly detection. Proceedings of 6th International Symposium on Recent Advances in Intrusion Detection, Pittsburgh, USA, 2003; The NSL-KDD Data Set, KDD/ (Accessed on 1 June 2014) 41. Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowledge and Information Systems 2008; 14(1): Fayyad UM, Irani KB. Multi-interval discretization of continuous-valued features for classification learning. Proceedings of the Thirteenth International Conference on Artificial Intelligence, Morgan Kaufmann, 1993; Witten IH, Frank E, Hall MA. Data Mining: Practical Machine Learning Tools and Techniques (3rd edn). Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, Security Comm. Networks 2015; 8: John Wiley & Sons, Ltd.

Analysis of Feature Selection Techniques: A Data Mining Approach

Analysis of Feature Selection Techniques: A Data Mining Approach Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature

More information

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET An IDS monitors the network bustle through incoming and outgoing data to assess the conduct of data

More information

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities Eren Golge FRAUD? HACKERS!! DoS: Denial of service R2L: Unauth. Access U2R: Root access to Local Machine. Probing: Survallience....

More information

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 5 (Nov. - Dec. 2013), PP 107-112 Intrusion Detection System Based on K-Star Classifier and Feature

More information

Network attack analysis via k-means clustering

Network attack analysis via k-means clustering Network attack analysis via k-means clustering - By Team Cinderella Chandni Pakalapati cp6023@rit.edu Priyanka Samanta ps7723@rit.edu Dept. of Computer Science CONTENTS Recap of project overview Analysis

More information

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection An Efficient Decision Tree Model for Classification of Attacks with Feature Selection Akhilesh Kumar Shrivas Research Scholar, CVRU, Bilaspur (C.G.), India S. K. Singhai Govt. Engineering College Bilaspur

More information

Detection of DDoS Attack on the Client Side Using Support Vector Machine

Detection of DDoS Attack on the Client Side Using Support Vector Machine Detection of DDoS Attack on the Client Side Using Support Vector Machine Donghoon Kim * and Ki Young Lee** *Department of Information and Telecommunication Engineering, Incheon National University, Incheon,

More information

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown CHAPTER V KDD CUP 99 DATASET With the widespread use of computer networks, the number of attacks has grown extensively, and many new hacking tools and intrusive methods have appeared. Using an intrusion

More information

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. IV (May - June 2017), PP 48-52 www.iosrjournals.org Classification Trees with Logistic Regression

More information

Experiments with Applying Artificial Immune System in Network Attack Detection

Experiments with Applying Artificial Immune System in Network Attack Detection Kennesaw State University DigitalCommons@Kennesaw State University KSU Proceedings on Cybersecurity Education, Research and Practice 2017 KSU Conference on Cybersecurity Education, Research and Practice

More information

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics Abhishek choudhary 1, Swati Sharma 2, Pooja

More information

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION 55 CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION In this work, an intelligent approach for building an efficient NIDS which involves data preprocessing, feature extraction and classification has been

More information

Contribution of Four Class Labeled Attributes of KDD Dataset on Detection and False Alarm Rate for Intrusion Detection System

Contribution of Four Class Labeled Attributes of KDD Dataset on Detection and False Alarm Rate for Intrusion Detection System Indian Journal of Science and Technology, Vol 9(5), DOI: 10.17485/ijst/2016/v9i5/83656, February 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Contribution of Four Class Labeled Attributes of

More information

INTRUSION DETECTION SYSTEM

INTRUSION DETECTION SYSTEM INTRUSION DETECTION SYSTEM Project Trainee Muduy Shilpa B.Tech Pre-final year Electrical Engineering IIT Kharagpur, Kharagpur Supervised By: Dr.V.Radha Assistant Professor, IDRBT-Hyderabad Guided By: Mr.

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

A Hybrid Anomaly Detection Model using G-LDA

A Hybrid Anomaly Detection Model using G-LDA A Hybrid Detection Model using G-LDA Bhavesh Kasliwal a, Shraey Bhatia a, Shubham Saini a, I.Sumaiya Thaseen a, Ch.Aswani Kumar b a, School of Computing Science and Engineering, VIT University, Chennai,

More information

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Nachirat Rachburee and Wattana Punlumjeak Department of Computer Engineering, Faculty of Engineering,

More information

Machine Learning for Network Intrusion Detection

Machine Learning for Network Intrusion Detection Machine Learning for Network Intrusion Detection ABSTRACT Luke Hsiao Stanford University lwhsiao@stanford.edu Computer networks have become an increasingly valuable target of malicious attacks due to the

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June-2015 1496 A Comprehensive Survey of Selected Data Mining Algorithms used for Intrusion Detection Vivek Kumar Srivastava

More information

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Rupali Datti 1, Bhupendra verma 2 1 PG Research Scholar Department of Computer Science and Engineering, TIT, Bhopal (M.P.) rupal3010@gmail.com

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms ISSN (Online) 2278-121 ISSN (Print) 2319-594 Vol. 4, Issue 6, June 215 A Study on NSL-KDD set for Intrusion Detection System Based on ification Algorithms L.Dhanabal 1, Dr. S.P. Shantharajah 2 Assistant

More information

Data Reduction and Ensemble Classifiers in Intrusion Detection

Data Reduction and Ensemble Classifiers in Intrusion Detection Second Asia International Conference on Modelling & Simulation Data Reduction and Ensemble Classifiers in Intrusion Detection Anazida Zainal, Mohd Aizaini Maarof and Siti Mariyam Shamsuddin Faculty of

More information

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM Soukaena Hassan Hashem Computer Science

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

ScienceDirect. Analysis of KDD Dataset Attributes - Class wise For Intrusion Detection

ScienceDirect. Analysis of KDD Dataset Attributes - Class wise For Intrusion Detection Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 57 (2015 ) 842 851 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015) Analysis of KDD Dataset

More information

Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods

Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods Zahra Karimi Islamic Azad University Tehran North Branch Dept. of Computer Engineering Tehran, Iran Mohammad Mansour

More information

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets Feature Selection in UNSW-NB15 and KDDCUP 99 datasets JANARTHANAN, Tharmini and ZARGARI, Shahrzad Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15662/ This

More information

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER A.Tharani MSc (CS) M.Phil. Research Scholar Full Time B.Leelavathi, MCA, MPhil., Assistant professor, Dept. of information technology,

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters Slobodan Petrović NISlab, Department of Computer Science and Media Technology, Gjøvik University College,

More information

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection 44 The International Arab Journal of Information Technology An Intelligent CRF Based Feature Selection for Effective Intrusion Detection Sannasi Ganapathy 1, Pandi Vijayakumar 2, Palanichamy Yogesh 1,

More information

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection S. Revathi Ph.D. Research Scholar PG and Research, Department of Computer Science Government Arts

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Novel Technique of Extraction of Principal Situational Factors for NSSA

Novel Technique of Extraction of Principal Situational Factors for NSSA 48 Novel Technique of Extraction of Principal Situational Factors for NSSA Pardeep Bhandari, Asst. Prof.,Computer Sc., Doaba College, Jalandhar bhandaridcj@gmail.com Abstract The research on Network Security

More information

A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks

A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks International Journal of Machine Learning and Computing, Vol. 2, No. 5, October 2012 A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks H. M. Shirazi, A. Namadchian,

More information

A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

More information

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits Fel! Hittar inte referenskälla. - Fel! Hittar inte referenskälla.fel! Hittar inte referenskälla. Table of Contents Independent degree project - first cycle Bachelor s thesis 15 ECTS credits Master of Science

More information

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools Performance Analysis of various classifiers using Benchmark Datasets in Weka tools Abstract Intrusion occurs in the network due to redundant and irrelevant data that cause problem in network traffic classification.

More information

Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM

Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM You Chen 1,2, Wen-Fa Li 1,2, Xue-Qi Cheng 1 1 Institute of Computing Technology, Chinese Academy of Sciences 2 Graduate

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Classification of Attacks in Data Mining

Classification of Attacks in Data Mining Classification of Attacks in Data Mining Bhavneet Kaur Department of Computer Science and Engineering GTBIT, New Delhi, Delhi, India Abstract- Intrusion Detection and data mining are the major part of

More information

Analysis of network traffic features for anomaly detection

Analysis of network traffic features for anomaly detection Mach Learn (2015) 101:59 84 DOI 10.1007/s10994-014-5473-9 Analysis of network traffic features for anomaly detection Félix Iglesias Tanja Zseby Received: 9 December 2013 / Accepted: 16 October 2014 / Published

More information

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Pattern Recognition 40 (2007) 2373 2391 www.elsevier.com/locate/pr Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Chi-Ho Tsang, Sam Kwong,

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA S. DeepaLakshmi 1 and T. Velmurugan 2 1 Bharathiar University, Coimbatore, India 2 Department of Computer Science, D. G. Vaishnav College,

More information

Feature-weighted k-nearest Neighbor Classifier

Feature-weighted k-nearest Neighbor Classifier Proceedings of the 27 IEEE Symposium on Foundations of Computational Intelligence (FOCI 27) Feature-weighted k-nearest Neighbor Classifier Diego P. Vivencio vivencio@comp.uf scar.br Estevam R. Hruschka

More information

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

NETWORK FAULT DETECTION - A CASE FOR DATA MINING NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Maheshkumar Sabhnani and Gursel Serpen Electrical Engineering and Computer Science Department The University

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

An Empirical Study on feature selection for Data Classification

An Empirical Study on feature selection for Data Classification An Empirical Study on feature selection for Data Classification S.Rajarajeswari 1, K.Somasundaram 2 Department of Computer Science, M.S.Ramaiah Institute of Technology, Bangalore, India 1 Department of

More information

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering 54 Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering Vineet Richhariya, Nupur Sharma 1 Lakshmi Narain College of Technology, Bhopal, India Abstract Network Intrusion

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text

More information

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM

A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM A FAST CLUSTERING-BASED FEATURE SUBSET SELECTION ALGORITHM Akshay S. Agrawal 1, Prof. Sachin Bojewar 2 1 P.G. Scholar, Department of Computer Engg., ARMIET, Sapgaon, (India) 2 Associate Professor, VIT,

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM

PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM PASS EVALUATING IN SIMULATED SOCCER DOMAIN USING ANT-MINER ALGORITHM Mohammad Ali Darvish Darab Qazvin Azad University Mechatronics Research Laboratory, Qazvin Azad University, Qazvin, Iran ali@armanteam.org

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection Iwan Syarif 1,2, Adam Prugel-Bennett 1, Gary Wills 1 1 School of Electronics and Computer

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Comparative Analysis of Machine Learning Methods in Anomaly-based Intrusion Detection

Comparative Analysis of Machine Learning Methods in Anomaly-based Intrusion Detection Proceedings of the Fourth Engineering Students Conference at Peradeniya (ESCaPe) 2016 Comparative Analysis of Machine Learning Methods in Anomaly-based Intrusion Detection W.D.Y.N. Piyasinghe, K.E.G.A.P.

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Intrusion Detection System with FGA and MLP Algorithm

Intrusion Detection System with FGA and MLP Algorithm Intrusion Detection System with FGA and MLP Algorithm International Journal of Engineering Research & Technology (IJERT) Miss. Madhuri R. Yadav Department Of Computer Engineering Siddhant College Of Engineering,

More information

System Health Monitoring and Reactive Measures Activation

System Health Monitoring and Reactive Measures Activation System Health Monitoring and Reactive Measures Activation Alireza Shameli Sendi Michel Dagenais Department of Computer and Software Engineering December 10, 2009 École Polytechnique, Montreal Content Definition,

More information

Anomaly detection using machine learning techniques. A comparison of classification algorithms

Anomaly detection using machine learning techniques. A comparison of classification algorithms Anomaly detection using machine learning techniques A comparison of classification algorithms Henrik Hivand Volden Master s Thesis Spring 2016 Anomaly detection using machine learning techniques Henrik

More information

A hybrid network intrusion detection framework based on random forests and weighted k-means

A hybrid network intrusion detection framework based on random forests and weighted k-means Ain Shams Engineering Journal (2013) 4, 753 762 Ain Shams University Ain Shams Engineering Journal www.elsevier.com/locate/asej www.sciencedirect.com ELECTRICAL ENGINEERING A hybrid network intrusion detection

More information

Application of the Generic Feature Selection Measure in Detection of Web Attacks

Application of the Generic Feature Selection Measure in Detection of Web Attacks Application of the Generic Feature Selection Measure in Detection of Web Attacks Hai Thanh Nguyen 1, Carmen Torrano-Gimenez 2, Gonzalo Alvarez 2 Slobodan Petrović 1, and Katrin Franke 1 1 Norwegian Information

More information

Using Google s PageRank Algorithm to Identify Important Attributes of Genes

Using Google s PageRank Algorithm to Identify Important Attributes of Genes Using Google s PageRank Algorithm to Identify Important Attributes of Genes Golam Morshed Osmani Ph.D. Student in Software Engineering Dept. of Computer Science North Dakota State Univesity Fargo, ND 58105

More information

Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set

Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set Intrusion Detection System based on Support Vector Machine and BN-KDD Data Set Razieh Baradaran, Department of information technology, university of Qom, Qom, Iran R.baradaran@stu.qom.ac.ir Mahdieh HajiMohammadHosseini,

More information

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm

An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Network Anomaly Detection using Co-clustering

Network Anomaly Detection using Co-clustering Network Anomaly Detection using Co-clustering Evangelos E. Papalexakis, Alex Beutel, Peter Steenkiste Department of Electrical & Computer Engineering School of Computer Science Carnegie Mellon University,

More information

Association Rule Mining in Big Data using MapReduce Approach in Hadoop

Association Rule Mining in Big Data using MapReduce Approach in Hadoop GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) - 2016 July 2016 e-issn: 2455-5703 Association Rule Mining

More information

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP (  1 Cluster Based Speed and Effective Feature Extraction for Efficient Search Engine Manjuparkavi A 1, Arokiamuthu M 2 1 PG Scholar, Computer Science, Dr. Pauls Engineering College, Villupuram, India 2 Assistant

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Collaborative Security Attack Detection in Software-Defined Vehicular Networks

Collaborative Security Attack Detection in Software-Defined Vehicular Networks Collaborative Security Attack Detection in Software-Defined Vehicular Networks APNOMS 2017 Myeongsu Kim, Insun Jang, Sukjin Choo, Jungwoo Koo, and Sangheon Pack Korea University 2017. 9. 27. Contents Introduction

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing

Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing Osanaiye et al. EURASIP Journal on Wireless Communications and Networking (2016) 2016:130 DOI 10.1186/s13638-016-0623-3 RESEARCH Ensemble-based multi-filter feature selection method for DDoS detection

More information