CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

Size: px
Start display at page:

Download "CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET"

Transcription

1 CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

2 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET An IDS monitors the network bustle through incoming and outgoing data to assess the conduct of data usage thereby identifying any apprehensive activity and alerting with a sign of intrusion [5]. There are two types of intrusion detection techniques known as misuse and anomaly detection. Misuse detection is possible only for those attacks whose prior knowledge is present in the data set used for training the model [13]. The challenge is to develop an efficient model for real time intrusion detection which can be modeled for online data. Anomaly detection [14] also called profile based detection approach is one such technique that adapts to the normal behavior of the user/network and applies statistical measures to events or activities to decide whether the encountered event is normal or not [15]. Though there are a number of measures available to analyze the performance of IDS but the focus of this study is only on two key performance metrics: DR and FAR. The efficiency of IDS can be talked about in terms of these two metrics which can be depicted in the form of ROC curve [46]. 5.1 INTRODUCTION The ultimate aim in the development of IDS is to achieve highest accuracy. The two basic techniques used in intrusion detection have their own advantages and disadvantages. The misuse detection can very well detect known attacks with lower FAR but fail to identify novel attacks whereas the strength of anomaly detection technique is the ability to detect unknown attacks but suffers from the drawback of high FAR [70], [71]. KDD Cup data set has played a key role in studying and analyzing IDS whose attributes can be labeled in four classes. The objective of this study is to assimilate the contribution of attributes from each of these four classes in achieving high DR and low FAR. Machine learning algorithms are employed to study the classification of KDD Cup data set in two classes of normal and anomalous data. Different variants of KDD Cup data set are created with respect to four labels and each of these variants is simulated on a set of three algorithms. The results derived from the study of each data variant is analyzed and compared to derive a broad conclusion. This pragmatic study compiles the findings for DR and FAR in IDS with respect to data under each of the 64

3 four labels. The study contributes to the estimation of desired attributes for achieving maximum DR and minimum FAR simultaneously while adhering to the earlier findings signifying the obligatory connection of basic labeled attributes to intrusion detection. Further, in the study, an attribute ranking technique is used to rank the 41 attributes of KDD Cup data set in reference to IDS. The attribute label of each ranked attribute is recognized and the study of these ranked attributes is done with respect to the four attribute labels. The results are compared with the existing results of attribute label s study [43, 72] presented in this and the last chapter and the feature selection studies thus validating the contribution of each of the four attribute label for IDS. The study can be useful to the researchers experimenting in the area of feature selection/reduction. The results of this study can also be fruitful in case a new database is to be developed for IDS. This study does not focus on individual attribute because an attribute characteristic may change with platform and protocol instead the study clarifies the role of attribute labels which will remain almost same. The study can help reduce the data complexity while identifying major attributes of a particular label that are significant in getting high DR and low FAR at the same time. Attribute Class/Label Abbreviation Attributes Basic B 1-9 Content C Traffic T Host H Table 5.1: Categorization of attributes with four labels. In this practical study, NSL-KDD Cup data set is used. As discussed in the last chapter, this data set has 42 attributes out of which 41 are classified under one of the following labels: Basic, Content, Traffic, or Host [31], [35]. The details of categorization of 41 attributes with four labels are specified in Table 5.1. The selected data set has many possible arrangements such that the records may be classified in binary classes as normal/anomalous or in five classes as normal, Denial of Service (DoS) attack, User to Root (U2R) attack, Remote to Local (R2L) attack and Probe attack. 65

4 5.1.1 FEATURE RANKING RULES Feature selection basically refers to the process of identifying prominent attributes with respect to their contribution in achieving the desired goal. Lower the dimensionality of the data set, lighter the system developed on top of the data set [5]. Though there is always loss of information associated when trying to reduce the number of attributes but it is necessary to understand the basic requirement from the developed system so that the results from the system with original number of attributes can be compared with the results from the system with reduced number of attributes. Some rules are designed to ascertain the significance of an attribute for IDS and are listed in Table 5.2. A stands for Accuracy, FP stands for False Positive and FN stands for False Negative. Considering increased accuracy, if FP and FN both decreases then the feature under study is concluded to be insignificant. Considering another case, if A decreases with increase in FP and FN then the feature is identified to be significant. Third case considers an increase in FN with constant values for A and FP; the feature is treated as important. In other cases, the feature is considered important. A FP FN Feature Significance Increases Decreases Decreases Insignificant Decreases Increases Increases Important Constant Constant Increases Important X X X Important Table 5.2: Rules to determine feature significance. 5.2 OBJECTIVE The objective of this research is to study and interpret the role of 41 attributes of NSL- KDD Cup data set with respect to four specified labels as in Table 5.1 on DR and FAR for IDS. The focus is not to analyze the contribution of each of the 41 attributes individually for feature selection purpose but study the cumulative effect as per the four labels. Though, the results of this study can be used to improve the process of feature selection at a later stage. The goal of any efficient IDS is to achieve maximum DR with minimum FAR [47]. Further, the objective of this study is also to validate the contribution of above mentioned labels done in previous studies [43], [72] for IDS. 66

5 This is done in two steps: first by ranking the individual attributes of the KDD Cup data set and converting the results as per four labels and second by comparing the previously observed label contributions with ranker results and already accessible feature selection results. This chapter aspires to deduce which categories of the four labeled attributes contribute significantly in achieving high DR and low FAR. The conclusions drawn from this empirical study can help overcome the limitation of training data which in the case of anomaly tries to over protect the network from intrusions thereby increasing the FAR. Hence the audit data used in anomaly detection to detect novel attacks can be enhanced so that FAR is negligible. Considering the misuse detection also known as signature based IDS, the performance is majorly based on the known signatures of the attack. These signatures are obtained from the data set used in the detection of intrusions. This data set is generally derived or obtained from the online data exchange over a period of time covering different types of possible intrusion attacks. Therefore, the quality of data used for detecting intrusion attacks is of utmost importance because the chances of detection would be high if the data set under reference by the IDS encompasses most of the attacks. Hence it can be said that the attributes of the data under the IDS reference for detecting attacks should be critically selected to ensure maximum coverage of attacks. It should be noted that the duplicated and unnecessary attributes also need to be identified and eliminated from the data set because this elimination will lead to low complexity of the data set and hence less time consumption in detecting the attack. The contribution of various attributes of the data set under reference by IDS for detecting attacks needs to be estimated. The study of contribution of each attribute for intrusion detection can lead to ranking these attributes in the order of their usability to detect intrusions effectively. The ranking can help eliminated the least important attributes with respect to IDS. This exclusion of attributes can lead to reduction in the dimensionality of the data set thereby adding efficiency to IDS DESIGN Fig. 5.1 shows the design of the proposed work. A systematic approach is used to make fifteen possible configurations of KDD Cup data set based on the four labels given to attributes. 67

6 List Fifteen different label configuration of data Prepare Training data file Prepare Test data file Testing Training Training and Simulation Confusion Matrix Repeat for fifteen data files Result Analysis Tabulate the results Figure 5.1: Architectural design of proposed work. Sr. No. Attribute class Combinations # Attributes B C T H 1 BCTH 41 2 BCT 31 X 3 BCH 32 X.4 BTH 28 X 5 CTH 32 x 6 BC 22 X X 7 BT 18 X X 8 BH 19 X X 9 CT 22 x X 10 CH 23 x X 11 TH 19 x X 12 B 9 X X X 13 C 13 x X X 14 T 9 x X X 15 H 10 x X X Table 5.3: Combinations of attributes with maximum four labels for KDD Cup data set. The total number of attribute labels is four (N=4) hence sixteen different combinations are possible (2 N ). The NULL combination comprising of nil label with zero attributes 68

7 is excluded. Hence, there are fifteen combinations possible to form different configurations of data set (2 N -1) [43]. The data set which includes training as well as the test files is preprocessed individually to develop fifteen configurations as per Table 5.3. Out of the total 41 attributes (excluding class attribute), the attributes not required for one of the fifteen selected configuration are removed from training and test data file. The last attribute Class which remains integral in all the fifteen configurations describes whether the instance is a normal record or an anomalous one. 5.3 EXPERIMENTAL SETUP The experimental setup presents the data set employed for the study, the tool used for simulation and the research methodology applied to conduct the test thus generating the results. Weka [64], [67] is used for preprocessing and simulation of KDD Cup data set on the chosen classification algorithms. The KDD Cup data set files for training and testing are preprocessed in Weka. These fifteen data set configurations are simulated for three classification algorithms, Random Forest, OneR and Naïve Bayes. This study considers the binary classification data set whose details are listed in Table 5.4. Normal Class Instances Anomalous Class Instances Total KDDTrain+_20Percent (Training Data) KDDTest+ (Test Data) Table 5.4: Data instances of NSL-KDD Cup data set. Considering the validation study, the number of attributes under scrutiny is 41 and the last attribute is class which explains the classification result. The number of instances used in the data set is identified as KDDTrain-20Percent. Ranker algorithm [64] is used to rank the 41 attributes of KDD Cup data set. This algorithm ranks the attributes by their individual assessment. Ranker algorithm is basically the search technique and the InfoGainAttributeEval is the attribute evaluator. Higher the information gain better is the capability of the attribute to discriminate for classification. 69

8 5.3.1 EVALUATION METRICS The evaluation metrics help assess the performance of an IDS. Some of the evaluation metrics majorly used in measuring the efficiency of IDS are accuracy, DR, FAR, precision and F-score. All these metrics are derived from the four basic result elements of any classification algorithm presented in the form of confusion matrix which illustrates the actual instance classes versus predicted classification result. A good IDS tries to achieve maximum possible accuracy, F-score and DR with minimum FAR. 5.4 SIMULATION RESULTS IMPLEMENTATION Implementation of the design presented in the last section is shown with the help of Weka tool snapshots. Figure 5.2: Preprocessing of data set. The implementation is same as presented in the last chapter, except that instead of Random Tree algorithm, the algorithms used are Naïve Bayes, Random Forest and One-R. As shown in the last chapter, Fig. 5.2 depicts the preprocessing of the data set. Fig. 5.3 shows the list of training files and Fig. 5.4 shows the list of test data files. 70

9 Figure 5.3: List of training files. Figure 5.4: List of test files. Fifteen training and testing files are again considered for implementation. The purpose is to test the results on another set of algorithms and further validate them with the existing studies. The number of instances for each of the fifteen training and test data files is same. 71

10 5.4.2 OBSERVATIONS The simulation results from the confusion matrix for fifteen configurations of data set are shown in Table 5.5 for Naive Bayes, Table 5.6 for Random Forest and Table 5.7 for OneR algorithm. The result comprises of the TP, TN, FP and FN values for each of the fifteen combinations with respect to the three selected classifiers. Sr. No. Attribute Class Combination Naïve Bayes TN FN FP TP 1 BCTH BCT BCH BTH CTH BC BT BH CT CH TH B C T H Table 5.5: Result set for Naive Bayes algorithm. The summary of results for DR is presented in Table 5.8 and for FAR is presented in Table 5.9. The key metrics used in the study are DR and FAR. The classification results in the form of DR and FAR for all the fifteen cases of attribute class s combinations are presented for Random Forest, OneR and Naïve Bayes classifiers. These classification results are used to further compute the evaluation metrics thereby assessing and comparing the performance of IDS. 72

11 Sr. No. Attribute Class Combination Random Forest TN FN FP TP 1 BCTH BCT BCH BTH CTH BC BT BH CT CH TH B C T H Table 5.6: Result set for Random Forest algorithm. Sr. No. Attribute Class Combination OneR TN FN FP TP 1 BCTH BCT BCH BTH CTH BC BT BH CT CH TH B C T H Table 5.7: Result set for OneR algorithm. 73

12 Sr. No. Attribute Class Combination Detection Rate (%) Random Forest OneR Naïve Bayes 1 BCTH BCT BCH BTH CTH BC BT BH CT CH TH B C T H Table 5.8: Detection rate for Random Forest, OneR and Naive Bayes algorithm. Sr. No. Attribute Class Combination False Alarm Rate (%) Random Forest OneR Naïve Bayes 1 BCTH BCT BCH BTH CTH BC BT BH CT CH TH B C T H Table 5.9: False alarm rate for Random Forest, OneR and Naive Bayes algorithm. 74

13 Attribute Rank Ranked Attribute (Highest on top) Average Merit 1 src_bytes B 2 Service B 3 dst_bytes B 4 Flag B 5 diff_srv_rate T 6 same_srv_rate T 7 dst_host_srv_count H 8 dst_host_same_srv_rate H 9 dst_host_diff_srv_rate H 10 dst_host_serror_rate H 11 logged_in C 12 dst_host_srv_serror_rate H 13 serror_rate 0.39 T 14 Count T 15 srv_serror_rate T 16 dst_host_srv_diff_host_rate H 17 dst_host_count H 18 dst_host_same_src_port_rate H 19 srv_diff_host_rate T 20 srv_count T 21 dst_host_srv_rerror_rate H Attribute Class Table 5.10: Simulation results on ranker algorithm (ranking 1-21). Attribute Rank Ranked Attribute (Highest on top) Average Merit 22 protocol_type B 23 rerror_rate T 24 dst_host_rerror_rate H 25 srv_rerror_rate T 26 Duration B 27 Hot C 28 wrong_fragment 0.01 B 29 num_compromised C 30 num_root C 31 num_access_files C 32 is_guest_login C 33 num_file_creations C 34 su_attempted C 35 root_shell 0 C 36 Land 0 B 37 num_shells 0 C 38 num_failed_logins 0 C 39 Urgent 0 B 40 num_outbound_cmds 0 C 41 is_host_login 0 C Attribute Class Table 5.11: Simulation results on ranker algorithm (ranking 22-41). 75

14 In this chapter, the results are analyzed with respect to DR and FAR individually, which emphasizes on one attribute, two attribute and three attribute class combinations of attributes for all the three classification algorithms under study. The conclusions are drawn only for those highlighted points on the plots where each of the three classification algorithm shows significant conduct. This emphasizes the dominant behavior of each of the label classes and thus its associated attributes. Hence, algorithms from different class of machine learning are considered to ensure that there is no biasing in the results and are in accordance. The results of the ranker algorithm simulated on the NSL-KDD data set attributes is listed in Table 5.10 and Table The observations include the average merit of each attribute and the observed ranking. The table also presents the label of each ranked attribute. The focus of this validation part of the study is to do the analysis according to the four labels not individually. For this, the results for individual attribute are grouped under the four labels DISCUSSION Fig. 5.5 to Fig. 5.7 presents plot for the three classification algorithms with respect to one, two and three labeled attribute combinations respectively. Figure 5.5: Detection rate distribution considering single attribute class for Random Forest, OneR and Naïve Bayes. Considering the analysis of DR, Fig. 5.5 depicts the plot of DR with respect to single class of attributes. The green marked arrow highlights the high DR for all the 76

15 three classifiers for content labeled attributes and red marked arrow highlights the low DR for traffic labeled attributes. Hence it can be concluded from Fig. 5.5 that the content class attributes have significant contribution towards achieving high DR whereas the traffic class attributes deteriorate the same. Figure 5.6: Detection rate distribution considering two attribute classes for Random Forest, OneR and Naïve Bayes. Figure 5.7: Detection rate distribution considering three attribute classes for Random Forest, OneR and Naïve Bayes. In Fig. 5.6, red arrow highlights the combination of basic and host (BH) labeled attributes which reflects lower DR as compared to basic and traffic (BT) label for all 77

16 the three classifiers. Hence, attributes of host label show poor performance for DR as compared to traffic labeled attributes. Similarly, Fig. 5.7 shows the plot of all three labeled classes in comparison to the original set of four labeled attributes. The green arrow highlights the BCT label showing DR nearly equal to the BCTH labeled attributes. The red arrow highlights the CTH label depicting the absence of basic attributes and it can be observed that DR significantly falls for all the three classification algorithms. Considering FAR for the three classifiers, Fig. 5.8 to Fig is plotted with respect to one, two and three class attributes respectively. Fig. 5.8 shows FAR for the three classifiers with respect to single class of attributes. The green arrow highlights that the FAR is minimum for basic attributes for all the three classifiers whereas the red arrow highlights the contribution of content labeled attributes towards higher FAR. Also, it can be observed that FAR is on the lower side for traffic labeled attributes as compared to content labeled attributes. Figure 5.8: False alarm rate distribution considering single attribute class for Random Forest, OneR and Naïve Bayes. The arrow in Fig. 5.9 highlights the BH labeled attributes presenting better FAR as compared to BT class of attributes for two of the three classifiers whereas the third classifier OneR shows constant value. Similarly, arrow in Fig emphasizes on BCT labeled attributes which indicates absence of host attributes showing com- 78

17 paratively high FAR for the three classifiers hence it can be concluded that the host attributes have positive contribution in trimming down FAR. Figure 5.9: False alarm rate distribution considering two attribute classes for Random Forest, OneR and Naïve Bayes. Figure 5.10: False alarm rate distribution considering three attribute classes for Random Forest, OneR and Naïve Bayes. In this section, the results are discussed highlighting the key observations. The three classification algorithms under study are intentionally selected from different class of 79

18 machine learning algorithms to make sure that the outcome of simulation is independent of a particular classifier. It is observed from the analysis of results that the outcome of BC (22 attributes) labeled data set is on the moderate side whereas BTH and CTH configurations show poor result in comparison to BCTH data configuration. Another observation of prime concern is the contribution of BH (19 attributes) labeled attributes is almost equivalent to the contribution of BCTH (41 attributes) labeled data set configuration. Hence, it can be deduced that the BH labeled attributes give computationally similar results as compared to BCTH label with low cost as the number of attributes has reduced significantly. Table 5.12 concludes the studied behavior of DR and FAR with respect to the class-wise distribution of attributes for KDD Cup data set. It is observed that the Basic label attributes contribute maximum in achieving highest DR whereas the contribution of Host attributes is least. Similarly, the contribution of Basic label attributes is highest in achieving minimum FAR whereas the content label attributes has the least significant role in reducing FAR. Therefore, the four classes of attributes are ranked for high DR and low FAR separately with rank 1 depicting the maximum dominance. Ranking of Attribute Labels High Detection Rate Low False Alarm Rate 1 Basic Basic 2 Content Host 3 Traffic Traffic 4 Host Content Table 5.12: Summary of results. Hence, Table 5.12 summarizes the class wise contribution of 41 attributes in accomplishing high DR and low FAR which can help recognize significant attributes with respect to these four labels. The results of this study can be further used for feature selection purpose indicating that instead of trying all 41 attributes, feature selection can be applied on selective labels as well. It can be inferred from Table 5.10 and Table 5.11 that the basic labeled attributes need minimum feature reduction whereas traffic class attributes need maximum feature reduction followed by content and host classes. Considering the validation part of the study, Table 5.13 presents the number of attributes of each label contributing in the top 10 ranks, ranks 11 to 20, ranks 21 to 30 80

19 and ranks 31 to 41. Based on the ranking of attributes and their relevant labels given in Table 5.10 and Table 5.11, Fig is plotted. For example, in ranking 1 to 10, four attributes of basic label, two attributes of traffic label and four attributes of host label are present. Similarly, distribution of all four labels is presented for every ranking category. Basic Content Traffic Host Ranking Ranking Ranking Ranking Table 5.13: Distribution of ranked attributes in labels. Figure 5.11: Ranked attributes under four attribute labels. Figure 5.11 highlights the contribution of each of the four labels in the attribute ranking going from the highest to the lowest. It can be observed from the figure that 44% of the basic attributes lies in the top ten ranking with 40% of the host attributes and 22% of the traffic attributes. It is interesting to note that no attribute of content label stand in the top ten ranking. Another remarkable observation is that 69% of content attributes stand in the ranking category All the traffic and host attributes stand within top 30 ranks. It can also be noted that 77% of basic attributes lie within the top 30 ranks. 81

20 It can be observed that the attributes ranked from 35 to 41 have zero average merit or the information gain. In other words, it can be said that these attributes do not contribute to the classification process. Out of these seven attributes, five attributes belong to the content class. That is, 38% of content class attributes have zero significance. Similarly, considering the rank range starting from 28 to 34, the contribution of attributes is least significant. Out of these seven attributes, six of them belong to the content class. From this discussion, it can be concluded that, out of 13 attributes of content class, 85% of attributes stand unimportant. Finally, the discussion can end worth mentioning that 80% of host and 78 % of traffic attributes stand in the top 20 ranking. Feature Selection Based Reduced Features Basic Content Traffic Host Ranker Variants Multiple Feature Evaluation BestFirst+CFSSubsetEval GeneticSearch+CFSSubsetEval GreedyStepwise+CFSSubsetEval Table 5.14: Feature selection based studies. These results can be validated with the studies on four attribute labels and feature selection techniques. Table 5.14 lists results of some of the feature selection based studies for overall comparison of results. In case of ranker variants [26], different feature selection methods are used like information gain attribute evaluation, gain ratio attribute evaluation and correlation attribute evaluation with ranker algorithm. Using this, the features are reduced to 33 whose label detail is provided within the table. In the multiple feature evaluation technique [27], the focus is on removing those attributes that have no role in identifying an attack. This is done by preparing two lists of significant features, one identified by the classifiers and the other central to all attack classes. A common list of considerable features is identified on which Gradually ADD feature and Gradually DELETE feature technique is applied resulting in 11 reduced features. The other entries in Table 5.14 present the search methods BestFirst, GeneticSearch and GreedyStepwise with CFSSubsetEval [28]. The results obtained from BestFirst and GreedyStepwise are same not only in terms of reduced attributes but also the number of attributes selected from each label. 82

21 Fig is prepared with reference to the results presented in Table The figure presents the percentage contribution of each label for each feature selection technique. For example, in case when variants of ranker algorithm are used, 89% of 9 basic attributes, 62 % of 13 content attributes, 100% of 9 traffic attributes and 80% of 10 host attributes are finally present in the reduced set of 33 attributes. It can be inferred from this figure that, in all the five evaluations, the minimum contribution comes from content label. In other words, it can be said that most of the content class attributes are not considered as the contributing elements, for example, consider the case of BestFirst, only 8% of content attributes are present in the reduced data set which means that 92 % attributes of this label are redundant. In four of the cases, the traffic label attributes prevail as compared to the host attributes except in the GeneticSearch. Finally, the basic label attributes count above all in the entire set of studies indicating definite inclusion of these attributes from 67% to 89% which is exemplary. Figure 5.12: Contribution of attribute labels in feature selection. According to the studies done on attribute labels presented in this chapter and the last chapter [43, 72] with different classification algorithms, the results are almost same as observed ranker study and other feature selection based studies. 83

22 The overall comparison of the work comprising of label based study, ranker results and Feature selection based study is presented in Table 5.15 thereby validating the label based studies. The results for the basic and content labels are exactly same for each style of label evaluation whereas there isn t any significant difference in the contribution of traffic and host labels. Ranking of Attribute Labels Ranker Based Label Based Feature Selection Based 1 Basic Basic Basic 2 Host Traffic Traffic 3 Traffic Host Host 4 Content Content Content Table 5.15: Ranking of attribute labels by feature selection and label based studies. 5.5 CONCLUSION KDD Cup data set was used to examine the behavior of DR and FAR metrics for IDS. The 41 attributes of the KDD Cup data set were classified under Basic, Content, Traffic and Host labels. This thesis explores the contribution of KDD Cup data set attributes with respect to these four labels in improving the value of detection and FAR. The study was done on Random Forest, OneR and Naïve Bayes classification algorithms. A significant contribution of basic class attributes was observed for IDS with remarkable observations with respect to attributes of other labels. Finally the four attribute labels were ranked for their dominance in enhancing the detection and reducing the FAR. Further, the study validates the contribution of four labels of KDD data set attributes by ranking the individual attributes. The validation study was appended with the research related to feature selection which focuses on the imperative attributes only. The attribute label evaluation and validation is done by analyzing the ranker results according to the attribute labels rather than individually thus comparing the attribute label dominance in IDS with related results of various feature selection techniques rearranged to labeled attribute results. It is concluded that the basic label attributes are the most significant and the content attributes are the least significant. The contribution of traffic and host attributes is quite close but difference is observable when detection and FAR are considered. 84

23 This study can help improve the data set by reducing the predisposition of results towards attributes of a particular label resulting in high FAR and hence enhance the data set to attain efficient IDS for anomaly detection. The results of this study can be used for selective feature selection on particular labeled attributes rather than on all the individual attributes. 5.6 SUMMARY This chapter analyses the contribution of four class labeled, NSL-KDD Cup data set with respect to detection and FAR. The process is implemented for three machine learning algorithms, Naïve Bayes, Random Forest, and One-R. The results for all the fifteen configurations are compared for each of the three algorithms. Chapter 6 presents a proposed Negative-Positive Ratio (NPR) metric whose aim is to provide generalized evaluation of the machine learning algorithm. This metric is designed keeping in view the imbalance in the data set used for IDS. 85

Contribution of Four Class Labeled Attributes of KDD Dataset on Detection and False Alarm Rate for Intrusion Detection System

Contribution of Four Class Labeled Attributes of KDD Dataset on Detection and False Alarm Rate for Intrusion Detection System Indian Journal of Science and Technology, Vol 9(5), DOI: 10.17485/ijst/2016/v9i5/83656, February 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Contribution of Four Class Labeled Attributes of

More information

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities Eren Golge FRAUD? HACKERS!! DoS: Denial of service R2L: Unauth. Access U2R: Root access to Local Machine. Probing: Survallience....

More information

Analysis of Feature Selection Techniques: A Data Mining Approach

Analysis of Feature Selection Techniques: A Data Mining Approach Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature

More information

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown CHAPTER V KDD CUP 99 DATASET With the widespread use of computer networks, the number of attacks has grown extensively, and many new hacking tools and intrusive methods have appeared. Using an intrusion

More information

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 5 (Nov. - Dec. 2013), PP 107-112 Intrusion Detection System Based on K-Star Classifier and Feature

More information

Network attack analysis via k-means clustering

Network attack analysis via k-means clustering Network attack analysis via k-means clustering - By Team Cinderella Chandni Pakalapati cp6023@rit.edu Priyanka Samanta ps7723@rit.edu Dept. of Computer Science CONTENTS Recap of project overview Analysis

More information

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. IV (May - June 2017), PP 48-52 www.iosrjournals.org Classification Trees with Logistic Regression

More information

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics Abhishek choudhary 1, Swati Sharma 2, Pooja

More information

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University

More information

ScienceDirect. Analysis of KDD Dataset Attributes - Class wise For Intrusion Detection

ScienceDirect. Analysis of KDD Dataset Attributes - Class wise For Intrusion Detection Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 57 (2015 ) 842 851 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015) Analysis of KDD Dataset

More information

Detection of DDoS Attack on the Client Side Using Support Vector Machine

Detection of DDoS Attack on the Client Side Using Support Vector Machine Detection of DDoS Attack on the Client Side Using Support Vector Machine Donghoon Kim * and Ki Young Lee** *Department of Information and Telecommunication Engineering, Incheon National University, Incheon,

More information

A Hybrid Anomaly Detection Model using G-LDA

A Hybrid Anomaly Detection Model using G-LDA A Hybrid Detection Model using G-LDA Bhavesh Kasliwal a, Shraey Bhatia a, Shubham Saini a, I.Sumaiya Thaseen a, Ch.Aswani Kumar b a, School of Computing Science and Engineering, VIT University, Chennai,

More information

INTRUSION DETECTION SYSTEM

INTRUSION DETECTION SYSTEM INTRUSION DETECTION SYSTEM Project Trainee Muduy Shilpa B.Tech Pre-final year Electrical Engineering IIT Kharagpur, Kharagpur Supervised By: Dr.V.Radha Assistant Professor, IDRBT-Hyderabad Guided By: Mr.

More information

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection An Efficient Decision Tree Model for Classification of Attacks with Feature Selection Akhilesh Kumar Shrivas Research Scholar, CVRU, Bilaspur (C.G.), India S. K. Singhai Govt. Engineering College Bilaspur

More information

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Rupali Datti 1, Bhupendra verma 2 1 PG Research Scholar Department of Computer Science and Engineering, TIT, Bhopal (M.P.) rupal3010@gmail.com

More information

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms ISSN (Online) 2278-121 ISSN (Print) 2319-594 Vol. 4, Issue 6, June 215 A Study on NSL-KDD set for Intrusion Detection System Based on ification Algorithms L.Dhanabal 1, Dr. S.P. Shantharajah 2 Assistant

More information

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION 55 CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION In this work, an intelligent approach for building an efficient NIDS which involves data preprocessing, feature extraction and classification has been

More information

Data Reduction and Ensemble Classifiers in Intrusion Detection

Data Reduction and Ensemble Classifiers in Intrusion Detection Second Asia International Conference on Modelling & Simulation Data Reduction and Ensemble Classifiers in Intrusion Detection Anazida Zainal, Mohd Aizaini Maarof and Siti Mariyam Shamsuddin Faculty of

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June-2015 1496 A Comprehensive Survey of Selected Data Mining Algorithms used for Intrusion Detection Vivek Kumar Srivastava

More information

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Nachirat Rachburee and Wattana Punlumjeak Department of Computer Engineering, Faculty of Engineering,

More information

Anomaly detection using machine learning techniques. A comparison of classification algorithms

Anomaly detection using machine learning techniques. A comparison of classification algorithms Anomaly detection using machine learning techniques A comparison of classification algorithms Henrik Hivand Volden Master s Thesis Spring 2016 Anomaly detection using machine learning techniques Henrik

More information

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM Soukaena Hassan Hashem Computer Science

More information

Classification of Attacks in Data Mining

Classification of Attacks in Data Mining Classification of Attacks in Data Mining Bhavneet Kaur Department of Computer Science and Engineering GTBIT, New Delhi, Delhi, India Abstract- Intrusion Detection and data mining are the major part of

More information

Machine Learning for Network Intrusion Detection

Machine Learning for Network Intrusion Detection Machine Learning for Network Intrusion Detection ABSTRACT Luke Hsiao Stanford University lwhsiao@stanford.edu Computer networks have become an increasingly valuable target of malicious attacks due to the

More information

NAVAL POSTGRADUATE SCHOOL THESIS

NAVAL POSTGRADUATE SCHOOL THESIS NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS NEURAL DETECTION OF MALICIOUS NETWORK ACTIVITIES USING A NEW DIRECT PARSING AND FEATURE EXTRACTION TECHNIQUE by Cheng Hong Low September 2015 Thesis

More information

Experiments with Applying Artificial Immune System in Network Attack Detection

Experiments with Applying Artificial Immune System in Network Attack Detection Kennesaw State University DigitalCommons@Kennesaw State University KSU Proceedings on Cybersecurity Education, Research and Practice 2017 KSU Conference on Cybersecurity Education, Research and Practice

More information

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Pattern Recognition 40 (2007) 2373 2391 www.elsevier.com/locate/pr Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Chi-Ho Tsang, Sam Kwong,

More information

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Maheshkumar Sabhnani and Gursel Serpen Electrical Engineering and Computer Science Department The University

More information

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER A.Tharani MSc (CS) M.Phil. Research Scholar Full Time B.Leelavathi, MCA, MPhil., Assistant professor, Dept. of information technology,

More information

A COMPARATIVE STUDY OF CLASSIFICATION MODELS FOR DETECTION IN IP NETWORKS INTRUSIONS

A COMPARATIVE STUDY OF CLASSIFICATION MODELS FOR DETECTION IN IP NETWORKS INTRUSIONS A COMPARATIVE STUDY OF CLASSIFICATION MODELS FOR DETECTION IN IP NETWORKS INTRUSIONS 1 ABDELAZIZ ARAAR, 2 RAMI BOUSLAMA 1 Assoc. Prof., College of Information Technology, Ajman University, UAE 2 MSIS,

More information

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets Feature Selection in UNSW-NB15 and KDDCUP 99 datasets JANARTHANAN, Tharmini and ZARGARI, Shahrzad Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/15662/ This

More information

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Novel Technique of Extraction of Principal Situational Factors for NSSA

Novel Technique of Extraction of Principal Situational Factors for NSSA 48 Novel Technique of Extraction of Principal Situational Factors for NSSA Pardeep Bhandari, Asst. Prof.,Computer Sc., Doaba College, Jalandhar bhandaridcj@gmail.com Abstract The research on Network Security

More information

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes Thaksen J. Parvat USET G.G.S.Indratrastha University Dwarka, New Delhi 78 pthaksen.sit@sinhgad.edu Abstract Intrusion

More information

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits Fel! Hittar inte referenskälla. - Fel! Hittar inte referenskälla.fel! Hittar inte referenskälla. Table of Contents Independent degree project - first cycle Bachelor s thesis 15 ECTS credits Master of Science

More information

A hybrid network intrusion detection framework based on random forests and weighted k-means

A hybrid network intrusion detection framework based on random forests and weighted k-means Ain Shams Engineering Journal (2013) 4, 753 762 Ain Shams University Ain Shams Engineering Journal www.elsevier.com/locate/asej www.sciencedirect.com ELECTRICAL ENGINEERING A hybrid network intrusion detection

More information

Feature Selection in the Corrected KDD -dataset

Feature Selection in the Corrected KDD -dataset Feature Selection in the Corrected KDD -dataset ZARGARI, Shahrzad Available from Sheffield Hallam University Research Archive (SHURA) at: http://shura.shu.ac.uk/17048/ This document is the author deposited

More information

Association Rule Mining in Big Data using MapReduce Approach in Hadoop

Association Rule Mining in Big Data using MapReduce Approach in Hadoop GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) - 2016 July 2016 e-issn: 2455-5703 Association Rule Mining

More information

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 55 Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm Wafa'

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

LAWRA: a layered wrapper feature selection approach for network attack detection

LAWRA: a layered wrapper feature selection approach for network attack detection SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2015; 8:3459 3468 Published online 26 May 2015 in Wiley Online Library (wileyonlinelibrary.com)..1270 RESEARCH ARTICLE LAWRA: a layered wrapper

More information

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection 44 The International Arab Journal of Information Technology An Intelligent CRF Based Feature Selection for Effective Intrusion Detection Sannasi Ganapathy 1, Pandi Vijayakumar 2, Palanichamy Yogesh 1,

More information

Fuzzy Grids-Based Intrusion Detection in Neural Networks

Fuzzy Grids-Based Intrusion Detection in Neural Networks Fuzzy Grids-Based Intrusion Detection in Neural Networks Izani Islam, Tahir Ahmad, Ali H. Murid Abstract: In this paper, a framework is used for intrusion detection that shows the effectiveness of data

More information

The Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method

The Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method Available online at http://www.csjonline.org/ The Caspian Sea Journal ISSN: 1578-7899 Volume 10, Issue 1, Supplement 4 (2016) 456-461 A Study on Improvement of Intrusion Detection Systems in Computer Networks

More information

Analysis of network traffic features for anomaly detection

Analysis of network traffic features for anomaly detection Mach Learn (2015) 101:59 84 DOI 10.1007/s10994-014-5473-9 Analysis of network traffic features for anomaly detection Félix Iglesias Tanja Zseby Received: 9 December 2013 / Accepted: 16 October 2014 / Published

More information

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection Iwan Syarif 1,2, Adam Prugel-Bennett 1, Gary Wills 1 1 School of Electronics and Computer

More information

Anomaly based Network Intrusion Detection using Machine Learning Techniques.

Anomaly based Network Intrusion Detection using Machine Learning Techniques. Anomaly based Network Intrusion etection using Machine Learning Techniques. Tushar Rakshe epartment of Electrical Engineering Veermata Jijabai Technological Institute, Matunga, Mumbai. Vishal Gonjari epartment

More information

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES EXPERIMENTAL WORK PART I CHAPTER 6 DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES The evaluation of models built using statistical in conjunction with various feature subset

More information

Classifying Network Intrusions: A Comparison of Data Mining Methods

Classifying Network Intrusions: A Comparison of Data Mining Methods Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2005 Proceedings Americas Conference on Information Systems (AMCIS) 2005 Classifying Network Intrusions: A Comparison of Data Mining

More information

CHAPTER 2 DARPA KDDCUP99 DATASET

CHAPTER 2 DARPA KDDCUP99 DATASET 44 CHAPTER 2 DARPA KDDCUP99 DATASET 2.1 THE DARPA INTRUSION-DETECTION EVALUATION PROGRAM The number of intrusions is to be found in any computer and network audit data are plentiful as well as ever-changing.

More information

Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM

Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM Toward Building Lightweight Intrusion Detection System Through Modified RMHC and SVM You Chen 1,2, Wen-Fa Li 1,2, Xue-Qi Cheng 1 1 Institute of Computing Technology, Chinese Academy of Sciences 2 Graduate

More information

A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks

A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks International Journal of Machine Learning and Computing, Vol. 2, No. 5, October 2012 A Combined Anomaly Base Intrusion Detection Using Memetic Algorithm and Bayesian Networks H. M. Shirazi, A. Namadchian,

More information

System Health Monitoring and Reactive Measures Activation

System Health Monitoring and Reactive Measures Activation System Health Monitoring and Reactive Measures Activation Alireza Shameli Sendi Michel Dagenais Department of Computer and Software Engineering December 10, 2009 École Polytechnique, Montreal Content Definition,

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Collaborative Security Attack Detection in Software-Defined Vehicular Networks

Collaborative Security Attack Detection in Software-Defined Vehicular Networks Collaborative Security Attack Detection in Software-Defined Vehicular Networks APNOMS 2017 Myeongsu Kim, Insun Jang, Sukjin Choo, Jungwoo Koo, and Sangheon Pack Korea University 2017. 9. 27. Contents Introduction

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm

Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm Alaa Abd Ali Hadi Al-Furat Al-Awsat Technical University, Iraq. alaaalihadi@gmail.com Abstract The Internet has

More information

Network Anomaly Detection using Co-clustering

Network Anomaly Detection using Co-clustering Network Anomaly Detection using Co-clustering Evangelos E. Papalexakis, Alex Beutel, Peter Steenkiste Department of Electrical & Computer Engineering School of Computer Science Carnegie Mellon University,

More information

ATwo Stage Intrusion Detection Intelligent System

ATwo Stage Intrusion Detection Intelligent System ATwo Stage Intrusion Detection Intelligent System Nevrus Kaja, Adnan Shaout and Di Ma The University of Michigan Dearborn, United States Abstract Security is becoming an inherited and amplified problem

More information

An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm

An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm I. J. Communications, Network and System Sciences, 2008, 4, 285-385 Published Online November 2008 in SciRes (http://www.scirp.org/journal/ijcns/). An Active Rule Approach for Network Intrusion Detection

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network International Journal of Science and Engineering Investigations vol. 6, issue 62, March 2017 ISSN: 2251-8843 An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network Abisola Ayomide

More information

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Prateek Saxena March 3 2008 1 The Problems Today s lecture is on the discussion of the critique on 1998 and 1999 DARPA IDS evaluations conducted

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection S. Revathi Ph.D. Research Scholar PG and Research, Department of Computer Science Government Arts

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection Multiple Fusion With Cuttlefish Algorithm Based Feature Selection K.Jayakumar Department of Communication and Networking k_jeyakumar1979@yahoo.co.in S.Karpagam Department of Computer Science and Engineering,

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods

Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods Feature Ranking in Intrusion Detection Dataset using Combination of Filtering Methods Zahra Karimi Islamic Azad University Tehran North Branch Dept. of Computer Engineering Tehran, Iran Mohammad Mansour

More information

A COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR NETWORK INTRUSION DETECTION IN THE PRESENCE OF POOR QUALITY DATA (complete-paper)

A COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR NETWORK INTRUSION DETECTION IN THE PRESENCE OF POOR QUALITY DATA (complete-paper) A COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR NETWORK INTRUSION DETECTION IN THE PRESENCE OF POOR QUALITY DATA (complete-paper) Eitel J.M. Lauría Marist College Eitel.Lauria@Marist.edu Giri K. Tayi

More information

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17,   ISSN RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET Dr C Manju Assistant Professor, Department of Computer Science Kanchi Mamunivar center for Post Graduate Studies,

More information

Intrusion Detection Using Data Mining Technique (Classification)

Intrusion Detection Using Data Mining Technique (Classification) Intrusion Detection Using Data Mining Technique (Classification) Dr.D.Aruna Kumari Phd 1 N.Tejeswani 2 G.Sravani 3 R.Phani Krishna 4 1 Associative professor, K L University,Guntur(dt), 2 B.Tech(1V/1V),ECM,

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Signature Based Intrusion Detection using Latent Semantic Analysis

Signature Based Intrusion Detection using Latent Semantic Analysis Signature Based Intrusion Detection using Latent Semantic Analysis Jean-Louis Lassez, Ryan Rossi, Stephen Sheel Department of Computer Science Coastal Carolina University {jlassez, raross, steves}@coastal.edu

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing

Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing Osanaiye et al. EURASIP Journal on Wireless Communications and Networking (2016) 2016:130 DOI 10.1186/s13638-016-0623-3 RESEARCH Ensemble-based multi-filter feature selection method for DDoS detection

More information

Two Level Anomaly Detection Classifier

Two Level Anomaly Detection Classifier Two Level Anomaly Detection Classifier Azeem Khan Dublin City University School of Computing Dublin, Ireland raeeska2@computing.dcu.ie Shehroz Khan Department of Information Technology National University

More information

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters Slobodan Petrović NISlab, Department of Computer Science and Media Technology, Gjøvik University College,

More information

Flow-based Anomaly Intrusion Detection System Using Neural Network

Flow-based Anomaly Intrusion Detection System Using Neural Network Flow-based Anomaly Intrusion Detection System Using Neural Network tational power to analyze only the basic characteristics of network flow, so as to Intrusion Detection systems (KBIDES) classify the data

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER International Journal of Mechanical Engineering and Technology (IJMET) Volume 7, Issue 5, September October 2016, pp.417 421, Article ID: IJMET_07_05_041 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=7&itype=5

More information

A Back Propagation Neural Network Intrusion Detection System Based on KVM

A Back Propagation Neural Network Intrusion Detection System Based on KVM International Journal of Innovation Engineering and Science Research Open Access A Back Propagation Neural Network Intrusion Detection System Based on KVM ABSTRACT Jiazuo Wang Computer Science Department,

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

A SEMI-SUPERVISED MODEL FOR NETWORK TRAFFIC ANOMALY DETECTION

A SEMI-SUPERVISED MODEL FOR NETWORK TRAFFIC ANOMALY DETECTION A SEMI-SUPERVISED MODEL FOR NETWORK TRAFFIC ANOMALY DETECTION Nguyen Ha Duong*, Hoang Dang Hai** *Faculty of Information and Technology, National University of Civil Engineering, Vietnam ** Ministry of

More information

Review on Data Mining Techniques for Intrusion Detection System

Review on Data Mining Techniques for Intrusion Detection System Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,

More information

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

Practical Data Mining COMP-321B. Tutorial 5: Article Identification

Practical Data Mining COMP-321B. Tutorial 5: Article Identification Practical Data Mining COMP-321B Tutorial 5: Article Identification Shevaun Ryan Mark Hall August 15, 2006 c 2006 University of Waikato 1 Introduction This tutorial will focus on text mining, using text

More information

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH VIKAS SANNADY 1, POONAM GUPTA 2 1Asst.Professor, Department of Computer Science, GTBCPTE, Bilaspur, chhattisgarh, India 2Asst.Professor,

More information

Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic

Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 2008 39 Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic G.V.S.N.R.V.Prasad

More information

Bayesian Learning Networks Approach to Cybercrime Detection

Bayesian Learning Networks Approach to Cybercrime Detection Bayesian Learning Networks Approach to Cybercrime Detection N S ABOUZAKHAR, A GANI and G MANSON The Centre for Mobile Communications Research (C4MCR), University of Sheffield, Sheffield Regent Court, 211

More information

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering

Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering 54 Optimized Intrusion Detection by CACC Discretization Via Naïve Bayes and K-Means Clustering Vineet Richhariya, Nupur Sharma 1 Lakshmi Narain College of Technology, Bhopal, India Abstract Network Intrusion

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Pattern recognition (4)

Pattern recognition (4) Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and

More information

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees

Intrusion detection in computer networks through a hybrid approach of data mining and decision trees WALIA journal 30(S1): 233237, 2014 Available online at www.waliaj.com ISSN 10263861 2014 WALIA Intrusion detection in computer networks through a hybrid approach of data mining and decision trees Tayebeh

More information

Deep Learning Approach to Network Intrusion Detection

Deep Learning Approach to Network Intrusion Detection Deep Learning Approach to Network Intrusion Detection Paper By : Nathan Shone, Tran Nguyen Ngoc, Vu Dinh Phai, Qi Shi Presented by : Romi Bajracharya Overview Introduction Limitation with NIDS Proposed

More information