INTRUSION DETECTION SYSTEM

Similar documents
Network attack analysis via k-means clustering

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics

CHAPTER 2 DARPA KDDCUP99 DATASET

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

Analysis of Feature Selection Techniques: A Data Mining Approach

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS

Classification of Attacks in Data Mining

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms

Classifying Network Intrusions: A Comparison of Data Mining Methods

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System

Detection of DDoS Attack on the Client Side Using Support Vector Machine

Independent degree project - first cycle Bachelor s thesis 15 ECTS credits

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction

Anomaly detection using machine learning techniques. A comparison of classification algorithms

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set

NAVAL POSTGRADUATE SCHOOL THESIS

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN

Machine Learning for Network Intrusion Detection

Combination of Three Machine Learning Algorithms for Intrusion Detection Systems in Computer Networks

The Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method

Data Mining Approaches for Network Intrusion Detection: from Dimensionality Reduction to Misuse and Anomaly Detection

A COMPARATIVE STUDY OF CLASSIFICATION MODELS FOR DETECTION IN IP NETWORKS INTRUSIONS

RUSMA MULYADI. Advisor: Dr. Daniel Zeng

Data Reduction and Ensemble Classifiers in Intrusion Detection

Anomaly Intrusion Detection System Using Hierarchical Gaussian Mixture Model

INTRUSION DETECTION WITH TREE-BASED DATA MINING CLASSIFICATION TECHNIQUES BY USING KDD DATASET

CHAPTER 7 Normalization of Dataset

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem

A Hybrid Anomaly Detection Model using G-LDA

ARTIFICIAL INTELLIGENCE APPROACHES FOR INTRUSION DETECTION.

A Hierarchical SOM based Intrusion Detection System

On Dataset Biases in a Learning System with Minimum A Priori Information for Intrusion Detection

Unsupervised clustering approach for network anomaly detection

Analysis of neural networks usage for detection of a new attack in IDS

Analysis of TCP Segment Header Based Attack Using Proposed Model

Discriminant Analysis based Feature Selection in KDD Intrusion Dataset

A COMPARATIVE STUDY OF DATA MINING ALGORITHMS FOR NETWORK INTRUSION DETECTION IN THE PRESENCE OF POOR QUALITY DATA (complete-paper)

Bayesian Learning Networks Approach to Cybercrime Detection

Mining Audit Data for Intrusion Detection Systems Using Support Vector Machines and Neural Networks

Intrusion Detection -- A 20 year practice. Outline. Till Peng Liu School of IST Penn State University

Ranking and Filtering the Selected Attributes for Intrusion Detection System

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER

Experiments with Applying Artificial Immune System in Network Attack Detection

ATwo Stage Intrusion Detection Intelligent System

A Rough Set Based Feature Selection on KDD CUP 99 Data Set

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) PROPOSED HYBRID-MULTISTAGES NIDS TECHNIQUES

System Health Monitoring and Reactive Measures Activation

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm

Comparison of variable learning rate and Levenberg-Marquardt back-propagation training algorithms for detecting attacks in Intrusion Detection Systems

Keywords Intrusion Detection System, Artificial Neural Network, Multi-Layer Perceptron. Apriori algorithm

Fuzzy Grids-Based Intrusion Detection in Neural Networks

INTRUSION DETECTION MODEL IN DATA MINING BASED ON ENSEMBLE APPROACH

Visualization of anomaly detection using prediction sensitivity

A hybrid network intrusion detection framework based on random forests and weighted k-means

Association Rule Mining in Big Data using MapReduce Approach in Hadoop

An Active Rule Approach for Network Intrusion Detection with Enhanced C4.5 Algorithm

Intrusion Detection Based On Clustering Algorithm

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection

An Intelligent CRF Based Feature Selection for Effective Intrusion Detection

Intrusion Detection System with FGA and MLP Algorithm

Towards A New Architecture of Detecting Networks Intrusion Based on Neural Network

Distributed Detection of Network Intrusions Based on a Parametric Model

Comparative Analysis of Classification Algorithms on KDD 99 Data Set

IDS: Signature Detection

Learning Intrusion Detection: Supervised or Unsupervised?

Two Level Anomaly Detection Classifier

Improved Detection of Low-Profile Probes and Denial-of-Service Attacks*

Firewalls, Tunnels, and Network Intrusion Detection

Intrusion Detection Systems (IDS)

DATA MINING FOR NETWORK INTRUSION DETECTION. Vipin Kumar

CSE 565 Computer Security Fall 2018

Signature Based Intrusion Detection using Latent Semantic Analysis

Using MongoDB Databases for Training and Combining Intrusion Detection Datasets

An Intrusion Prediction Technique Based on Co-evolutionary Immune System for Network Security (CoCo-IDP)

2. INTRUDER DETECTION SYSTEMS

Using Domain Knowledge to Facilitate Cyber Security Analysis

Feature Selection in UNSW-NB15 and KDDCUP 99 datasets

Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm

Intrusion detection system with decision tree and combine method algorithm

Intrusion Detection Systems (IDS)

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET

IDuFG: Introducing an Intrusion Detection using Hybrid Fuzzy Genetic Approach

Network Traffic Anomaly Detection Based on Packet Bytes ABSTRACT Bugs in the attack. Evasion. 1. INTRODUCTION User Behavior. 2.

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection

A Back Propagation Neural Network Intrusion Detection System Based on KVM

Fast Feature Reduction in Intrusion Detection Datasets

Analysis of network traffic features for anomaly detection

CE Advanced Network Security

Performance improvement of intrusion detection with fusion of multiple sensors

Anomaly Detection of Network Traffic Based on Analytical Discrete Wavelet Transform. Author : Marius SALAGEAN, Ioana FIROIU 10 JUNE /06/10

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks

Computer Security: Principles and Practice

Feature Selection in the Corrected KDD -dataset

Transcription:

INTRUSION DETECTION SYSTEM Project Trainee Muduy Shilpa B.Tech Pre-final year Electrical Engineering IIT Kharagpur, Kharagpur Supervised By: Dr.V.Radha Assistant Professor, IDRBT-Hyderabad Guided By: Mr. Jagannath Kranthi Research Fellow, IDRBT-Hyderabad

CERTIFICATE This is to certify that this project has been successfully completed to my satisfaction and that the goals set upon at the outset of this endeavor have been worked upon to the best of student s abilities and resources. I hereby allow this project to be presented for evaluation with my full consent. Supervisor Dr.V.Radha

Intrusion Detection using Support Linear Machine M. Shilpa a, b,jaganath Kranthi*, V. Radha *, a a Institute for Development and Research in banking Technology, Castle Hills Road #1, Masab Tank, Hyderabad 500 057 (A P) INDIA b Department of Electrical Engineering, Indian Institute of Technology-Kharagpur, Kharagpur 721 302 (W B) INDIA Introduction Intrusion Detection: Intrusion detection is the process of monitoring the events occurring in a computer system or monitoring the events occurring in a computer system or defined as attempts to bypass the security mechanisms of a computer or network ( compromise the confidentiality, integrity, availability of information resources ) Intrusion Detection System (IDS) Combination of software and hardware that attempts to perform intrusion detection. Raise the alarm when possible intrusion happens. Probe, DoS, U2R, R2L Due to the proliferation of high-speed Internet access, more and more organizations are becoming vulnerable to potential cyber attacks, such as network intrusions. Sophistication of cyber attacks as well as their severity has also increased recently. Cyber attacks (intrusions) are actions that attempt to bypass security mechanisms of computer systems. They are caused by: Attackers accessing the system from Internet Insider attackers - authorized users attempting to gain and misuse non-authorized privileges Typical intrusion scenario Scanning activity Computer network * Corresponding Author. Ph: +91-40-2353 4981

Why do we need Intrusion Detection Machine with vulnerability Security mechanisms always have inevitable vulnerabilities Current firewalls are not sufficient to ensure security in computer networks Security holes caused by allowances made to users/programmers/administrators Insider attacks Multiple levels of data confidentiality in commercial and government organizations needs multilayer protection in firewalls Traditional Intrusion Detection Systems Traditional intrusion detection system (IDS) tools (e.g. SNORT) are based on signatures of known attacks Example of SNORT rule (MS-SQL Slammer worm) any -> udp port 1434 (content:" 81 F1 03 01 04 9B 81 F1 01 "; content:"sock"; content:"send") www.snort.org Limitations Signature database has to be manually revised for each new type of discovered intrusion They cannot detect emerging cyber threats Substantial latency in deployment of newly created signatures Data mining based IDSs can alleviate these limitations Taxonomy of Computer Attacks Intrusions can be classified according to several categories: Attack type (Denial of Service (DoS), Scan, worms/trojan horses, compromises (R2L, U2R), ) Number of network connections involved in the attack single connection cyber attacks multiple connections cyber attacks Source of the attack multiple vs. single inside vs. outside Environment (network, host, P2P, wireless networks, ) Automation (manual, automated, semi-automated attacks)

Types of Computer Attacks DoS (Denial of Service) attacks DoS attacks attempt to shut down a network, computer, or process, or otherwise deny the use of resources or services to the authorized users Distributed DoS attacks Probe (probing, scanning) attacks Attacker uses network services to collect information about a host (e.g. list of valid IP addresses, what services it offers, what is the operating system) Compromises - attackers use known vulnerabilities such as buffer overflows and weak security to gain privileged access to hosts R2L (Remote to Login) attacks - attacker who has the ability to send packets to a machine over a network (but does not have an account on that machine), gains machine over a network (but does not have an account on that machine), gains access (either as a user or as a root) to the machine and does harmful operations U2R (User to Root) attacks - attacker who has access to a local account on a computer system is able to elevate his or her privileges by exploiting a bug in the operating system or a program that is installed on the system Trojan horses / worms attacks that are aggressively replicating on other hosts (worms selfreplicating; Trojan horses are downloaded by users) Source of Computer Attacks Attacks may be launched from single location or from several different locations Attacks may be also targeted to single or many different destinations Need to analyze network data from several sites in order to detect these distributed attacks. Single source attacks Distributed/Coordinated attacks Intrusion Detection Taxonomy Information source host-based ID, network-based ID, wireless-network ID, application logs, sensor alerts Analysis strategy Anomaly detection vs. misuse detection

Data mining approach vs. traditional techniques Time aspects in analysis Real-time analysis vs. off-line analysis Architecture Single centralized vs. distributed & heterogeneous Activeness Active reaction vs. passive reaction Continuality Continuous analysis vs. periodic analysis IDS - Analysis Strategy Misuse detection is based on extensive knowledge of patterns associated with known attacks provided by human experts Existing approaches: pattern (signature) matching, expert systems, state transition analysis, data mining Major limitations: Unable to detect novel & unanticipated attacks Signature database has to be revised for each new type of discovered attack Anomaly detection is based on profiles that represent normal behavior of users, hosts, or networks, and detecting attacks as significant deviations from this profile Major benefit - potentially able to recognize unforeseen attacks. Major limitation - possible high false alarm rate, since detected deviations do not necessarily represent actual attacks Major approaches: statistical methods, expert systems, clustering, neural networks, support vector machines, outlier detection schemes IDS Time Aspects in Analysis Real-time IDS Analyzes the data while the sessions are in progress (e.g. network sessions for network intrusion detection, login sessions for host based intrusion detection) Raises an alarm immediately when the attack is detected Off-line IDS Analyzes the data when the information about the sessions are already collected postanalysis Useful for understanding the attackers behavior Standard measures for evaluating IDSs: Detection rate - ratio between the number of correctly detected attacks and the total number of attacks False alarm (false positive) rate - ratio between the number of normal connections that are incorrectly misclassified as attacks (False Alarms in Table) and the total number of normal connections Trade-off between detection rate and false alarm rate Performance (Processing speed + propagation + reaction) Fault Tolerance (resistant to attacks, recovery, resist subversion) Data

DARPA Data Set The DARPA evaluation data set has been made available by MIT Lincoln Laboratory under DARPA sponsorship (Lippmann & Cunningham, 1999). To the date, there are three data sets available for evaluation, DARPA 98, 99 and 2000 (DARPA 1998 data). Each recent data set contains new attacks. A sub set of the DARPA intrusion detection data set is used for off-line analysis. In the DARPA intrusion detection evaluation program, an environment was set up to acquire raw TCP/IP dump data for a network by simulating a typical U.S. Air Force LAN. The LAN was operated like a real environment, but being blasted with multiple attacks (Kendall, 1998) (Webster, 1998). For each TCP/IP connection, 41 various quantitative and qualitative features were extracted (Lee & Stolfo, 1998). The 41 features extracted fall into three categories, intrinsic features that describe about the individual TCP/IP connections; can be obtained from network audit trails, content-based features that describe about payload of the network packet; can be obtained from the data portion of the network packet, traffic-based features, that are computed using a specific window (connection time or no of connections). As DOS and Probe attacks involve several connections in a short time frame, whereas R2U and U2R attacks are embedded in the data portions of the connection and often involve just a single connection; traffic-based features play an important role in deciding whether a particular network activity is engaged in probing or not. Attack types fall into four main categories: 1. Denial-of-Service (DOS): deny legitimate requests to a system. 2. Remote-to-Local (R2L): unauthorized access from a remote machine. 3. User-to-Root (U2R): unauthorized access to local super user (root) privileges. 4. Probing: surveillance and information gathering attacks. Network based data is provided in tcmdump files and host based data in BSM audit log files. 1. PROPOSED SCHEME In this research work, we propose a scheme for network intrusion detection using 5 SVM s for each respective class of data. During the proposed approach, SVM-RFE (Guyon, 2002) is first employed for feature selection purpose. Later, the 5 SVM s are then employed for training. The proposed approach is applied to build a network intrusion detector, a predictive model capable of distinguishing between bad connections, called intrusions or attacks, and good normal connections. The dataset used in this study is obtained from KDDcup99, which was held in conjuction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining (Lee et al., 1998). The dataset is very huge and unbalanced in nature, with 41 features, which gives details about basic TCP features and time and window based features. Table 1 presents the attribute information about the data analyzed in this research study. There are approximately 5 million training records and 0.5 million testing records. Both training and testing data subsets cover four major attack categories, viz. Probing/Scanning, DoS, User-to-Root and Remote-to- Local. Table 2 presents these attack categories based on the attributes. Attribute Selection has been done using SVM. First of all, using SVM we get the ranks of all the attributes, then a threshold limit is set. All the ranks of the attributes which fall above the threshold are taken. A new dataset is prepared using the reduced attributes and this dataset is employed for training. By

following this procedure and using various threshold limits,22 features have been selected which give almost the same accuracy and sensitivity as the 41 attributes dataset. By using the 22 features dataset, the testing time, training time and also the memory space reduces to a greater extent Table 1. Attribute information of the dataset analyzed No Features No Features No Features 1. Duration 15 su_attempted 29 same_srv_rate 2. protocol_type 16 num_root 30 diff_srv_rate 3. service 17 num_file_creations 31 srv_diff_host_rate 4. Flag 18 num_shells 32 dst_host_count 5. src_bytes 19 num_access_files 33 dst_host_srv_count 6. dst_bytes 20 num_outbound_cmds 34 dst_host_same_srv_rate 7. Land 21 is_host_login 35 dst_host_diff_srv_rate 8. wrong_fragment 22 is_guest_login 36 dst_host_same_src_port_rate 9. urgent 23 count 37 dst_host_srv_diff_host_rate 10. hot 24 srv_count 38 dst_host_serror_rate 11. num_failed_logins 25 serror_rate 39 dst_host_srv_serror_rate 12. logged_in 26 srv_serror_rate 40 dst_host_rerror_rate 13. num_compromised 27 rerror_rate 41 dst_host_srv_rerror_rate 14. root_shell 28 srv_rerror_rate 29 PROPOSED 5 CLASS SVM INTRUSION DETECTION ARCHITECTURE SVM1(normal) Internet Firewall Network Data Pre- Processor SVM2(DoS) SVM3(R2L) SVM4(U2R) Flag? Server SVM5(Probe) System Administrator

3.1 DATA PREPROCESSING As the dataset under consideration is very large with.5 million training records and 0.05 million testing records, it is observed that there are so many records appearing more than once. We considered these kinds of records as duplicate records, which add the redundancy to the data. Over training is the result of the machine learning approach when it is trained using redundant data, which degrades the performance of the underdeveloped model. We removed the redundant records from training data and test data as well. After removing redundant records from the actual data, the training data records are reduced to 0.145 million and test dataset is reduced to.077 million records. This modified data is then used for detecting network intrusions.

Table 2. Transformation of the classes Attack types Class Class Attack types Class Class Normal normal normal multihop apache2 Phf Back ftp_write Land guess_passwd mailbomb Spy neptune snmpattack Pod DoS snmpguess processtable sendmail R2L Smurf Named teardrop warezclient udpstorm attack warezmaster attack buffer_overflow Worm loadmodule Xlock Ps Xsnoop Perl Mscan Rootkit U2R Nmap Xterm Ipsweep Httptunnel Portsweep Probe Sqlattack Saint Imap Satan Ms Briefly Explained SV Support vector machines, or SVMs, are learning machines that place the training vectors in highdimensional feature space, labeling each vector by its class. SVMs classify data by determining a set of vectors from the training set, called support vectors, which outlines a hyper plane in the feature space. SVMs provide a generic mechanism to fit the surface of the hyper plane to the data through the use of a kernel function. The user may provide a function (e.g., linear, polynomial, or sigmoid) to the SVMs during the training process, which selects support vectors along the surface of this function. The number of free parameters used in the SVMs depends on the margin that separates the two classes but not on the number of input features, thus SVMs do not require a reduction in the number of features in order to avoid over fitting--an apparent advantage in applications such as intrusion detection. Another primary advantage of SVMs is the low expected probability of generalization errors. There are other reasons that we use SVMs for intrusion detection. The first is speed: as realtimeperformance is of primary importance to IDSs, any classifier that can potentially run fast is worth considering. The second reason is scalability: SVMs are relatively insensitive to the number of data points and the classification complexity does not depend on the dimensionality of the feature space, so they can

potentially learn a larger set of patterns and thus be able to scale better than neural networks. Finally, SVMs give highly accurate classification of the patterns. Final results REDUCED ATTRIBUTES(22) AFTER USING SVM No Features No Features No Features 1. Land 9 Srv_serror_rate 17 Serror_rate 2. Num_failed_logins 10 Dst_host_srv_count 18 Dst_host_rerror_rate 3. Dst_host_srv_diff_host_rate 11 Dst_host_same_srv_rate 19 Logged_in 4. Flag 12 Num_compromised 20 Is_guest_login 5. Dst_host_srv_serror_rate 13 Srv_count 21 Su_attempted 6. Dst_host_diff_srv_rate 14 duration 22 Root_shell 7. service 15 Dst_host_count 8. Srv_diff_host_rate 16 Srv_serror_rate Table 3 :Performance of 5 SVM s using 41 Features Class Accuracy Normal 92.26 DoS 95.67 Probe 99.46 R2L 94.86 U2R 99.9 Table 4. Results obtained using SVM. Classifier Full features(41 features) Reduced features(22 features) Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity SVM 92.26 90.54 99.82 90.34 88.16 99.83 The results presented in this report are not strictly comparable to (Chen et al., 2009), because they carried on with original data without removing the redundant records during preprocessing. The accuracy obtained by our proposed approach with reduced features and SVM as a classifier is 90.34% with 88.16% sensitivity and 99.83% Specificity. (Chen et al., 2009) reported the accuracy of 89.13%, sensitivity of 86.72% and specificity of XX%, where they also trained SVM using reduced feature data. It is observed that the results obtained by our proposed approach are equally well compared to (Chen et al., 2009) approach.