Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Similar documents
ROC in Assessing IDS Quality

Enhancing Byte-Level Network Intrusion Detection Signatures with Context

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

Pyrite or gold? It takes more than a pick and shovel

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

NOTHING IS WHAT IT SIEMs: COVER PAGE. Simpler Way to Effective Threat Management TEMPLATE. Dan Pitman Principal Security Architect

Semi-supervised Learning for False Alarm Reduction

Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Intrusion Detection Datasets

CSE 565 Computer Security Fall 2018

Generating Representative Traffic for Intrusion Detection System Benchmarking

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

Securing Mediated Trace Access Using Black-box Permutation Analysis

Intrusion Detection and Malware Analysis

The NIDS Cluster: Scalable, Stateful Network Intrusion Detection on Commodity Hardware

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Considerations and Pitfalls for Conducting Intrusion Detection Research

CSE543 - Computer and Network Security Module: Intrusion Detection

Application of the Generic Feature Selection Measure in Detection of Web Attacks

Intrusion Detection Using Data Mining Technique (Classification)

MACHINE LEARNING & INTRUSION DETECTION: HYPE OR REALITY?

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

CLASSIFICATION OF ARTIFICIAL INTELLIGENCE IDS FOR SMURF ATTACK

Statistical based Approach for Packet Classification

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Payoff Based IDS Evaluation

The DETER Testbed: Overview 25 August 2004

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

Using Threat Analytics to Protect Privileged Access and Prevent Breaches

Automated Website Fingerprinting through Deep Learning

Denial of Service (DoS) Attack Detection by Using Fuzzy Logic over Network Flows

Intrusion Detection Systems

Operational Experiences With High-Volume Network Intrusion Detection

CSCI 5417 Information Retrieval Systems. Jim Martin!

Introduction Challenges with using ML Guidelines for using ML Conclusions

A Hybrid Approach for Misbehavior Detection in Wireless Ad-Hoc Networks

AUTOMATED SECURITY ASSESSMENT AND MANAGEMENT OF THE ELECTRIC POWER GRID

Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks. Anna Giannakou, Daniel Gunter, Sean Peisert

CSE543 - Computer and Network Security Module: Intrusion Detection

Malicious Activity and Risky Behavior in Residential Networks

Review on Data Mining Techniques for Intrusion Detection System

CE Advanced Network Security

Network traffic characterization

Network traffic characterization. A historical perspective

Specification-based Intrusion Detection. Michael May CIS-700 Fall 2004

A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection

Use of Synthetic Data in Testing Administrative Records Systems

Activating Intrusion Prevention Service

Overview Intrusion Detection Systems and Practices

Recent Advances in Network Intrusion Detection System Tuning

Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow

Bayesian Learning Networks Approach to Cybercrime Detection

2 The IBM Data Governance Unified Process

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management *

This shows a typical architecture that enterprises use to secure their networks: The network is divided into a number of segments Firewalls restrict

Introduction and Statement of the Problem

An Autonomic Framework for Integrating Security and Quality of Service Support in Databases

Deep Learning Approach to Network Intrusion Detection

Automatic Identification of User Goals in Web Search [WWW 05]

Automated Network Anomaly Detection with Learning and QoS Mitigation. PhD Dissertation Proposal by Dennis Ippoliti

ECE 669 Parallel Computer Architecture

Basic Concepts in Intrusion Detection

// The Value of a Standard Schedule Quality Index

Improving Your Network Defense. Joel M Snyder Senior Partner Opus One

Detecting Credential Spearphishing Attacks in Enterprise Settings

CS395/495 Computer Security Project #2

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

CS Review. Prof. Clarkson Spring 2017

Adaptive Supersampling Using Machine Learning Techniques

ART 알고리즘특강자료 ( 응용 01)

Mahalanobis Distance Map Approach for Anomaly Detection

ANOMALY DETECTION IN COMMUNICTION NETWORKS

NETWORK INTRUSION DATASETS USED IN NETWORK SECURITY EDUCATION

Wireless Clients and Users Monitoring Overview

Voice Analysis for Mobile Networks

Packet-Level Diversity From Theory to Practice: An based Experimental Investigation

Diversified Intrusion Detection with Various Detection Methodologies Using Sensor Fusion

Data Sources for Cyber Security Research

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

HOW TO CHOOSE A NEXT-GENERATION WEB APPLICATION FIREWALL

Machine Learning and Bioinformatics 機器學習與生物資訊學

A Graphical User Interface Framework for Detecting Intrusions using Bro IDS

Visualization of Internet Traffic Features

Network Performance: Queuing

Information Retrieval

The Bro Cluster The Bro Cluster

Multiple Classifier Fusion With Cuttlefish Algorithm Based Feature Selection

Why You Should Consider a Hardware Based Protocol Analyzer?

Carnegie Mellon University Multimedia Databases & Data Mining Spring C. Faloutsos. Project Description: Dense Subtensor Mining using SQL

Artificial Immune System against Viral Attack

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

Developing the Sensor Capability in Cyber Security

A Defense-Centric Attack Taxonomy

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

A Knowledge-based Alert Evaluation and Security Decision Support Framework 1

The SD-WAN security guide

Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

THE CYBERSECURITY LITERACY CONFIDENCE GAP

Transcription:

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Prateek Saxena March 3 2008 1 The Problems Today s lecture is on the discussion of the critique on 1998 and 1999 DARPA IDS evaluations conducted by MIT Lincoln Laboratory. The lecture highlights some of the problems with carrying out such evaluations for IDSs, why this is made hard by real-world problems and finally, concludes with a discussion of our many common IDS evaluation pitfalls which apply more broadly to systems research than just to IDS evaluation. There are 3 deep problems, which emerge after reading the paper. Authentic Traffic/Activity : There is no such thing as authentic or typical traffic. Studies of network activity shows that there are a lot of variations from site-to-site, across locations, etc. The large inherent diversity in operations across the Internet implies that artificial synthesis of so called authentic traffic must be treated skeptically. Desirability of black-box testing (and ranking) Funding agencies often define aggressive goals in terms of metrics to achieve. Metric based numbers are needed to justify the research expenditure. To explain the crucial benefits of a system to someone outside the area, tends to be hard. One has to sometimes report highlights that are easier to understand and sell, rather than technical intricacies that may be really meaningful. These problems arise due to the desire to rank systems based on black-box view of their operation, rather than based on its model of operation. Great need for illumination/insight into the IDS s mechanism. If you want to rank, you can not do it using black-box views. You need to have model based understanding of the capabilities. There isn t a single good benchmark that is conclusively best and can be used for evaluation. Therefore, there needs to be insight based studies into the IDS being evaluated to decide its effectiveness. 2 Problems with evaluation studies in systems CS Little emphasis on reproducibility - Unlike in other matured sciences, credit is seldom given to works that reproduce the result previously reported. Several works do not have rigor/discipline built in the evaluation process sometimes even the original authors can not reproduce the work done by them previously. 1

Evaluation Data is unpublished - This is largely due to tension between need to reveal full structure of data and maintaining privacy and anonymity. Often are expensive undertakings. 3 Follow Up work for supporting such large-scale system evaluations LARIAT : Follow-Up work on 1999 IDS evaluation by MIT Lincoln Laboratory. Describes machinery to synthesize sites and has components that try to generate realistic background user traffic and network attacks. It was published at IEEE Aerospace Conference, March 9-16, 2002. It is useful to know about some other efforts to make such evaluations possible : DETER - Testbed for network security technology. Public facility for medium-scale repeatable experiments in computer security Located at USC ISI and UC Berkeley. 300 PC systems running Utah s Emulab software. Experimenter can access DETER remotely to develop, configure, and manipulate collections of nodes and links with arbitrary network topologies. Problem with this is currently that there isn t realistic attack module or background noise generator plugin for the framework. Attack distribution is a problem. PREDICT - Its a huge trace repository. It is not public and there are several legal issues in working with it. KDD Cup - Its goal is to provide data-sets from real world problems to demonstrate the applicability of different knowledge discovery and machine learning techniques. The 1999 KDD intrusion detection contest uses a labelled version of this 1998 DARPA dataset, annotated with connection features. There are several problems [1] with KDD Cup. Recently, people have found average TCP packet sizes as best correlation metrics for attacks, which is clearly points out the inefficacy of using this dataset. A question was raised in class: Why isn t Bro there in the study? Primary reason: as a research system, it lacks a comprehensive set of detectors (signatures, application protoocls) because the emphasis is on building a framework and understanding the underlying issues, rather than coverage. (McHugh mentions this concern in his paper.) Its detection approach also does not fall into the 2 categories, though the LL folks were fine with this (and in fact encouraged adding it as a way to illuminate what other approaches can do). Finally, there was the perceived risk of getting classified for life based on a contrived benchmark result. 2

Figure 1: The Basic idea of ROC curve : A plot of the True Positives against the False Positives 4 Problems with Taxonomy Descriptions in the Evaluation The taxonomy used to classify attacks used in the 1998 evaluation : DOS, R2L, U2R, Surveillance. The problems with using such a taxonomy is : Surveillance is not a well-defined term. This is an attacker-centric taxonomy. Its based on how costly it is to defend against the classes of attacks. It is deficient in giving any insight into what class of attacks is the system targeting to detect. The critique proposes to use an alternaive, insight based taxonomy. For example, one such taxonomy could be based on how the IDS treats the incoming traffic. Does the IDS look at network layer headers or only application data? If it does not deal with IP header data, and works only for application-specific attacks, then it is obvious that it does not deal with IP layer attacks. This is perhaps a more principled way to analyze different IDS than the strategy used in the evaluation paper. 5 Problems with using ROC ROC stands Receiver Operating Curve. 3

The basic idea - Every IDS has knobs which are parameters the user can use to control IDS behavior. Tuning the knobs yields a curve on the TP vs FP graph as shown in the Fig 1. If a curve for an IDS A, strictly dominates the curve for an IDS B, then intuitively A is better. 5.1 Core Problems with the ROC used in 1998 evaluation Denominator and Unit of analysis - Use of log Scale false alarms per day on y-axis, and the number of attacks detected on the x-axis. The measurement of false positives should be based on a per-event basis, not on per-day basis. What events could we use? It could be based on a unit of analysis that the specif IDS uses. If the IDS uses a packet-based analysis (such as Snort) then packets should be the unit of analysis. Similarly, if sessions are the unit of analysis (such as in Bro), or time-windows (as in anomaly based IDS), then corresponding units should be used as a basis for analysis. There are other problems with using such an evaluation technique : The view that IDS returns a boolean answer to the question Is instance A an attack? is true only for some systems. How do we accomodate these in the analysis? How do the results of an evaluation transfer to those on another environment? It is hard to compare IDSes with different knobs. It is important to characterize specifics of IDS using models of the detection, so that one can take away sound conclusions from the paper. Its hard to get IDS models for commercial IDS To illustrate an example of reporting comparisons with IDSes, consider the performance comparison between Bro, Snort, Snort (after improvements in Snort). Due to diversity across measurement platforms, the performance figures reported differed considerably from a Pentium IV based test platform, to a Pentium III based test platform. The relative ordering of performance was same on the two test platform Snort > Bro > Snort, but Bro got slower on Pentium III than on Pentium IV, while Snort became faster, thereby narrowing the gap between the two systems. See [2] for details. 6 Common IDS research pitfalls Machine learning based IDS often fail to explore why their system classifies a certain metric. It is often useful to report the internal co-efficients of the resulting support vector machines or resulting neural network, to explain which factors contribute to the result heavily. Use of data : The evaluation data traces used must be different from the development traces used. This is analogous to using different test data from training data in machine learning based techniques. Ideally data used in evaluation should be from multiple sites accessed multiple times. This is often violated and data is not aggregated into one before evaluation. Data used does not often meet obvious requirements How diverse is the data? This implies how well do the results scale. 4

Quality of data? Details such as the correctness of timestamps in traces are often important. Is the metadata accompanying also stored. Determines reproducibility for instance, if you don t store the IP to hosts mappings at the time of the experiments, then the data may not be reproducible later. Other artifacts - Does the innocuous data have malicious activity? False positives are a major concern, and a system needs to investigate these if they arise. Techniques for how to deal with false positives, specially if they are in large number : Randomly sample and pick one often a good strategy. Prioritize the false positive list, and proceed in highest priority first order. False Negatives can be measured by taking a large labelled dataset. References [1] M. Mahoney and P. Chan. An analysis of the 1999 darpa/lincoln laboratory evaluation data for network anomaly detection. In Proceeding of Recent Advances in Intrusion Detection (RAID)- 2003, volume 2820 of Lecture Notes in Computer Science, pages 220 237. Springer Verlag, September 8-10 2003. [2] Robin Sommer and Vern Paxson. Enhancing byte-level network intrusion detection signatures with context. In CCS 03: Proceedings of the 10th ACM conference on Computer and communications security, pages 262 271. ACM Press, 2003. 5