Neural Networks in Statistical Anomaly Intrusion Detection

Similar documents
Machine Learning 9. week

User Authentication Based On Behavioral Mouse Dynamics Biometrics

An Entropy-Based Approach to Integrated Information Needs Assessment

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Lecture 5: Multilayer Perceptrons

A Binarization Algorithm specialized on Document Images and Photos

Classifying Acoustic Transient Signals Using Artificial Intelligence

Cluster Analysis of Electrical Behavior

Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

S1 Note. Basis functions.

A Self-Learning Network Anomaly Detection System using Majority Voting

y and the total sum of

X- Chart Using ANOM Approach

A Perceptron based Classifier for Detecting Malicious Route Floods in Wireless Mesh Networks

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Load Balancing for Hex-Cell Interconnection Network

Edge Detection in Noisy Images Using the Support Vector Machines

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Wishing you all a Total Quality New Year!

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Classifier Selection Based on Data Complexity Measures *

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Concurrent Apriori Data Mining Algorithms

A Background Subtraction for a Vision-based User Interface *

A Modified Median Filter for the Removal of Impulse Noise Based on the Support Vector Machines

Optimal Design of Nonlinear Fuzzy Model by Means of Independent Fuzzy Scatter Partition

Data Mining: Model Evaluation

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

TN348: Openlab Module - Colocalization

An Optimal Algorithm for Prufer Codes *

Abstract. 1. Introduction

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Virtual Machine Migration based on Trust Measurement of Computer Node

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Simulation Based Analysis of FAST TCP using OMNET++

Evolutionary Wavelet Neural Network for Large Scale Function Estimation in Optimization

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Local Quaternary Patterns and Feature Local Quaternary Patterns

Related-Mode Attacks on CTR Encryption Mode

Adaptive Silhouette Extraction and Human Tracking in Dynamic. Environments 1

A Statistical Model Selection Strategy Applied to Neural Networks

Smoothing Spline ANOVA for variable screening

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Support Vector Machines

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Avoiding congestion through dynamic load control

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Training ANFIS Structure with Modified PSO Algorithm

Overview. Basic Setup [9] Motivation and Tasks. Modularization 2008/2/20 IMPROVED COVERAGE CONTROL USING ONLY LOCAL INFORMATION

Machine Learning: Algorithms and Applications

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Adaptive Silhouette Extraction In Dynamic Environments Using Fuzzy Logic. Xi Chen, Zhihai He, James M. Keller, Derek Anderson, and Marjorie Skubic

A Topology-aware Random Walk

An Approach for Building Intrusion Detection System by Using Data Mining Techniques

Using Neural Networks and Support Vector Machines in Data Mining

Study on Fuzzy Models of Wind Turbine Power Curve

An Ensemble Learning algorithm for Blind Signal Separation Problem

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Optimal Workload-based Weighted Wavelet Synopses

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

Detecting Compounded Anomalous SNMP Situations Using Cooperative Unsupervised Pattern Recognition

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

APPLICATION OF PREDICTION-BASED PARTICLE FILTERS FOR TELEOPERATIONS OVER THE INTERNET

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Convolutional Neural Network- based Human Recognition for Vision Occupancy Sensors

Fusion Performance Model for Distributed Tracking and Classification

Fast Feature Value Searching for Face Detection

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Reducing Frame Rate for Object Tracking

Face Detection with Deep Learning

Specifications in 2001

Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm *

A Hybrid Data Mining based Intrusion Detection System for Wireless Local Area Networks

Meta-heuristics for Multidimensional Knapsack Problems

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

AN APPLICATION OF THE TCRBF NEURAL NETWORK IN MULTI-NODE FAULT DIAGNOSIS METHOD

CAN COMPUTERS LEARN FASTER? Seyda Ertekin Computer Science & Engineering The Pennsylvania State University

Problem Set 3 Solutions

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Application of Relevance Vector Machines in Real Time Intrusion Detection

Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark

Improving anti-spam filtering, based on Naive Bayesian and neural networks in multi-agent filters

Learning-based License Plate Detection on Edge Features

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Optimizing Document Scoring for Query Retrieval

An IPv6-Oriented IDS Framework and Solutions of Two Problems

High-Boost Mesh Filtering for 3-D Shape Enhancement

Transcription:

Neural Networks n Statstcal Anomaly Intruson Detecton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech., Unversty Heghts, Newark, NJ 72, USA Department of Mathematcs, CUNY, Convent Ave. at 38 ST., New York, NY 3, USA Network Securty Solutons, 5 Independence Blvd. 3 rd FL., Warren, NJ 759, USA Abstract: - In ths paper, we report on experments n whch we used neural networks for statstcal anomaly ntruson detecton systems. The fve types of neural networks that we studed were: Perceptron; Backpropagaton; Perceptron- Backpropagaton-Hybrd; Fuzzy ARTMAP; and Radal-Based Functon. We collected four separate data sets from dfferent smulaton scenaros, and these data sets were used to test varous neural networks wth dfferent hdden neurons. Our results showed that the classfcaton capabltes of BP and PBH outperform those of other neural networks. Key-Words: - Securty, Intruson Detecton, Statstcal Anomaly Detecton, Neural Network Classfcaton, Perceptron, Backpropagaton, Perceptron-Backpropagaton-Hybrd, Fuzzy ARTMAP, Radal-Based Functon Introducton The ubquty of the Internet poses serous concerns on the securty of computer nfrastructures and the ntegrty of senstve data. Network ntruson detecton s a very effcent approach to protect networks and computers from malcous network-based attacks. The basc assumpton of ntruson detecton s that an ntruder's behavor wll be notceably dfferent from that of legtmate users. Intruson detecton technques can be parttoned nto two complementary trends: msuse detecton, and anomaly detecton. Msuse detecton systems, such as [][2], model the known attacks and scan the system for the occurrences of these patterns. Anomaly detecton systems, such as [3] [7], flag ntrusons by observng sgnfcant devatons from typcal or expected behavor of the systems or users. Statstcal Modelng and Neural Networks are wdely appled n buldng anomaly ntruson detecton systems. For example, NIDES [3] represents user or system behavors by a set of statstcal varables and detects the devaton between the observed and the standard actvtes. A system, whch dentfes ntrusons usng packet flterng and neural networks, was ntroduced n [4]. The work of Ghosh et al [7] studed the employment of neural networks to detect anomalous and unknown ntrusons aganst a software system. In [8], we presented the prototype of a herarchcal anomaly network ntruson detecton system that uses statstcal models and neural networks to detect attacks. As the kernels of many anomaly IDS, neural networks have profound mpacts on the system performance and effcency, but lttle research has been completed whch compares the output of neural networks as appled to IDS problems. In ths paper, we present our experments concernng the performances of fve dfferent types of neural networks. Secton 2 ntroduces the statstcal model that we are usng. Secton 3 descrbes the neural networks we tested. In Secton 4, we report the test bed and the attack schemes we smulated. Some expermental results are also presented n that secton. Secton 5 draws some conclusons and outlnes future work. 2 Statstcal Model Statstcs have been used n anomaly ntruson detecton systems [3]; however, most of these systems smply measure the means and the varances of some varables and detect whether certan thresholds are exceeded. SRI s NIDES [5][3] developed a more sophstcated statstcal algorthm by usng a χ 2 -lke test to measure the smlarty between short-term and long-term profles. Our current statstcal model uses a smlar algorthm as NIDES but wth major modfcatons. Therefore, we wll frst brefly ntroduce some basc nformaton about the NIDES statstcal algorthm.

In NIDES, user profles are represented by a number of probablty densty functons. Let S be the sample space of a random varable and events E E,..., E, 2 k a mutually exclusve partton of S. Assume p s the expected probablty of the occurrence of the event E, and let p be the frequency of the occurrence of ' durng a gven tme nterval. Let N denote the total number of occurrences. NIDES statstcal algorthm used a χ 2 -lke test to determne the smlarty between the expected and actual dstrbutons through the statstc: k ' 2 ( p p ) Q = N p = When N s large and the events E,..., E, E2 Ek are ndependent, Q approxmately follows a χ 2 dstrbuton wth ( k ) degrees of freedom. However n a real-tme applcaton the above two assumptons generally cannot be guaranteed, thus emprcally Q may not follow a χ 2 dstrbuton. NIDES solved ths problem by buldng an emprcal probablty dstrbuton for Q whch s updated daly n a realtme operaton. In our system, snce we are usng neural networks to dentfy possble ntrusons, we are not so concerned wth the actual dstrbuton of Q. However, because network traffc s not statonary and network-based attacks may have dfferent tme duratons, varyng from a couple of seconds to several hours, we need an algorthm whch s capable of effcently montorng network traffc wth dfferent tme wndows. Based on the above observatons, we used a layer-wndow statstcal model, Fg., wth each layer-wndow correspondng to one dfferent detecton tme slce. The newly arrved events wll frst be stored n the event buffer of layer. The stored events wll be compared wth the reference model of that layer and the results are fed nto neural networks to detect the network status durng that tme wndow. The event buffer wll be empted once t becomes full, and the stored events wll be averaged and forwarded to the event buffer of layer 2. The same process wll be repeated recursvely untl t arrves at the top level where the events wll smply be dropped after processng. The smlarty-measurng algorthm that we are usng s shown below: Q = f ( N).[ k = -Wndow M -Wndow 2 -Wndow p p + ' k max = Event Buffer... Event Buffer Event Buffer Event Report ( p ' Fg. Statstcal Model p )] Reference Model Reference Model Reference Model where f (N) s a functon that takes nto account the total number of occurrences durng a tme wndow. Besdes smlarty measurements, we also desgned an algorthm for the real-tme updatng of the reference model. Let p old be the reference model before updatng, p new be the reference model after updatng, and p obs be the observed user actvty wthn a tme wndow. The formula to update the reference model s p new = s α p obs + ( s α) p old n whch α s the predefned adaptaton rate and s s the value generated by the output of the neural network. Assume that the output of the neural network s a contnuous varable t between and, where means ntruson wth absolute certanty and means no ntruson agan wth complete confdence. In between, the values of t ndcate proportonate levels of certanty. The functon for calculatng s s t, f t s =, otherwse Through the above equatons, we ensured that the reference model would be updated actvely for normal traffc whle kept unchanged when attacks occurred. The attack events wll be dverted and stored, for us as attack scrpts, n neural network learnng. 3 Neural Networks The neural networks are wdely consdered as an effcent approach to adaptvely classfy patterns, but the hgh computaton ntensty and the long tranng

cycles greatly hnder ther applcatons. In [4][7], BP neural networks were used to detect anomalous user actvtes. In [8], we deployed a hybrd neural network paradgm [6], called perceptron-backpropagatonhybrd (or PBH) network, whch s a superposton of a perceptron and a small backpropagaton network. In order to comprehensvely nvestgate the performances of neural networks, we examned fve dfferent types of neural networks: Perceptron, BP, PBH, Fuzzy ART MAP and RBF. The perceptron [9], Fg. 2, s the smplest form of a neural network used for the classfcaton of lnearly separable patterns. It conssts of a sngle neuron wth adjustable synapses and threshold. Although our data sets wll not, n general, be lnearly separable, we are usng perceptron as a baselne to measure the performances of other neural networks. x x 2 x N- x N s Threshold θ y small backpropagaton network. PBH networks are capable of explorng both lnear and nonlnear correlatons between the nput stmulus vectors and the output values. We tested PBH networks wth the number of hdden neurons rangng from to 8. Hdden Fg. 4 PBH archtecture Fuzzy ARTMAP [] n ts most general form s a system of two Fuzzy ART networks ART a and ART b whose F2 layers are connected by a subsystem referred to as a match trackng system. We are usng a smplfed verson of Fuzzy ARTMAP [], Fg. 5, whch s mplemented for classfcaton problems. We tested ARTMAP networks wth the number of category neurons rangng from 2 to 8. Fg. 2 Perceptron archtecture Error Sgnal The Backpropagaton network [9], or BP, Fg. 3, s a multplayer feedforward network, whch contans an nput layer, one or more hdden layers, and an output layer. BP s have strong generalzaton capabltes and have been appled successfully to solve some dffcult and dverse problems. We tested BP networks wth the number of hdden neurons rangng from 2 to 8. x x 2 C C 2 Fuzzy ART x P- x P C 2P Complement Catergory Hdden Fg. 3 BP archtecture Perceptron-backpropagaton hybrd network [6], or PBH, Fg. 4, s a superposton of a perceptron and a Fg. 5 Fuzzy ARTMAP archtecture Radal-bass functon network [9], or RBF, Fg. 6, nvolves three entrely dfferent layers. The nput layer s made up of source nodes. The second layer s a hdden layer of hgh enough dmenson, whch serves a dfferent purpose from that n a BP network. The output layer supples the response of the network to the actvaton patterns appled to the nput layer. We tested RBF networks wth hdden neurons rangng from 2 to 8.

x x 2 G G Typcal Traffc Attack Traffc Scenaro 6kbps 5kbps Scenaro 2 6kbps kbps Scenaro 3 2Mbps 5kbps Scenaro 4 2Mbps kbps Table Traffc Loads of The Four Smulato Scenaros x P- x P G Hdden of Green's Functons Fg. 6 RBF archtecture In our experments, we used NeuralWorks Professonal II/PLUS to buld all of the neural networks depcted above. 4 Expermental Results In ths secton, we wll present our smulaton approach and the results n applyng our statstcal models and the dfferent neural networks to detect network-based attacks. Frst the testbed confguraton and the smulaton specfcatons wll be ntroduced n subsecton 4., and then subsecton 4.2 reports the testng results. 4. Testbed We used a vrtual network usng smulaton tools to generate attack scenaros. The expermental testbed that we bult usng OPNET, a powerful network smulaton faclty, s shown n Fg. 7. The testbed s a -BaseX LAN that conssts of workstatons and server. Fg. 7 Smulaton Testbed We smulated the udp floodng attack wthn the testbed. To extensvely test the performances of neural networks, we ran four ndependent scenaros wth dfferent typcal traffc loads and attack traffc. Table lsts the traffc loads of the smulaton scenaros. 4.2 Results For each smulaton scenaro, we collected, records of network traffc. We dvded these data nto two separate sets, one set of 6 data for tranng and the other of 4 data for testng. In each scenaro, the system was traned for epochs. We evaluated the performances of the neural networks based on the mean squared root errors and the msclassfcaton rates of the outputs. The msclassfcaton rate s defned as the percentage of the nputs that are msclassfed by neural networks durng one epoch, whch ncludes both false postve and false negatve msclassfcatons. In the rest of ths secton, we wll present and analyze the smulaton results of the neural networks one by one. 4.2. Perceptron The mean squared root errors and the msclassfcaton rates of the perceptrons wthn the four smulaton scenaros are tabulated n Table 2. MSR Error Msclass rate Scenaro.68564.6725 Scenaro 2.75895.22 Scenaro 3.738548.233889 Scenaro 4.635356.9444 Table 2 The smulaton results of perceptrons We can see that the perceptrons performed poorly n all four scenaros: Mean squared root errors are between.6 and.7; and msclassfcaton rates are between. and.2. Both the MSR errors and the msclassfcaton rates are unacceptably hgh for an IDS. 4.2.2 Fuzzy ARTMAP and RBF The results of Fuzzy ARTMAP and RBF nets are shown n Fg. 8 to Fg.. The x-axes of the fgures represent the number of category neurons n Fuzzy ARTMAP and the hdden neurons n RBF. The y-axes represent the lowest Mean Squared Root Errors and the lowest Msclassfcaton Rates that these neural nets acheved wthn the epochs.

.9.8 scenaro 2.5.45.4 scenaro 2.7.35 MSR Error.6.5.4.3 Msclassfcaton Rate.3.25.2.5.2...5 # of category neurons Fg. 8 MSR errors of Fuzzy ARTMAP # of hdden neurons Fg. Msclassfcaton rates of RBF Msclassfcaton Rate.5.45.4.35.3.25.2.5. scenaro 2 From the above fgures, we can see that, as the number of hdden neurons ncreases, the performances of both ARTMAP and RBF networks mprove. In most of the cases, both of them outperformed perceptrons. 4.2.3 BP and PBH The results of BP nets are llustrated from Fg. 2 to Fg. 5..2.5.8 # of category neurons Fg. 9 Msclassfcaton rates of Fuzzy ARTMAP.9.8.7 scenaro 2 MSR Error.6.4.2..8.6.4.2 scenaro 2 MSR Error.6.5.4.3.2 # of hdden neurons.9 Fg. 2 MSR errors of BP..8 # of hdden neurons Fg. MSR errors of RBF Msclassfcaton Rate.7.6.5.4.3.2 scenaro 2. # of hdden neurons Fg. 3 Msclassfcaton rates of BP

MSR Error.2.8.6.4.2..8 scenaro 2 PBH are more desrable for statstcal anomaly ntruson detecton systems. Acknowledgements Our research was partally supported by a Phase I SBIR contract wth US Army. We would also lke to thank OPNET Technologes, Inc. TM, for provdng the OPNET smulaton software. Msclassfcaton Rate.6.4.2 # of hdden neurons.9.8.7.6.5.4.3.2. Fg. 4 MSR errors of PBH # of hdden neurons scenaro 2 Fg. 5 Msclassfcaton rates of PBH The fgures ndcate that BP and PBH networks have smlar performances, and that both neural networks consstently perform better than the other three types of neural networks. The curves n these fgures are flat: the MSR errors and msclassfcaton rates do not decrease as the number of hdden neurons ncreases. We beleve the reason s that, because we only deployed one attackng technque, UDP floodng attack, n our smulatons, our data sets are too smple for BP and PBH. In the future, we wll ncorporate more Denal-of-Servce attackng technques nto our smulaton, thus provdng addtonal tests, and possbly greater challenges, for the neural networks under consderaton. 5 Conclusons In ths paper, we descrbed our experments of testng dfferent neural networks for statstcal anomaly ntruson detecton. The results showed that BP and PBH nets outperform Perceptron, Fuzzy ARTMAP and RBF. Thus, classfcaton capabltes of BP and References: [] G. Vgna, R. A. Kemmerer, NetSTAT: a network-based Intruson Detecton Approach, Proceedngs of 4 th Annual Computer Securty Applcatons Conference, 998, pp. 25 34. [2] W. Lee, S. J. Stolfo, K. Mok, A Data Mnng Framework for Buldng Intruson Detecton Models, Proceedngs of 999 IEEE Symposum of Securty and Prvacy, pp. 2-32. [3] A. Valdes, D. Anderson, Statstcal Methods for Computer Usage Anomaly Detecton Usng NIDES, Techncal report, SRI Internatonal, January 995. [4] J. M. Bonfaco, et al., Neural Networks Appled n Intruson Detecton System, IEEE, 998, pp. 25-2 [5] H. S. Javtz, A. Valdes, the NIDES Statstcal Component: Descrpton and Justfcaton, Techncal report, SRI Internatonal, March 993. [6] R. M. Dllon, C. N. Mankopoulos, Neural Net Nonlnear Predcton for Speech Data, IEEE Electroncs Letters, Vol. 27, Issue, May 99, pp. 824-826. [7] A.K. Ghosh, J. Wanken, F. Charron, Detectng Anomalous and Unknown Intrusons Aganst Programs, Proceedngs of IEEE 4th Annual Computer Securty Applcatons Conference, 998, pp. 259 267 [8] Z. Zhang, et al, A Herarchcal Anomaly Network Intruson Detecton System Usng Neural Network Classfcaton, to appear n Proceedngs of 2 WSES Internatonal Conference on: Neural Networks and Applcatons (NNA ), Feb. 2 [9] Smon Haykn, Neural Network A Comprehensve Foundaton, Macmllan College Publshng Company, 994 [] G.A. Carpenter, et al, Fuzzy ARTMAP: An adaptve resonance archtecture for ncremental learnng of analog maps, Internatonal Jont Conference on Neural Networks, June 992 [] NeuraWare Inc., Neural Computng A Technology Handbook for NeuralWorks Professonal II/PLUS and Neural Works Explorer, NeuralWare Inc., 998