Neural Networks in Statistical Anomaly Intrusion Detection

Neural Networks n Statstcal Anomaly Intruson Detecton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech., Unversty Heghts, Newark, NJ 72, USA Department of Mathematcs, CUNY, Convent Ave. at 38 ST., New York, NY 3, USA Network Securty Solutons, 5 Independence Blvd. 3 rd FL., Warren, NJ 759, USA Abstract: - In ths paper, we report on experments n whch we used neural networks for statstcal anomaly ntruson detecton systems. The fve types of neural networks that we studed were: Perceptron; Backpropagaton; Perceptron- Backpropagaton-Hybrd; Fuzzy ARTMAP; and Radal-Based Functon. We collected four separate data sets from dfferent smulaton scenaros, and these data sets were used to test varous neural networks wth dfferent hdden neurons. Our results showed that the classfcaton capabltes of BP and PBH outperform those of other neural networks. Key-Words: - Securty, Intruson Detecton, Statstcal Anomaly Detecton, Neural Network Classfcaton, Perceptron, Backpropagaton, Perceptron-Backpropagaton-Hybrd, Fuzzy ARTMAP, Radal-Based Functon Introducton The ubquty of the Internet poses serous concerns on the securty of computer nfrastructures and the ntegrty of senstve data. Network ntruson detecton s a very effcent approach to protect networks and computers from malcous network-based attacks. The basc assumpton of ntruson detecton s that an ntruder's behavor wll be notceably dfferent from that of legtmate users. Intruson detecton technques can be parttoned nto two complementary trends: msuse detecton, and anomaly detecton. Msuse detecton systems, such as [][2], model the known attacks and scan the system for the occurrences of these patterns. Anomaly detecton systems, such as [3] [7], flag ntrusons by observng sgnfcant devatons from typcal or expected behavor of the systems or users. Statstcal Modelng and Neural Networks are wdely appled n buldng anomaly ntruson detecton systems. For example, NIDES [3] represents user or system behavors by a set of statstcal varables and detects the devaton between the observed and the standard actvtes. A system, whch dentfes ntrusons usng packet flterng and neural networks, was ntroduced n [4]. The work of Ghosh et al [7] studed the employment of neural networks to detect anomalous and unknown ntrusons aganst a software system. In [8], we presented the prototype of a herarchcal anomaly network ntruson detecton system that uses statstcal models and neural networks to detect attacks. As the kernels of many anomaly IDS, neural networks have profound mpacts on the system performance and effcency, but lttle research has been completed whch compares the output of neural networks as appled to IDS problems. In ths paper, we present our experments concernng the performances of fve dfferent types of neural networks. Secton 2 ntroduces the statstcal model that we are usng. Secton 3 descrbes the neural networks we tested. In Secton 4, we report the test bed and the attack schemes we smulated. Some expermental results are also presented n that secton. Secton 5 draws some conclusons and outlnes future work. 2 Statstcal Model Statstcs have been used n anomaly ntruson detecton systems [3]; however, most of these systems smply measure the means and the varances of some varables and detect whether certan thresholds are exceeded. SRI s NIDES [5][3] developed a more sophstcated statstcal algorthm by usng a χ 2 -lke test to measure the smlarty between short-term and long-term profles. Our current statstcal model uses a smlar algorthm as NIDES but wth major modfcatons. Therefore, we wll frst brefly ntroduce some basc nformaton about the NIDES statstcal algorthm.

In NIDES, user profles are represented by a number of probablty densty functons. Let S be the sample space of a random varable and events E E,..., E, 2 k a mutually exclusve partton of S. Assume p s the expected probablty of the occurrence of the event E, and let p be the frequency of the occurrence of ' durng a gven tme nterval. Let N denote the total number of occurrences. NIDES statstcal algorthm used a χ 2 -lke test to determne the smlarty between the expected and actual dstrbutons through the statstc: k ' 2 ( p p ) Q = N p = When N s large and the events E,..., E, E2 Ek are ndependent, Q approxmately follows a χ 2 dstrbuton wth ( k ) degrees of freedom. However n a real-tme applcaton the above two assumptons generally cannot be guaranteed, thus emprcally Q may not follow a χ 2 dstrbuton. NIDES solved ths problem by buldng an emprcal probablty dstrbuton for Q whch s updated daly n a realtme operaton. In our system, snce we are usng neural networks to dentfy possble ntrusons, we are not so concerned wth the actual dstrbuton of Q. However, because network traffc s not statonary and network-based attacks may have dfferent tme duratons, varyng from a couple of seconds to several hours, we need an algorthm whch s capable of effcently montorng network traffc wth dfferent tme wndows. Based on the above observatons, we used a layer-wndow statstcal model, Fg., wth each layer-wndow correspondng to one dfferent detecton tme slce. The newly arrved events wll frst be stored n the event buffer of layer. The stored events wll be compared wth the reference model of that layer and the results are fed nto neural networks to detect the network status durng that tme wndow. The event buffer wll be empted once t becomes full, and the stored events wll be averaged and forwarded to the event buffer of layer 2. The same process wll be repeated recursvely untl t arrves at the top level where the events wll smply be dropped after processng. The smlarty-measurng algorthm that we are usng s shown below: Q = f ( N).[ k = -Wndow M -Wndow 2 -Wndow p p + ' k max = Event Buffer... Event Buffer Event Buffer Event Report ( p ' Fg. Statstcal Model p )] Reference Model Reference Model Reference Model where f (N) s a functon that takes nto account the total number of occurrences durng a tme wndow. Besdes smlarty measurements, we also desgned an algorthm for the real-tme updatng of the reference model. Let p old be the reference model before updatng, p new be the reference model after updatng, and p obs be the observed user actvty wthn a tme wndow. The formula to update the reference model s p new = s α p obs + ( s α) p old n whch α s the predefned adaptaton rate and s s the value generated by the output of the neural network. Assume that the output of the neural network s a contnuous varable t between and, where means ntruson wth absolute certanty and means no ntruson agan wth complete confdence. In between, the values of t ndcate proportonate levels of certanty. The functon for calculatng s s t, f t s =, otherwse Through the above equatons, we ensured that the reference model would be updated actvely for normal traffc whle kept unchanged when attacks occurred. The attack events wll be dverted and stored, for us as attack scrpts, n neural network learnng. 3 Neural Networks The neural networks are wdely consdered as an effcent approach to adaptvely classfy patterns, but the hgh computaton ntensty and the long tranng

cycles greatly hnder ther applcatons. In [4][7], BP neural networks were used to detect anomalous user actvtes. In [8], we deployed a hybrd neural network paradgm [6], called perceptron-backpropagatonhybrd (or PBH) network, whch s a superposton of a perceptron and a small backpropagaton network. In order to comprehensvely nvestgate the performances of neural networks, we examned fve dfferent types of neural networks: Perceptron, BP, PBH, Fuzzy ART MAP and RBF. The perceptron [9], Fg. 2, s the smplest form of a neural network used for the classfcaton of lnearly separable patterns. It conssts of a sngle neuron wth adjustable synapses and threshold. Although our data sets wll not, n general, be lnearly separable, we are usng perceptron as a baselne to measure the performances of other neural networks. x x 2 x N- x N s Threshold θ y small backpropagaton network. PBH networks are capable of explorng both lnear and nonlnear correlatons between the nput stmulus vectors and the output values. We tested PBH networks wth the number of hdden neurons rangng from to 8. Hdden Fg. 4 PBH archtecture Fuzzy ARTMAP [] n ts most general form s a system of two Fuzzy ART networks ART a and ART b whose F2 layers are connected by a subsystem referred to as a match trackng system. We are usng a smplfed verson of Fuzzy ARTMAP [], Fg. 5, whch s mplemented for classfcaton problems. We tested ARTMAP networks wth the number of category neurons rangng from 2 to 8. Fg. 2 Perceptron archtecture Error Sgnal The Backpropagaton network [9], or BP, Fg. 3, s a multplayer feedforward network, whch contans an nput layer, one or more hdden layers, and an output layer. BP s have strong generalzaton capabltes and have been appled successfully to solve some dffcult and dverse problems. We tested BP networks wth the number of hdden neurons rangng from 2 to 8. x x 2 C C 2 Fuzzy ART x P- x P C 2P Complement Catergory Hdden Fg. 3 BP archtecture Perceptron-backpropagaton hybrd network [6], or PBH, Fg. 4, s a superposton of a perceptron and a Fg. 5 Fuzzy ARTMAP archtecture Radal-bass functon network [9], or RBF, Fg. 6, nvolves three entrely dfferent layers. The nput layer s made up of source nodes. The second layer s a hdden layer of hgh enough dmenson, whch serves a dfferent purpose from that n a BP network. The output layer supples the response of the network to the actvaton patterns appled to the nput layer. We tested RBF networks wth hdden neurons rangng from 2 to 8.

x x 2 G G Typcal Traffc Attack Traffc Scenaro 6kbps 5kbps Scenaro 2 6kbps kbps Scenaro 3 2Mbps 5kbps Scenaro 4 2Mbps kbps Table Traffc Loads of The Four Smulato Scenaros x P- x P G Hdden of Green's Functons Fg. 6 RBF archtecture In our experments, we used NeuralWorks Professonal II/PLUS to buld all of the neural networks depcted above. 4 Expermental Results In ths secton, we wll present our smulaton approach and the results n applyng our statstcal models and the dfferent neural networks to detect network-based attacks. Frst the testbed confguraton and the smulaton specfcatons wll be ntroduced n subsecton 4., and then subsecton 4.2 reports the testng results. 4. Testbed We used a vrtual network usng smulaton tools to generate attack scenaros. The expermental testbed that we bult usng OPNET, a powerful network smulaton faclty, s shown n Fg. 7. The testbed s a -BaseX LAN that conssts of workstatons and server. Fg. 7 Smulaton Testbed We smulated the udp floodng attack wthn the testbed. To extensvely test the performances of neural networks, we ran four ndependent scenaros wth dfferent typcal traffc loads and attack traffc. Table lsts the traffc loads of the smulaton scenaros. 4.2 Results For each smulaton scenaro, we collected, records of network traffc. We dvded these data nto two separate sets, one set of 6 data for tranng and the other of 4 data for testng. In each scenaro, the system was traned for epochs. We evaluated the performances of the neural networks based on the mean squared root errors and the msclassfcaton rates of the outputs. The msclassfcaton rate s defned as the percentage of the nputs that are msclassfed by neural networks durng one epoch, whch ncludes both false postve and false negatve msclassfcatons. In the rest of ths secton, we wll present and analyze the smulaton results of the neural networks one by one. 4.2. Perceptron The mean squared root errors and the msclassfcaton rates of the perceptrons wthn the four smulaton scenaros are tabulated n Table 2. MSR Error Msclass rate Scenaro.68564.6725 Scenaro 2.75895.22 Scenaro 3.738548.233889 Scenaro 4.635356.9444 Table 2 The smulaton results of perceptrons We can see that the perceptrons performed poorly n all four scenaros: Mean squared root errors are between.6 and.7; and msclassfcaton rates are between. and.2. Both the MSR errors and the msclassfcaton rates are unacceptably hgh for an IDS. 4.2.2 Fuzzy ARTMAP and RBF The results of Fuzzy ARTMAP and RBF nets are shown n Fg. 8 to Fg.. The x-axes of the fgures represent the number of category neurons n Fuzzy ARTMAP and the hdden neurons n RBF. The y-axes represent the lowest Mean Squared Root Errors and the lowest Msclassfcaton Rates that these neural nets acheved wthn the epochs.

.9.8 scenaro 2.5.45.4 scenaro 2.7.35 MSR Error.6.5.4.3 Msclassfcaton Rate.3.25.2.5.2...5 # of category neurons Fg. 8 MSR errors of Fuzzy ARTMAP # of hdden neurons Fg. Msclassfcaton rates of RBF Msclassfcaton Rate.5.45.4.35.3.25.2.5. scenaro 2 From the above fgures, we can see that, as the number of hdden neurons ncreases, the performances of both ARTMAP and RBF networks mprove. In most of the cases, both of them outperformed perceptrons. 4.2.3 BP and PBH The results of BP nets are llustrated from Fg. 2 to Fg. 5..2.5.8 # of category neurons Fg. 9 Msclassfcaton rates of Fuzzy ARTMAP.9.8.7 scenaro 2 MSR Error.6.4.2..8.6.4.2 scenaro 2 MSR Error.6.5.4.3.2 # of hdden neurons.9 Fg. 2 MSR errors of BP..8 # of hdden neurons Fg. MSR errors of RBF Msclassfcaton Rate.7.6.5.4.3.2 scenaro 2. # of hdden neurons Fg. 3 Msclassfcaton rates of BP

MSR Error.2.8.6.4.2..8 scenaro 2 PBH are more desrable for statstcal anomaly ntruson detecton systems. Acknowledgements Our research was partally supported by a Phase I SBIR contract wth US Army. We would also lke to thank OPNET Technologes, Inc. TM, for provdng the OPNET smulaton software. Msclassfcaton Rate.6.4.2 # of hdden neurons.9.8.7.6.5.4.3.2. Fg. 4 MSR errors of PBH # of hdden neurons scenaro 2 Fg. 5 Msclassfcaton rates of PBH The fgures ndcate that BP and PBH networks have smlar performances, and that both neural networks consstently perform better than the other three types of neural networks. The curves n these fgures are flat: the MSR errors and msclassfcaton rates do not decrease as the number of hdden neurons ncreases. We beleve the reason s that, because we only deployed one attackng technque, UDP floodng attack, n our smulatons, our data sets are too smple for BP and PBH. In the future, we wll ncorporate more Denal-of-Servce attackng technques nto our smulaton, thus provdng addtonal tests, and possbly greater challenges, for the neural networks under consderaton. 5 Conclusons In ths paper, we descrbed our experments of testng dfferent neural networks for statstcal anomaly ntruson detecton. The results showed that BP and PBH nets outperform Perceptron, Fuzzy ARTMAP and RBF. Thus, classfcaton capabltes of BP and References: [] G. Vgna, R. A. Kemmerer, NetSTAT: a network-based Intruson Detecton Approach, Proceedngs of 4 th Annual Computer Securty Applcatons Conference, 998, pp. 25 34. [2] W. Lee, S. J. Stolfo, K. Mok, A Data Mnng Framework for Buldng Intruson Detecton Models, Proceedngs of 999 IEEE Symposum of Securty and Prvacy, pp. 2-32. [3] A. Valdes, D. Anderson, Statstcal Methods for Computer Usage Anomaly Detecton Usng NIDES, Techncal report, SRI Internatonal, January 995. [4] J. M. Bonfaco, et al., Neural Networks Appled n Intruson Detecton System, IEEE, 998, pp. 25-2 [5] H. S. Javtz, A. Valdes, the NIDES Statstcal Component: Descrpton and Justfcaton, Techncal report, SRI Internatonal, March 993. [6] R. M. Dllon, C. N. Mankopoulos, Neural Net Nonlnear Predcton for Speech Data, IEEE Electroncs Letters, Vol. 27, Issue, May 99, pp. 824-826. [7] A.K. Ghosh, J. Wanken, F. Charron, Detectng Anomalous and Unknown Intrusons Aganst Programs, Proceedngs of IEEE 4th Annual Computer Securty Applcatons Conference, 998, pp. 259 267 [8] Z. Zhang, et al, A Herarchcal Anomaly Network Intruson Detecton System Usng Neural Network Classfcaton, to appear n Proceedngs of 2 WSES Internatonal Conference on: Neural Networks and Applcatons (NNA ), Feb. 2 [9] Smon Haykn, Neural Network A Comprehensve Foundaton, Macmllan College Publshng Company, 994 [] G.A. Carpenter, et al, Fuzzy ARTMAP: An adaptve resonance archtecture for ncremental learnng of analog maps, Internatonal Jont Conference on Neural Networks, June 992 [] NeuraWare Inc., Neural Computng A Technology Handbook for NeuralWorks Professonal II/PLUS and Neural Works Explorer, NeuralWare Inc., 998