International Journal of Electronics and Communication Engineering & Technology (IJECET) Volume 6, Issue 12, Dec 2015, pp. 43-48, Article ID: IJECET_06_12_007 Available online at http://www.iaeme.com/ijecetissues.asp?jtype=ijecet&vtype=6&itype=12 ISSN Print: 0976-6464 and ISSN Online: 0976-6472 IAEME Publication COMPARISON OF THE ACCURACY OF BIVARIATE REGRESSION AND BOX PLOT ANALYSIS IN DETECTING DDOS ATTACKS Samer Charbaji Department of Electrical and Computer Engineering, Faculty of Engineering and Architecture, American University of Beirut, Lebanon ABSTRACT In this paper, we compare the accuracy of two different host based statistical methods, the univariate box plot and bivariate linear regression, in detecting distributed denial of service (DDOS) attacks. The receiver operating characteristic curves (ROCs) of the two methods are plotted and the comparison is performed based on the two curves. The results of our study show that the two methods have very close results when it comes to their ROC curves and their accuracy in detecting attacks in our data set. Cite this Article: Samer Charbaji. Comparison of the Accuracy of Bivariate Regression and Box Plot Analysis in Detecting DDOS Attacks. International Journal of Electronics and Communication Engineering & Technology, 6(12), 2015, pp. 43-48. http://www.iaeme.com/ijecet/issues.asp?jtype=ijecet&vtype=6&itype=12 1. INTRODUCTION DDOS attacks pose a serious threat to the security of the internet. DDOS attacks overwhelm online servers and hosts by sending large amounts of packets from many different sources thereby taking them off the network [1]. The effectiveness of DDOS attacks can be attributed to the presence of many simple yet powerful attack tools that automate the attack and that require little knowledge on the side of the attacker. The attack tools themselves carry out different types of attacks which include variants of packet transmission such HTTP request, UDP, and SYN floods [2]. Intrusion Detection Systems (IDSs) are systems used in order to detect attacks. They analyze and process network data in order to detect the presence of any unusual behavior or exploitation attempts on the network and then classify these patterns as either attacks or normal traffic. The main assumption behind IDSs is that there exists a difference between normal and attacker traffic and, as a result, detection can occur http://www.iaeme.com/ijecet.asp 43 editor@iaeme.com
Samer Charbaji by capturing normal traffic patterns and comparing them with current network behavior in order to identify and single out anomalous user behavior [3]. This makes detection difficult due to the unpredictability of attacks and normal user behavior. The unpredictability of attacks is due to the different methods and attack patterns used whereas the unpredictability of normal user behavior is attributed to the phenomena of flash-crowds which constitute unpredictable increases in legitimate user traffic in a manner that resembles DDOS attacks [4]. IDS detection methods can be categorized into two general categories: signature and profile based detection [5]. Signature based methods work by collecting data and searching for matches between the data and the signatures of known attacks. Profile based methods, on the other hand, collect data and establish normal user behavior profiles and then search for deviations from these profiles in the network. One of the widely used techniques in profile based detection are statistical techniques [6]. These techniques rely on collecting normal user traffic data over a period of time and then conducting statistical tests in order to single out anomalous activities. The purpose of this paper is compare the accuracy of two different host based statistical techniques in detecting DDOS attacks. The first technique used is a simple univariate statistic, the box plot, which works by detecting outliers and classifying them as attacks [7] whereas the second technique used is bivariate regression analysis which takes in different data in order to predict the occurrence of an attack [8]. The comparison of the accuracy of the two methods is done by comparing the receiver operating characteristic (ROC) curves of the two methods. 2. RELATED WORK Many researchers have adapted statistical techniques for DDOS detection and some have made comparisons between different methods. Feinstein et al [9] used entropy calculation and chi-square statistic on packet header fields such as source and destination address and ports used in order to detect DDOS attacks and compared the two methods. Xiao et al [10] have performed attack detection by using correlation based k-nearest neighbor analysis in order to classify the data into different data sets and new data was compared to the different available data sets and attacks were identified. They generally compared the effectiveness of their method against other statistical methods. Jin et al [11] calculated the correlation between different header fields such as the RST, SYN, and FIN flags in order to detect changes in network behavior. Gautam et al [12] used multivariate linear regression on system log file attributes such as page faults, allocation, and cache bytes in order to detect attacks. Om et al [13] used principle component analysis and Gaussian mixture distributions on system log file attributes such as page writes and cache faults in order to detect anomalous traffic and compared the two. Mok et al [6] used logistical regression on different header fields such as protocol type and TCP flags in order to detect attacks. Wu et al [5] used factor analysis on source and destination IP addresses and ports along with the protocol type and TCP flags and then calculated the Mahalanobis distance in order to perform the detection of anomalous traffic variations. Other researchers have adapted different techniques for DDOS detection. Ye et al [14] used clustering analysis on user session features in order to extract request rates, transition patterns, object requested popularity, and the size of the requested objects. User behaviors are then grouped into different clusters and attacks are detected when they fail to fit in any of the clusters. Lie et al [15] used neural networks taking the traffic duration, intensity, and packet numbers as inputs in order to detect network http://www.iaeme.com/ijecet.asp 44 editor@iaeme.com
Comparison of The Accuracy of Bivariate Regression and Box Plot Analysis In Detecting DDOS Attacks attacks. Saied et al [16] used artificial neural networks that were trained with patterns of both normal and abnormal traffic in order to perform detection. 3. IMPLEMENTATION The comparison that was performed consisted of three stages: collecting network data, performing detection accuracy analysis using each method, and then comparing the results. Network data was collected over a time period spanning three weeks on a host using Wireshark. The data was aggregated on an hour by hour basis and exported as a CSV file to Excel where it was subsequently analyzed. The data collected consisted of both normal and malicious network traffic and the analysis consisted of calculating the total number of SYN and FIN packets included within each hour of data. The malicious network traffic was due to distributed SYN flood attacks which were executed using an automated SYN flooding tool, Hyenae FE. After the total SYN and FIN were obtained for every hour and placed in a spreadsheet as seen in Figure 1, the difference between the two was taken. With the data acquisition complete, the statistical methods could now be used and their results compared. Figure 1 SYN and FIN Analyzed 4. BOX PLOT The box plot method works by calculating the average and standard deviation of the data set and then considers data points above a certain threshold of standard deviations away as attacks. In our implementation, the difference of SYN total and FIN total was used as the parameter and its average and standard deviation were calculated. The threshold of standard deviation was changed multiple times and the resulting percentage detection and percentage false positives (false alarm) were calculated according to the equations (1) and (2), where a false positive is considered to occur when normal traffic is considered falsely to be an attack. A plot of the % Detection as a function of % False Alarm was constructed which is the ROC of the box plot method as shown in Figure 2. http://www.iaeme.com/ijecet.asp 45 editor@iaeme.com
% Detection Samer Charbaji ROC Curve - Box Plot 100 90 80 70 60 50 40 30 20 10 0 0 10 20 30 40 % False Alarm Figure 2 ROC of Box Plot 5. BIVARIATE REGRESSION Regression is an explanatory and predictive technique that enables us to predict a dependent variable using independent variable(s). In our implementation we used bivariate regression with the difference between SYN total and FIN total as the independent variable and the Attack/No-Attack as our dependent variable placing a 0 for No-Attack and 1 for Attack. We then perform the analysis and get the predictions as shown in Figure 3. The predictions fall between 0 and 1 and according to the threshold placed between 0 and 1 we consider the data point to be an attack or normal traffic. We vary the aforementioned threshold and calculate the % Detection and % False Alarm as in Equation (1) and (2) for each threshold. This gives us the ROC of the bivariate regression shown in Figure 4. Figure 3 Regression Analysis http://www.iaeme.com/ijecet.asp 46 editor@iaeme.com
% Detection Comparison of The Accuracy of Bivariate Regression and Box Plot Analysis In Detecting DDOS Attacks 100 ROC Curve-Bivariate Regression 80 60 40 20 0 0 5 10 15 20 25 30 35 % False Alarm Figure 4 ROC-Bivariate Regression 6. RESULTS In comparing the ROC curves of the two methods, we find that both curves are very similar. We notice a large increase in % Detection at first as we increase the %False Alarm slightly. The % Detection than reaches a plateau until it then continues to increase linearly with the %False Alarm. The above comparison shows us that the bivariate regression which is more computationally intensive gives results comparable to that of the simple box plot method when performed on our data set. This shows that the complexity of the method used in this study did not affect the accuracy of the results obtained where the simpler method gave a slightly better detection capability sacrificing a smaller false alarm percentage to reach the same percentage detection. 7. CONCLUSION Both bivariate regression analysis and the univariate box plot can be used in order to detect attacks with variable sensitivity. The results of our study show that even with its simplicity and small computational requirement the box plot method is able to provide the same, albeit slightly better results, than the more computationally intensive and complicated bivariate regression and as such we can see that the complexity of a method and its computation intensity may not always give us better results. Future research should implement both methods on different data sets in order to track the difference and give a more generalized comparison between the two by choosing more encompassing data sets. REFERENCES [1] No, Giseop, and Ilkyeun Ra. An efficient and reliable DDoS attack detection using a fast entropy computation method. Communications and Information Technology, 2009. ISCIT 2009. 9th International Symposium on. IEEE, 2009. [2] Chen, Yonghong, Xinlei Ma, and Xinya Wu. "DDoS detection algorithm based on preprocessing network traffic predicted method and chaos theory."communications Letters, IEEE 17.5 (2013): 1052-1054. [3] Tan, Zhiyuan, et al. Triangle-area-based multivariate correlation analysis for effective denial-of-service attack detection. Trust, Security and Privacy in Computing and Communications (TrustCom), 2012 IEEE 11th International Conference on. IEEE, 2012. http://www.iaeme.com/ijecet.asp 47 editor@iaeme.com
Samer Charbaji [4] Thapngam, Theerasak, et al. Distributed Denial of Service (DDoS) detection by traffic pattern analysis. Peer-to-peer networking and applications 7.4 (2014): 346-358. [5] Wu, Ningning, and Jing Zhang. Factor analysis based anomaly detection."information Assurance Workshop, 2003. IEEE Systems, Man and Cybernetics Society. IEEE, 2003. [6] Mok, Min Seok, So Young Sohn, and Yong Han Ju. Random effects logistic regression model for anomaly detection. Expert Systems with Applications37.10 (2010): 7162-7166. [7] Williamson, David F., Robert A. Parker, and Juliette S. Kendrick. The box plot: a simple visual method to interpret data. Annals of internal medicine 110.11 (1989): 916-921. [8] Retherford, Robert D., and Minja Kim Choe. Bivariate linear regression."statistical Models for Causal Analysis (1993): 1-28. [9] Feinstein, Laura, et al. "Statistical approaches to DDoS attack detection and response. DARPA Information Survivability Conference and Exposition, 2003. Proceedings. Vol. 1. IEEE, 2003. [10] Xiao, Peng, et al. Detecting DDoS attacks against data center with correlation analysis." Computer Communications 67 (2015): 66-74. [11] Jin, Shuyuan, and Daniel S. Yeung. A covariance analysis model for DDoS attack detection. Communications, 2004 IEEE International Conference on. Vol. 4. IEEE, 2004. [12] Gautam, Sunil Kumar, and Hari Om. Multivariate Linear Regression Model for Host Based Intrusion Detection. Computational Intelligence in Data Mining- Volume 3. Springer India, 2015. 361-371. [13] Om, Hari, and Tanmoy Hazra. Statistical Techniques in Anomaly Intrusion Detection System. International Journal of Advances in Engineering and Technology (2012). [14] Ye, Chengxu, Kesong Zheng, and Chuyu She. Application layer DDoS detection using clustering analysis. Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on. IEEE, 2012. [15] Dr. Imad S. Alshawi, Dr. Kareem R. Alsaiedy, Ms.Vinita Yadav and Ms. Rashmi Ravat. Defense Framework (Stream) For Stream-Based DDOS Attacks on Manet. International Journal of Electronics and Communication Engineering & Technology, 5(1), 2014, pp. 42-52 [16] Liu, Lei, et al. Anomaly diagnosis based on regression and classification analysis of statistical traffic features. Security and Communication Networks7.9 (2014): 1372-1383. [17] Saied, Alan, Richard E. Overill, and Tomasz Radzik. Detection of known and unknown DDoS attacks using Artificial Neural Networks. Neurocomputing 172 (2016): 385-393. http://www.iaeme.com/ijecet.asp 48 editor@iaeme.com