Online Traffic Classification Based on Sub-Flows

Size: px

Start display at page:

Download "Online Traffic Classification Based on Sub-Flows"

Pamela Wilkins
6 years ago
Views:

1 Online Traffic Classification Based on SubFlows Victor Pasknel de A. Ribeiro, Raimir Holanda Filho Master s Course in Applied Computer Sciences University of Fortaleza UNIFOR Fortaleza Ceará Brazil paskel@unifor.br, raimir@unifor.br. José Everardo Bessa Maia Departmemt of Statistics and Computing State University of Ceará UECE Fortaleza Ceará Brazil jmaia@uece.br. Abstract Traffic classification by application class provides useful information for various tasks of network engineering and administration. However, offline classification of flows has limited its practical application to auditing tasks, longterm planning and other analytical issues. Therefore, research on traffic classification now moves towards the search for accurate and efficient methods of classification in order to meet online tasks such as traffic monitoring and shaping and other specificapplication operations. In this work we apply the OneAgainst All Approach (OAA) for two online classification strategies based on statistical features of TCP subflows. One uses the first N packets of the bidirectional TCP session and the other applies to subflows of the N packets starting at random position in the flow. In our variant of the OAA approach, the problem of classifying an object in one of M classes is reduced to M binary classification problems with an associated decision rule, with each of them possibly using a different subset of features and subflow size. We investigated the effect of variation in the amount of N on the results of classification and the smaller set of variables in each of the above problems. This study used the Naïve Bayes classifier. Keywords: Online traffic classification, oneagainstall classification. obtained from a multiclass Naïve Bayes classifier. We investigated the effect of variation in the amount of N on the results of classification and the use of a reduced set of variables in each of the above problems with each of them possibly using a different subset of features. The tests were performed using current real traces with 5 different applications classes and the results show a precision about 98.45% when using initial subflows and of 94.86% while the classification is based on random subflows. These values are 4.21% and 7.34% over a multiclass classifier. In this strategy, the optimum number of features and the optimum subflow size per class ranged from 5 to 20 and 5 to 8 respectively. Our paper is organized as following. Section II presents a brief review of the more relevant recent works and closer to our approach. Section III introduces some background on Bayesian classification and its application in IP traffic classification. It then reviews the Naïve Bayes classifier and presents our complete proposal. Section IV describes the procedure for collecting and labeling data used in the tests. Section V illustrates our proposal using an experimental approach with our results and analysis. Our paper is concluded in section IV with some remarks and future works. I. INTRODUCTION II. RELATED WORKS Online traffic classification may be a core part for network management systems, automated intrusion detection systems and denial of service attacks detection. Commonly deployed IP traffic classification techniques often involve direct inspection of port numbers and/or packet s payload. Yet the efficacy of such techniques is diminishing. Traffic classifications based on statistics methods and Machine Learning techniques have attracted a great deal of interest, while inspecting only packet s header information. We proposed an online traffic classification architecture using a supervised OneAgainstAll algorithm (OAA). The approach has the advantage of allowing the utilization of binary classifiers, which are highly specialized. In our variant of the OAA approach, the problem of classifying an object in one of M classes is reduced to M binary classification problems with an associated decision rule. In this work we applied the OAA approach for two online classification strategies based on statistical features of TCP subflows. One uses the first N packets of the bidirectional TCP session and the other applies to subflows of the N packets starting at a random position in the flow. These two strategies are compared with each other and against the results More recent publications have approached the network traffic classification subject under an online classification perspective [1], [2]. Subflow classification, when compared to the approach based on flows, reduces the processing (throughput) and does not require waiting until the end of the flow (delay). For the classification activity, two important aspects must be verified: the feature selection method and the classification algorithm. Feature selection algorithms fall into three broad categories, the filter model, the wrapper model and the hybrid model [3]. Machine learning algorithms have been used assuming that a class of traffic can be identified using statistical analysis of traffic features. Related to feature selection for online classification, the work of Zhang et al. [4] explores a benchmarking of two different algorithms to identify the feature subset suitable for cluster algorithm as a critical question on online traffic classification. In [5] the authors perform an evaluation of the effectiveness of statistical methods for the online traffic classification problem. The paper evaluates three different flow feature sets

2 that are used to capture distinct properties of each application, two of them consisting of features generated from full flows and the third was made up of early subflows statistics derived from the first few packets of each flow. Into the work proposed in [6], a pretrained Naïve Bayes model is used for classification based on the statistical behavior of a traffic flow, such as average segment size, variance of payload size and initial window size. On total, 10 features are collected from traffic flows and up to 96% of precision were achieved to classify the traffic into 10 different application classes. However, in this work, the packets are collected only from the beginning of TCP flows and was not evaluated the impact of random subflows. In [7], is proposed a classifier that uses statistics derived from the most recent N packets taken at any arbitrary point in a flow s lifetime. The classifier was trained using statistical features calculated over multiple short subflows extracted from full flows generated by the target application. The paper, however, is applied only to identify a game application. The same authors of [7] extend their previous work on training with multiple subflows [8] to include the idea of using unsupervised machine learning algorithm for automated subflows selection. Also, as in the previous work, the accuracy of the proposed approach is limited to an online game application. The work proposed here differs from previous works by using together the following set of assumptions: 1) characterization of an optimal classification subflow size for each application class; 2) explore further reduction in the number of features, for each class, that must be calculated online to still achieve acceptable performance and; 3) utilization of low complexity statistical techniques. III. THE USED OAA APPROACH A. Bayesian Classification and Naïve Bayes In this work, we have used the Naïve Bayes technique [9]. Consider a collection of flows x = (x 1,..., x n ), where each flow x i is described by m discriminators {d 1 (i),..., d m (i) } that can take either numerical or discrete values. In the context of the Internet traffic, d j (i) is a discriminator of flow x i, for example it may represent the mean interarrival time of packets in the flow x i. In this paper, x i flows belong to exactly one of the mutually exclusive classes. The supervised Bayesian classification problem deals with building a statistical model that describes each class based on some training data, and where each new flow y receives a probability of getting classified into a particular class according to the Bayes rule below, p(c j ) f (y c j ) p(c j y) = p(c j ) f (y c j ) (1) c j acts as a normalizing constant. The Naïve Bayes technique that is considered in this paper assumes the independence of discriminators d 1,..., d m as well as the simple Gaussian behavior of them. The classification rule consists to choose the class with maximum membership probability, according to equation 2: c j = argmax j p(c j y) (2) B. Feature and Subflow Size Selection Since there are a lot of features can be used for flow classification, the classifier may deal with huge amount of features, which contains irrelevant and redundant features causing slower classifyprocess, higher resource consumption as well as poor classification accuracy. Therefore, feature selection plays a vital role in performance optimizing. How to find an optimal subflows featureset is still a critical question. Feature selection methods have been successfully applied to classification but seldom applied to online clustering due to the unavailability of class label information. The Wrapper [10] evaluator was used in this work for feature selection. Wrapper evaluates features using precision estimations produced by the learning algorithm that will be used on the classification, in this case, the Naïve Bayes. A selection approach was performed for each binary classifier producing a specific set of features for each class. The Java implementation of the Wrapper evaluator found in Weka [11] was used for the selection of features in each class model created. The Naïve Bayes classifier was utilized as the learning algorithm and Best First was selected as search method for the Wrapper evaluator. The following steps were performed for both subflow strategies (initial and random packets) in order to select a reduced number of features and subflows sizes: 1. An OAA dataset is created for each class and subflow size analyzed. In this research, we studied the effect of varying the size of subflows from 5 to 8 packets. 2. The Wrapper evaluator is executed with each dataset created in step 1. The variables and subflow size of each class are selected based on the highest result obtained from the Wrapper evaluator. In case of match between the highest results, the model with the smallest subflow size is selected. The results obtained from Wrapper while analyzing the N initial packets are presented in Fig. 1. The optimum number of attributes and subflow size obtained from each class is demonstrated in Table I. The description of all the features selected for each class can be seen in Table X (Appendix). (2) where p(c j ) denotes the probability of obtaining class c j independently of the observed data, f(y c j ) is the distribution function (or the probability of y given c j ) and the denominator

3 MAIL 20 8 SMTP, IMAP, POP P2P 7 8 Bittorrent, Gnutella, edonkey SSL 14 8 HTTPS (SSL/TLS) Figure 1. Wrapper Initial Packets Note that for some classes (e.g. HTTP) the accuracy rate drops as the subflow size is increased. This phenomenon has been observed in other studies [7] and is due to the fact that the attributes used for initial packets, when isolated, can differentiate more appropriately this class rather than when used diffused with more packets of the flow. TABLE I. INITIAL PACKETS Class Number of Subflow Attributes Size Applications CHAT 6 7 MSN Messenger HTTP 9 5 Browsers MAIL 6 6 SMTP, IMAP, POP P2P 12 7 Bittorrent, Gnutella, edonkey SSL 5 8 HTTPS (SSL/TLS) C. The Classification Procedure Previous works, that applied a statistical approach to the Internet traffic classification, tried to solve the following problem: based on a fixed subflow size, how to select the minor set of statistical features to classify the Internet traffic [6], [7], [8]. Our work follows a different way based on the statistical approach. First of all, we start from the hypothesis that the best set of discriminators and the minimum subflow size used to classify each traffic class does not match the best set of discriminators and subflow size to classify all the traffic classes simultaneously. After that, we try to obtain for each traffic class a set of features that better identify that class against all others. Furthermore, for each class we look for the optimal number of packets per subflow size. The figure below shows the architecture of the proposed classifier, which consists of three modules: preprocessing, training and class identification. The next paragraphs describe in details the implementation of these modules applied in our work. Fig. 2 presents the results obtained from Wrapper while analyzing N packets taken from a random position of the original flow. Table II demonstrates the optimum number of features and subflow size obtained from each class. The description of all the features selected for each class can be seen in Table XI (Appendix). Figure 3. Architecture of proposed classifier Figure 2. Wrapper Random Packets TABLE II. RANDOM PACKETS Class Number of Subflow Attributes Size Applications CHAT 5 8 MSN Messenger HTTP 10 8 Browsers The preprocessing module starts with the capture of network packets in a promiscuous mode. The capture is limited to the TCP/IP headers and not considering any information from the packet payload. The captured packets are stored temporarily and then the flows are reconstructed. As we applied a flow approach in this work, this phase is very important. After that, we start the flow identification phase. In this phase, each flow must be labeled with an application class. This identification is carried out in a manual fashion whether the traces are not generated into a controlled way. After the flow reconstruction and identification, the third phase consists of the features extraction. These features will be used into the next module (Training) to perform the training of the classifier. The training of the classifier consists to determine the best combination of features and subflow size. The approach to find that combination is to perform the Wrapper filter and get the best accuracy ratio, as described in the previous session. In the class identification module, the procedure consists in taking a new flow; classifying it according to the Naïve Bayes

4 classifier of each binary classification problem using the related discriminators of each class and the subflow size. For each subflow, we calculate the class membership probability. Thus, the subflow will be assigned to the class which model gives the higher probability. This clasification approach may be applied to initial subflows as also to random subflows, in other words, can be applied for a set of packets extracted from any flow position. IV. DATA AND MEASUREMENTS This section describes some basic definitions used during this research, as well as the process of data acquisition and labeling procedures. A. Flow Definition Our proposal is based on the analysis of traffic flows. Traffic flows consists in a stream of packet being transmitted between a pair of hosts [12]. A flow can be also defined as a 5 tuple: IP addresses (Source and Destination), Port numbers (Source and Destination) and a Protocol (TCP or UDP). Only TCP traffic was analyzed during this research. TCP flows are initiated with a 3way handshake and are considered finalized if any of these two conditions are met: FIN and/or RST flags are seen in the TCP header or no packet is transmitted between the hosts during an interval of 60 seconds. B. The data acquisition and Labeling Procedure To verify the validity of our approach, we must run our proposed methodology with traces of network traffic. A number of steps must be performed in order to obtain all the necessary data: capturing raw packets, flow reconstruction and class labeling. The first step consists in the capture of network packets using a network interface card in promiscuous mode. The captured packets are stored temporarily and then the flows are reconstructed. During the final step, each flow must be labeled with a flow class. The process of labeling each flow was performed in a semiautomated manner through the use of the payload inspection tool OpenDPI [13] and Jpcap Library [14]. The proposed online classifier was trained and validated through the use of 3 traffic datasets (referred as T1, T2 and T3) collected from the network gateway of the University of Fortaleza during April 26 to 28, Table III summarizes the classes, applications and the total number of flows found within each dataset. Gnutella, edonkey SSL HTTPS (SSL/TLS) Each dataset was collected during periods of 1 hour (morning, afternoon and night) and they contain network traffic from the following classes (and corresponding applications): CHAT (MSN Messenger), HTTP (Browsers), MAIL (SMTP, IMAP and POP), P2P (Bittorrent, Gnutella and edonkey) and SSL (SSL/TLS). A total of random flows (5.000 per class) were selected for the training phase of the proposed classifier. C. Features and Subflows The features used for flow classification in this research were calculated based only on information obtained from packet headers, such as packet size and TCP flags. No payload inspection is performed or port numbers are used while calculating this group of features. The features are calculated for each direction of a bidirectional flow (client to server and server to client). The subflows used in this research consist in groups of N packets, taken from complete flows. We have selected subflows varying N from 5 to 8 packets. For each original flow, two subflows are extracted: the first contains the initial N packets while N packets taken from a random position of the original flow form the second. The statistical features are extracted from subflows and are used to classify the entire flow. In the cases which the number of packets in the flow is lower than N, the entire flow was used to extract the features and used as initial and random subflow. As the result of the final step, each subflow will be represented as a vector of features. The evaluation procedure used crossvalidation with 10 partitions. V. RESULTS AND DISCUSSION In a concise manner, our approach consists on the application of the OAA classification strategy using different subflow sizes and also different discriminators for each traffic class. To evaluate this strategy we compared its performance against the Naïve Bayes multiclass (traditional form) for the dataset described earlier. A. Main Results For a fair comparison, the Naïve Bayes multiclass classifier was trained using the same subflows size used into the OAA and the best set of features obtained with the Wrapper filter. The following metrics were utilized for performance measurement: TABLE III. SUMMARY OF DATASETS Class Number of Flows T1 T2 T3 Applications CHAT MSN Messenger HTTP Browsers MAIL SMTP, IMAP, POP P2P Bittorrent, precision = true positive true positive + false positive ( )

5 true positive recall = true positive + false negative Tables IV and V show the confusion matrix for the Naïve Bayes multiclass and OAA classifiers, respectively, when using subflows containing the initial packets. TABLE IV. CONFUSION MATRIX INITIAL PACKETS (BEST N = 8) Multiclass Naïve Bayes CHAT HTTP MAIL P2P SSL OAA HTTP MAIL P2P SSL Table VIII presents the recall (flows, packets and bytes) for both Naïve Bayes multiclass and OAA classifiers (subflows containing initial or random packets). TABLE VIII. RESULTS (RECALL) Flows Bytes Packets Multiclass Initial 94,24% 94,04% 94,13% OAA Initial 98,45% 98,68% 98,58% Multiclass Random 87,55% 88,14% 87,82% OAA Random 94,86% 94,74% 94,73% TABLE V. CONFUSION MATRIX INITIAL PACKETS OAA CHAT HTTP MAIL P2P SSL The precision values (flows, packets and bytes), calculated from the Naïve Bayes multiclass and OAA classifiers, are presented in table IX. TABLE IX. RESULTS (PRECISION) Flows Bytes Packets Multiclass Initial 94,3% 83,01% 92,98% OAA Initial 98,47% 98,53% 98,99% Multiclass Random 87,82% 73,53% 86,12% OAA Random 95% 93,22% 95.5% Tables VI and VII show the confusion matrix for the Naïve Bayes multiclass and OAA classifiers, respectively, when using subflows constituted by random packets. TABLE VI. CONFUSION MATRIX RANDOM PACKETS (BEST N = 8) Multiclass Naïve Bayes CHAT HTTP MAIL P2P SSL TABLE VII. CONFUSION MATRIX RANDOM PACKETS OAA CHAT B. Discussion The results of the OAA approach were consistently better than the results obtained with the multiclass approach for all classes, using initial or random subflows. However, some additional insights can be extracted from these tables. Initially, observe that the higher percentage of confusion, when using random subflows, is among the HTTP and SSL classes. However, better results are obtained when using subflows of initial packets. This fact occurs because into the initial packets are present some information related with these protocols. Into the remaining of the flow, where are extracted the random subflows, these information are not present. An advantage of the approach with multiple subflow sizes and different features for each class in an online classification is that at the moment in which the subflow size is reached, the features extraction and the application of the associated OAA classifier can be performed imediately. In this case, we do not wait for the packet collection to complete the longer subflow of the set. In our example, the subflow size ranges from 5 to 8 packets. In this case, simultaneously to the reception of the last 3 packets, we can perform all classes in which the subflow sizes are less than 8. VI. CONCLUSIONS

6 In this work we evaluated a new approach for online classification of the Internet traffic. Its main characteristic is related with the use of the OAA approach based on the Naïve Bayes classifier, with the following optimizations: subflow size, number and specific features for each class. The performance of this approach was tested for subflows composed by initial packets from each flow and also for subflows extracted from random positions of the flows. The outcomes of the classification, using the performance measures accuracy, precision and recall are also compared against that produced by the multiclass Naïve Bayes classifier. The results show the superior performance of the proposed approach besides generate interesting insights. The OAA Naïve Bayes approach using subflows of initial packets is consistently better than the others. This is consequence of the fact that relevant informations about the protocols are present into the initial packets but are not acessible into the random subflows. The cost of this advantage is the requirement of the acquisition of initial packets of each flow. As consequence, we can speculate that, under fair conditions, the algorithms based on initial subflows always will superate the performance of that based on random subflows. REFERENCES [1] Laurent Bernaille, Renata Teixeira, Ismael Akodkenou, Augustin Soule, Kave Salamatian, Traffic classification on the fly, ACM SIGCOMM Computer Communication Review, v.36 n.2, April 2006 [2] A. Este, F. Gringoli, L. Salgarelli, Support Vector Machines for TCP traffic classification, Elsevier Computer Network, 53(14), pp , [3] J. Erman, M. Arlitt, A. MAhanti, Traffic Classification Using Clustering Algorithms, SIGCOMM 06 Workshops September 1115, 2006, pp [4] J Zhang, Z Qian, G Shou, Y Hu, An automated online traffic flow classification scheme, Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Kyoto Japan, September, [5] Yu Wang and ShunZheng Yu, Supervised Learning Realtime Traffic Classifiers, Journal of Networks, Vol 4, No 7, September, [6] Wei Li, Kaysar Abdin, Robert Dann, and Andrew Moore. Approaching real time network traffic classification. Technical Repor t RR0612, Department of Computer Science, Queen Mary, University of London, December [7] T. Nguyen and G. Armitage, Training on multiple subflows to optimise the use of machine learning classifiers in realworld ip networks, in Proc. IEEE 31st Conference on Local Computer Net works, Tampa, Florida, USA, November [8] T.T.T. Nguyen, G. Armitage, "Clustering to Assist Supervised Machine Learning for RealTime IP Traffic Classification", IEEE International Conference on Communications (ICC 2008), pp , Beijing, China, 1923 May [9] Patcha, A. and Park, J An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw. 51, 12 (Aug. 2007), [10] Hall, M. A., Correlationbased feature selection for machine learning, Ph.D. thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand, [11] WEKA 3.6, (as of September 2010). [12] Moore, A.W., Zuev, D., Crogan, M., Discriminators for use in flowbased classification, In passive & Measurement Workshop 2003 (PAM2005), August [13] OpenDPI, Ipoque s DPI software s Open Source Version, (as of September 2010). [14] Jpcap: Java library for capturing and sending network packets, (as of September 2010).

7 TABLE X. OAA MODELS INITIAL PACKETS Maximum InterPacket (Server to Client) Windows Maximum Window Size Mean of Segment Bytes Variance of Control Bytes Variance of InterPacket Third Quartile of Ethernet Bytes Maximum of Bytes in Ethernet Packet (Client to Variance of Segment Bytes Window Minimum InterPacket (Server to Client) Average Window Size Mean of InterPacket Length Variation Average Window Size (Client to Window Maximum of Segment Bytes Average Window Size Maximum InterPacket Arrival Interval Variance of InterPacket Total of Packets (Client to Maximum of InterPacket Length Variation (Client to Variance of InterPacket (Client to Third Quartile of Bytes in Ethernet Packet (Client to Average Window Size (Client to Mean of Segment Bytes (Server First Quartile of Bytes in IP Packet Average Window Size (Server Total of Zero Windows (Server Median of Bytes in IP Packet Total of Packets (Client to Mean of Segment Bytes Window (Client to Window (Server to Client)

8 TABLE XI. OAA MODELS RANDOM PACKETS Minimum of Bytes in Minimum of Bytes in Minimum of Bytes in Ethernet Ethernet Packet (Client to Ethernet Packet Packet Window Variance of Segment Bytes Maximum Window Size Window Total of Actual Packets Mean of InterPacket (Client to First Quartile of Inter Packet Length Variation Maximum Window Size Total of Actual Packets Average Window Size Median of InterPacket Arrival Interval First Quartile of Bytes in Ethernet Packet First Quartile of InterPacket Minimum of bytes in IP Packet Maximum of Bytes in Ethernet Packet Variance of Control Bytes (Client to Standard Deviation of Inter Packet (Client to Median of InterPacket Arrival Interval (Client to Maximum Window Size (Client to Total of TCP packets with SYN Flag Window Minimum of InterPacket Length Variation Maximum of Bytes in Ethernet Packet Median of InterPacket Arrival Interval (Server Maximum Window Size (Server Window Mean of Control Bytes Total of Zero Windows Window Window Window Median of InterPacket First Quartile of InterPacket Minimum of InterPacket Length Variation (Client to Variance of Segment Bytes Average Window Size (Client to Window Window Total of Actual Packets Variance of Segment Bytes Window

Efficient Flow based Network Traffic Classification using Machine Learning

Efficient Flow based Network Traffic Classification using Machine Learning Jamuna.A*, Vinodh Ewards S.E** *(Department of Computer Science and Engineering, Karunya University, Coimbatore-114) ** (Assistant