Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems Fall 21

Page 1 Abstract The focus of this study is to determine the optimal allotment of bandwidth for users at the University of Wisconsin Foundation whom connect to the physical network via a Virtual Private Network connection. By the use of artificial neural network pattern classification techniques, the UW Foundation can ensure quality of services amongst all of its connection types, which is critical to business success. The objective is to determine an accurate estimate of usage based on a classification system that examines in-going and out-going packets. Maximum likelihood estimation, k-nearest Neighbor methods, and multi-layer perceptron modeling have been used to determine the optimal setting.

Page 2 Table of Contents Background 3 Objective 3 Methodology 3 Analysis 4 Maximum Likelihood Estimation (Gaussian) 4 K-Nearest Neighbor Classification 6 Back Propagation Multi-Layer Perceptron 1 Discussion 12 References 14

Page 3 Background The University of Wisconsin Foundation is the organization responsible for acquiring funding for academics at the University of Wisconsin-Madison. Many employees at the UW- Foundation travel to locations around the world to meet with alumni and corporate sponsors to gain involvement with the university. The Information Technology department at the UW- Foundation has provided its employees the ability to connect to its network while away from the office via a Cisco Virtual Private Network (VPN) client. Each time a person connects, a log captures the employee who makes the connection, the length of connection, and the amount of data that is transferred over the network. Objective Since December 28, the University of Wisconsin Foundation has been capturing the VPN log information. Roughly 2 VPN connections have been made, and concepts learned in Introduction to Artificial Neural Networks and Fuzzy Systems will be used to adequately determine the minimum bandwidth needed to operate the VPN client for years to come. Allocating the minimum amount of resources can allow for more attention to other critical areas of the organization, while ensuring Quality of Service is available for the VPN connections. Methodology The study utilizes a data set which includes incoming and outgoing packets, which would comprise the feature space ([2x1]). The class vector would be similar to the ones used in class ([3x1]). The class vector will be either a [1 ], [ 1 ], or [ 1]. It will be based on a formula which examines strictly the length of the session. The length of session is used as a classifying parameter, because of the following ANOVA table: Response for study: Session Length Predictor Coef Standard Error Coef T Constant 3831.4 39.1 12.39 Incoming Packets.97936.3787 2.86 Outgoing Packets -.6696.262-21.38 As one can see, the session length time is statistically significant in terms of identifying levels of incoming and outgoing packets. Therefore to simplify the class vector, the session length time is the only indicator. Next, a classification set has been determined to equally distribute the class vectors evenly. The first method for classifying the points was sorting the time from shortest to longest and providing one third of each data set into a bin. For the purposes of using a three way cross-

Incoming Packets Page 4 validation study, the original data set will be used. The additional data sources were created by changing the sorting scheme to examine the incoming and outgoing packets, respectively. After a training set has been created, I will perform the following tests. Pattern Classification (Maximum Likelihood Estimation) and Clustering (Kmeans and self-organizing map) tests will be performed. Then a testing vector will be examined to determine a best case estimate. The final examination will utilize a backward-propagation multi-layer perceptron model tested at a variety of useful parameters seen throughout the course. Each of the tests will be analyzed, and compared to traditional statistical analysis. The objective of the tests is to beat a conservative estimate made by a 9% confidence interval of the historical data. Analysis Pattern Classification (Maximum Likelihood with Gaussian Likelihood Function) The first section of analysis from the VPN logs was a method for placing an optimal class label on the testing vector. mldemo_uni.m was the responsible matlab file for the analysis, and it seemed to yield fairly poor results. The confusion matrix and classification rate are as follows: Confusion Matrix 46 86 7 196 24 166 2 1 411 Classification Rate: 61.63% By inspection, it appears the first and third classes had a much easier time being classified than the second. The following is a plot of the results: x 1 4 Incoming vs Outgoing Packets 4. 4 3. Outgoing Incoming 3 2. 2 1. 1..2.4.6.8 1 1.2 1.4 1.6 1.8 2 Outgoing Packets x 1 4

Incoming Packets Incoming Packets Page Based on the chart, it is clear that the outgoing and incoming packets roughly follow a linear distribution. A log-normal classification algorithm was used, and is likely the reason for such a poor classification rate.. Incoming vs Outgoing Packets 9 8 7 6 4 3 2 1 Outgoing Incoming Classified Point Attempt 1 2 3 4 6 Outgoing Packets As one can see, the classification image reflects the poor overall rate. It is interesting to note from this image, that the packets classified near the origin are not easily picked up by the algorithm. The following is a snap-shot of the whole dataset. 2. x 1 Incoming vs Outgoing Packets 2 1. 1. Outgoing Incoming Classified Point Attempt 2 4 6 8 1 12 14 16 18 Outgoing Packets x 1 4

% classification error Page 6 In addition to examining the original data sets, the other two sets provided (sorted by incoming and outgoing packets) were also ran through the same testing. Their classification rates are: Packets Sorted Classification Rate Incoming 4.46% Outgoing 4.19% k-nearest Neighbor Classifier An interesting pattern classification algorithm is the k-nearest Neighbor Classifier, because the classification rate can be dependent upon the number of nearest neighbors selected in conjunction with the data provided. Using the same data as the Maximum Likelihood Estimation, the following was seen for k = {1,2,3,4,}: 4 Classification error rate vs. k 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. A local minimum occurs at k=7, but it may not be the global minimum. However, one may notice the classification rate at k=1 be an improvement compared to the rate found with the MLE analysis. In a search for the global minimum, the next analysis focused on k= {1,2,.2}.

% classification error % classification error Page 7 4 Classification error rate vs. k 4 3 3 2 2 1 1 1 1 2 2 As one can see, the k-nearest Neighbor Classifier performs as poorly than the Gaussian MLE approximation. The following utilizes a three-way cross validation study. The first component of this section analyzes the data from the first and second training and testing sets: 4 Classification Error Rate Vs K (Set 1 & Set 2) Classification error rate vs. k Set 1 % Set 2 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4.

% classification error % classification error Page 8 The next plot is examining the first and third data sets: 4 Classification Error Rate Vs K (Set 1 & Set 3) Classification error rate vs. k Set 1 % Set 2 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. Finally sets two and three are analyzed: 7 Classification Error Rate Vs K (Set 2 & Set 3) Classification error rate vs. k Set 1 % Set 2 6 4 3 2 1 1 1. 2 2. 3 3. 4 4. It is easy to notices the vast difference from all of the previous tests and the classification rate of greater than 9% when looking at sets two and three. Unfortunately, the plot does not show the classification with k=1, but the rate is 93.4%.

% classification error % classification error Page 9 The final sets of analysis using the k-nearest neighbors classifier is an analysis using a mean value of and a standard deviation of 1 for each data set. The three-way cross validation once again will be repeated. 4 Classification error rate vs. k Set 1 & Set 2 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. 3 Classification error rate vs. k Set 1 & Set 3 2 2 1 1 1 1. 2 2. 3 3. 4 4.

% classification error Page 1 6 Classification error rate vs. k Set 2 & Set 3 4 3 2 1 1 1. 2 2. 3 3. 4 4. It is interesting to see the improvement across testing 1v2 and 1v3, while testing 2v3 yielded the poorest improvement. Also, the strangest classification rate was with k=1 neighbor under the last test. Further discussion on these results will be in the final section.. Backwards Propagation Multi-Layer Perceptron Model The first step in any multi-layer perceptron model is determining proper parameters. From previous work with multi-layer perceptron, it is clear a small value of alpha, a momentum constant of ½ and a large set of iterations to find the best possible path. Once again, the three data sets are being examined, with the length of session being the first set. It yielded the following results with a 1 layer 2 neurons in the hidden layer: 98 248 748 464 Classification Rate 79.38%

Page 11 The same data was tested using a 1 layer model with 2 neurons in the hidden layer. 916 312 691 21 Classification Rate 79.% Next, the second set of data (in-coming packets) was compared in the same fashion (1 layers 1 neurons in hidden layer and 1 layers 2 neurons in hidden layer. 1171 39 62 63 Classification Rate 9.33% 121 731 484 Classification Rate 93.13% Lastly, we examine the third set of data using the same parameters: 1133 77 624 91 Classification Rate 94.78%

Page 12 168 142 61 6 Classification Rate 91.97% By brief examination, it is clear the network architecture containing 2 neurons in the hidden layer is an ideal condition. Also, the second and third data sets are once again more useful for classifying the network traffic than session time alone. Discussion Two key observation can be made about the maximum likelihood estimation using a Gaussian negative log function; the data from feature one and feature two are not linearly separable. Also, it is possible that the data does not follow a normal distribution. In fact, when the session length is plotted as a histogram, we have the following: Statistics gathered by the software package, Arena, declared the function to most likely be a Log-Normal curve. Unfortunately, due to time constraints, a log function was not able to be incorporated into this analysis, but the other techniques fortunately provided a better picture.

Page 13 The k-nearest Neighbor (KNN) classifier yielded more interesting results. The initial study, examining the initial data set, did not perform very well. In fact it had nearly the same result as the MLE classifier. While the first picture revealed k= as a minimum, which led to a study of k=1 2, the results were not the best for determining a likely estimate of in-going and out-going packets. Three-way cross validation yielded more usable results than the previously described sections. When examining the data sets on a higher level, it makes sense that the second and third data sets would react well to each other, while the first data set seems to be inseparable for classification purposes. The main drawback of the voting k-nn rule is that it implicitly assumes the k nearest neighbors of a data point x to be contained in a region of relatively small volume, so that sufficiently good resolution in the estimates of the different conditional densities can be obtained (Denoeux 1). Denoeux s point was clearly seen in effect without using three-way cross validation techniques, because using the packets in conjunction with the length of session yielded an improvement from the separate tests. It seems as if introducing a different type of data structure with a similar classifying pattern. The final development from the k-nn tests were the lack of influence related from the session length. Even though the statistical evidence is overwhelming, the k-nn brought a different perspective to this issue. Lastly, multi-layer perceptron brought a much different approach to the classification problem at hand. Through two different structures, there is confirmation through neural networks that the in-coming and out-going packets have a difficult time being classified exclusively by their session time. One major characteristic of back-propagation classifiers is long training times. Training times are typically longer when complex decision regions are required and when networks have more hidden layers (Lippmann ). While the cost of introducing this specific multi-layer perceptron technique may be a bit high, it was substantially more efficient than the k-nn testing, and it happened to lead to better results in general. As one can see throughout the analysis provided, a common theme has been the discrepancy of the regression model with the artificial neural network models. One possible explanation could be a user s length of session was found to be significantly related based on a multiple linear regression technique, because packets generally do increase as the session time increases. However, the specific nature of artificial neural networks has pinned down the precise classification of certain packets. Therefore it is the overall recommendation of this analysis to continue the use of artificial neural networks in terms of classifying Virtual Private Network connections. While the classification process demonstrated may not be fully developed to immediately introduce rules within the Information Technology department, it is a useful starting point to ensure quality of service is maintained across on-site and off-site connections. In conclusion, length of sessions amongst VPN users shall not be the decisive factor in determining the extent of network traffic at the University of Wisconsin Foundation.

Page 14 References Denoeux, T. A k-nearest neighbor classification rule based on Dempster-Shafer Theory Classic Works of the Dempster-Shafer Theory of Belief Functions. Studies in Fuzziness and Soft Computing (28) Lippmann, R.P. Pattern Classification Using Neural Networks IEEE Communications Magazine (1989)