Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Similar documents
CSC 2515 Introduction to Machine Learning Assignment 2

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Performance Measures

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Performance Analysis of Data Mining Classification Techniques

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CS229 Final Project: Predicting Expected Response Times

Extreme Learning Machines. Tony Oakden ANU AI Masters Project (early Presentation) 4/8/2014

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Machine Learning Lecture 3

Classifying Building Energy Consumption Behavior Using an Ensemble of Machine Learning Methods

Lecture 03 Linear classification methods II

A New Measure of the Cluster Hypothesis

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

The Detection of Faces in Color Images: EE368 Project Report

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

Detecting Spam with Artificial Neural Networks

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

An Algorithm For Training Multilayer Perceptron (MLP) For Image Reconstruction Using Neural Network Without Overfitting.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

Identification of the correct hard-scatter vertex at the Large Hadron Collider

A Neural Network Model Of Insurance Customer Ratings

Louis Fourrier Fabien Gaie Thomas Rolf

Simple Model Selection Cross Validation Regularization Neural Networks

Machine Learning. Supervised Learning. Manfred Huber

RAJESH KEDIA 2014CSZ8383

ECE 5470 Classification, Machine Learning, and Neural Network Review

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

The role of Fisher information in primary data space for neighbourhood mapping

Machine Learning Techniques for Data Mining

Perceptron as a graph

Slides for Data Mining by I. H. Witten and E. Frank

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Robust PDF Table Locator

10-701/15-781, Fall 2006, Final

Decoding the Human Motor Cortex

Support Vector Machines

Comparison of Methods for Analyzing and Interpreting Censored Exposure Data

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

arxiv: v2 [cs.lg] 11 Sep 2015

Data Preprocessing. Supervised Learning

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Random projection for non-gaussian mixture models

Creating a Classifier for a Focused Web Crawler

A Data Classification Algorithm of Internet of Things Based on Neural Network

Lecture 25: Review I

A Taxonomy of Semi-Supervised Learning Algorithms

Music Genre Classification

Machine Learning in Biology

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Naïve Bayes for text classification

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

Probabilistic Classifiers DWML, /27

Lecture 3: Linear Classification

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Using Machine Learning to Optimize Storage Systems

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

A Comparative Study of Conventional and Neural Network Classification of Multispectral Data

Classifying Depositional Environments in Satellite Images

Network Traffic Measurements and Analysis

Support Vector Machines

A Novel Approach to Image Segmentation for Traffic Sign Recognition Jon Jay Hack and Sidd Jagadish

We use non-bold capital letters for all random variables in these notes, whether they are scalar-, vector-, matrix-, or whatever-valued.

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

Unsupervised Learning

Assignment 1: CS Machine Learning

Use of Extreme Value Statistics in Modeling Biometric Systems

Response to API 1163 and Its Impact on Pipeline Integrity Management

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Predicting Diabetes using Neural Networks and Randomized Optimization

Box-Cox Transformation for Simple Linear Regression

The Mathematics Behind Neural Networks

Exercise: Training Simple MLP by Backpropagation. Using Netlab.

6.034 Quiz 2, Spring 2005

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

Using Genetic Algorithms to Improve Pattern Classification Performance

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Random Forest A. Fornaser

Computational Intelligence Meets the NetFlix Prize

CS6220: DATA MINING TECHNIQUES

Tutorials (M. Biehl)

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

SD 372 Pattern Recognition

Why Do Nearest-Neighbour Algorithms Do So Well?

Outlier detection using autoencoders

Feature Selection with Decision Tree Criterion

Predict the box office of US movies

CS570: Introduction to Data Mining

Transcription:

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems Fall 21

Page 1 Abstract The focus of this study is to determine the optimal allotment of bandwidth for users at the University of Wisconsin Foundation whom connect to the physical network via a Virtual Private Network connection. By the use of artificial neural network pattern classification techniques, the UW Foundation can ensure quality of services amongst all of its connection types, which is critical to business success. The objective is to determine an accurate estimate of usage based on a classification system that examines in-going and out-going packets. Maximum likelihood estimation, k-nearest Neighbor methods, and multi-layer perceptron modeling have been used to determine the optimal setting.

Page 2 Table of Contents Background 3 Objective 3 Methodology 3 Analysis 4 Maximum Likelihood Estimation (Gaussian) 4 K-Nearest Neighbor Classification 6 Back Propagation Multi-Layer Perceptron 1 Discussion 12 References 14

Page 3 Background The University of Wisconsin Foundation is the organization responsible for acquiring funding for academics at the University of Wisconsin-Madison. Many employees at the UW- Foundation travel to locations around the world to meet with alumni and corporate sponsors to gain involvement with the university. The Information Technology department at the UW- Foundation has provided its employees the ability to connect to its network while away from the office via a Cisco Virtual Private Network (VPN) client. Each time a person connects, a log captures the employee who makes the connection, the length of connection, and the amount of data that is transferred over the network. Objective Since December 28, the University of Wisconsin Foundation has been capturing the VPN log information. Roughly 2 VPN connections have been made, and concepts learned in Introduction to Artificial Neural Networks and Fuzzy Systems will be used to adequately determine the minimum bandwidth needed to operate the VPN client for years to come. Allocating the minimum amount of resources can allow for more attention to other critical areas of the organization, while ensuring Quality of Service is available for the VPN connections. Methodology The study utilizes a data set which includes incoming and outgoing packets, which would comprise the feature space ([2x1]). The class vector would be similar to the ones used in class ([3x1]). The class vector will be either a [1 ], [ 1 ], or [ 1]. It will be based on a formula which examines strictly the length of the session. The length of session is used as a classifying parameter, because of the following ANOVA table: Response for study: Session Length Predictor Coef Standard Error Coef T Constant 3831.4 39.1 12.39 Incoming Packets.97936.3787 2.86 Outgoing Packets -.6696.262-21.38 As one can see, the session length time is statistically significant in terms of identifying levels of incoming and outgoing packets. Therefore to simplify the class vector, the session length time is the only indicator. Next, a classification set has been determined to equally distribute the class vectors evenly. The first method for classifying the points was sorting the time from shortest to longest and providing one third of each data set into a bin. For the purposes of using a three way cross-

Incoming Packets Page 4 validation study, the original data set will be used. The additional data sources were created by changing the sorting scheme to examine the incoming and outgoing packets, respectively. After a training set has been created, I will perform the following tests. Pattern Classification (Maximum Likelihood Estimation) and Clustering (Kmeans and self-organizing map) tests will be performed. Then a testing vector will be examined to determine a best case estimate. The final examination will utilize a backward-propagation multi-layer perceptron model tested at a variety of useful parameters seen throughout the course. Each of the tests will be analyzed, and compared to traditional statistical analysis. The objective of the tests is to beat a conservative estimate made by a 9% confidence interval of the historical data. Analysis Pattern Classification (Maximum Likelihood with Gaussian Likelihood Function) The first section of analysis from the VPN logs was a method for placing an optimal class label on the testing vector. mldemo_uni.m was the responsible matlab file for the analysis, and it seemed to yield fairly poor results. The confusion matrix and classification rate are as follows: Confusion Matrix 46 86 7 196 24 166 2 1 411 Classification Rate: 61.63% By inspection, it appears the first and third classes had a much easier time being classified than the second. The following is a plot of the results: x 1 4 Incoming vs Outgoing Packets 4. 4 3. Outgoing Incoming 3 2. 2 1. 1..2.4.6.8 1 1.2 1.4 1.6 1.8 2 Outgoing Packets x 1 4

Incoming Packets Incoming Packets Page Based on the chart, it is clear that the outgoing and incoming packets roughly follow a linear distribution. A log-normal classification algorithm was used, and is likely the reason for such a poor classification rate.. Incoming vs Outgoing Packets 9 8 7 6 4 3 2 1 Outgoing Incoming Classified Point Attempt 1 2 3 4 6 Outgoing Packets As one can see, the classification image reflects the poor overall rate. It is interesting to note from this image, that the packets classified near the origin are not easily picked up by the algorithm. The following is a snap-shot of the whole dataset. 2. x 1 Incoming vs Outgoing Packets 2 1. 1. Outgoing Incoming Classified Point Attempt 2 4 6 8 1 12 14 16 18 Outgoing Packets x 1 4

% classification error Page 6 In addition to examining the original data sets, the other two sets provided (sorted by incoming and outgoing packets) were also ran through the same testing. Their classification rates are: Packets Sorted Classification Rate Incoming 4.46% Outgoing 4.19% k-nearest Neighbor Classifier An interesting pattern classification algorithm is the k-nearest Neighbor Classifier, because the classification rate can be dependent upon the number of nearest neighbors selected in conjunction with the data provided. Using the same data as the Maximum Likelihood Estimation, the following was seen for k = {1,2,3,4,}: 4 Classification error rate vs. k 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. A local minimum occurs at k=7, but it may not be the global minimum. However, one may notice the classification rate at k=1 be an improvement compared to the rate found with the MLE analysis. In a search for the global minimum, the next analysis focused on k= {1,2,.2}.

% classification error % classification error Page 7 4 Classification error rate vs. k 4 3 3 2 2 1 1 1 1 2 2 As one can see, the k-nearest Neighbor Classifier performs as poorly than the Gaussian MLE approximation. The following utilizes a three-way cross validation study. The first component of this section analyzes the data from the first and second training and testing sets: 4 Classification Error Rate Vs K (Set 1 & Set 2) Classification error rate vs. k Set 1 % Set 2 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4.

% classification error % classification error Page 8 The next plot is examining the first and third data sets: 4 Classification Error Rate Vs K (Set 1 & Set 3) Classification error rate vs. k Set 1 % Set 2 4 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. Finally sets two and three are analyzed: 7 Classification Error Rate Vs K (Set 2 & Set 3) Classification error rate vs. k Set 1 % Set 2 6 4 3 2 1 1 1. 2 2. 3 3. 4 4. It is easy to notices the vast difference from all of the previous tests and the classification rate of greater than 9% when looking at sets two and three. Unfortunately, the plot does not show the classification with k=1, but the rate is 93.4%.

% classification error % classification error Page 9 The final sets of analysis using the k-nearest neighbors classifier is an analysis using a mean value of and a standard deviation of 1 for each data set. The three-way cross validation once again will be repeated. 4 Classification error rate vs. k Set 1 & Set 2 3 3 2 2 1 1 1 1. 2 2. 3 3. 4 4. 3 Classification error rate vs. k Set 1 & Set 3 2 2 1 1 1 1. 2 2. 3 3. 4 4.

% classification error Page 1 6 Classification error rate vs. k Set 2 & Set 3 4 3 2 1 1 1. 2 2. 3 3. 4 4. It is interesting to see the improvement across testing 1v2 and 1v3, while testing 2v3 yielded the poorest improvement. Also, the strangest classification rate was with k=1 neighbor under the last test. Further discussion on these results will be in the final section.. Backwards Propagation Multi-Layer Perceptron Model The first step in any multi-layer perceptron model is determining proper parameters. From previous work with multi-layer perceptron, it is clear a small value of alpha, a momentum constant of ½ and a large set of iterations to find the best possible path. Once again, the three data sets are being examined, with the length of session being the first set. It yielded the following results with a 1 layer 2 neurons in the hidden layer: 98 248 748 464 Classification Rate 79.38%

Page 11 The same data was tested using a 1 layer model with 2 neurons in the hidden layer. 916 312 691 21 Classification Rate 79.% Next, the second set of data (in-coming packets) was compared in the same fashion (1 layers 1 neurons in hidden layer and 1 layers 2 neurons in hidden layer. 1171 39 62 63 Classification Rate 9.33% 121 731 484 Classification Rate 93.13% Lastly, we examine the third set of data using the same parameters: 1133 77 624 91 Classification Rate 94.78%

Page 12 168 142 61 6 Classification Rate 91.97% By brief examination, it is clear the network architecture containing 2 neurons in the hidden layer is an ideal condition. Also, the second and third data sets are once again more useful for classifying the network traffic than session time alone. Discussion Two key observation can be made about the maximum likelihood estimation using a Gaussian negative log function; the data from feature one and feature two are not linearly separable. Also, it is possible that the data does not follow a normal distribution. In fact, when the session length is plotted as a histogram, we have the following: Statistics gathered by the software package, Arena, declared the function to most likely be a Log-Normal curve. Unfortunately, due to time constraints, a log function was not able to be incorporated into this analysis, but the other techniques fortunately provided a better picture.

Page 13 The k-nearest Neighbor (KNN) classifier yielded more interesting results. The initial study, examining the initial data set, did not perform very well. In fact it had nearly the same result as the MLE classifier. While the first picture revealed k= as a minimum, which led to a study of k=1 2, the results were not the best for determining a likely estimate of in-going and out-going packets. Three-way cross validation yielded more usable results than the previously described sections. When examining the data sets on a higher level, it makes sense that the second and third data sets would react well to each other, while the first data set seems to be inseparable for classification purposes. The main drawback of the voting k-nn rule is that it implicitly assumes the k nearest neighbors of a data point x to be contained in a region of relatively small volume, so that sufficiently good resolution in the estimates of the different conditional densities can be obtained (Denoeux 1). Denoeux s point was clearly seen in effect without using three-way cross validation techniques, because using the packets in conjunction with the length of session yielded an improvement from the separate tests. It seems as if introducing a different type of data structure with a similar classifying pattern. The final development from the k-nn tests were the lack of influence related from the session length. Even though the statistical evidence is overwhelming, the k-nn brought a different perspective to this issue. Lastly, multi-layer perceptron brought a much different approach to the classification problem at hand. Through two different structures, there is confirmation through neural networks that the in-coming and out-going packets have a difficult time being classified exclusively by their session time. One major characteristic of back-propagation classifiers is long training times. Training times are typically longer when complex decision regions are required and when networks have more hidden layers (Lippmann ). While the cost of introducing this specific multi-layer perceptron technique may be a bit high, it was substantially more efficient than the k-nn testing, and it happened to lead to better results in general. As one can see throughout the analysis provided, a common theme has been the discrepancy of the regression model with the artificial neural network models. One possible explanation could be a user s length of session was found to be significantly related based on a multiple linear regression technique, because packets generally do increase as the session time increases. However, the specific nature of artificial neural networks has pinned down the precise classification of certain packets. Therefore it is the overall recommendation of this analysis to continue the use of artificial neural networks in terms of classifying Virtual Private Network connections. While the classification process demonstrated may not be fully developed to immediately introduce rules within the Information Technology department, it is a useful starting point to ensure quality of service is maintained across on-site and off-site connections. In conclusion, length of sessions amongst VPN users shall not be the decisive factor in determining the extent of network traffic at the University of Wisconsin Foundation.

Page 14 References Denoeux, T. A k-nearest neighbor classification rule based on Dempster-Shafer Theory Classic Works of the Dempster-Shafer Theory of Belief Functions. Studies in Fuzziness and Soft Computing (28) Lippmann, R.P. Pattern Classification Using Neural Networks IEEE Communications Magazine (1989)