Statistical based Approach for Packet Classification

Similar documents
Internet Traffic Classification using Machine Learning

Automated Traffic Classification and Application Identification using Machine Learning. Sebastian Zander, Thuy Nguyen, Grenville Armitage

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

4. The transport layer

Smart Home Network Management with Dynamic Traffic Distribution. Chenguang Zhu Xiang Ren Tianran Xu

CSC Network Security

SVILUPPO DI UNA TECNICA DI RICONOSCIMENTO STATISTICO DI APPLICAZIONI SU RETE IP

Internet Security: Firewall

W is a Firewall. Internet Security: Firewall. W a Firewall can Do. firewall = wall to protect against fire propagation

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

Can we trust the inter-packet time for traffic classification?

Bittorrent traffic classification

CSE 565 Computer Security Fall 2018

Tree-Based Minimization of TCAM Entries for Packet Classification

CyberP3i Course Module Series

ASA Access Control. Section 3

Distributed Systems. 27. Firewalls and Virtual Private Networks Paul Krzyzanowski. Rutgers University. Fall 2013

Real-Time Protocol (RTP)

Intrusion Detection Using Data Mining Technique (Classification)

Why Firewalls? Firewall Characteristics

Activating Intrusion Prevention Service

Cisco ASA Next-Generation Firewall Services

Network Support for Multimedia

Chapter 8. Network Troubleshooting. Part II

Modular Policy Framework. Class Maps SECTION 4. Advanced Configuration

ndpi & Machine Learning A future concrete idea

Developing the Sensor Capability in Cyber Security

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Early Application Identification

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Managing SonicWall Gateway Anti Virus Service

Comparison of Firewall, Intrusion Prevention and Antivirus Technologies

Table of Contents...2 Abstract...3 Protocol Flow Analyzer...3

Network Security. Thierry Sans

Applied IT Security. System Security. Dr. Stephan Spitz 6 Firewalls & IDS. Applied IT Security, Dr.

APPENDIX F THE TCP/IP PROTOCOL ARCHITECTURE

Objectives: (1) To learn to capture and analyze packets using wireshark. (2) To learn how protocols and layering are represented in packets.

INTRODUCTION TO ICT.

DDOS Attack Prevention Technique in Cloud

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Software Defined Networking

Internet Protocol version 6

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Detecting Malicious Hosts Using Traffic Flows

Detecting Specific Threats

KUPF: 2-Phase Selection Model of Classification Records

UNIT 2 TRANSPORT LAYER

10 Key Things Your VoIP Firewall Should Do. When voice joins applications and data on your network

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

Detecting Botnets Using Cisco NetFlow Protocol

Application Intelligence and Integrated Security Using Cisco Catalyst 6500 Supervisor Engine 32 PISA

liberate, (n): A library for exposing (traffic-classification) rules and avoiding them efficiently

BUILDING A NEXT-GENERATION FIREWALL

Big Data Analytics for Host Misbehavior Detection

Mapping Mechanism to Enhance QoS in IP Networks

Configuring Application Visibility and Control for Cisco Flexible Netflow

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Wireless Networks And Cross-Layer Design: An Implementation Approach

A Firewall Architecture to Enhance Performance of Enterprise Network

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

Distributed Systems. 29. Firewalls. Paul Krzyzanowski. Rutgers University. Fall 2015

Traffic and Performance Visibility for Cisco Live 2010, Barcelona

BitTorrent Traffic Classification

Firewall Simulation COMP620

Tunneling Activities Detection Using Machine Learning Techniques

Network Defenses 21 JANUARY KAMI VANIEA 1

Network Defenses 21 JANUARY KAMI VANIEA 1

Application Firewalls

FPGA based Network Traffic Analysis using Traffic Dispersion Graphs

Basic Concepts in Intrusion Detection

DESIGN AND IMPLEMENTATION OF OPTIMIZED PACKET CLASSIFIER

Chapter 8 roadmap. Network Security

Can Passive Mobile Application Traffic be Identified using Machine Learning Techniques

n Learn about the Security+ exam n Learn basic terminology and the basic approaches n Implement security configuration parameters on network

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis

HOW TO CHOOSE A NEXT-GENERATION WEB APPLICATION FIREWALL

Automation the process of unifying the change in the firewall performance

Sirindhorn International Institute of Technology Thammasat University

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

Efficient Flow based Network Traffic Classification using Machine Learning

An advanced data leakage detection system analyzing relations between data leak activity

Configuring attack detection and prevention 1

Network Security and Cryptography. December Sample Exam Marking Scheme

Network Analysis of Point of Sale System Compromises

Fast and Reconfigurable Packet Classification Engine in FPGA-Based Firewall

3.2 COMMUNICATION AND INTERNET TECHNOLOGIES

Optimizing the Internet Quality of Service and Economics for the Digital Generation. Dr. Lawrence Roberts President and CEO,

Indicate whether the statement is true or false.

* Knowledge of Adaptive Security Appliance (ASA) firewall, Adaptive Security Device Manager (ASDM).

9. Wireshark I: Protocol Stack and Ethernet

UDP: Datagram Transport Service

Configuring QoS CHAPTER

CSCD 433/533 Advanced Networks

Managed IP Services from Dial Access to Gigabit Routers

Network Defenses KAMI VANIEA 1

9th Slide Set Computer Networks

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Transcription:

Statistical based Approach for Packet Classification Dr. Mrudul Dixit 1, Ankita Sanjay Moholkar 2, Sagarika Satish Limaye 2, Devashree Chandrashekhar Limaye 2 Cummins College of engineering for women, Pune 1 Assistant Professor, 2 Student ABSTRACT Increasing demand for large network bandwidths leads to the growth in the data traffic. As a consequence of this packet classification is required for handling the drastically increasing the traffic on both the edge and the core devices. In today s world, large applications have been developed like service-aware routing, intrusion detection and prevention systems, intelligent routing, traffic management and shaping. Hence use of intelligent and efficient packet classification techniques is recommended. Traditional classification techniques such as Port based classification, Deep packet inspection, etc. have certain drawbacks. Hence there is a need for more efficient classification techniques to route internet packets to their specific applications. One such technique is Statistical Classification. This paper gives an overview of its advantages and limitations. KEYWORDS Statistical Classification, Internet Packets, Statistical Parameters, Packet Classification, Port based Classification, Supervised learning, Internet traffic. I. INTRODUCTION Packet Classification is a process of classifying or processing internet packets for meeting certain objectives such as routing and filtering. Access control, traffic engineering, intrusion detection, software defined networking and many other network services require the discrimination of packets based on the multiple fields of packet headers, which is called packet classification. Packet classification features provide the capability to partition network traffic into multiple priority levels or classes of service. The objective of the packet classification is to classify the internet packets by applying a set of predefined rules. Each rule consists of some set of components in a range matching expression. A packet is said to be matched if and only if a desired rule matches the particular field of the packet and satisfies the matching expression. A2 A3 A4 A1 A5 R2 R3 R4 R1 R5 CLASSIFIER Internet Packets FIG 1: PACKET CLASSIFICATION 25 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

Internet packet is sent to the particular application(a) if it matches the rule(r), else it is dropped. PERFORMANCE MEASURES Packetclassification techniques can be chosen on the basis of certain performance measures. They are as follows:- a. Accuracy It is the measure of correct classifications to total classifications. The error rate should be as low as possible for all the classifications, and significantly for crucial key classes. b. Speed There is a trade-off between the accuracy and speed( at which classifier does it's work). A highly accurate system is comparatively slower. Hence the classifier is chosen based on the speed required for a particular application. c. Comprehensibility It is a measure of the ease with which a human operator can use the classifier. d. Time to learn - For real-time systems, it is necessary to learn a classification rule quickly or make adjustments quickly, or read only a few observations to establish the rule. [5] II. APPLICATIONS OF PACKET CLASSIFICATION 1. ROUTING Routers are devices that route packets towards their destinations, and physical links that transport packets from one router to another. Routers need to have the capability todistinguish and isolate traffic belonging to different flows. The ability to classify each incomingpacket to determine the flow it belongs to is called packet classification, and could be based onan arbitrary number of fields in the packet header. FIG 2: ROUTING 2. INTRUSION DETECTION SYSTEM Intrusion Detection system is a software which helps us to protect our system from other system when other person tries to access our system through network. Predictive type of modeling is used to predict the output based on past data. Classification is used to predict the output by relying on past collected data. Intrusion 26 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

detection system monitors network traffic for identifying harmful packets. Packet classification is an important to the IDS functionality as it is in charge of scanning network packet header [7]. FIG 3: INTRUSION DETECTION SYSTEM 3. FIREWALL Packet filtering: a type of packet classification is often part of a firewall program for protecting a local network from unwanted intrusion. Firewall unlike intrusion detection system does not block malicious packets it simply drops packets blocked by user. The process of passing or dropping a packet is based on source and destination addresses, ports, or protocols. Packet filter is a program used in software firewall. Each packet is examined by the packet classifier based on a specific set of rules, and depending upon these rules a decision is made on whether to drop or accept the packet. FIG 4: FIREWALL 27 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

4. SECURITY AND NETWORK MONITORING Security and network monitoring applications require packet classification and filtration. Security and network monitoring systems are used for monitoring a large network and identifying breaches and threats and taking corrective measures [6]. III. TRADITIONAL METHODS FOR PACKET CLASSIFICATION Packet classification means finding out in a set of filters the highest priority filter matching the packet. The classification task has become more complex and is based on at least five fields these fields are dependent upon the type of classification process used. 1. Port based classification: A port number is the logical address of each application or process that uses a network or the Internet to communicate. Port numbers range from 0-65535. It is the most common method that traffic classification is based on: associating a well-known port number with a given traffic type. This method needs access only to the header of the packets. It matches port numbers to applications where an application is associated with well-defined port numbers (0-1024), example: http used for browsing has port number 80; SMTP used for email has port number 25, etc. It is fast and simple to implement. No complex computations required. But this traditional technique has some limitations: Some port numbers may be the dynamic ones that are not registered with IANA (port numbers above 49151). Port numbers in the system can be manipulated and hence packet classification can fail. Useful only for the applications and services, which use fixed port numbers IP layer encryption may obscure the TCP and UDP port numbers thus making it unable to identify the actual port numbers.[1] Payload Based Classification (Deep Packet Inspection): A payload refers to the actual data that is to be transferred to the desired destination. It is appended with a header having parameters like source and destination addresses for transport. This method looks into the headers as well as the payloads inside the packet. It performs scanning of the payload bit by bit to identify a predefined sequence stream of a certain network protocol. The stored sequences of the bit steams are compared with those of thepayloads and classified accordingly. But this traditional technique has some limitations: It has significant complexities and processing on the traffic devices. Direct payload analysis may lead to violation of privacy policies. It is also difficult to maintain bit stream sequences having high hit ratios. To reduce trace file size, packets are recorded with limited length. Hence, signature may not be contained in that part. It lacks support for many applications (example: Skype) Packet fragmentation also leads to computational complexity. [1] IV. MODERN TECHNIQUE FOR PACKET CLASSIFICATION To overcome the limitations of the traditional techniques, a new method has been found which deals with the statistical parameters of the packets for efficient classification. 28 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

STATISTICAL CLASSIFICATION Statistical based classification techniques is referred to as a modern technique for packet classification done at transport or network layer. A statistical classification algorithm is built on the basis of cluster analysis. Traffic is taken from the datasets made up of thousands of flows for many different application protocols. The classic approach of training and testing shows that cluster analysis yields very good results with very little information. The investigated applications are characterized from a signature at the network layer that is used to recognize such applications even when the port number is not significant[3]. It deals with only a certain statistical parameters and not with the headers or data of the packets. It uses parameters such as: Distribution of flow duration The incoming packet flow is separated according to the incoming time and is further distributed on the basis of number of flows of various applications. FIG 5: DISTRIBUTION OF FLOW DURATION [2] Flow idle time - It is a measure of the time for which the flow remains idle and no incoming packets are received. For e.g. - Packets received till(p.r.t): 9.54 p.m. Packets received after(p.r.a): 9.57 p.m. Hence, flow idle time = P.r.a P.r.t = 3 minutes (no packets received for 3 minutes as the flow was idle). Packet inter-arrival time It is the mean or average of the time lapse between the data packets arriving at a certain host. Total packet length It is the total length of a single packet including its header and data. Number of packets It is the number of incoming packets in a given period of time. Minimum/maximum, average and standard deviation of packet length - Standard deviation of the packet lengths is found in order to differentiate the packets. Minimum/maximum, average and standard deviation of inter-arrival time - Standard deviation of the inter-arrival time is found in order to differentiate the packets. PACKET NO. 1 PACKET NO. 2 PACKET NO. 3 PACKET INTER-ARRIVAL TIME TOTAL PACKET LENGTH FIG 6: STATISTICAL PARAMETERS: PACKET LENGTH AND INTER-ARRIVAL TIME 29 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

These parameters are different for different applications. The traffic can be real-time, or can be obtained from the datasets such as NSL-KDD, DARPA, CAIDA, etc. The incoming packets having similar parameters are grouped into clusters, where one particular cluster belongs to one particular application. The formation of these clusters is done using machine learning (supervised learning). GROUP 1: Packet inter-arrival time: 20-50ms GROUP 2: Packet inter-arrival time: 30-45s Flow idle time: Flow idle time: APPLICATION 1 APPLICATION 2 1-3 minutes 2-5 minutes FIG 7: FORMATION OF GROUPS Supervised Learning: Supervised learning creates knowledge structures that support the task of classifying new instances into predefined classes [3]. The machine that is to be trained for packet classification is given a set of parameters (example: inter arrival time). In supervised learning there are two types of datasets: training and testing. Training: The learning phase that examines the provided data (called the training dataset) and constructs (builds) a classification model [3]. Testing: The model that has been built in the training phase is used to classify new unseen instance [3]. The training dataset is used by the classifier to study the data recognize the patterns in the pre-defined parameter and form groups according to the application, based on a probability model (like in Naïve Bayes ). And then it is tested using the testing dataset. And based on these results efficiency and other parameters are obtained. Various supervised learning algorithms that can be used for packet classification are: Decision Trees Support Vector Machine K-Nearest Mapping Naïve Bayes Random Forest DATASET TRAINING DATASETS TESTING DATASETS DATA NORMALIZATION DATA NORMALIZATION ML ALGORITHMS APPLICATION 1 APPLICATION 3 APPLICATION 2 FIG 8: STATISTICAL CLASSIFICATION USING ML ALGORITHMS 30 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye

V. WHY STATISTICAL CLASSIFICATION OVER TRADITIONAL METHODS? The main limitation of deep packet inspection c that is security breach on instances where data is encrypted is avoided as in case of statistical approach inspection of packet is not needed. Statistical approach does not need information like port number so any tampering with the port number, which is the limitation of port based classification, is avoided. Statistical classification is in real time and hence is adaptable to changes that happen in real time, it can train itself to adapt to the changes that happen over time. Statistical classification attains high accuracy even though it is based on very limited information. The statistical approach is faster than the traditional approach as it is based on the flow characteristics and does not require information like the source and destination IP etc. from the form the header. VI. CONCLUSION This paper gives an overview of packet classification and compares the traditional and modern techniques used for packet classification. It also reviews the limitations of the traditional techniques, and states how modern technique called Statistical Classification can overcome those limitations.it can be seen that the modern technique of packet classification is more adaptable, as it is based on machine learning, than the traditional methods. The performance measures like accuracy, speed, comprehensibilityand time to learn are better for statistical classification than any traditional method, as statistical classification works in real time. This modern technique is very efficient, accurate, and capable of classifying traffic in real time, which is very important in the era of growing Internet traffic. REFERENCES [1]CiprianDobre, Internet Traffic Classification based on Flow Statistical Properties with Machine Int.J.NetworkMgmt 2015; 00:1-14 [2] Myung-Sup Kim Young J, Flow-based Characteristic Analysis of Internet Application Traffic. Learning, [3]Thuy T.T. Nguyen and Grenville Armitage, A Survey of Techniques for Internet Traffic Classification using Machine Learning. [4]Andrea Baiocchi, On-the-fly statistical classification of internet traffic at application layer based on cluster analysis. [5] Book- Donald Michie, David Stiegelhalter, Charles Taylor,Machine learningneural and Statistical classification [6] Chapter 3: reconfigurable firewall based on FPGA-based parameterized content addressable memory [7]V. Jaiganesh Dr. P. Sumathi A.Vinitha, Classification Algorithms in Intrusion Detection System: A Survey 31 Dr. Mrudul Dixit, Ankita Sanjay Moholkar, Sagarika Satish Limaye, Devashree Chandrashekhar Limaye