Clustering Analysis for Malicious Network Traffic

Similar documents
Clustering Analysis for Malicious Network Traffic

Automatic Record Linkage using Seeded Nearest Neighbour and SVM Classification

Outsourcing Privacy-Preserving Social Networks to a Cloud

Integration of information security and network data mining technology in the era of big data

A PSO-based Generic Classifier Design and Weka Implementation Study

Cluster analysis. Agnieszka Nowak - Brzezinska

A Multi-Copy Delegation Forwarding Based On Short-term and Long-Term Speed in DTNs

6. Dicretization methods 6.1 The purpose of discretization

Dynamic Grouping Strategy in Cloud Computing

An Efficient Clustering for Crime Analysis

ECLT 5810 Clustering

An Intrusion Alert Correlator Based on Prerequisites of Intrusions

A priority based dynamic bandwidth scheduling in SDN networks 1

Adaptive Clustering with Feature Ranking for DDoS Attacks Detection

Spoofing Detection in Wireless Networks

Improved Discretization Based Decision Tree for Continuous Attributes

ECLT 5810 Clustering

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Leveraging Set Relations in Exact Set Similarity Join

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

ANALYSIS AND EVALUATION OF DISTRIBUTED DENIAL OF SERVICE ATTACKS IDENTIFICATION METHODS

An Efficient Character Segmentation Algorithm for Printed Chinese Documents

Weka ( )

Robust Network Traffic Classification

MURDOCH RESEARCH REPOSITORY

An algorithm of lips secondary positioning and feature extraction based on YCbCr color space SHEN Xian-geng 1, WU Wei 2

Stochastic Pre-Classification for SDN Data Plane Matching

Top-k Keyword Search Over Graphs Based On Backward Search

Intrusion Detection by Combining and Clustering Diverse Monitor Data

Cluster Validation. Ke Chen. Reading: [25.1.2, KPM], [Wang et al., 2009], [Yang & Chen, 2011] COMP24111 Machine Learning

DDoS Detection in SDN Switches using Support Vector Machine Classifier

Automatic training example selection for scalable unsupervised record linkage

INF4820, Algorithms for AI and NLP: Hierarchical Clustering

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Unsupervised Learning

A New Clustering Algorithm On Nominal Data Sets

Research on Evaluation Method of Product Style Semantics Based on Neural Network

Metric and Identification of Spatial Objects Based on Data Fields

Basic Concepts in Intrusion Detection

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

Detecting Protected Layer-3 Rogue APs

An Application of Genetic Algorithm for Auto-body Panel Die-design Case Library Based on Grid

Mining Quantitative Association Rules on Overlapped Intervals

Multi-path based Algorithms for Data Transfer in the Grid Environment

Intelligent management of on-line video learning resources supported by Web-mining technology based on the practical application of VOD

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Efficient Lists Intersection by CPU- GPU Cooperative Computing

Mapping Internet Sensors with Probe Response Attacks

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Clustering. Chapter 10 in Introduction to statistical learning

3. Data Preprocessing. 3.1 Introduction

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Rotation Invariant Finger Vein Recognition *

A Real-time Rendering Method Based on Precomputed Hierarchical Levels of Detail in Huge Dataset

MITIGATING DENIAL OF SERVICE ATTACKS IN OLSR PROTOCOL USING FICTITIOUS NODES

Automatic Shadow Removal by Illuminance in HSV Color Space

Research of Applications in Relational Database on Digital. Watermarking Technology

DDOS Attack Prevention Technique in Cloud

An Adaptive Threshold LBP Algorithm for Face Recognition

Fast and Evasive Attacks: Highlighting the Challenges Ahead

Parallelizing Video Frame Segmentation using Region Growing Technique on Multi-Core Processors

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Information Retrieval and Organisation

Pupil Localization Algorithm based on Hough Transform and Harris Corner Detection

A multi-step attack-correlation method with privacy protection

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

C E N T E R A T H O U S T O N S C H O O L of H E A L T H I N F O R M A T I O N S C I E N C E S. Image Operations II

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

A Batched GPU Algorithm for Set Intersection

Community-Aware Opportunistic Routing in Mobile Social Networks

Research on Incomplete Transaction Footprints in Networked Software

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Face Hallucination Based on Eigentransformation Learning

FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP

Research on online inspection system of emulsion explosive packaging defect based on machine vision

Research of Traffic Flow Based on SVM Method. Deng-hong YIN, Jian WANG and Bo LI *

Research on adaptive network theft Trojan detection model Ting Wu

Review on Data Mining Techniques for Intrusion Detection System

An Approach to Polygonal Approximation of Digital CurvesBasedonDiscreteParticleSwarmAlgorithm

Energy-aware Scheduling for Frame-based Tasks on Heterogeneous Multiprocessor Platforms

Numerical Recognition in the Verification Process of Mechanical and Electronic Coal Mine Anemometer

Fuzzy-Kernel Learning Vector Quantization

SOCIAL MEDIA MINING. Data Mining Essentials

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

Communication Pattern Anomaly Detection in Process Control Systems

Clock-Based Proxy Re-encryption Scheme in Unreliable Clouds

Mapping Internet Sensors with Probe Response Attacks

The Application of K-medoids and PAM to the Clustering of Rules

Semi supervised clustering for Text Clustering

Chapter DM:II. II. Cluster Analysis

Decision tree learning

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

TO DETECT AND RECOVER THE AUTHORIZED CLI- ENT BY USING ADAPTIVE ALGORITHM

Operators-Based on Second Derivative double derivative Laplacian operator Laplacian Operator Laplacian Of Gaussian (LOG) Operator LOG

Research on QR Code Image Pre-processing Algorithm under Complex Background

Transcription:

Clustering Analysis for Malicious Network Traffic Jie Wang, Lili Yang, Jie Wu and Jemal H. Abawajy School of Information Science and Engineering, Central South University, Changsha, China Email: jwang,liliyang@csu.edu.cn Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, USA Email: jiewu@temple.edu School of Information Technology, Deakin University, Burwood 3125, Australia Email: jemal@deakin.edu.au

Outline! Introduction! Malicious network traffic clustering scheme! Clustering algorithm based on seedingexpanding! Evaluation! Conclusions

Introduction! It is essential to identify what phase the attack is in as early as possible in order to detect attacks earlier.! Clustering analysis is a good technology for classifying network traffic into different attack phases.! However, because we cannot control the similarity level of data points in the clustering process, these cluster algorithms are not appropriate for detecting malicious network flows.! New methods?

Outline! Introduction! Malicious Network Traffic Clustering Scheme! Clustering Algorithm Based on Seeding- Expanding! Evaluation! Conclusions

Malicious Network Traffic Clustering Scheme! Flow-level features

Malicious Network Traffic Clustering Scheme! Feature preprocessing " A continuous-valued attribute is discretized by partitioning its range into multiple intervals. " Nominal attributes can be encoded using asymmetric binary attributes by creating a new binary attribute for each of the M states. " For an object with a given state value, the binary attribute representing that state is set to 1, while the remaining binary attributes are set to 0.

Malicious Network Traffic Clustering Scheme! The process of malicious network traffic clustering from identifying flows includes: " Flow identification: flows are first constructed by aggregating traffic based on the 5-tuple flow identifiers " Feature extraction: features can be extracted from the flows " Feature preprocessing " Unsupervised learning and clustering

Outline! Introduction! Malicious Network Traffic Clustering Scheme! Clustering Algorithm Based on Seeding- Expanding! Evaluation! Conclusions

Clustering Algorithm Based on Seeding- Expanding! Similarity computing " The asymmetric binary similarity between the objects i and j can be computed as " where q is the number of positive matches that equal 1 for both objects i and j, r is the number of attributes that equal 1 for object i but equal 0 for object j, and s is the number of attributes that equal 0 for object i but equal 1 for object j

Clustering Algorithm Based on Seeding- Expanding! The SE algorithm contains the following steps: " Weight computing. For each data point d i, the weight of d i, Weightd i, is the sum of sim(d i, d j ). " Seed selection. Firstly, all data points are sorted based on their Weights by decreasing and constructing a candidate queue Q = {q 1, q 2,..., q n }. Then, two data points, s 1 and s 2, are selected from Q as seeds. The first seed s 1 selected is as follows: The second seed s 2 is the next of s 1 in the queue Q.

Clustering Algorithm Based on Seeding- Expanding " Seed expanding. One data point, q, in the candidate queue is added to a cluster only if it has a maximal similarity value with seed s. " Noise removal. Clusters with sizes smaller than 3 are considered noise data.

Outline! Introduction! Malicious Network Traffic Clustering Scheme! Clustering Algorithm Based on Seeding- Expanding! Evaluation! Conclusions

Evaluation! SE is evaluated by adopting a DDoS sample, which includes 5 stages: " IP-sweep of the AFB from a remote site; " Probe of the live IPs to look for the sadmind daemon running on Solaris hosts; " Breakings via the sadmind vulnerability, both successful and unsuccessful on those hosts; " Installation of the trojan mstream DDoS software on three hosts at the AFB; " Launching the DDoS.

Evaluation! Results and discussion " There are three common indexes for evaluating the clustering results (purity, RI, F-Measure). " The results are illustrated in Figure 1, Figure 2, and Figure 3. " In these figures, the horizontal axis stands for a threshold, r, adopted by the SE algorithm. With the increase of r, cluster purity, the Rand Index, and the F-measure will increase.

Evaluation

Evaluation! Moreover, we compare the SE algorithm with K-Means, where we choose threshold r = 0.9. The results are shown in Table II.! According to Table II, the SE algorithm can achieve a higher clustering performance than that of K-Means with any input cluster number.

Evaluation! In the next experiments, we choose one seed, two seeds, three seeds, four seeds, five seeds, and six seeds in the process of seed selection, and the clustering results are illustrated in Fig.4, Fig.5, and Fig.6.

Evaluation

Evaluation Figure 4 illustrates the purity values with different numbers of clustering seeds. It is easy to find that clustering can obtain good results no matter how many seeds are selected for each iteration when threshold r is larger than 0.5. A similar conclusion can be drawn from Figures 5 and 6.

Evaluation! We compare the number of iterations required with different seed numbers. The results of the comparison are illustrated in Figure 7.! According to Figure 7, when the number of seeds increases, the number of iterations decreases gradually. However, when the number of seeds further increases, the iteration number remains the same.

Outline! Introduction! Malicious Network Traffic Clustering Scheme! Clustering Algorithm Based on Seeding- Expanding! Evaluation! Conclusions

Conclusions! In this paper, we propose a clustering scheme based on Two- Seed-Expanding, which clusters attack flows into different phases.! We also discuss the clustering s effectiveness when different seed numbers are used in the Seed-Expanding algorithm.! Experimental results show Two-Seed-Expanding is better than K-Means and other kinds of Seed-Expanding that use different seed numbers in attack flow clustering.! Next, we will do further research on how to identify attack flows from normal flows based on the clustering results.

Thank you! Comments and questions are welcome! Central South University