Detecting Malicious Hosts Using Traffic Flows

Similar documents
Big Data Analytics for Host Misbehavior Detection

Intrusion Detection by Combining and Clustering Diverse Monitor Data

CSE 565 Computer Security Fall 2018

Detecting Botnets Using Cisco NetFlow Protocol

Introduction Challenges with using ML Guidelines for using ML Conclusions

On the challenges of network traffic classification with NetFlow/IPFIX

Basic Concepts in Intrusion Detection

Improved Detection of Low-Profile Probes and Denial-of-Service Attacks*

Listening to the Network: Leveraging Network Flow Telemetry for Security Applications Darren Anstee EMEA Solutions Architect

Cisco Service Control Service Security: Outgoing Spam Mitigation Solution Guide, Release 4.1.x

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097

Internet Traffic Classification using Machine Learning

Outline. Motivation. Our System. Conclusion

Connection Logging. Introduction to Connection Logging

Incorporating Network Flows in Intrusion Incident Handling and Analysis

Network traffic classification: From theory to practice

Connection Logging. About Connection Logging

in High-Speed Networks

Chair for Network Architectures and Services Department of Informatics TU München Prof. Carle. Network Security. Chapter 8

Can t You Hear Me Knocking: Security and Privacy Threats from ML Application to Contextual Information

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

Data Sheet. DPtech Anti-DDoS Series. Overview. Series

ERT Threat Alert New Risks Revealed by Mirai Botnet November 2, 2016

Botnets Behavioral Patterns in the Network

Stealthwatch ülevaade + demo ja kasutusvõimalused. Leo Lähteenmäki

CS395/495 Computer Security Project #2

Covert channel detection using flow-data

Automated Traffic Classification and Application Identification using Machine Learning. Sebastian Zander, Thuy Nguyen, Grenville Armitage

Flows at Masaryk University Brno

Multi-phase IRC Botnet & Botnet Behavior Detection Model

Botnet Detection Using Honeypots. Kalaitzidakis Vasileios

Diverse network environments Dynamic attack landscape Adversarial environment IDS performance strongly depends on chosen classifier

9. Security. Safeguard Engine. Safeguard Engine Settings

Distributed Systems. 29. Firewalls. Paul Krzyzanowski. Rutgers University. Fall 2015

Classification of Log Files with Limited Labeled Data

CAMNEP: Multistage Collective Network Behavior Analysis System with Hardware Accelerated NetFlow Probes

Multidimensional Aggregation for DNS monitoring

ID: Sample Name: fonttable.xml Cookbook: defaultandroidfilecookbook.jbs Time: 05:14:58 Date: 27/04/2018 Version:

Incident Response. Figure 10-1: Incident Response. Figure 10-2: Program and Data Backup. Figure 10-1: Incident Response. Figure 10-2: Program and Data

International Journal of Computer Trends and Technology (IJCTT) Volume54 Issue 1- December 2017

Statistical based Approach for Packet Classification

Detecting malware even when it is encrypted

n Learn about the Security+ exam n Learn basic terminology and the basic approaches n Implement security configuration parameters on network

BOTNET BEHAVIOR ANALYSIS USING NAÏVE BAYES CLASSIFICATION ALGORITHM WITHOUT DEEP PACKET INSPECTION

Introduction to Netflow

ART 알고리즘특강자료 ( 응용 01)

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

ID: Sample Name: numbering.xml Cookbook: defaultandroidfilecookbook.jbs Time: 05:15:39 Date: 27/04/2018 Version:

2. INTRUDER DETECTION SYSTEMS

FlowMonitor for WhatsUp Gold v16.3 User Guide

A Robust Classifier for Passive TCP/IP Fingerprinting

Cyber Moving Targets. Yashar Dehkan Asl

Monitoring and 3D Visualization of the Internet Threats

Intrusion Detection - Snort

CSC Network Security

Network Management and Monitoring

Configuring Application Visibility and Control for Cisco Flexible Netflow

OSSIM Fast Guide

Understanding Cisco Cybersecurity Fundamentals

CIT 480: Securing Computer Systems

INTRUSION DETECTION SYSTEM BASED SNORT USING HIERARCHICAL CLUSTERING

Computer Network Vulnerabilities

Network Security. Thierry Sans

Request for Proposal (RFP) for Supply and Implementation of Firewall for Internet Access (RFP Ref )

Botnet Behaviour Analysis using IP Flows

Load Balance Mechanism

Presentation and Demo: Flow Valuations based on Network-Service Cooperation

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Technical Report CIDDS-002 data set

Benchmarking the Effect of Flow Exporters and Protocol Filters on Botnet Traffic Classification

PROTECTING INFORMATION ASSETS NETWORK SECURITY

Configuring AVC to Monitor MACE Metrics

ID: Sample Name: [Content_Types].xml Cookbook: defaultandroidfilecookbook.jbs Time: 05:15:19 Date: 27/04/2018 Version: 22.0.

Application Inspection and Control for SMTP

The Bro Cluster The Bro Cluster

Requirements from the

Detecting malware even when it is encrypted

ndpi & Machine Learning A future concrete idea

Modular Policy Framework. Class Maps SECTION 4. Advanced Configuration

Distributed Systems. 27. Firewalls and Virtual Private Networks Paul Krzyzanowski. Rutgers University. Fall 2013

Implementing Cisco Cybersecurity Operations

Introduction. Can we use Google for networking research?

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

RIPE75 - Network monitoring at scale. Louis Poinsignon

Monitoring and Threat Detection

Technical Report CIDDS-001 data set

surveillance & anonymity cs642 computer security adam everspaugh

Intrusion prevention systems are an important part of protecting any organisation from constantly developing threats.

Automated Application Signature Generation Using LASER and Cosine Similarity

Jaal: Towards Network Intrusion Detection at ISP Scale

Network Intrusion Goals and Methods

Web Security. Outline

Network Monitoring for Performance and Security: the LOBSTER Project

Network Intrusion Analysis (Hands on)

This chapter provides information to configure Cflowd.

Network Anomaly Detection Using Autonomous System Flow Aggregates

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 12

Detecting Drive-by-Download Attacks based on HTTP Context-Types Ryo Kiire, Shigeki Goto Waseda University

CIH

Transcription:

Detecting Malicious Hosts Using Traffic Flows Miguel Pupo Correia joint work with Luís Sacramento NavTalks, Lisboa, June 2017 Motivation Approach Evaluation Conclusion Outline 2 1

Outline Motivation Approach Evaluation Conclusion 3 Motivation Scenario: Large national telco/isp connected to its own provider Huge amount of traffic in/out, much is encrypted Possibly new attacks / new variants national ISP international ISP 4 2

Motivation Compromised hosts do attacks such as: Distributed denial of service attacks Exfiltratingconfidential data Sending spam Mapping the network Contact bot command&control centers etc. 5 Network Intrusion Detection Systems Traditional NIDSs: Knowledge-based: require signatures of attacks Not good for new attacks Behavior-based: require clean traffic for training Where to get it with our scenario? Most do deep packet inspection, unfeasible with too much traffic 6 3

Outline Motivation Approach Evaluation Conclusion 7 Our approach Detection framework to detect malicious hosts based on real traffic Not knowledge-based, to avoid need for signatures Not behavior-based, as no training traffic exists No deep packet inspection, as it is slow Detects hots doing new attacks or new variants 8 4

Key ideas Collect traffic data summarized as network flows Extract data about hosts from flows Use unsupervised machine learning / clustering to get information that humans can understand without previous knowledge about attacks Use supervised machine learning / classifier to automatically assign clusters to classes/categories ex: web servers, mail servers, hosts sending spam, hosts doing distributed denial of service, Manually label new clusters 9 The approach Loop: Collect flows for a period of time (e.g., 1 day) Extract from the flows data about hosts with MapReduce Use clustering to create groups of hosts Use classifier to automatically classify hosts Manually label remaining ones Repeat for next period Extract Classify manually Cluster Classify automatically 10 5

The approach Loop: Collect flows for a period of time (e.g., 1 day) 11 Flows Flow: sequence of related packets observed during an interval of time A flow is defined in terms of a subset of src IP, dest IP, protocol, src port, dest port; ex: (*, 1.2.3.4, TCP, *, 80) Netflow: monitoring approach created by Cisco Idea is to capture data about network flows Data: begin/end of flow timestamps, n. packets, n. bytes Variants: IPFIX (standard based on Netflow 9), sflow, 12 6

Flow collection Flows collected on NetFlow-enabled border routers national ISP international ISP 13 The approach Loop: Collect flows for a period of time (e.g., 1 day) Extract from the flows data about hosts with MapReduce 14 7

Host data extraction Flow format: <Source IP, Destination IP, Source Port, Destination Port, Protocol, TCP Flags, #Bytes, #Packets, Duration> Use MapReduce for extracting data per host (IP) aggregated by source or destination IP address 15 Host data extraction Host features (data) extracted by MapReduce: 16 8

The approach Loop: Collect flows for a period of time (e.g., 1 day) Extract from the flows data about hosts with MapReduce Use clustering to create groups of hosts 17 Unsupervised ML / clustering Idea: group similar hosts in clusters (sets) Why? Humans can understand and classify a few clusters, not zillions of hosts How? Normalize every feature into range [0,1] Run clustering algorithm, e.g., K-Means, to get k clusters k can be defined, e.g., with the elbow method (finds the elbow, i.e., when adding more clusters does not improve the modelling of the data) 18 9

The approach Loop: Collect flows for a period of time (e.g., 1 day) Extract from the flows data about hosts with MapReduce Use clustering to create groups of hosts Use classifier to automatically classify hosts Manually label remaining ones 19 Intrusion detection with flows Each cluster contains hosts with similar behavior ex: web servers, mail servers, hosts sending spam, hosts doing denial of service, What to do with them? (at cruise speed) Already seen? Use classifier to classify automatically Never seen? Label manually, with help of the features values Focus attention on smaller clusters with odd feature distribution; often malicious Retrain classifier 20 10

Supervised ML / classification Naïve solution: use labelled hosts to train a Binary Support Vector Machine (SVM) classifier Samples/hosts classified as benign or malicious Finds an hyperplane that separates samples Classifies new samples (hosts) 21 The approach Loop: Collect flows for a period of time (e.g., 1 day) Extract from the flows data about hosts with MapReduce Use clustering to create groups of hosts Use classifier to automatically classify hosts Manually label remaining ones Repeat for next period Extract Classify manually Cluster Classify automatically 22 11

Outline Motivation Approach Evaluation Conclusion 23 Tool interface 24 12

Tool interactive visualiz. of cluster 25 Evaluation Two parts: Synthetic dataset (ISCX) Designed for IDSs Flows are labelled Allows validating the approach Real dataset collected at the telco No ground truth 26 13

ISCX dataset evaluation Avg ; StdDev Low values Saturday clusters Brute-Force SSH attack found during this day (cluster 3) Maximum for SSH connections (and high, not seen in table) 27 Telco dataset evaluation # Hosts Cluster data with source aggregation key (i.e., aggregated by IP inside the ISP) cluster data 1 st part 28 14

Telco dataset evaluation # Hosts Cluster data with source aggregation key 2 nd part 29 Source aggregation key cluster 15 i.e., aggregated by IP inside the ISP # Hosts # Hosts Spammer or denial of service (?) High connectivity to various users, many ports, receiving communication on IRC port, communication through HTTP, high number of packets sent, high number of bytes 30 15

Source aggregation - Cluster 21 # Hosts # Hosts Bot communicating with C&C server High IRC communication + high average packet size Confirmed by accessing the IP of the C&C server 31 Telco dataset evaluation summary 32 16

Outline Motivation Approach Evaluation Conclusion 33 Conclusion Network Intrusion Detection for identifying malicious hosts using flows...without having to say how entities misbehave Use clustering (unsupervised ML) to reduce the size of the problem and a classifier (supervised ML) to automatize classification Keep humans in the loop; mandatory w/evolving threats Detects attacks involving many packets, not low traffic attacks like buffer overflows or SQL injection 34 17

Thanks! Questions? This work was supported by national funds through Fundação para a Cie ncia e a Tecnologia (FCT) with reference UID/CEC/50021/2013 (INESC-ID) 18