Multidimensional Aggregation for DNS monitoring

Similar documents
MAD 12 Monitoring the Dynamics of Network Traffic by Recursive Multi-dimensional Aggregation. Midori Kato, Kenjiro Cho, Michio Honda, Hideyuki Tokuda

Outline. Motivation. Our System. Conclusion

Identifying Anomalous Traffic Using Delta Traffic. Tsuyoshi KONDOH and Keisuke ISHIBASHI Information Sharing Platform Labs. NTT

DNSSM: A Large Scale Passive DNS Security Monitoring Framework

Classification of Log Files with Limited Labeled Data

Connection Logging. Introduction to Connection Logging

Botnets Behavioral Patterns in the Network

Detecting Malicious Hosts Using Traffic Flows

A Two-Layered Anomaly Detection Technique based on Multi-modal Flow Behavior Models

Connection Logging. About Connection Logging

ANOMALY DETECTION USING HOLT-WINTERS FORECAST MODEL

Concept: Traffic Flow. Prof. Anja Feldmann, Ph.D. Dr. Steve Uhlig

Network Anomaly Detection Using Autonomous System Flow Aggregates

Detecting Anomalies in Netflow Record Time Series by Using a Kernel Function

Detecting malware even when it is encrypted

Detecting malware even when it is encrypted

Intrusion Detection by Combining and Clustering Diverse Monitor Data

Configuring the Botnet Traffic Filter

Packet Classification Using Dynamically Generated Decision Trees

PreSTA: Preventing Malicious Behavior Using Spatio-Temporal Reputation. Andrew G. West November 4, 2009 ONR-MURI Presentation

Measurement: Techniques, Strategies, and Pitfalls. David Andersen CMU

Naming in Distributed Systems

Paloalto Networks PCNSA EXAM

Distributed Denial of Service

Network Heartbeat Traffic Characterization. Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary

PoP Level Mapping And Peering Deals

PhD-FSTC The Faculty of Sciences, Technology and Communication DISSERTATION. Defense held on 30/03/2012 in Luxembourg. to obtain the degree of

Assessment of Security Threats via Network Topology Analysis: An Initial Investigation

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine

How to Configure DNS Sinkholing in the Firewall

Trisul Network Analytics - Traffic Analyzer

CAMNEP: Multistage Collective Network Behavior Analysis System with Hardware Accelerated NetFlow Probes

OSSIM Fast Guide

Request for Proposal (RFP) for Supply and Implementation of Firewall for Internet Access (RFP Ref )

Mapping Internet Sensors with Probe Response Attacks

Intelligent Network Management Using Graph Differential Anomaly Visualization Qi Liao

Internet Security: Firewall

Basic Concepts in Intrusion Detection

Chapter 8 roadmap. Network Security

Design of a Stream-based IP Flow Record Query Language

High-Dimensional Incremental Divisive Clustering under Population Drift

Network layer: Overview. Network layer functions IP Routing and forwarding NAT ARP IPv6 Routing

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 12

Key Technologies for Security Operations. Copyright 2014 EMC Corporation. All rights reserved.

KNOM Tutorial Internet Traffic Matrix Measurement and Analysis. Sue Bok Moon Dept. of Computer Science

Network layer: Overview. Network Layer Functions

Data Mining for Improving Intrusion Detection

Network Security: Firewall, VPN, IDS/IPS, SIEM

The Future of Threat Prevention

CS 43: Computer Networks. 21: The Network Layer & IP November 7, 2018

CS118 Discussion, Week 6. Taqi

Accelerating OpenFlow SDN Switches with Per-Port Cache

Overview of nicter - R&D project against Cyber Attacks in Japan -

Distributed Systems. 27. Firewalls and Virtual Private Networks Paul Krzyzanowski. Rutgers University. Fall 2013

Basics of communication. Grundlagen der Rechnernetze Introduction 31

Mapping Internet Sensors with Probe Response Attacks

SentinelOne Technical Brief

Distributed Systems. 29. Firewalls. Paul Krzyzanowski. Rutgers University. Fall 2015

Multi-phase IRC Botnet & Botnet Behavior Detection Model

IPv6 Classification. PacketShaper 11.8

Configuring Antivirus Devices

ECEN 689 Special Topics in Data Science for Communications Networks

Improved Detection of Low-Profile Probes and Denial-of-Service Attacks*

Threat Detection and Mitigation for IoT Systems using Self Learning Networks (SLN)

CYBER ANALYTICS. Architecture Overview. Technical Brief. May 2016 novetta.com 2016, Novetta

Managing Latency in IPS Networks

Version 5.3 Rev A Student Guide

Behavior Based Malware Analysis: A Perspective From Network Traces and Program Run-Time Structure

Detection of DNS Traffic Anomalies in Large Networks

Listening to the Network: Leveraging Network Flow Telemetry for Security Applications Darren Anstee EMEA Solutions Architect

intelop Stealth IPS false Positive

The following topics describe how to configure correlation policies and rules.

Analyzing Cacheable Traffic in ISP Access Networks for Micro CDN applications via Content-Centric Networking

McAfee Network Security Platform Administration Course

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CS 5114 Network Programming Languages Data Plane. Nate Foster Cornell University Spring 2013

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Configuring the Botnet Traffic Filter

Traffic in Network /8. Background. Initial Experience. Geoff Huston George Michaelson APNIC R&D. April 2010

W is a Firewall. Internet Security: Firewall. W a Firewall can Do. firewall = wall to protect against fire propagation

Computer Networks Security: intro. CS Computer Systems Security

Emerging Threat Intelligence using IDS/IPS. Chris Arman Kiloyan

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow

Spam Mitigation using Spatio temporal Reputations from Blacklist History*

Device Management Basics

Classification and Regression Trees

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Technical Report CIDDS-002 data set

Intrusion Detection and Malware Analysis

Network Layer: Control/data plane, addressing, routers

Power Attack Defense: Securing Battery-Backed Data Centers

Access Control. Access Control Overview. Access Control Rules and the Default Action

Chapter 8. Virtual Memory

Multidimensional Indexes [14]

Port Mirroring in CounterACT. CounterACT Technical Note

Automated Threat Management - in Real Time. Vectra Networks

SD-WAN Deployment Guide (CVD)

Stager. A Web Based Application for Presenting Network Statistics. Arne Øslebø

Device Management Basics

Transcription:

Multidimensional Aggregation for DNS monitoring Jérôme François, Lautaro Dolberg, Thomas Engel jerome.francois@inria.fr 03/11/15

2 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

3 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

4 DNS monitoring DNS traffic reflects host activities and behaviors Internet threats growing: Phishing, Malware, botnet, Spoofed Domains, data ex-filtration, etc. Identify malicious domains behavior by assessing associations between names and IP subnets (and how this evolves) Passive DNS analysis: easy to collect, reflect user activities without tracking individually them from all collected DNS answers collected over multiple weeks, is it possible to detect divergent behaviors?

5 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

6 Spatio temporal aggregation: State of the art Aguri QofIS 2001: subnetwork prefix based aggreagation Danak NSS 2011: Aguri applied to anomaly detection TreeTop Usenix Sec 2010: DNS domain based aggregation

Aggregation Scalable way to represent information Outline relevant correlated facts reduce storage needs and post processing time Temporal and Spatial aggregation Aggregation temporal: time windows split (β) spatial: keep nodes with activity > α e.g. traffic volume, aggregate the others into their parents needs hierarchical relationships Heterogeneous Data No specific order 1st Source IP@, 2nd Destination IP@ Auto adjust to Information Granularity /18 /24 /27 subnetworks... 7

8 Mutidimensional Aggregation Example PORT PROTO KB TIME SOURCE DEST 80 TCP 1491 2010 02 24 0 2 : 2 0 : 1 5 1 9 2. 1 6 8. 6. 2 9 2. 2 5 0. 2 2 1. 8 2 110 TCP 988 2010 02 24 0 2 : 2 0 : 1 9 1 9 2. 1 6 8. 8. 2 9 2. 2 5 0. 2 2 3. 8 7 443 TCP 902 2010 02 24 0 2 : 2 0 : 2 7 1 9 2. 1 6 8. 1 1. 2 9 2. 2 5 0. 2 2 0. 8 2 110 TCP 1513 2010 02 24 0 2 : 2 0 : 2 9 1 9 2. 1 6 8. 1 1 2. 1 9 2. 2 5 0. 2 2 2. 8 1 80 TCP 1205 2010 02 24 0 2 : 2 0 : 2 9 1 9 2. 1 6 8. 1 1. 1 9 2. 2 5 0. 2 2 0. 8 2 80 TCP 1491 2010 02 24 0 2 : 2 0 : 3 1 1 9 2. 1 6 8. 1. 2 9 2. 2 5 0. 2 2 0. 8 3 110 TCP 1467 2010 02 24 0 2 : 2 0 : 3 9 1 9 2. 1 6 8. 1 2. 2 9 2. 2 5 0. 2 2 1. 8 1 80 TCP 927 2010 02 24 0 2 : 2 0 : 3 9 1 9 2. 1 6 8. 1 2. 2 9 2. 2 5 0. 2 2 0. 8 2 443 TCP 1294 2010 02 24 0 2 : 2 0 : 3 9 1 9 2. 1 6 8. 1 1. 1 9 2. 2 5 0. 2 2 3. 8 2 110 TCP 940 2010 02 24 0 2 : 2 0 : 4 9 1 9 2. 1 6 8. 2 1. 2 9 2. 2 5 0. 2 2 1. 8 1 80 TCP 917 2010 02 24 0 2 : 2 0 : 4 9 1 9 2. 1 6 8. 2 3. 1 9 2. 2 5 0. 2 2 0. 8 2 443 TCP 460 2010 02 24 0 2 : 2 0 : 5 9 1 9 2. 1 6 8. 2 6. 2 9 2. 2 5 0. 2 2 0. 8 5

9 Mutidimensional Aggregation Example app:mail src_ip: next_bit(17,32) dst_ip: next_bit(17,32) app: ROOT src_ip: 192.168.0.0/17 dst_ip: 92.250.220.0/22 6.91% 100.00% app:same... app: $.v3.pop.get.mail.root app: $.v3.pop.get.mail.root app: ROOT app: HTTP.Web.ROOT app: $.Secure.HTTP.Web.ROOT src_ip: 192.168.12.2/32 src_ip: 192.168.112.1/32 src_ip: 192.168.0.0/20 src_ip: 192.168.0.0/19 src_ip: 192.168.11.1/32 dst_ip: 92.250.221.81/32 dst_ip: 92.250.222.81/32 dst_ip: 92.250.220.0/22 dst_ip: 92.250.220.80/29 dst_ip: 92.250.223.82/32 10.79% 10.79% 11.13% 11.13% 13.90% 24.87% 10.13% 36.78% 9.52% 9.52% Destination port Source IP Destination IP app: $.HTTP.Web.ROOT src_ip: 192.168.6.2/32 dst_ip: 92.250.221.82/32 app: $.HTTP.Web.ROOT src_ip: 192.168.1.2/32 dst_ip: 92.250.220.83/32 app: $.HTTP.Web.ROOT src_ip: 192.168.8.0/21 dst_ip: 92.250.220.82/32 10.97% 10.97% 10.97% 10.97% 15.68% 15.68%

10 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

Data processing cycle User Input Data 101 0111 110 1001 110 1011 110 1001 111 0000 110 0101 110 0100 110 1001 110 0001 Data Model Aggregated Tree MAM Data Parsing Node creation Node insertion Aggregation Nodes constructed based on input data and continuously included in the tree Aggregation: at the final step vs. when the tree size is too large 11

12 Underlying Data Model Data Model

13 Data Structure Tree based structure: Root node and multiple children Directions How to find the right path to insert a node within a tree? Every hierarchical data can be implemented (MaM can be easily extended) common ancestor between two nodes direction function IP@ binary function (0,1) as next bit value DNS: every level name is a direction ports: service taxonomy

14 Node Insertion (Branching Point) New Node Data Structure

14 Node Insertion (Branching Point) New Node Data Structure

14 Node Insertion (Branching Point) Data Structure New Node

15 Optimization Aggregation From leafs to root node On a complete tree of a time window Large data structures in memory before aggregation Online Strategies (before the end of the time window) Tree size > MAX NODES aggregation Root LRU Aggregation is triggered Aggregation is triggered in from root node the least recently used node RAM + + Performance - - -

16 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

17 Applications Output of MaM = sequence of trees monitoring the network using these trees trees are well known data structure distance metrics, kernel functions, homomorphisms,... manual vs automated analysis visual inspection

18 User inputs Data + parsing function List of attributes to extract + dimensions (definition of dimensions if not supported by default) parameters: aggregation threshold (α), time window size (β), max nodes (2000), strategy (LRU) monitoring the network using these trees

19 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

20 Contributions Malicious domains names are usually changing IP association. How can this be exploited? Large Scale Aggregation: DNS and IP addresses, into single data structure. Steadiness Metrics: Formal measure of DNS and Subnetwork address association over time. Metric Validation: Long term experiments using Passive DNS Database.

21 Data sample

22 With MAM is possible to generate aggregated views combining multiple dimensions at the same time. Hierarchically derived from data model Provides different levels of granularity Accelerates Post processing dns:.root ip: 0.0.0.0/0 0.00% 100.00% DNS IP dns: $.betterblock.org..root ip: 173.201.208.1/32 33.33% 33.33% dns: $.blockbuster.org..root ip: 72.233.2.58/32 33.33% 33.33% dns: $.balconytv.com..root ip: 75.101.145.87/32 33.33% 33.33% Accumulated activity Node activity

23 Experiments & Data set The objectives of the experiments are: Discriminate between malicious and normal domains Attack detection ability Performance decay

23 Experiments & Data set The objectives of the experiments are: Discriminate between malicious and normal domains Attack detection ability Performance decay Passive DNS + Blacklist

24 Monitoring Logs to Time Series of Trees An aggregation process outputs a series of trees Monitoring aggregated series of trees i.e T 1... T m Metrics correlate IP subnets Domain names Volume of Traffic

24 Monitoring Logs to Time Series of Trees An aggregation process outputs a series of trees Monitoring aggregated series of trees i.e T 1... T m Metrics correlate IP subnets Domain names Volume of Traffic

25 Metrics I Tree comparison, how to establish a similarity criteria?

25 Metrics I Tree comparison, how to establish a similarity criteria? sim(n1, n2) = α IP sim(n1, n2) + β DNS sim(n1, n2) + γ vol sim(n1, n2)

25 Metrics I Tree comparison, how to establish a similarity criteria? sim(n1, n2) = α IP sim(n1, n2) + β DNS sim(n1, n2) + γ vol sim(n1, n2) IP sim(n1, n2) = 1 n1 prefix len n2 prefix len 32

25 Metrics I Tree comparison, how to establish a similarity criteria? sim(n1, n2) = α IP sim(n1, n2) + β DNS sim(n1, n2) + γ vol sim(n1, n2) IP sim(n1, n2) = 1 n1 prefix len n2 prefix len 32 DNS sim(n1, n2) = n1 dns n2.dns n1 dns n2 dns

25 Metrics I Tree comparison, how to establish a similarity criteria? sim(n1, n2) = α IP sim(n1, n2) + β DNS sim(n1, n2) + γ vol sim(n1, n2) IP sim(n1, n2) = 1 n1 prefix len n2 prefix len 32 DNS sim(n1, n2) = n1 dns n2.dns n1 dns n2 dns vol sim(n1, n2) = 1 0.01 n1 acc vol n2 acc vol

26 Metrics II Two goals at different levels 1. Detecting the presence of an anomaly in the traffic: sim metric is between two nodes maximise this metric for each node n1 T i, n2 T i 1, n2 = most sim(n1) stead(n1) = sim(n1, n2) + µ stead(n2) n T pers(t i ) = i stead(n) {n T i } 2. Identifying the anomaly, i.e. the domains and IP addresses look for nodes with the smallest stead values (1)

27 Experiments Aggregation Window Time Length Macro: Up to 52 weeks Micro: 10 weeks maximum Malicious data Time: Periodically, Steady Proportion Aggregation Granularity

28 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

29 Results Malicious domains causes a drop on average steadiness: Micro

30 Results Malicious domains causes a drop on average steadiness: Macro

31 Results II Accuracy: Steadiness as metric for filtering malicious domains

32 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

33 MAM extensions define any hierarchical dimension successfully applied to different domains: vehicular networks, Netflow monitoring again MAM is only producing trees = aggregation metrics / feature engineering methods / machine learning but data to handle are squeezed to a smaller scale

34 Performances Number of nodes main performance parameter when computing metrics depends on the aggregation threshold (α) = minimum of activity to not be aggregated DNS monitoring α = 2% avg. = 2200 nodes / weekly tree 13000 IP addresses / week 5300 domain names / week

35 Other use case Dataset from major ISP in Luxembourg Capture: 26 Days, 60,000 flows/sec at peak hours IP Address: 279815 unique IP addesses using 64470 different UDP and TCP Ports Extracting: Timestamp, IP Source and Destination Addresses, TCP/UDP source and destination ports, traffic Volume in bytes Anomaly detection Raw output Visually enhanced output Automated analysis

Trees as text with indentation Raw output [src_ip-->0.0.0.0/0 dst_ip-->0.0.0.0/0 ] 92 (0.19% / 100.00%) [src_ip-->0.0.0.0/1 dst_ip-->0.0.0.0/1 ] 3104 (6.34% / 19.30%) [src_ip-->32.0.0.0/3 dst_ip-->96.0.0.0/3 ] 3868 (7.91% / 12.95%) [src_ip-->43.160.0.0/11 dst_ip-->120.194.118.20/32 ] 2470 (5.05% / 5.05%) [src_ip-->97.254.47.254/32 dst_ip-->138.146.47.197/32 ] 3581 (7.32% / 7.32%) [src_ip-->128.0.0.0/1 dst_ip-->0.0.0.0/1 ] 4182 (8.55% / 47.08%) [src_ip-->128.0.0.0/3 dst_ip-->97.254.0.0/16 ] 3734 (7.63% / 19.32%) [src_ip-->128.0.0.0/4 dst_ip-->97.254.64.0/18 ] 3012 (6.16% / 6.16%) 5.53%) 7.03%) [src_ip-->137.57.71.255/32 dst_ip-->97.254.131.93/32 ] 2706 (5.53% / [src_ip-->128.0.0.0/2 dst_ip-->0.0.0.0/1 ] 3223 (6.59% / 19.22%) [src_ip-->135.251.160.3/32 dst_ip-->97.254.23.33/32 ] 3438 (7.03% / [src_ip-->128.0.0.0/5 dst_ip-->97.254.128.0/21 ] 2740 (5.60% / 5.60%) [src_ip-->0.0.0.0/0 dst_ip-->0.0.0.0/1 ] 2504 (5.12% / 26.11%) [src_ip-->138.146.47.197/32 dst_ip-->97.254.47.254/32 ] 7030 (14.37% / 14.37%) 36

37 pictures (integrated in GUI) improvement Visually enhanced output node size: importance of the represented attributes (feature space usage) node color: instability of the represented attributes ( new events) needs to be user-defined semantics can be freely chosen

38 1 Motivation 2 Aggregation 3 MAM 4 DNS applications 5 DNS monitoring 6 Results 7 Going further 8 Conclusion Outline

MaM Conclusion Scalable aggregation of heterogeneous data Easily extensible to new features (geolocated IP flows, vehicular networks DNS monitoring MaM only performs aggregation Needs to define: hierarchical order, metrics and methods to analyze References General description + theoretical foundations + network traffic monitoring Dolberg L., François J., Engel T., Efficient Multidimensional Aggregation for Large Scale Monitoring, USENIX LISA 2012 DNS trafic monitoring Dolberg L., François J., Engel T., Multi-dimensional Aggregation for DNS Monitoring, to appear in IEEE LCN 2013 39

Multidimensional Aggregation for DNS monitoring Jérôme François, Lautaro Dolberg, Thomas Engel jerome.francois@inria.fr 03/11/15