Entropy-Based Measurement of IP Address Inflation in the Waledac Botnet

Similar documents
Tumbling Down the Rabbit Hole:

Fast and Evasive Attacks: Highlighting the Challenges Ahead

4MMSR-Network Security Seminar. Peer-to-Peer Botnets: Overview and Case Study

Very Fast Containment of Scanning Worms. Nicholas Weaver, Stuart Staniford, Vern Paxson ICSI, Nevis Networks, ICSI & LBNL

The Waledac Protocol: The How and Why

A Taxonomy of Botnet Structures

Outline. Motivation. Our System. Conclusion

BOTMAGNIFIER: Locating Spambots on the Internet

Protecting Against Modern Attacks. Protection Against Modern Attack Vectors

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

Tales from cloud nine. Mihai Chiriac, BitDefender

Network Heartbeat Traffic Characterization. Mackenzie Haffey Martin Arlitt Carey Williamson Department of Computer Science University of Calgary

Data mining, 4 cu Lecture 6:

Key Technologies for Security Operations. Copyright 2014 EMC Corporation. All rights reserved.

Data Preprocessing. Slides by: Shree Jaswal

PineApp Mail Secure SOLUTION OVERVIEW. David Feldman, CEO

John Munro / Jason Trost / FlonCon 2013 January 7 10 Albuquerque, New Mexico

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Flow Measurement. For IT, Security and IoT/ICS. Pavel Minařík, Chief Technology Officer EMITEC, Swiss Test and Measurement Day 20 th April 2018

The Virtualisation Security Journey: Beyond Endpoint Security with VMware and Symantec

Endpoint Protection : Last line of defense?

Security with FailSafe

Malicious Activity and Risky Behavior in Residential Networks

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

A Systematic Overview of Data Mining Algorithms

DATA MINING II - 1DL460. Spring 2014"

Detecting and Quantifying Abusive IPv6 SMTP!

SAFEGUARDING YOUR VIRTUALIZED RESOURCES ON THE CLOUD. May 2012

Analisi degli attacchi DDOS e delle contromisure

Advanced Data Management

Unit C - Network Addressing Objectives Purpose of an IP Address and Subnet Mask Purpose of an IP Address and Subnet Mask

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Moderated by: Moheeb Rajab Background singers: Jay and Fabian

Oracle Database In-Memory

Techniques, Tools and Processes to Help Service Providers Clean Malware from Subscriber Systems

Global DDoS Measurements. Jose Nazario, Ph.D. NSF CyberTrust Workshop

Integrating Juniper Sky Advanced Threat Prevention (ATP) and ForeScout CounterACT for Infected Host Remediation

Secure Development Lifecycle

Chapter 12: Indexing and Hashing. Basic Concepts

Stochastic Analysis of Horizontal IP Scanning

War Stories from the Cloud: Rise of the Machines. Matt Mosher Director Security Sales Strategy

Authentication SPRING 2018: GANG WANG. Slides credit: Michelle Mazurek (U-Maryland) and Blase Ur (CMU)

Doxxing, Dissidents, And. Digital Extortion. Fortify Your Digital Risk Defenses. Nick Hayes, Senior Analyst

Use Cases. E-Commerce. Enterprise

CGN in Europe? An Analytical Approach to Policymaking

Chapter 12: Indexing and Hashing

On the Effectiveness of Distributed Worm Monitoring

Marshal s Defense-in-Depth Anti-Spam Engine

Big Data Analytics CSCI 4030

CSE 5243 INTRO. TO DATA MINING

Avoiding Information Overload: Automated Data Processing with n6

Chapter 10. Fundamental Network Algorithms. M. E. J. Newman. May 6, M. E. J. Newman Chapter 10 May 6, / 33

Peering into Botnets via Fast Flux Enumeration: The ATLAS Experience. Jose Nazario, Ph.D. FIRST 2008 NSM-SIG Vancouver

P2P Botnet Detection Method Based on Data Flow. Wang Jiajia 1, a Chen Yu1,b

IC B01: Internet Security Threat Report: How to Stay Protected

Botnets: A Survey. Rangadurai Karthick R [CS10S009] Guide: Dr. B Ravindran

Securing Dynamic Data Centers. Muhammad Wajahat Rajab, Pre-Sales Consultant Trend Micro, Pakistan &

Outline. Database Management and Tuning. Index Data Structures. Outline. Index Tuning. Johann Gamper. Unit 5

State of the Internet Security Q Mihnea-Costin Grigore Security Technical Project Manager

Chapter 11 Global Illumination. Part 1 Ray Tracing. Reading: Angel s Interactive Computer Graphics (6 th ed.) Sections 11.1, 11.2, 11.

Discovering Paths Traversed by Visitors in Web Server Access Logs

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

RSA Web Threat Detection

Christopher Covert. Principal Product Manager Enterprise Solutions Group. Copyright 2016 Symantec Endpoint Protection Cloud

EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING

Survey of the P2P botnet detection methods

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

Automating Security Response based on Internet Reputation

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Slides credited from Dr. David Silver & Hung-Yi Lee

DATA MINING - 1DL105, 1DL111

User Guide Check Point Analytics App by QOS

Encrypted Traffic Security (ETS) White Paper

Intermediaries and regulation

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

ddos-guard.net Protecting your business DDoS-GUARD: Distributed protection against distributed attacks

Monitoring and diagnostics of data infrastructure problems in power engineering. Jaroslav Stusak, Sales Director CEE, Flowmon Networks

Security: Worms. Presenter: AJ Fink Nov. 4, 2004

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Towards Complete Node Enumeration in a Peer-to-Peer Botnet

Security, Internet of Things, DNS and ICANN. ICANN Security Team Kiev November

It s Flow Time! The Role and Importance of Flow Monitoring in Network Operations and Security

Vincent van Kooten, EMEA North Fraud & Risk Intelligence Specialist RSA, The Security Division of EMC

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

The SANS Institute Top 20 Critical Security Controls. Compliance Guide

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

Exadata Implementation Strategy

Fun with Flow. Richard Friedberg rf [at] cert.org Carnegie Mellon University

CS 4320/5320 Homework 2

Mining Temporal Association Rules in Network Traffic Data

RELATIONAL OPERATORS #1

Maximum Entropy (Maxent)

SYMANTEC DATA CENTER SECURITY

Activity-Based Human Mobility Patterns Inferred from Mobile Phone Data: A Case Study of Singapore

ADVANCED, UNKNOWN MALWARE IN THE HEART OF EUROPE

Naming in Distributed Systems

Putting the 20 Critical Controls into Action: Real World Use Cases. Lawrence Wilson, UMass, CSO Wolfgang Kandek, Qualys, CTO

DATA SHEET RSA NETWITNESS ENDPOINT DETECT UNKNOWN THREATS. REDUCE DWELL TIME. ACCELERATE RESPONSE.

Hotspots: The Root Causes of Non- Uniformity in Self-Propagating Malware

Transcription:

Entropy-Based Measurement of IP Address Inflation in the Waledac Botnet Rhiannon Weaver 1 Chris Nunnery 2 Gautam Singaraju 2 Brent ByungHoon Kang 3 1 CERT/SEI 2 University of North Carolina 3 George Mason University January 11, 2011

Introduction The Botnet Question: How big is it? Size relates to potential threat, adaptability Relative size can help us prioritize mitigation efforts Currently research thinks about size in two ways (Rajab et. al.) Count of active individuals at any particular point in time Footprint count of all unique individuals across the entire history What s an individual? Often count and report IP addresses Often want to know the number of machines NAT, DHCP can inflate or deflate our estimates What effect does IP vs. machine measurement have on a footprint count?

Title Deconstruction and Roadmap This research: Extends Rajab s footprint count to a distribution that weights individuals by their level of activity Introduces a measurement of IP address inflation based on relative entropy of footprint distributions Shows how to use relative entropy to discover NAT/DHCP properties of sub-networks useful for prioritizing blacklisting and cleanup efforts Presents some results from applying these concepts to data (IP addresses and unique IDs) collected from the Waledac botnet

IP Address Inflation Rate (R) The effect on a population estimate of counting IP addresses instead of machines R > 1 for a machine moving among a DHCP pool R < 1 for several machines using the same NAT address We can study inflation rates directly in visible botnets (IPs and IDs available) Network policy information can be transferrable to hidden botnets (IPs only are observable)

Inflation Rate of a Footprint Measurement For a visible botnet, let I = Set of observed IP addresses H = Set of observed machines cumulative across the recorded active history. A naive measurement of the footprint inflation rate is simply: R N (I, H) = I H Interpretation: breadth and spread What is missing? relative popularity and visibility of IPs, individuals

An Activity-based Footprint Distribution An individual j (IP address or machine) is observed over time due to its network activity a j : Scan hits #Log-ins to C&C server #P2P clients contacted, etc. For a population J, define the the footprint distribution p J (j): p J (j) = a j k J a k This distribution weights every individual by its associated activity (temporal or volumetric)

Entropy and Inflation Shannon Entropy S(p J ) of a footprint distribution p J measures its uniformity: S(p J ) = j J p J (j) ln[p J (j)] For footprint distributions p I and p H, we define the Entropy-based IP Inflation Rate R E as R E (p I, p H ) = exp[s(p I ) S(p H )] Note: Maximal (uniform) entropy among N items is equal to ln(n) R E = R N when p I and p H are uniform, but extends inflation to apply to unequal distributions.

Studying Sub-networks Connections between IPs and Individuals form a graph G, that has *+,-.,/012,34+5-!"#"!!$!!%&'$() inflation rate R E (G) '!!#&!('&!,$&$)"* %"($ '*&$&!,+&$)"*!$*(''!"$%'!,'!%!"*&! '!&!'"&!('&$)"*!$'+(',*'(+!"!&+*&%"&$)"*!!#&"*!&"#&$)"*!!#&!('&!,!&$)"*!$#''% *,"#' (%"$* +#*,, '!&!'"&!((&$)"*!,*$*# '$&!'#&((&$)"*!!%$,!!"'(# #'+(# "$"&#$&#$&$)"*,$,+*!!#&"*!&",&$)"* '$&!$*&!,!&$)"*!(!(( ("#$*!%(,'+ "$"&!%%&+'&$)"*!!#&"*!&"+&$)"* (!!#, "!#&!!'&'%&$)"*!%!, '''",!"## "$%&!,$&!&$)"* (,!+(!!#&"$!&''&$)"* (+!'+!!#&"*!&"*&$)"* %"'!+ "$%&'"&("&$)"*!!*+%!!!#&"$!&'"&$)"*!!#&"$!&'%&$)"*!!"$"* +!#%# "!"&%(&#(&$)"*!%+"+!!*+!%

*+,-.,/012,34+5-!!"$"* *,"#' ("#$*!!#&!('&!,!&$)"* (%"$* "$"&#$&#$&$)"* "$%&!,$&!&$)"* '%&!*(&+"&$)"*!"'%*%!"$%'! (!!#,!!*+%!!%(,'+ +!#%#!%,+%'!$'+(' (+!'+!!#&!('&!,$&$)"*!""&!#%&*#&$)"* "!#&!!'&'%&$)"*!!#&"$!&'"&$)"* 789.%""":,.4,"4;.+<.4"=.,.+4>-"54.,.?@+@80?,"A90>0?"A90>0?BC!!";+9.D+>"0E@789.F,G+99H,<6 +#*,,!$*('' '$&!'#&((&$)"* #'+(# "!"&%(&#(&$)"* %,!#(!"!&+*&%"&$)"*,$,+*!(!(( %"'!+!!#&"$!&'$&$)"* %"($!#"",(!!%$,!!#&"$!&''&$)"*!%+"+!!%!,,'!%!""&!#%&*+&$)"*!$#''% "$"&!%%&+'&$)"* '*&$&!,+&$)"*!!#&"*!&"*&$)"*!!#&"$!&'!&$)"* '!&!'"&!((&$)"*!!#&"*!&",&$)"* '''",!!#&"$!&'%&$)"*!$$$,(,*'(+!!#&!(#&,*&$)"*!*+!% (,!+(!!"'(#!!#&"*!&"+&$)"* "!'&!'(&"",&$)"*!!#&"*!&"#&$)"*!"#"!!$!!%&'$() '!&!'"&!('&$)"*!%"$## '$&!$*&!,!&$)"* "$%&'"&("&$)"* "#'!"!!(+!'!,*$*#!$*++( 2+6.$!$07$&!"*&!!"##!!,%,( ' The Graph Properties of IP Inflation R E (G l ) can be measured for any sub-graph G l G with associated activity a l Equivalence classes are the only partitions of I or H that satisfy the rate-preserving equality: R E (G) = l a l a L R E (G l )

Pruning within ASN to find sub-networks We would like to interpret Equivalence Classes as independent networks, but they often traverse ASN or even country boundaries: To obtain a more interpretable set of equivalence classes, create a sub-graph G R G: find the modal ASN M h of each unique individual h Remove from G (set a hi to 0) any edge (h, i) such that i M h This restricts strong connected components in G R to within-asn clusters The set of removed edges A has weight equal to R E (G)/R E (G R )

Application: Waledac Logs (12/04-22/2009) UTS Tier Botmaster-Owned Infrastructure TSL Tier Repeater Tier Infected Hosts Spammer Tier Used SiLK to analyze 44 million log files over 3 different graphs Graph I H %a l R N R E G 667033 172283 1.00 3.87 4.56 G L G R

Removing Aliases to obtain G L Probability 0.00 0.02 0.04 0.06 0.08 0.10 1e 09 1e 07 1e 05 0.001 0.1 10 1000 1e+05 Nonzero Mobility Score Graph I H %a l R N R E G 667033 172283 1.00 3.87 4.56 G L 548997 172238 0.92 3.18 2.27 G R

Pruning within ASN to obtain G R : Graph I H %a l R N R E G 667033 172283 1.00 3.87 4.56 G L 548997 172238 0.92 3.18 2.27 G R 475665 172238 0.86 2.76 2.00

Equivalence Classes in G R 2048 1024 A Effective number of IPs: exp[s(p_i)] 512 256 128 64 32 16 8 D B 4 2 1 C 1 2 4 8 16 32 64 128 256 Effective Number of Hashes: exp[s(p_h)]

A Tale of Four Networks Graph I H a l R N R E A 6789 438 317435 15.50 9.08 B 145 533 119684 0.27 0.89 C 5 5 296 1.00 0.45 D 16 16 1746 1.00 6.06 1 0.7 0.5 0.3 0.2 0.1 A IP Addresses Machine IDs 0.01 0.001 1e 04 1st 438th 6789th

A Tale of Four Networks Graph I H a l R N R E A 6789 438 317435 15.50 9.08 B 145 533 119684 0.27 0.89 C 5 5 296 1.00 0.45 D 16 16 1746 1.00 6.06 1 0.7 0.5 0.3 0.2 0.1 B IP Addresses Machine IDs 0.01 0.001 1e 04 1st 145th 533rd

1st 5th A Tale of Four Networks Graph I H a l R N R E A 6789 438 317435 15.50 9.08 B 145 533 119684 0.27 0.89 C 5 5 296 1.00 0.45 D 16 16 1746 1.00 6.06 1 0.7 0.5 0.3 0.2 0.1 C IP Addresses Machine IDs 0.01 0.001 1e 04

A Tale of Four Networks Graph I H a l R N R E A 6789 438 317435 15.50 9.08 B 145 533 119684 0.27 0.89 C 5 5 296 1.00 0.45 D 16 16 1746 1.00 6.06 1 0.7 0.5 0.3 0.2 0.1 D IP Addresses Machine IDs 0.01 0.001 1e 04 1st 16th

Summary and Future work With this method and data, we are trying to answer a larger question: Can we learn about individuals in a hidden botnet by studying a visible one? Find specific static regions of NAT or DHCP pools across the world and transfer this information to hidden botnets Create a tool/method that adjusts raw IP address counts for network structure Learn how to find a set of most likely Equivalence Classes when IPs only are visible We are currently looking into learning about Conficker from this study of Waledac

Extra Slides

Subversive uses of SiLK Each Hash (eg 55530ea22bfee564631490025e ) assigned a unique integer ID (eg 10345 ) Each Hash marked as Repeater (R) or Spammer (S) level Each Login stored as a SiLK record using rwtuc: sip dip stime tcpflags 111.222.33.4 10345 2009/12/20T00:14:12 S 222.33.44.5 10345 2009/12/22T00:03:55 S... rwtuc UTS-formatted.txt --output-file=utslogs.rw

Subversive uses of SiLK Inter-ASN network created with a tuple file: sip dip 111.222.33.4 25667 223.156.255.4 25667... rwfilter UTSlogs.rw --tuple-file=edgestoremove.txt --pass=interasnlogs.rw --fail=intraasnlogs.rw Equivalence Class IDs and ASNs stored as P-maps: rwfilter UTSlogs.rw --pmap-file=eqclass:eqclasses.pmap --pmap-src=eq2100 --pass=stdout rwstats --sip --threshold=1 > EQ2100-IP-distribution.txt Summary tables created using rwuniq: rwuniq IntraASNlogs.rw --pmap-file=eqclass:eqclasses.pmap --pmap-file=asn:asns.pmap --fields=src-eqclass,src-asn --flows --sip-distinct --dip-distinct --stime src-eqclass src-asn Records stime-earliest sip-distin dip-distin EQ0 "AS5089 NTL Group Limited" 596 2009/12/12T21:14:45 1 1 EQ1 "AS4766 Korea Telecom" 45 2009/12/05T10:41:33 1 1 EQ3 "AS1221 Telstra Pty Ltd" 55 2009/12/08T04:43:00 10 1 EQ4 "AS17858 KRNIC" 628 2009/12/04T12:42:34 2 1