A Taxonomy of Botnet Structures

A Taxonomy of Botnet Structures Martin Lyckander martily 08/04/2016

About the paper David Dagon, Guofei Gu, Christopher P. Lee, Wenke Lee Georgia Institute of Technology Published in 2007

What is a botnet? Hosts under control of a third party Infection vectors vary Can be self propogating Different means of communication in different botnets Various capabilites: Spam DDoS Keylogging / Data exfiltration Scanning/Bruteforce Clickfraud Two categories of reasons when a bot leaves the botnet Random failures Targeted responses Botnet topology can be seen as a network graph

The botmaster

The need for a taxonomy Botnets are diverse Size may vary greatly Threat of a botnet is not only about number of infected hosts High speed internet vs ADSL Uptime of nodes in the botnet Determine the potential of the botnet analysed

Purpose of a taxonomy (a) assist the defender in identifying possible types of botnets (b) describe key properties of botnet classes, so researchers may focus their efforts on beneficial response technologies. One method to take down one type of botnet is not necessarily as effective on other types

Metrics Effectiveness Robustness Efficiency

Effectiveness Measure of overall utility to the botmaster Size (The giant component, S) and bandwidth The giant component is the largest online/connected portion of bots reachable by the botmaster In a DDoS: largest amount of bots that can receive and execute commands Botnets are diurnal - affects available bandwidth Often related to link speed This is probably a lesser factor today in some parts of the world than when the paper was written Home-routers in botnets: http://www.securityweek.com/large-ddos-botnet-powered-routersinfected- spike -malware In the future: IoT, cellphones

Effectiveness cont. Available average bandwidth from a bot: B Complex problem for a single link - for botnets, even harder B is the average cumulative bandwidth available to the botmaster under ideal circumstances The paper classifies bots based on link speed Modem (type 1) DSL/cable (type 2) High speed internet (type 3) The chance of a bot belonging to a group is P, M=Max network bandwidth, A=Network bandwidth, W= Probability of a bot being online

Efficiency Communication in the botnet - C&C messages, updates or data exfiltration Network diameter The geodesic length between nodes Degrees of separation Six degrees of separation - l = 6 The inverse, l-inv is used in the taxonomy Average length of the shortest edge connecting two nodes If l-inverse is small, the communication can ble classified as slow. l-inv = 0, no connection l-inv = 1, fully connected d(v,w) = distance between node v and w

Efficiency cont. Distance is not the physical connections between the nodes One physical jump(lan) between could be several jumps in the botnet Topology defined by the botmaster The ideal network diameter is l-inv=1

Robustness The network diameter (l-inverse) is also relevant for robustness High connectivity between bots means high fault-tolerance Bots are added and removed from the botnet constantly Instead of only using the network diameter, local transitivity can be used to measure redundancy Given three nodes, u, w, v, with the existing pairs {u, w} and {u, v}, local transitivity measures the likelihood of u and v also being connected Clustering coefficient - average degree of local transitivity: (gamma) Ev is the number of edges around node v. Kv is the number of nodes around node v

Robustness cont. The three nodes u,v,w forms a triad measures the number of triads divided by the maximal number of triads = 1 means that the botnet topology is a complete mesh Local transitivity is important for some types of botnets Warez Key-/password-cracking Bruteforcing

Botnet network models

Erdős Rényi Random Graph Models Botnet structured as a random graph Equal probability N-1 that one node is connected to an other This means that a bot must know the address of all other bots to potentially create an edge Botmasters limit the maximum number of connections for their hosts Random graphs require some central logging of nodes in the network The first bot in a chain do not get information about subsequent infections Easy to discover infections for honeypot operators A challenge for botnets distributed through scanning/spam The first in the infection chain does not know of subsequent infections Scanning for active bots is a possibility

Erdős Rényi Random Graph Models

Watts-Strogatz Small World Models Network is created in a ring Each node has a probability of being connected to nodes on the opposite side of the ring During spreading in a self-propagating botnet: A new infection can receive a list of previously infected victims When the infected hosts then passes along the list of victims to new infections it appends its own address Typically limited number of addresses in list to hinder security researchers

Barabási-Albert Scale Free Model Highly connected central nodes, hubs Leaf nodes has fewer connections IRC based botnets Very vulnerable to targeted responses by researchers Taking down the central hubs, e.g. the IRC servers used

P2P models Structured and unstructured topologies The unstructured P2P botnets tend to have similar link distributions as the scale free botnets Some nodes have a much larger peer list than others Distributed hash table(dht) Structured botnets are more similar to random networks, as each bot in the botnet is connected to approximately the same amount of other bots Kazaa/Gnutella

Response strategies The response strategies proposed is based on previous research, and an empirical study on two different botnets in January 2006 Previously known: Targeting C&C infrastructure is efficient!

Random graph and P2P models Empirical studies have shown a median node degree k = 5,5 Network diameter is logarithmically increasing with values for k, but this is only for larger values of k. Realistic values show a linear growth Giant (S), number of reachable hosts for the botmaster Local transitivity ( ) is also logarithmically increasing, but not for realistic values of k

Random graph and P2P models - loss of nodes Targeted responses and random failures have the same effect Low impact! P2P networks often have a k equal to log N where N is the size of the botnet Therefore slightly more resilient than random graph Loss of nodes are constant in the three metrics Random graph and p2p botnets are very resilient Remediation techniques Remove a large number of nodes at once Targeted respones : Address list poisoning, P2P index poisoning

Wattz-Strogatz model Research shows some botnets using this model Low utility to the botmaster The average degree in a small world model is equal to the number of edges each vertex has Constant decay of all metrics as nodes are removed Other advantages Stealthy propogation Anonymity In other domains researchers state that small world model is essentially a random graph

Scale free and structured P2P models Targeted responses are highly effective The core size, C, is the number of bots which function as hubs Distributing commands 5k botnet Adding a large amount of cores does not affect network diameter measures the number of triads Dip in the graph is caused by Core-nodes forming squares, while triads are measured locally Upon adding more cores, transitivity grows as Core-nodes also form triads

Transitivity loss in scale free The botmaster whishes to avoid transitivity A low amount of core nodes makes the botnet vulnerable to takedowns By increasing number of links for leaf nodes, the dip is lower A high link count makes bots vulnerable to anomaly detection (e.g. netflow analysis) Changes in transitivity vs core size

Scale free targeted responses and random loss Centralizing information makes the network vulnerable Targeted responses are highly effective

Case study: Nugache botnet Uses the WASTE file sharing protocol Hard-coded IP-addresses to retrieve a list of initial peers Continues to connect and discover to new peers Spread through P2P, resulting mesh is a scale free network Low link count for each leaf node Link count in Nugache leaf nodes

Takedown of the ZeroAccess botnet (Not covered in the paper) Clickfraud, search-hijacking P2P based New peers were pushed to all bots using a broadcast mechanism Unstructured Cost online advertizers $2,7 million each month More than 2 million infected hosts, 800k active each day Takedown in 2013 by Microsoft, Europol and FBI Sinkholed 18 IP-adresses, 49 domains Targeted the mechanism to broadcast new configurations/updates to newly infected bots P2P layer was still intact, botnet masters still making money Botnet still alive today, but at limited capacity http://www.darkreading.com/attacks-and-breaches/microsoft-fails-to-nuke-zeroaccess-botnet/d/d-id/1113008 https://news.microsoft.com/2013/12/05/microsoft-the-fbi-europol-and-industry-partners-disrupt-the-notorious-zeroaccess-botnet/#sm.0000a9ziod396dqxqk714erddbw47

Empirical study: Available bandwidth in botnets Botnet 1: 50,000 unique members, sample size of 7,326 Measured in January 2005 Botnet 2: 48,000 unique members, sample size of 3,391 Measured in January 2006

Bandwidth in botnets cont. - Taking diurnal activity into account, with [2, 4,24] for each class of bots - Botnet 1 has a DDoS capability of ~1 Gbps - 2,000 less members in botnet 2, but only half the DDoS capability - Could potentially be used to determine which botnet to target in takedowns - Targeted responses against high speed bots can be very impactfull Botnet 1 Botnet 2 Average available bandwidth ~53 Kbps ~39 Kbps Accounted for diurnal ~22 Kbps ~14 Kbps

Summary Proposed metrics to measure botnets utility to the botmaster Structured P2P botnets and random graph botnets are resilient to both targeted and random responses Targeted responses are effective on scale free botnets

Questions?

Further reading - Paper published in 2013 about resilience of different P2P botnets - P2PWNED - Modeling and Evaluating the Resilience of Peer-to-Peer Botnets - http://www.ieee-security.org/tc/sp2013/papers/4977a097.pdf