Chapter 6 A SUBSYSTEM FOR FAST (IP) FLUX BOTNET DETECTION 6.1 Introduction 6.1.1 Motivation Content Distribution Networks (CDNs) and Round-Robin DNS (RRDNS) are the two standard methods used for resource sharing [68]. RRDNS is a technique for load balancing, where IP changes happen in a round robin fashion. Content Delivery Networks (CDNs) are collections of servers that are located in dierent data centers around the world, in order to decrease the resource access time for users. These techniques are employed by legal commercial organizations for the benet of end users. Unfortunately, Fast Flux networks are developed using the same techniques as CDN`s and RRDNS. Fast uxing takes the advantage of DNS based load balancing by masking its own activities as if it is a CDN. Fast uxing uses many IP addresses that are hidden behind a single domain name just as large CDNs and Antivirus providers do. The IP addresses of domains involved in Fast Fluxing changes with extreme frequency in round-robin 85
Figure 6.1: Work ow of a fast-ux service fashion with very short Time-To-Live (TTL) for each DNS Resource Record (RR). Figure 6.1 shows the pictorial representation of a fast-ux service network. Fast ux domain names map to a new set of IP addresses, which are assigned to infected machines (botnet). While trying to access the domain name, the botnet automatically connects to any one of the infected computers within a short interval of time. Botnets use the fast uxing mechanism in order to conceal their Command and Control (C&C) servers. This technique prevents the identication of C&C servers by changing IP addresses frequently, so that it cannot be detected. Single uxing is a process when the IP addresses of malicious domains are altered within a short period of time. Double uxing is a process when the name servers (NS) of the domains are uxed along with the IPs. Fast ux botnets have been and are responsible for a plethora of activities like: money mule recruitment sites, phishing sites, illegal online pharmacies, illegal adult content sites, malicious browser exploit 86
Domain name getghosted.com welish.com Table 6.1: Fast-ux domains and their associated IP address IP List First set of IPs Second set IPs Third set of IPs 71.184.254.78 74.128.244.203 78.185.249.199 86.203.225.37 108.0.74.147 205.251.243.12 176.32.100.244 75.92.87.214 98.31.36.136 108.0.74.147 67.8.240.119 74.128.244.203 74.100.177.137 75.136.19.32 93.172.95.120 123.20.163.22 203.130.15.5 176.32.100.244 176.32.101.148 sites and web trap (distributing malware) [69]. To classify domains containing fast uxing properties, supervised machine learning classiers are used. Table 6.1 shows some of the fast ux domains and their associated IP addresses in a particular interval of time. These active domain names have been selected from ATLAS [70] global fast ux database. 6.1.2 Contributions In this research, the following conributions are made: Developed a system, capable of detecting single and double ux with high accuracy 98.2% and very low fasle positive in real-time. Provided two services; one for detection and the other for monitoring the detected FFSNs and employed One-Class SVM (OCSVM) with linear programming approach to classify the data after training the system with eight relevant features of fast ux domains. The framework was deployed at an Internet service provider location for more 87
than a year. The framework was able to identify several Fast Flux IPs which neither blacklisted nor had any history. 6.2 System Developed The system was developed and deployed with an aim to detect fast-ux domains automatically in real-time. After detecting the fast-ux domain, the system monitors its activities in an interval of ve seconds on a regular basis. Figure 6.2 presents the architectural diagram of the implemented system which consists of three modules 1) Data Collection 2) Feature selection 3) Classication 4) Continuous monitoring. 6.2.1 Data Collection Passive Sensor collects DNS Query/Response from the DNS Servers. It captures network trac from DNS Servers and passes it to an application. One Sensor can handle data from multiple DNS Servers. It collects data passively from DNS Servers. Collected DNS logs contain the details of user queries. Parsing unit parses the logs and extract domain names exclusively which have more than one IP addresses. The domain names and IP addresses are then fed into the feature selection module to extract relevant features. 6.2.2 Feature Selection Features play the most important role in classication. Quality of features determines the accuracy of classication. At most care must be given to feature extraction. Eight features such as Domain age, TTL, Number of IP addresses of a distinct DNS A record, 88
Figure 6.2: Architecture Diagram Autonomous System Distribution, National Distribution, and Organizational distribution were extracted for modelling. TTL: TTL (Time-To-Live) determines the life span of a record in a network. A record will be deleted from DNS Server Cache when the specied TTL expires [71] so that new IP address will be given to the bots. Fast ux domains maintain 89
shorter TTL for each record. Number of IP addresses of a distinct DNS Query: A record stands for Address record and an A record lookup resolves a hostname to an IP address [72]. In fast ux the TTL for each record is very less and hence each time TTL expires, a new set of IP addresses will be resolved. Thus the total number of accumulated IP address will be very large. Network Diversity: Diversity of network is another characteristic of fast ux. IPs of fast ux domains will be in located in dierent networks. Autonomous System Distribution: In order to reduce the chance of detection, fast uxing will have IP addresses from dierent Autonomous Systems [15]. Hence distinct Autonomous Systems will be high for a single domain. Geographical Distribution: A considerable amount of fast ux domains have IP's distributed among multiple countries. Double Fluxing Characterization: Second layer of complexity comes when a domain name shows the frequent changes in its Name Server. Organization: The IP addresses of a domain are owned by multiple organizations, this is a dening characteristic of fast ux domains. 90
6.3 Continous Monitoring The detected domain names are then fed into the continuous monitoring system. Here, the system continuously monitors the targeted domains with time duration of ve seconds. This system will actively query to check for any change in IP of the detected fast ux domain. The Fast ux domains having some DGA textual properties will also be checked. One of the salient textual features is having more randomness than a legitimate domain name. Randomness can be the inclusion of digits, absence of dictionary word, unusual word length etc. This checking is nontrivial because, sometime the suspected domain may be a good domain name with more fast-ux characteristics. So to remove these kinds of false positive results, the system will again give these domain names back to feature collection module with the newly available IPs for further classication. So this part of the system does an important role to remove literally the whole false positive and false negatives. The use of powerful Machine Learning algorithm as well as the Reputation checker system gives better prediction accuracy for every domain. From the analysis, a new category of fastux domains was observed; these domains will have only one IP address, which will be uxing continuously. These kinds of domains will be discarded from the rst analysis, so all suspected domains will be monitored separately to detect the uxy characteristics. Also these domains will be having high TTL value. Based on these characteristics, the suspected domains are observed. Also a white list is kept to remove good domains. 91
6.4 Experimental Results The developed system detects the fast ux domains in real-time. The system was trained based on the fast ux features. Since the output of the classier depends on the training data, the collected training set includes all the combination of features that is discussed earlier. For testing the system, the active fast ux domains from ATLAS global fast-ux database was chosen and collected data from in-house network manually and for normal domains data were collected from Alexa [63]. The system queried domain names from real-time DNS logs and active fast ux domains listed in ATLAS and Alexa database. In order to lter out the domain names that are part of CDNs [73], a reverse lookup of the domain was performed. If the reverse lookup of A-records of a domain doesn't show any similarity in its answer section, it is most likely to be a fast-ux domain. Hence it helps us to increase the accuracy of the analysis from 98.2% to almost 100%. Figures 6.3 (a) and 6.3 (b) show the number of unique IP addresses of fastux and non-fastux domains. From the gure it can be seen that fastux domains are having more IP addresses than normal domains. In Figures 6.4 (a) and 6.4 (b), number of unique name servers for both categories can be observed. Figures 6.5 (a) and 6.5 (b) show number of unique number of networks. 92
(a) (b) Figure 6.3: (a) IP Addresses of fast-ux domains, (b) IP Addresses of legitimate domains (a) (b) Figure 6.4: (a) Name servers of fast-ux Domains, (b) Name servers of normal domains 93
(a) (b) Figure 6.5: (a) Networks of fast-ux domains, (b) Networks of normal domains 6.5 Conclusion A system was developed to detect single uxing as well as double uxing propagating in a monitored network in real-time. In order to do so, one-class SVM and a continuous monitoring system were applied to track the activities of detected domains in real-time. This system detects very active domains which were approved by ATLAS global fast- ux domains with an accuracy of 98.2%. The developed system is capable of eciently ltering out the CDNs, Antivirus software etc. which cause false positives in detection of fast uxing domains. 94