Discovering new malicious domains using DNS and big data Case study: Fast Flux domains Dhia Mahjoub OpenDNS May 25 th, 2013
Background A@ackers seek to keep their operabons online at all Bmes The Network = the hosbng infrastructure is CRUCIAL Spam Phishing Malware distribubon Botnets
Fast Flux IP FLUX via DNS RECORDS SAME QUERY, DIFFERENT RESPONSES paypalz.com = 1.1.1.1 paypalz.com = 1.1.1.2 ad.malware.cn = 2.2.2.2 ad.malware.cn = 2.2.2.3 p2p.botnet.com = 3.3.3.3 p2p.botnet.com = 3.3.3.4 - Responses for domain s IP change very frequently - Responses for domain s NSs also change frequently - Large number of resource records paypalz.com = 1.1.1.3 ad.malware.cn = 2.2.2.4 p2p.botnet.com = 3.3.3.5 Must Shutdown or Block All Content Servers. Name Servers. via DNS Records. ns.botnet.com = 4.4.4.4 ns.bonet.com = 4.4.4.5 ns.bonet.com = 4.4.4.6 DOUBLE IP FLUX via DNS RECORDS SAME NAME SERVER, DIFFERENT RESPONSES
How to detect Fast Flux? Evidence collecbon AcBve probing (successive digs over Bme) +Easy to implement - Latency in detecbon and cha@y process Passive probing (passive DNS) +No latency and more discreet - Need to have passive DNS database Decision making Rule set based detecbon (like an IDS) Machine Learning
Features
Machine Learning SoluBon Algorithm: Random forest classifier Training Data set: PosiBve set: known fast flux domains (From the security community and our blacklist) NegaBve set: known benign domains (Alexa top domains)
Training the Classifier Extract domains from our BL where nbips >=3 and that have been live for the past week Filter and keep domains with @l <=90 or (nb IPs>=3 and nb countries >=2 and @l<=14400) Add fast flux domains published by security community PosiDve set: 2000+ FF domains with high accuracy NegaDve set: 25000+ domains from Alexa top 1 Million
Performance on sample labeled data Random Forest accuracy 99% (233 FF and 600 benign) Predicted class Actual class Positive Negative Positive Negative TP (229) FN (4) FP (0) TN (600)
OpenDNS Network Map
DNS big data querylogs authlogs
Placorm and tools used - Pig on Hadoop cluster - raw logs on HDFS - Scikit learn: Python module for machine learning; integrates w/matplotlib, numpy, scipy - Redis for in- memory lookup of domain features - Python, shell
Daily FF detecbon workflow 1-9.5+ million unique valid domains/day (w/ TTL < 4 hours) 2- Obtain IPs, NSs, and IPs of NSs (with TTLs) 3- Build features (21 features) 4- Build fast flux classifier model from the labeled data (BL + Alexa) 5- Run classifier on unlabeled filtered daily data 6- Filter out domains already in BL and WL 7- Build clusters of related domains, IPs, NSs 8- Keep clusters of domains recently registered è a few hundred new FF domains discovered daily
Example day s numbers - Daily log of 9,609,478 domains with IP, TTL - 435,837 domains have resolving NS, TTL - 410,072 unique NSs, with IP, TTL - 125,021 domains with all features - 1,320 discovered FF domains
Main daily discoveries Work from home, fat loss, fake news spam domains Russian dabng domains Canadian Pharmacy domains Kelihos downloader domains used by BH and Red Kit EK Various Trojan CnC domains
Expand the FF discovered set 1. Take the set of FF discovered domains 2. Using SGraph (passive DNS), get all IPs that the FF domains resolve to 3. Get all domains that resolve to those IPs 4. Apply filtering heurisbcs è Expand the graph of FF domains, IPs More accurate with FF botnet IPs than with VPS Applies also to associabon by name server è Flag new fresh suspicious domains
Use cases (SGraph demo) ns[1-4].mydomainvps.pl ns[1-4].speedyvps.su ns[1-4].funnyns.su ns[1-4].feva.pl ns[1-4].kimd.pl ns[1-4].sl8.pl xixuungo.dota.fi combine dynamic dns and fasclux
Use case: FF botnet size 1. Take daily sample of kelihos domains (56 domains) 2. From SGraph (passive DNS), get all IPs they resolve to (4821 IPs, 1048 alive) 3. Get all domains that resolve to those IPs. Extract only 2LDs registered in 2013 (357 domains) 4. Get all IPs that these domains resolve to (52565 IPs) è Total final number of unique IPs is 52565, 12368 IPs are alive (esbmate on size of botnet)
FF botnet size (cont d) Set- 1 D Set- 2 IP Set- 3 D D Set- 4 IP IP D D IP D D IP IP 56 Kelihos domains 4821 IPs 1048 alive 357 domains IP 52565 IPs 12368 alive
Use case: FF botnet IPs map
Thank you (Q & A)