From Correlation to Causation: Active Delay Injection for Service Dependency Detection Christopher Kruegel Computer Security Group ARO MURI Meeting ICSI, Berkeley, November 15, 2012
Correlation Engine COAs Data Data Data Data Real World Enterprise Network Mission Cyber-Assets Simulation/Live Security Exercises Analysis to get up-to-date view of cyber-assets Analyze and Characterize Attackers Analysis to determine dependencies between assets and missions Predict Future Actions Mission Model Cyber-Assets Model Create semantically-rich view of cyber-mission status Sensor Alerts Data Impact Analysis ARO MURI Meeting, Berkeley, November 15, 2011 2
Motivation Thrust I: Obtaining an up-to-date view of the available cyber-assets Need to know and model assets on your network network services (beyond IP address and ports) Thrust II: Obtaining understanding of the dependencies between missions and assets Find dependencies and redundancies between services Find relationships (mappings) between missions and assets Find assets and activities critical for network (or particular mission) ARO MURI Meeting, Berkeley, November 15, 2011 3
Accomplishments Year 1 models to fingerprint specific programs and network services track services and identify bot-infected machines Year 2 service dependency model algorithms for ranking assets and services Year 3 develop techniques and tools to extract indirect dependencies between missions (activities) and assets develop techniques and tools to determine the effects of service failures (using fault injection) ARO MURI Meeting, Berkeley, November 15, 2011 4
Quick Recap and Updates Determine relationships between services one service relies on another one (direct dependency) two services needed together (indirect dependency) B DNS Web A LDAP A B Mail C C ARO MURI Meeting, Berkeley, November 15, 2011 5
Quick Recap and Updates Extract activities and their related assets activity = set of services that cooperate to achieve a higher-level goal building blocks for missions of course, this could be done manually we propose an automated approach (not all activities are obvious) We proposed an approach based on passive observation of network traffic conducted experiments in the CS network at ARO MURI Meeting, Berkeley, November 15, 2011 6
Quick Recap and Updates In this period, we evaluated our tool on traffic collected at LBNL 6.33 billion records (150 GB of NetFlow) 15 days worth of data 5,593 missions and 998 backup relations interesting examples currently under investigation ARO MURI Meeting, Berkeley, November 15, 2011 7
Extracting Dependencies Basic idea of our passive activity extraction approach Find multiple services that are all correlated intuition is that multiple services that work together do this for a purpose; the network is leveraged to achieve a certain goal Problems correlation does not imply causation false positives direction of dependency cannot be determined ARO MURI Meeting, Berkeley, November 15, 2011 8
Extracting Dependencies Basic Idea Perform active discovery actively perturb traffic for service A, monitor how service B reacts when B depends on service A, we expect to see the effect of perturbation when B does not depend on A, there should be no effect How to introduce perturbations introduce delays into requests (flows) to service A active watermarking, but for flows, not for packets ARO MURI Meeting, Berkeley, November 15, 2011 9
Introducing Delays Service A Service B Idle period Busy period ARO MURI Meeting, Berkeley, November 15, 2011 10
Introducing Delays In the real world, idle and busy periods not as easily detectable unrelated requests unexpected delays caching effects Need (many) more than one observation period (window) Need to perform statistical tests ARO MURI Meeting, Berkeley, November 15, 2011 11
Statistical Tests Unknown distribution of service requests D(μ, σ) In case service has dependency, ρ delayed requests result in Idle period D 1 (μ (1-ρ), σ 1 ) Busy period D 2 (μ (1+ρ), σ 2 ) Hypothesis: Two services are independent, hence μ idle = μ busy ARO MURI Meeting, Berkeley, November 15, 2011 12
Statistical Tests Independent samples t-test We can do better: Paired samples t-test ARO MURI Meeting, Berkeley, November 15, 2011 13
Statistical Tests Even better Paired Wilcoxon test When the null hypothesis is rejected, we have found a dependency For all three tests, we can show that increasing the number of sample intervals will eventually allow us to make a decision (even when the fraction of delayed requests is very small) ARO MURI Meeting, Berkeley, November 15, 2011 14
Simulations Demonstrate the desirable properties of the system (more data yields precise results) ARO MURI Meeting, Berkeley, November 15, 2011 15
Simulations Demonstrate the desirable properties of the system (more data yields precise results) ARO MURI Meeting, Berkeley, November 15, 2011 16
Real World Experiment Installed a delay mechanism at the CS Department Perturbed connections from CS lab machines to 54 services 3.5 month worth of data 11.5 million connections to interesting services 500ms delay introduced ARO MURI Meeting, Berkeley, November 15, 2011 17
Results 331 dependencies file server depends on DNS mail applications depend on the file server fileserver depends on backup fileserver LDAP server depends on backup LDAP servers web server depends on LDAP server Direction can be detected here, some services depend on NFS (('128.111.43.46', 1172, 6), ('128.111.43.46', 1174, 6), ('128.111.43.46', 2049, 6)) Causality analysis can remove false positives (('128.111.41.24', 21, 6), ('128.111.41.39', 5308, 6)) ARO MURI Meeting, Berkeley, November 15, 2011 18
Conclusions Work focused on Thrust II Leveraging service models to rank network assets and to build foundation for impact analysis and what-if scenarios Active discovery of dependencies introduced novel flow watermarking scheme multiple statistical tests to identify even small perturbations Simulations and experimental evaluation ARO MURI Meeting, Berkeley, November 15, 2011 19
Future Work Develop techniques and tools to extract asset information and latent service capabilities through active probing useful to find service that are not actively contacted identify service dependencies and causality with better confidence Leveraging dependencies for sophisticated what-if analysis Semantic analysis and labeling of network assets (what is a network proxy, NAT device, ) based on network behaviors ARO MURI Meeting, Berkeley, November 15, 2011 20
Thank You ARO MURI Meeting, Berkeley, November 15, 2011 21