Intrusion Detection by Combining and Clustering Diverse Monitor Data TSS/ACC Seminar April 5, 26 Atul Bohara and Uttam Thakore PI: Bill Sanders
Outline Motivation Overview of the approach Feature extraction and selection Clustering Intrusion detection Results Future directions 2
Motivation Monitoring in enterprise systems is extremely diverse and verbose 3
Motivation Monitoring in enterprise systems is extremely diverse and verbose Image: http://blog.bro.org/22//monster-logs.html Image: http://blog.wildpackets.com/28//28/simplify_analysis_- _packet-based_traffic_netflow_statistics_in_one_ui.html 4
Motivation Monitoring in enterprise systems is extremely diverse and verbose Image: http://blog.bro.org/22//monster-logs.html Problems: High false positive rate and verbosity Limited ability to combine and analyze heterogeneous data together Require significant input from system expert Image: http://blog.wildpackets.com/28//28/simplify_analysis_- _packet-based_traffic_netflow_statistics_in_one_ui.html 5
Our Contributions We fuse data from the host-level and network-level context to perform anomaly detection We use unsupervised clustering to identify usage behavior patterns in the data and detect anomalous behavior We find attacks that are undetectable with individual monitors alone 6
Overview of Approach System Logs Firewall Logs Data Sources Feature Extraction Feature Selection & Fusion Cluster Analysis Intrusion Detection 7
System Logs Firewall Logs Data Sources 8
Dataset Description VAST Challenge 2, Mini Challenge 2 dataset [link] Small enterprise network Types of logs Network-level: Firewall logs Snort IDS logs Host-level Operating system security event logs (system logs) Attacks were injected into the logs Firewall logs OS security event logs Snort IDS 9
Threat Model Network flooding attacks Distributed Denial of Service (DDoS) from Internet Port scan from external host Port scan from workstations Behavior-changing malware Worm installed on workstations
System Logs Firewall Logs Feature Extraction
Feature Extraction Four types of features: Identification IP address and timestamp Network traffic-based source/destination IP addresses and ports, TCP connections Service-based connections to different types of servers, e.g., DNS, database, web Authentication-based significant authentication events from system logs Aggregated into one-minute time intervals 2
Extracted Example System Log IP address Timestamp # failed logon events from this host (4625) # special privileges assignment to new logon (4672) # target domain name = NT AUTHORITY # remote interactive logons (logon type = ) # NTLM authentications/logons # distinct subject logon IDs Example Firewall Log IP address Timestamp # of unique destination IPs # of unique source ports # of connections built # of accesses to DNS server IPs # of accesses to database IPs in orange are identification features. 3
System Logs Firewall Logs Feature Selection & Fusion 4
Feature Selection System log feature distributions Not all features are equal! Some are correlated E.g., number of NTLM authentications and number of authentication attempts with host name starting with WS Some are not useful for clustering E.g., number of successful logon events High dimensionality problem Firewall log feature distributions Techniques for feature selection: Pearson correlation coefficient to remove strongly correlated features Compare normalized average feature value across clusters 5
Extracted System Log Total number of features 36 Number of identification features Number of service-based features Number of authenticationbased features 2 2 32 Firewall Log Total number of features 7 Number of identification features Number of network traffic-based features Number of service-based features 2 6 9 Total number of features after selection 2 Total number of features after selection 2 6
Fusion We fuse the logs using inner join on identification features Firewall feature vector Syslog feature vector Fused feature vector Identification Service-based Network traffic-based Authentication-based 7
System Logs Firewall Logs Cluster Analysis 8
Clustering Techniques Apply k-means and DBSCAN clustering algorithms Algorithm Type Cluster shape Noise handling k-means Centroid based Spherical clusters DBSCAN Density based Arbitrary shaped clusters No Yes Parameter selection WCSD, Silhouettes k-dist graph 9
Cluster Analysis DBSCAN Clustering on Firewall Logs PC3 Outliers : 8 Cluster : 4876 Cluster2 : 825 Cluster3 : 8 Cluster4 : 39 Cluster5 : 2 Cluster6 : 53 Cluster7 : 84.5.5 PC2 PC 2
Cluster Analysis DBSCAN Clustering on Firewall Logs Normalized Average Feature Values Cluster Cluster 2 PC3 Outliers : 8 Cluster : 4876 Cluster2 : 825 Cluster3 : 8 Cluster4 : 39 Cluster5 : 2 Cluster6 : 53 Cluster7 : 84 2 3 4 5 6 7 8 9 Cluster 6 2 3 4 5 6 7 8 9 Cluster 7.5.5 PC2 PC 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9
Cluster Analysis DBSCAN Clustering on Firewall + System Logs Outliers : 8 Cluster : 25342 Cluster2 : 54 Cluster3 : 37 Cluster4 : 23 PC3 PC2 -.5 PC.5 22
Cluster Analysis DBSCAN Clustering on Firewall + System Logs Normalized Average Feature Values Cluster Cluster 2 PC3 PC2 -.5 Outliers : 8 Cluster : 25342 Cluster2 : 54 Cluster3 : 37 Cluster4 : 23 PC.5 3 5 7 9 35792232527 Cluster 3 3 5 7 9 35792232527 3 5 7 9 35792232527 Cluster 4 3 5 7 9 35792232527
System Logs Firewall Logs Intrusion Detection 24
Intrusion Detection Approach More than 8% data points are captured with in 3 clusters These clusters contained more than 5% hosts have high probability mass at low values 25
Intrusion Detection Approach More than 8% data points are captured with in 3 clusters These clusters contained more than 5% hosts have high probability mass at low values Our approach: Examine the size and distribution of hosts for each clusters 26
Intrusion Detection Approach (contd.) Clusters 27
Intrusion Detection Approach (contd.) Normal or Anomalous 28
Intrusion Detection Approach (contd.) Normal or Anomalous Feature Distributions Cluster Cluster 2 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster Cluster 2 6 Cluster 27 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 26 7 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 29
Intrusion Detection Approach (contd.) Normal or Anomalous Feature Distributions Distances Cluster Cluster 2 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster Cluster 2 6 Cluster 27 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 26 7 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 3
Intrusion Detection Approach (contd.) Normal or Anomalous Feature Distributions Distances Normalcy Cluster Cluster 2 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster Cluster 2 6 Cluster 27 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 26 7 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 Cluster 6 Cluster 7 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 3
Intrusion Detection Results: Firewall Logs PC3 Outliers : 8 Cluster : 4876 Cluster2 : 825 Cluster3 : 8 Cluster4 : 39 Cluster5 : 2 Cluster6 : 53 Cluster7 : 84 Anomalous clusters: Clusters 6,5,3,4,7 Cluster 6: DoS by external hosts Cluster 2 3 4 5 6 7 8 9 Cluster Cluster Cluster 2 6.5.5 PC2 PC value 2 3 4 5 6 7 8 9 Cluster 6 value 2 3 42 53 64 75 86 97 8 9 Cluster 7 32
Intrusion Detection Results: Firewall Logs PC3 Outliers : 8 Cluster : 4876 Cluster2 : 825 Cluster3 : 8 Cluster4 : 39 Cluster5 : 2 Cluster6 : 53 Cluster7 : 84 Anomalous clusters: Clusters 6,5,3,4,7 Cluster 6: DoS by external hosts.5 Cluster 5: Port scan by internal hosts.5 PC2 PC Cluster 3, 4, 7: Anomalous but not malicious 33
Intrusion Detection Results: Firewall + System Logs Outliers : 8 Cluster : 25342 Cluster2 : 54 Cluster3 : 37 Cluster4 : 23 Anomalous clusters: Clusters 2,4,3 Cluster 2: Worm infected host PC3 PC2 -.5 PC.5 ge value Cluster Cluster 3537597 9 35792232527 Cluster Cluster 3 3 ge value ge value Cluster Cluster 2 2 3537597 9 35792232527 Cluster Cluster 4 4 ge value 34
Intrusion Detection Results: Firewall + System Logs Outliers : 8 Cluster : 25342 Cluster2 : 54 Cluster3 : 37 Cluster4 : 23 Anomalous clusters: Clusters 2,4,3 Cluster 2: Worm infected host PC3 Cluster 4: Port scan by internal hosts PC2 -.5 PC.5 Cluster 3: Anomalous but not malicious 35
Intrusion Detection Summary Cluster ID % Data points No. of Unique hosts Represented Attack Firewall data Significant features 6.72 5 DoS # of unique source ports, # of connections built, # of connections torn down 5.65 3 Port scan # of unique destination IPs Firewall + System log data 4.9 2 Port scan # of connections built, # of connections torn down 2 Worm # anonymous target user names, # NTLM authentications, # session keys requested 36
Conclusion Intrusion detection using clustering techniques Without labelling the data Without explicit profile for normal behavior Generic time-aware features to detect malicious behavior Can be used for other attack types, e.g., brute-force attacks and data exfiltration Allow data fusion across monitors Additional visibility into the system behavior Average feature values analysis More holistic view Data reduction 37
Discussion Works well for the attacks that change the system behavior, including zero-days Complementary to rule-based intrusion detection approaches Might not work properly for the attacks that do not change the outward behavior of hosts, such as privilege escalation However, a better choice of features might change this for some attacks 38
Future Directions Attack classes and features Classify security attacks and respective features to detect them Data-driven feature selection Clustering algorithm choice Hierarchical clustering Distribution-based clustering Online classification Online clustering Train classifier using cluster labels 39
Questions? 4
Thank you! 4