Polygraph: Automatically Generating Signatures for Polymorphic Worms

Similar documents
Misleading Worm Signature Generators Using Deliberate Noise Injection

SSL Automated Signatures

Limits of Learning-based Signature Generation with Adversaries

Survey of Polymorphic Worm Signatures. Mesra, Ranchi, India. Mesra, Ranchi, India. Abstract

A Novel Approach to Detect and Prevent Known and Unknown Attacks in Local Area Network

Automated Signature Generation: Overview and the NoAH Approach. Bernhard Tellenbach

Evading Network Anomaly Detection Sytems - Fogla,Lee. Divya Muthukumaran

Polymorphic Blending Attacks. Slides by Jelena Mirkovic

Hamsa : Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience

D1.2: Attack Detection and Signature Generation

An Efficient Signature-Based Approach for Automatic Detection of Internet Worms over Large-Scale Networks

Network Intrusion Detection with Semantics-Aware Capability

Activating Intrusion Prevention Service

Network Intrusion Detection with Semantics-Aware Capability W. Scheirer, M. Chuah {wjs3, Department of Computer Science and

Generating Simplified Regular Expression Signatures for Polymorphic Worms

A Modular Approach for Implementation of Honeypots in Cyber Security

Feature Omission Vulnerabilities: Thwarting Signature Generation for Polymorphic Worms

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

PolyS: Network-based Signature Generation for Zero-day Polymorphic Worms

COMPUTER worms are serious threats to the Internet,

CSE 565 Computer Security Fall 2018

Binary Protector: Intrusion Detection in Multitier Web Applications

Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms

Syntax vs. Semantics: Competing Approaches to Dynamic Network Intrusion Detection

Network Traffic Classification by Common Subsequence Finding

ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing

Defending against Polymorphic Attacks: Recent Results and Open Questions

Polymorphic Worm Detection Using Structural Information of Executables

"GET /cgi-bin/purchase?itemid=109agfe111;ypcat%20passwd mail 200

An Architecture for Generating Semantic-Aware Signatures

Lecture 12 Malware Defenses. Stephen Checkoway University of Illinois at Chicago CS 487 Fall 2017 Slides based on Bailey s ECE 422

Intrusion Detection System

(WHASG) Automatic SNORT Signatures Generation by using Honeypot

Fast and Effective Worm Fingerprinting via Machine Learning

Towards Automatic Generation of Vulnerability- Based Signatures

2. INTRUDER DETECTION SYSTEMS

Basic Concepts in Intrusion Detection

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 12

Intrusion Detection by Combining and Clustering Diverse Monitor Data

Advanced Allergy Attacks: Does a Corpus Really Help?

Comparison of Firewall, Intrusion Prevention and Antivirus Technologies

Intrusion Detection System (IDS) IT443 Network Security Administration Slides courtesy of Bo Sheng

Mapping Internet Sensors with Probe Response Attacks

Traffic Classification Using Visual Motifs: An Empirical Evaluation

Network Security Terms. Based on slides from gursimrandhillon.files.wordpress.com

Enhancing Byte-Level Network Intrusion Detection Signatures with Context

CUTE: traffic Classification Using TErms

Firewalls, Tunnels, and Network Intrusion Detection

Mapping Internet Sensors with Probe Response Attacks

ACS / Computer Security And Privacy. Fall 2018 Mid-Term Review

Developing the Sensor Capability in Cyber Security

McPAD and HMM-Web: two different approaches for the detection of attacks against Web applications

Intrusion Detection. Comp Sci 3600 Security. Introduction. Analysis. Host-based. Network-based. Distributed or hybrid. ID data standards.

Efficient Content-Based Detection of Zero-Day Worms

Encrypted mal-ware detection

White Paper. New Gateway Anti-Malware Technology Sets the Bar for Web Threat Protection

Intrusion Detection System using AI and Machine Learning Algorithm

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

Symantec Ransomware Protection

Anagram: A Content Anomaly Detector Resistant to Mimicry Attack 1

Intrusion Detection Systems and Network Security

Polymorphic Blending Attacks

Revealing Skype Traffic: When Randomness Plays with You

OSSIM Fast Guide

CIS 551 / TCOM 401 Computer and Network Security. Spring 2006 Lecture 22

CSE543 - Computer and Network Security Module: Intrusion Detection

CSE543 - Computer and Network Security Module: Intrusion Detection

VisFlowConnect-IP: An Animated Link Analysis Tool For Visualizing Netflows

Anomaly Detection of Network Traffic Based on Analytical Discrete Wavelet Transform. Author : Marius SALAGEAN, Ioana FIROIU 10 JUNE /06/10

Survey of Cyber Moving Targets. Presented By Sharani Sankaran

Artificial Immune System against Viral Attack

Intrusion Detection and Malware Analysis

WormShield: Fast Worm Signature Generation with Distributed Fingerprint Aggregation

Computer Security: Principles and Practice

CIH

SentinelOne Technical Brief

Autograph: Toward Automated, Distributed Worm Signature Detection

Intrusion Detection System

ERT Threat Alert New Risks Revealed by Mirai Botnet November 2, 2016

Practical Techniques for Regeneration and Immunization of COTS Applications

Self-Learning Systems for Network Intrusion Detection

UMSSIA INTRUSION DETECTION

CompTIA E2C Security+ (2008 Edition) Exam Exam.

OpenFire: Opening Networks to Reduce Network Attacks on Legitimate Services

Intrusion Detection Systems

Tool Manual (Version I)

Actual4Test. Actual4test - actual test exam dumps-pass for IT exams

HYBRID HONEYPOT -SYSTEM FOR PRESERVING PRIVACY IN NETWORKS

IDS: Signature Detection

Emerging Threat Intelligence using IDS/IPS. Chris Arman Kiloyan

Mahalanobis Distance Map Approach for Anomaly Detection

ACS-3921/ Computer Security And Privacy. Chapter 9 Firewalls and Intrusion Prevention Systems

Gladiator Incident Alert

Prospector: accurate analysis of heap and stack overflows by means of age stamps

CSC 574 Computer and Network Security. Firewalls and IDS

Collaborative Intrusion Detection System : A Framework for Accurate and Efficient IDS. Outline

CE Advanced Network Security

Network Anomaly Detection Using Autonomous System Flow Aggregates

Transcription:

Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome Brad Karp Dawn Song Presented by: Jeffrey Kirby

Overview Motivation Polygraph Signature Generation Algorithm Evaluation Attack Analysis Related Work Conclusion 11/11/10 2

Motivation Worms A computer worm is a software program that is designed to copy itself from one computer to another, without human interaction. Unlike a computer virus, a worm can copy itself automatically. [2] Some popular examples: ILOVEYOU (2000) Code Red (2001) Sasser (2004) Conficker (2008) Stuxnet (2010) 11/11/10 3

Motivation Mitigation Strategies Better, more secure software Alter source code (commercial software?) Defend against every threat Security updates, patches After-the-fact Intrusion Detection Systems (IDSs) Prevent the exploit from accessing the system 11/11/10 4

Motivation Intrusion Detection Systems (IDSs) Monitors network for suspicious activity or traffic May send an alert when such activity or traffic is found May block traffic from source IP address May be network-based (NIDS) or host-based (HIDS) May be signature-based (SBS) or anomaly-based (ABS) 11/11/10 5

Motivation Signature-based Intrusion Detection Systems (SBSs) Compares flows on the network against a database of signatures corresponding to known attacks A signature may be some portion of a worm's payload which is long enough to correctly match the worm without causing false positives Signatures are created by security experts who study network traces after an attack has taken place (hours or days) IDS only as good as the signatures used for matching 11/11/10 6

Motivation Signature-based Intrusion Detection Systems (SBSs) In order to react more quickly to attacks, automated signature generation has been developed Pattern-based Honeycomb Autograph Earlybird Polygraph Semantic-based TaintCheck motif-finding 11/11/10 7

Motivation Signature-based Intrusion Detection Systems (SBSs) Whether manual or automated signature generation, both assume there is a unique contiguous substring which will remain invariant across worm connections and is long enough to be specific But, a worm may be polymorphic, meaning that its content can change among different instances while still having the same effect In this case, a single contiguous substring will not match all variants of a given worm Polymorphic libraries are readily available 11/11/10 8

Polygraph Three classes of bytes 1. Invariant bytes must remain the same in order for the exploit to function properly (e.g. frames / overwrite values) 2. Code bytes encrypted and/or obfuscated code which will be executed by the worm 1. Wildcard bytes values do not affect exploit functionality 11/11/10 9

Polygraph Three Signature Classes for Polymorphic Worms 1. Conjunction signatures matches if all tokens in the set are found in the payload 2. Token-subsequence signatures similar to conjunction but with order constraint therefore, more specific than conjunction 3. Bayes signatures tokens present in the flow used to calculate a probability that the flow is a worm considered a worm if the score is above some threshold 11/11/10 10

Polygraph Signature Generation 1. Preprocessing This process separates the suspicious flows into tokens A token is a contiguous byte sequence 2. Single Signature Generation Conjunction Token-subsequence Bayes 3. Multiple Signature Generation 11/11/10 11

Polygraph Preprocessing extract all distinct substrings with minimum length α which occur in at least K out of the total n samples distinct means that a substring should not be considered a token if it is a substring of another token, unless it is found in at least K out of n samples not as a substring of the larger token Example: if HTTP is a token (occurs in at least K out of n samples), TTP is only considered a token if it is found in at least K out of n samples not as a substring of HTTP 11/11/10 12

Polygraph Preprocessing (cont'd.) uses a modification of longest substring algorithm to return a set of substrings which include all distinct substrings non-distinct substrings may be pruned to discard irrelevant, nontoken data from this point on, each flow is represented as a sequence of tokens only this process runs in linear time with respect to the number of samples 11/11/10 13

Polygraph Single Signature Generation (Conjunction) Conjunction signature is an unordered set of tokens This is trivially found by taking the union of all tokens from each sample in the flow pool 11/11/10 14

Polygraph Single Signature Generation (Token-subsequence) Token-subsequence is an ordered list of tokens Need to find a token-subsequence which is present in every sample in the flow pool These token subsequences may be represented as regular expressions Example: Comparing two strings o x n x e x z x t w o x y t w o y o y n y e y z Which is better?.*o.*n.*e.*z.*.*two.* 11/11/10 15

Polygraph Single Signature Generation (Token-subsequence) Use Smith-Waterman algorithm to find best subsequence (best meaning that it has a smaller chance of producing false positives) Each subsequence is given a score: 1 for a match Subtract W g for every gap (.*) not including leading and trailing gaps (W g = 0.8) Higher score means better subsequence.*o.*n.*e.*z.* = 4-3(0.8) = 1.6.*two.* = 3-0(0.8) = 3 The algorithm is applied iteratively to the entire pool of samples 11/11/10 16

Polygraph Single Signature Generation (Bayes) Bayes provides a probabilistic matching Given probability distributions for sets of tokens corresponding to worm flows and innocuous flows, a sample flow can be classified as either a worm or innocuous by calculating which distribution its token set is more likely to have been generated by. Worms P(W S) = x Sample flow Innocuous P(I S) = y 11/11/10 17

Polygraph Multiple Signature Generation Need to generate set of signatures given a suspicious flow pool which may contain multiple worms as well as innocuous flows (noise) The Bayes algorithm needs no modification for this situation Conjunction and token subsequence require clustering Clustering divides the suspicious pool into clusters, each of which contains similar flows A signature is generated for each cluster 11/11/10 18

Polygraph Multiple Signature Generation (Clustering) Cluster quality: Should not be too general (high false positive rate) Should not be too specific (high false negative rate) 11/11/10 19

Polygraph Multiple Signature Generation (Clustering) Hierarchical Clustering method: Start with s flows and s clusters, each with its own (very specific) signature Iteratively merge clusters and generate new (less specific) signatures for each cluster Choose the two clusters to merge by calculating which pair gives the lowest false positive rate when compared against the innocuous pool Merging stops when only one cluster remains OR no two clusters may be merged without resulting in a signature which gives an unacceptably high false positive rate The signatures from each cluster(with enough samples) form the set of signatures 11/11/10 20

Evaluation Experimental Setup Evaluate each signature generation algorithm where the suspicious flow pool contains: 1. One worm / No innocuous flows 2. One worm / Some innocuous flows 3. Multiple worms / Some innocuous flows Token-extraction threshold K = 3 Minimum token length α = 2 Minimum cluster size is 3 5 independent trials for each experiment 11/11/10 21

Evaluation Experimental Setup Generate signatures for polymorphic versions of three real-world exploits: 1. Apache-Knacker (HTTP) 2. ATPhttpd (HTTP) 3. BIND-TSIG (DNS) Simulate ideal polymorphic engine by filling wildcard and code bytes with randomly chosen data 11/11/10 22

Evaluation Experimental Setup HTTP experiments 5-day network trace (45,111 flows) as innocuous pool 10-day trace (125,301) as evaluation trace Evaluation trace used to measure false positive rate of signatures Also, noise was drawn at random from the evaluation trace DNS experiments 24 hour DNS trace First 500,000 flows as innocuous Last 1,000,000 flows as evaluation trace 11/11/10 23

Evaluation Experiment 1. One worm / No innocuous flows 11/11/10 24

Evaluation Experiment 1. One worm / No innocuous flows 11/11/10 25

Evaluation Experiment 1. One worm / No innocuous flows Experiment Summary: Conjunction and token subsequence exhibit significantly fewer false positives when compared to a single substring (longest / best) Token subsequence has lower false positive rate than conjunction due to ordering property Correct signature generation requires suspicious pool size to be greater than 2 (otherwise signatures are too specific) 11/11/10 26

Evaluation Experiment 2. One worm / Some innocuous flows 11/11/10 27

Evaluation Experiment 2. One worm / Some innocuous flows Experiment Summary: Polygraph generates a signature for the 5 worm flows which are correct and no signatures are generated for the innocuous traffic Bayes performs well until noise reaches 80% of suspicious flow pool (may be adjusted for high noise ratios) With HTTP exploits, when sufficient noise is present signatures are created for clusters of some of the innocuous flows (noise which is significantly different from that in the innocuous pool) 11/11/10 28

Evaluation Experiment 3. Multiple Worms / Some Innocuous Flows 11/11/10 29

Evaluation Experiment 3. Multiple Worms / Some Innocuous Flows Experiment Summary: Similar to Experiment 2 Conjunction and token subsequence signatures were generated for each of the polymorphic worms Bayes generated a single signature for both worms No false positives until noise ratio is high 11/11/10 30

Attack Analysis Polygraph-specific attacks Coincidental-pattern attack: Attacker's polymorphic engine chooses wildcard bytes from a smaller distribution This causes coincidental substrings to be present which do not occur in every sample of the worm 11/11/10 31

Attack Analysis Polygraph-specific attacks Red herring attack: Worm initially uses fixed tokens as wildcard bytes After some time, this value is changed causing previously generated signatures to no longer match Bayes is resilient to this type of attack Innocuous pool poisoning attack: Attacker creates innocuous flows which the signature for a polymorphic worm will match This will cause the signature to cause a high false positive rate If using clustering, the signature may not be generated to match this worm Otherwise, polygraph may decide that the signature is too general Mitigate by using older innocuous data or distributed innocuous pools 11/11/10 32

Attack Analysis Polygraph-specific attacks Long-tail attack: Because networks flows are only revealed one packet at a time, it may be possible for a network exploit to occur before the signature can be matched Especially for token subsequence, because of its order requirement, this attack may be mitigated by using only enough of the signature to ensure a low false positive rate Also, buffering the streams until it can be determined whether the flow is a worm or not would fix this problem 11/11/10 33

Related Work Signature-based Intrusion Detection Systems (SBSs) Pattern-based Honeycomb Autograph Earlybird Polygraph Semantic-based TaintCheck motif-finding (from computational biology) 11/11/10 34

Related Work Signature-based Intrusion Detection Systems (SBSs) Honeycomb uses longest common substring (LCS) algorithm generates signatures from honeypot traffic (i.e. malicious) Source: [4] 11/11/10 35

Related Work Signature-based Intrusion Detection Systems (SBSs) TaintCheck (semantic-based) runs programs to be monitored on an emulator monitors data from the attack payload at the processorinstruction level detects format string attacks detects overwrite attacks works without source code of monitored program (e.g. commercial software) may be used to verify the quality of other automated signatures 11/11/10 36

Conclusion Polygraph generates quality signatures automatically Three classes of signatures Conjunction Token subsequence Bayes Which should be used? All have advantages and disadvantages No one algorithm is superior for every worm Best approach is to use all three and use the best signature (fewest false positives and false negatives) 11/11/10 37

References [1] J. Newsome, B. Karp, D. Song. Polygraph: Automatically Generating Signatures for Polymorphic Worms. In 2005 IEEE Symposium on Security and Privacy. 23 May 2005. http://www.ece.cmu.edu/~dawnsong/papers/polygraph.pdf [2] Microsoft Security. What is a computer worm? http://www.microsoft.com/security/worms/whatis.aspx [3] Tony Bradley. Introduction to Intrusion Detection Systems (IDS). http://netsecurity.about.com/cs/hackertools/a/aa030504.htm [4] C. Kreibich, J. Crowcroft. Honeycomb: Creating Intrusion Detection Signatures Using Honeypots. http://www.icir.org/christian/publications/honeycomb-hotnetsii.pdf [5] J. Newsome, D. Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. http://valgrind.org/docs/newsome2005.pdf 11/11/10 38

Questions? 11/11/10 39

!!!Beat Florida!!! 11/11/10 40