A multi-step attack-correlation method with privacy protection

Similar documents
Developing the Sensor Capability in Cyber Security

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Data Mining Part 3. Associations Rules

Mining Temporal Association Rules in Network Traffic Data

Data Structures. Notes for Lecture 14 Techniques of Data Mining By Samaher Hussein Ali Association Rules: Basic Concepts and Application

A Flexible Approach to Intrusion Alert Anonymization and Correlation

Integration of information security and network data mining technology in the era of big data

Research and Improvement of Apriori Algorithm Based on Hadoop

A New Technique to Optimize User s Browsing Session using Data Mining

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

EFFICIENT TRANSACTION REDUCTION IN ACTIONABLE PATTERN MINING FOR HIGH VOLUMINOUS DATASETS BASED ON BITMAP AND CLASS LABELS

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

Pattern Discovery Using Apriori and Ch-Search Algorithm

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

Web Service Usage Mining: Mining For Executable Sequences

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Managed Security Services - Automated Analysis, Threat Analyst Monitoring and Notification

Pattern Mining in Frequent Dynamic Subgraphs

/08/$ IEEE 630

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

A Rule-Based Intrusion Alert Correlation System for Integrated Security Management *

Performance Based Study of Association Rule Algorithms On Voter DB

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

A Framework for Securing Databases from Intrusion Threats

CHAPTER-5 A HASH BASED APPROACH FOR FREQUENT PATTERN MINING TO IDENTIFY SUSPICIOUS FINANCIAL PATTERNS

CS 5540 Spring 2013 Assignment 3, v1.0 Due: Apr. 24th 11:59PM

Tutorial on Association Rule Mining

Design and Simulation Implementation of an Improved PPM Approach

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

A Criminal Intrudes into a Bank in Geneva Korean agents. Canadian agents make the arrest. Argentinian investigators. discover. attack came from Seoul

Minimal Test Cost Feature Selection with Positive Region Constraint

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

Remotely Sensed Image Processing Service Automatic Composition

Generating Cross level Rules: An automated approach

An advanced data leakage detection system analyzing relations between data leak activity

Research on outlier intrusion detection technologybased on data mining

Extracting Attack Scenarios Using Intrusion Semantics

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

On Reduct Construction Algorithms

CHAPTER 3 ASSOCIATION RULE MINING WITH LEVELWISE AUTOMATIC SUPPORT THRESHOLDS

Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

Analytical Techniques for Anomaly Detection Through Features, Signal-Noise Separation and Partial-Value Association

Research and Application of E-Commerce Recommendation System Based on Association Rules Algorithm

Multivariate Correlation Analysis based detection of DOS with Tracebacking

An Improved Apriori Algorithm for Association Rules

Introduction to Security

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Improved Frequent Pattern Mining Algorithm with Indexing

Efficiently Mining Positive Correlation Rules

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

A Modified Apriori Algorithm

A Pandect on Association Rule Hiding Techniques

CE Advanced Network Security

Keywords: Parallel Algorithm; Sequence; Data Mining; Frequent Pattern; sequential Pattern; bitmap presented. I. INTRODUCTION

1 Linear programming relaxation

CYBER ANALYTICS. Architecture Overview. Technical Brief. May 2016 novetta.com 2016, Novetta

A Survey of Sequential Pattern Mining

Mining Quantitative Association Rules on Overlapped Intervals

Mapping Internet Sensors with Probe Response Attacks

4.1 Interval Scheduling

Data Mining Techniques

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Improved Apriori Algorithms- A Survey

TKS: Efficient Mining of Top-K Sequential Patterns

Security and Compliance Powered by the Cloud. Ben Friedman / Strategic Accounts Director /

Chapter 4: Mining Frequent Patterns, Associations and Correlations

A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets

Source Anonymous Message Authentication and Source Privacy using ECC in Wireless Sensor Network

Mapping Internet Sensors with Probe Response Attacks

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

ALIENVAULT USM FOR AWS SOLUTION GUIDE

ADAPTING QUERY OPTIMIZATION TECHNIQUES FOR EFFICIENT ALERT CORRELATION*

AN ADAPTIVE PATTERN GENERATION IN SEQUENTIAL CLASSIFICATION USING FIREFLY ALGORITHM

NON-CENTRALIZED DISTINCT L-DIVERSITY

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

620 HUANG Liusheng, CHEN Huaping et al. Vol.15 this itemset. Itemsets that have minimum support (minsup) are called large itemsets, and all the others

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Chapter 4: Association analysis:

Value Added Association Rules

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Symantec Endpoint Protection Integration Component User's Guide. Version 7.0

Hillstone T-Series Intelligent Next-Generation Firewall Whitepaper: Abnormal Behavior Analysis

Data Access Paths for Frequent Itemsets Discovery

Bayesian Learning Networks Approach to Cybercrime Detection

CSE 565 Computer Security Fall 2018

An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams *

FP-Growth algorithm in Data Compression frequent patterns

Data Mining Part 5. Prediction

USING FREQUENT PATTERN MINING ALGORITHMS IN TEXT ANALYSIS

Intrusion Detection System

IBM Proventia Management SiteProtector Policies and Responses Configuration Guide

Association Rules. Berlin Chen References:

Temporal Weighted Association Rule Mining for Classification

CHAPTER V ADAPTIVE ASSOCIATION RULE MINING ALGORITHM. Please purchase PDF Split-Merge on to remove this watermark.

Transcription:

A multi-step attack-correlation method with privacy protection Research paper A multi-step attack-correlation method with privacy protection ZHANG Yongtang 1, 2, LUO Xianlu 1, LUO Haibo 1 1. Department of Computer Science and Technology, Guangdong Neusoft Institute, Foshan 528225, China 2. Jiangxi Microsoft Technology Center, Nanchang 330003, China Abstract: In the era of global Internet security threats, there is an urgent need for different organizations to cooperate and jointly fight against cyber attacks. We present an algorithm that combines a privacy-preserving technique and a multi-step attack-correlation method to better balance the privacy and availability of alarm data. This algorithm is used to construct multistep attack scenarios by discovering sequential attack-behavior patterns. It analyzes the time-sequential characteristics of attack behaviors and implements a support-evaluation method. Optimized candidate attack-sequence generation is applied to solve k-anonymity method is applied to this algorithm to preserve privacy. Experimental results indicate that the algorithm has better performance Key words: Citation 1 Introduction In recent years, global security threats, e.g., the Code, My Doom This creates an urgent need for mutual cooperation and share security-incident alarm data for security privacy protection, it is easy to be the attacker, lead to of an individual or organization, excluding public privacy preserves sensitive information that the including characteristics of the sensitive data and the data representation. Normally, privacy refers to records, etc. At present, most of the multi-step attack-correlation methods are carried out based on original police organizations fail to provide alarm data because of privacy protection, the multi-step attack alarm-

134 Journal of Communications and Information Networks data association results may be affected because of the lack of accurate information. Therefore, effectively associating privacy-protection technology requirements. This significantly affects research, Since 2000, research on the multi-step attack correlation analysis of alarm data has gradually increased, and achieved a number of results. Mange F proposed proposed consequences to construct attack scenarios. Debar H, et al. used similar methods in the describe an attack, and automatically generated association rules according to the description of the prerequisites and consequences. Through the association rules, they discovered a relationship This type of method uses the causal relationship of is very complex, the design requirement is very high, to find the cause-and-effect relationship, there is a defect. In 2009, Wu B attack-scene fragments, and then connect the attack this method relies heavily on domain and expert summarized the of multi-step attack-correlation methods. He proposed attack sequence, he extracts the largest sequential attack-behavior pattern. In recent years, the multi-step attack the correlation research in the field of intrusion detection and prevention research is a hot topic. Ringer H A to study the alarm data, and found the alarm data sharing may encounter attack. He suggested to use a hash function or anonymous hash function of sensitive alarm properties, to a certain extent, the protection of alarm-attribute value, this method can easily be abused. proposed using a hierarchy to anonymize sensitive attribute values to protect the privacy of sensitive data. This method introduces the concept of discrete generation measure. By modifying the similarity function, it redefines the relationship, based on the attack probability, and applies privacy protection methods to the alarm data after the alarm correlation. the original similarity function should be modified for different sensitivity properties. If the original similarity function of a sensitive attribute is more complex, then modifying the attribute-similarity redefining the relationship based on the attack increasing the number of false alarms in the attack scene graph. multi-step attack methods has a long history in its specific application, choosing appropriate privacy- information, and providing good information support

A multi-step attack-correlation method with privacy protection 135 for information sharing, have become important research topics for many scholars and experts. algorithm is proposed in this paper to discover sequential attack-behavior patterns in a protected alert set. By attempting to determine the sequences for global multi-step attacks, this method can be take proper actions to efficiently decrease the impact of such intrusions. With an affordable information in the alert set. 2 PPMAC briefing illustrated in Fig.1. engine, a secure third-party multi-step attack correlation the function units of the privacy-preserving agent and monitor. domains are stored in the intrusion event database, The data cleaner is used to reduce the number and enhance the quality of these alerts. After sanitizing the refined data received from the data cleaner, the privacy-preserving agent submits the data to the Finally, the monitor receives the correlation results of generates a global sequence of attack behaviors are mapped onto the attack-signature identifiers of PPMAC engine domain 1 privacy-preserving agent domain 2 domain n protected alerts for multi-step attack correlation global attack sequence generator intrusion event DB 1 intrusion event DB 2 intrusion event DB n load raw alerts load raw alerts load raw alerts candidate attack sequence DB data cleaner 1 data cleaner 2 monitor data cleaner n sequential patterns of multi-step attacks as feedback maximal attack behavior set handler maximal attack sequence miner Figure 1

136 Journal of Communications and Information Networks Sequence DB stores the selected candidate attack Maximal Attack-Behavior-Set Handler produces all Attack-Sequence Miner produces multi-step attack scenarios from the protected alert set based on attackbehavior sequence analysis. The mining results can be to defend against complex intrusions. Considering the different sensitivity levels of various attributes in the alert, this paper improves the k-anonymity method, a classical privacy- as much as possible during the process of attribute anonymization. 3 PPMAC algorithms The method of mining sequential attack-behavior patterns is based on the concept that the various multi- logically perform a certain action before carrying explicitly illustrates the general character of multi-step attacks. Consequently, sequential-alert pattern analysis can be used to correlate various suspicious activities attacks. Definition 1 Attribute vector: The attribute vector of each intrusion alert is defined as: {date, start- al = {at 1,, at 8 }. Alert set: These are triggered by IDSs are detected. Let AL number of attribute vectors al. Definition 3 k-anonymity: Let sd = {sd 1,,sd i }, 0<i 8 be a vector of selected attributes from al are the sensitive items to be protected, sd al. SD is the set of sd, as a subset extracted from AL, i.e., SD AL. AL satisfies k-anonymity if and only if each vector in SD k occurrences. Step 1 improved k-anonymity method. One of the major privacy-preserving methods generally uses a quasi-identifier to support judging k-anonymity properties are met. When preserving privacy in intrusion alerts, the sensitive The k-anonymity method is applied to selected sensitive attributes in SD remain unchanged. The discrete values of sensitive alert attributes in AL are replaced by their generalized security policy. For example, the original destination- mask. Uncertainty is introduced into the alert set, but the alert regulations are kept for mining. The generalized alert set AL G, as the output of the engine for multi-step attack-correlation analysis. Multi-step attack-correlation algorithm. This step discovers maximal attack sequences, based on an Apriori-like theory. attack-type strings and a series of consecutive integers is

A multi-step attack-correlation method with privacy protection 137 type alerts in AL G are converted to integers according to the mapping rule. Attack set S {s 1, s 2,, s g }: The set of attack-signature identifiers according to the attack types reported by the IDSs. Alert attributes generally have some relationships among the different steps of a multi-step attack All alerts in AL G are classified into groups by recognize single-hop and multi-hop attacks. mapping and grouping, the global attack-sequence generator generates a global attack sequence. Definition 5 Attack sequence <a 1, a 2,, a n >: A a i i n S. We sort all alerts in each group by ascending starttime after the mapping and grouping procedure to organize the global attack sequences by groups. attack scenario is a collection of intrusion events Therefore, an iterative process is designed to retrieve sequence. After applying the process for each global sequence, the attack steps that fall into the same sequence. The candidate attack-sequence set is obtained by combining the attack sequences, and then is stored in the sequence matrix database. The set of candidate attack sequences is denoted as CAS, and the candidate attack sequences in CAS are denoted as cas p, 1 p TS TS is the total number of candidate attack sequences. Definition 6 sequences A=<a 1, a 2,, a n > and B=<b 1, b 2,, b m > m n a 1 =b 1 and A is a subsequence of B A can be obtained by deleting some data from B A is contained in sequence B, or sequence B supports sequence A, denoted as B A. In a set of attack sequences, an attack sequence is maximal if it is not contained in any other sequences. The key target of mining sequential attack behavior patterns is to find the maximal attack sequences among all candidate attack sequences by group. represent an attack scenario. The support-evaluation method is applied to determine the maximal attack sequences. Definition 7 Support of attack sequence SUP AS : sequences supporting attack sequence A, denoted as CS, and the total number of candidate attack sequences, denoted as TS, is given: Support of attack behavior SUP AB : The attack behavior a i in the global attack sequence, denoted as AB, and the number of total behaviors contained in this global attack sequence, denoted as TB Attack behaviors that satisfy the predefined min_sup maximal attack behaviors. The procedure to obtain the maximal attack behavior set is also the process to produce an attack

138 Journal of Communications and Information Networks denoted as L 1. The maximal attack-behavior set handler produces L 1 total number of maximal attack behaviors is denoted as TA. generated by joining the maximal attack-sequence Algorithm 1 L 1 for //each attack behavior a i in attack set S If SUP AB a i min_sup then a i belongs to L 1 end if end for elements in the maximal attack sequences must be the ones in the maximal attack-behavior set. maximal attack-sequence miner looks for small attack scenarios, and finds progressively larger ones. This a candidate maximal attack-sequence set C y and finding a corresponding maximal attack-sequence set L y y denotes the length or the number of attack behaviors in the maximal attack sequence. derived from L y to C y+1. It discovers all the maximal attack sequences of L y in the given intrusion-event sequences. L y 1 The candidate maximal attack-sequence set C y is C y is denoted as C y and the candidate maximal attack sequences in C y are denoted as C y q q C y top to bottom and read one attack sequence each time. The support count of each candidate maximal attack sequence in C y in the attack sequence. We obtain the maximal attack sequences of L y by deleting those candidate min_ sup. The pseudo code for retrieving L y Algorithm 2.

A multi-step attack-correlation method with privacy protection 139 L y for //each candidate attack sequence CAS p CAS for //each candidate maximal attack sequence C y q in C y if CAS p C y q then SUP AS C y q end if end for end for for //each candidate maximal attack sequence C y q C y if SUP AS C y q min_sup then C y q L y end if end for datasets, one of the most authoritative attack-scenario datasets from the DEF CON organization. prevention and detection system. Tab.1 summarizes Table 1 type of intrusion alert number of alerts before correlation 150 218 5 in C y, this candidate attack sequence is demonstrated fragmentation overlap 91 to not support any sequences in C y. The sequence SCAN FIN is then flagged in the candidate attack sequence set process for producing L y+1. This pruning technique to reduce the number of candidate sequences in the subsequent passes. be derived from L y to C y+1. All the maximal attack sequences discovered by domains through the monitor. 4 Experimental results This section includes the validation of the effectiveness DDOS mstream client to handler behavior sequences. We collected the original experimental data from an experimental data set containing 5 290 intrusion alarm records. Tab intrusion-alarm data after the original intrusion alarm and privacy protection. In the experiment, destination- topology and the services running on it. These

140 Journal of Communications and Information Networks k protocol source-port k destination-port destination-port attack-type 10.10.1.20 8 080 10.10.*.* * * 110.121.50.119 * 123.122.12.120 10.10.*.* * agent according to the improved k-anonymity method k = 2. algorithm, as AprioriAll algorithm discovered too in exhausting computation. Table 3k comparison items AprioriAll number of real attack scenarios number of discovered attack scenarios 52 number of correct attack scenarios 38 39 false positive ratio false negative ratio the correct multi-step attack scenarios on the protected attack scenarios after the privacy preserving process. Because the attack type attributes are fully involved in the process of discovering attack behavior sequences. to our novel method of producing candidate maximal attack sequences. (a) runtime/minutes (b) runtime/minutes 7 6 5 4 3 2 1 0 14 12 10 8 6 4 2 0 PPMAC W=2 h GSP AprioriAll 1.00 0.75 0.50 0.33 0.25 minimum support/% PPMAC W=6 h GSP AprioriAll 1.00 0.75 0.50 0.33 0.25 minimum support/%

A multi-step attack-correlation method with privacy protection 141 W is smaller and min_sup the sensitive attributes in the original alerts. With the decreasing min_sup, more maximal attack of producing candidate maximal attack-sequence sets. decreasing min_sup. maximal attack sequences in the experimental dataset, leads to extending the length of the candidate attack effort of both factors should be held liable for the rapid increase in processing time. runtime/minutes 10 9 8 7 6 5 4 3 2 1 0 min_sup=1% min_sup=0.33% min_sup=0.25% 2 4 6 8 10 12 attack scenario time window Figure 3 The effectiveness of the number of alerts versus the cost is roughly linear, despite the different ranges of alert numbers. runtime/minutes 18 16 14 12 10 8 6 4 2 0 time window=2 time window=8 time window=10 500 5 000 10 000 number of alerts minimum support 5 Conclusions and future work Sequential pattern-mining techniques are considered an essential part of the alert correlation and analysis sensitive-information disclosure during the process. performance and accuracy in recognizing multi-step attack scenarios. developments in the corresponding area, and make it more feasible to discover hidden multi-hop attacks direction for future research. defined according to off-line analysis of the attack produce candidate attack sequences to discover multistep attack scenarios remains an open question. References

142 Journal of Communications and Information Networks 2011, 200: 101-110. 85-98. 2013, 2820: 91-112. Madison Department of Computer Sciences, 2010. About the authors ZHANG Yongtang and communications engineering in 2005 from the Central China Normal University. He is currently associate professor, and as a systems analyst at the Jiangxi Microsoft Technology Center. At present, his main research interests are communications and LUO Haibo application engineering from Wuhan University. He is currently in Guangdong Neusoft Institute, as a lecturer. His research interests are information- LUO Xianlu degree in computer applications in 2002 from in the Guangdong Neusoft Institute as an associate professor and systems analyst. At present, his main