Research on outlier intrusion detection technologybased on data mining

Similar documents
A new way to optimize LDPC code in gaussian multiple access channel

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Fast K-nearest neighbors searching algorithms for point clouds data of 3D scanning system 1

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Detection and Deletion of Outliers from Large Datasets

Integration of information security and network data mining technology in the era of big data

Low power consumption of smart phone reading mode enabled with electrophoretic electronic display

Distance-based Outlier Detection: Consolidation and Renewed Bearing

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

A Cloud Based Intrusion Detection System Using BPN Classifier

Nonparametric Importance Sampling for Big Data

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

Time Series Classification in Dissimilarity Spaces

Mining N-most Interesting Itemsets. Ada Wai-chee Fu Renfrew Wang-wai Kwong Jian Tang. fadafu,

Research on time optimal trajectory planning of 7-DOF manipulator based on genetic algorithm

Edge detection based on single layer CNN simulator using RK6(4)

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

Obtaining Rough Set Approximation using MapReduce Technique in Data Mining

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

Hybrid Feature Selection for Modeling Intrusion Detection Systems

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

Chapter 1, Introduction

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

A multi-step attack-correlation method with privacy protection

The Detection Method of Website Administrator Account Password Brute-Force Attack by Flow Characteristics

FPSMining: A Fast Algorithm for Mining User Preferences in Data Streams

Anomaly Detection on Data Streams with High Dimensional Data Environment

Object Modeling from Multiple Images Using Genetic Algorithms. Hideo SAITO and Masayuki MORI. Department of Electrical Engineering, Keio University

Individualized Error Estimation for Classification and Regression Models

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

Suitability analysis in contactor s panel data with interval & real numbers

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process

Noval Stream Data Mining Framework under the Background of Big Data

Towards New Heterogeneous Data Stream Clustering based on Density

An Approach for Privacy Preserving in Association Rule Mining Using Data Restriction

Parallelization of K-Means Clustering Algorithm for Data Mining

Iteration Reduction K Means Clustering Algorithm

A Population-Based Learning Algorithm Which Learns Both. Architectures and Weights of Neural Networks y. Yong Liu and Xin Yao

Open Access Research on the Data Pre-Processing in the Network Abnormal Intrusion Detection

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

Role of big data in classification and novel class detection in data streams

An Enhanced K-Medoid Clustering Algorithm

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

CLASSIFICATION FOR SCALING METHODS IN DATA MINING

Exploratory Analysis: Clustering

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc

Dierencegraph - A ProM Plugin for Calculating and Visualizing Dierences between Processes

Deep Tensor: Eliciting New Insights from Graph Data that Express Relationships between People and Things

Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing

Comparative Study of Subspace Clustering Algorithms

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Chapter S:II. II. Search Space Representation

KBSVM: KMeans-based SVM for Business Intelligence

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Chapter 2 Basic Structure of High-Dimensional Spaces

A Data Classification Algorithm of Internet of Things Based on Neural Network

Energy efficient optimization method for green data center based on cloud computing

Sequences Modeling and Analysis Based on Complex Network

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Prediction of traffic flow based on the EMD and wavelet neural network Teng Feng 1,a,Xiaohong Wang 1,b,Yunlai He 1,c

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

Contents. Preface to the Second Edition

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Method to Study and Analyze Fraud Ranking In Mobile Apps

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in

Explore Co-clustering on Job Applications. Qingyun Wan SUNet ID:qywan

H-D and Subspace Clustering of Paradoxical High Dimensional Clinical Datasets with Dimension Reduction Techniques A Model

Algorithm research of 3D point cloud registration based on iterative closest point 1

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Research Article A Novel Steganalytic Algorithm based on III Level DWT with Energy as Feature

Normalization based K means Clustering Algorithm

Concept Tree Based Clustering Visualization with Shaded Similarity Matrices

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

International Journal of Advance Research in Computer Science and Management Studies

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Metric and Identification of Spatial Objects Based on Data Fields

Fuzzy Entropy based feature selection for classification of hyperspectral data

Ecient Processor Allocation for 3D Tori. Wenjian Qiao and Lionel M. Ni. Department of Computer Science. Michigan State University

Summary. Machine Learning: Introduction. Marcin Sydow

K- Nearest Neighbors(KNN) And Predictive Accuracy

A Method Based on RBF-DDA Neural Networks for Improving Novelty Detection in Time Series

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

Ke Wang and Soon Hock William Tay and Bing Liu. National University of Singapore. or rules are nearly as specic as the data itself.

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

Transcription:

Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development of information technology, network security issues become increasingly prominent. Intrusion detection technology is a detection technology for all kinds of network attacks. The intrusion detection system needs to adapt to high-dimensional and massive network trac. In order to solve this problem, this paper proposes an outlier mining algorithm based on attribute correlation and outlier probability to detect intrusion behavior, and improve the performance of intrusion detection system by combining data mining technology with intrusion detection technology. Finally, the experiment shows that the algorithm can eectively nd the intrusion behavior, and improve the performance of intrusion detection system. Key words. Data mining, outlier, intrusion detection. 1. Introduction With the rapid development of computer network technology, computer network is widely used in various industries and elds. The computer brings great convenience to people's production and life. However, because the computer network has information sharing, terminal distribution and network open and other functional characteristics, there are many security risks. Therefore, the research on network information security technology has become a very important research topic of information security technologies in the elds such as computer communication. 2. Basic principle of outlier detection 2.1. Outlier mining technology Data mining [1] refers to the non-trivial process of discovering data that is not known and inherently useful from large amounts of data in the database. Among 1 Workshop 1 - College of Computer Science, Chongqing University, Chongqing400044,China 2 Workshop 2 - School of Information Engineering, ChongqingIndustry Polytechnic College, Chongqing401120,China; e-mail: jsly2001@sina.com http://journal.it.cas.cz

636 LIANG ZHU them, the outlier mining technology is a key mining technology in the data mining industry, which is included in the basic requirements of data mining. The main task requirement is to discover unique data information in the specied data set. The unique data here is outlier data. Outlier [2 3] is an object that obviously deviates from the observed value of most objects, which is thought to be generated by a dierent mechanism. 2.2. Intrusion detection technology Intrusion detection [4] is a technology developed in recent years. It is a security mechanism that can dynamically monitor, predict and defend against system intrusion. Through the dynamic monitoringof the operation of the computer network, system architecture and other aspects to identify the means and purpose of intrusion. Intrusion detection has a variety of features, including real-time monitoring, intelligent control, dynamic response and conguration convenience. The main goal is through dealing with the network behavior characteristics or the system log to distinguish the user behavior those who violate the security policy or threat to the system security, so as to ensure that the system resources are not abused, to prevent the omission of system data, being articially modied and wantonly destructed. Intrusion Detection System [5] (IDS) takes intrusion detection as a new security protection, based on the overall characteristics of the target, the system objectives, audit documents, abnormal behavior and activity records. There are several dierent classication methods for intrusion detection systems from dierent perspectives. 3. Application and Implementation of Outlier Detection Technology 3.1. Algorithm of outlier intrusion detection technology An outlier mining algorithm based on outliers [6 7] uses an outlier to represent a data point, and the outlier probability is a number from 0 to 1. For any data set, the maximum and minimum values of the outlier are xed. The algorithm rst analyzes the attribute relevance of the data set and obtains the relevant information of the attribute set. And then the attribute reduction is carried out according to the relevant information of the attribute set, and the optimal attribute subset which can keep the original information of the data set as much as possible is obtained. Finally, the outlier probability of the data set is calculated on the optimal attribute subset. Select the outlier data according to outlier probability, that is, the nal result required. The ow chart of the implementation of specic algorithms is shown in Figure 1. It can be seen that the algorithm optimizes the attribute set by attribute reduction, and obtains the optimal attribute subset. Therefore, the algorithm is suitable for high-dimensional data sets. In addition, the algorithm uses the outlier probability to select the outlier data, which makes the algorithm keep a uniform standard on dierent data sets, so the algorithm can adapt well to the vast majority of data

THERMAL EFFECT ON VIBRATION 637 Fig. 1. Flow chart of algorithm implementation sets. For a given attribute A i A j A the attribute relevance betweena i and A j are described through the formula (1): γ A i A j = Wherein A is the set of attributes, N is the number of rows of matrix Z n*d. When there is zki=zkj in matrix Z n*d, Pk = 0, otherwise Pk = 1. It can be seen that the higher the γ (Ai, Aj) value, the stronger the correlation between Ai and Aj. For a given threshold ε, if the correlation between any attribute in the matrix Z (n*d) and the actual data is greater than a given threshold value ε, it is assumed that the attribute is redundant and needs to be deleted from the attribute set. After using the above inference for each attribute in the attribute set, the attribute reduction of the algorithm can get the optimal attribute subset. N k 1 P k N (1) b pq = a qp n k=1 a qp (2)

638 LIANG ZHU Therefore, the probability that the data point xp belongs to the outlier dataset C0 is expressed as follows: P x p C 0 = 1 b qp (3) qp After adding the weight, the algorithm can more accurately calculate the dissimilarity between two data points. Because in the calculation of the dissimilarity of two data points the dierent attributes will account for dierent proportions according to their dierent importance. The attribute with a larger eect on the dissimilarity values between data points will account for a larger proportion when calculating dissimilarity, whereas the attribute with a smaller eect on dissimilarity between data points will take less when calculating dissimilarity proportion. 3.2. Algorithm Implementation and Feasibility Analysis In order to evaluate the intrusion detection system, two common performance parameters in the intrusion detection system are discussed: data error rate and data leakage rate. The data error rate corresponds to the case where the normal data is seen as outlier data in the outlier mining algorithm, and the data omission rate corresponds to the outlier data hidden in the normal data in the outlier mining algorithm. Therefore, when the outlier mining algorithm is applied to the intrusion detection system, the ratio of these two cases needs to be considered. In addition, the rst two stages of the algorithm mainly deal with the attribute set of the data, analyze the relevance of the data set, reduce the attribute set, and delete the redundant attributes. The intrusion detection system in the data collection of many attributes, attribute set has a lot of redundant attributes. After the rst two stages of the algorithm, the data stream will preserve the main attributes, which is very benecial to the detection of intrusion data. At the same time, because the attribute reduction of the algorithm does not need to be running all the time, the algorithm can run the rst two stages by extracting small sample, and then run the third stage in practice to complete the intrusion data detection. This will be very eective for intrusion detection systems which has a high requirement on real-time performance. 4. Experimental testing and results analysis 4.1. Design and implementation process of the experiment The attributes in the data set are of both numeric and parameter symbol types. For the attribute in the parameter symbol type, this paper rst carries on the numerical correspondence to the data of the parameter symbol type, then carries on the standardized calculation to the mapped data. For attributes that are of continuous numeric type, discrete analysis is needed. In addition, the measurement units of the selected attributes are dierent, and the results are very obvious in the processing of

THERMAL EFFECT ON VIBRATION 639 the data. In order to facilitate the analysis, this paper uses the normalized interval to measure the dierence between the dierent units. 4.2. Analysis of experimental results In verication experiment of algorithm in the intrusion detection, two indicators including data detection rate (Detection Rate) and data false rate (False Rate) are selected to represent the eciency of the implementation of the algorithm. Compare other algorithms to verify the eciency of the algorithm. The data detection rate is the ratio of the number of true outliers detected by the algorithm to the total number of outliers in the data set, ie the data detection rate (DR) = number of true outliers detected (D) / total number of outliers (N). Data false rate (FR) is the ratio of the number of the normal data which is mistaken as outliers to number of the normal data in the data set, ie the data error rate (FR) = number of the normal data which is mistaken as outliers (F) / number of the normal data (N) * 100%. Table 1. Experimental results of several algorithms Algorithm Name DR % FR % Algorithm This Paper of 96.2 4.57 EOS 95.1 3.96 SPOD 94.3 4.21 OMBGI 95.6 4.02 ENBROD 93.9 3.92 Fig. 2. Algorithm running time comparison diagram It can be seen from the gure that when the data set is small, the running time of each algorithm is negligible. With the increase of the scale of the data set, the running time of the algorithm is also increasing, and the increasing trend is also increasing. The algorithm proposed in this paper can meet the requirements of intrusion detection and can be eectively applied to the actual intrusion detection system to analyze and process the network data.

640 LIANG ZHU 5. Conclusion This paper analyzes the outlier mining technology and intrusion detection technology. It is considered that the data to be processed by the current intrusion detection system is massive and of high dimension with the increasing of the amount of network dat. Therefore, this paper proposes an outlier data mining algorithm based on attribute correlation and outlier probability. The three steps of algorithm implementation are described in detail, and the application and feasibility of the algorithm in intrusion detection system are analyzed. References [1] G. H. Orair, D. C. Gupta, W. Meira: Distance-based outlier detection: consolidation and renewed bearing. Proceedings of the Vldb Endowment 3 (2010), Nos. 12, 14691480. [2] J. S. Tomar, A. K. Gupta: Survey on Outlier Detection in Data Stream. International Journal of Computer Applications 98 (1985), No. 2, 257262. [3] R. H. Gutierrez, P. A. A. Laura: Online bad data detection using kernel density estimation. Power & Energy Society General Meeting. IEEE 18 (1985), No. 3, 171 180. [4] R. P. Singh, S. K. Jain: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 7 (2004), No. 1, 4152. [5] M. N. Gaikwad, K. C. Deshmukh: TExploiting Active Sub-areas for Multi-copy Routing in VDTNs,. Applied Mathematical Modelling 29 (2005), No. 9, 797804. [6] S. Chakraverty, R. Jindal, V. K. Agarwal: Exploiting Active Sub-areas for Multicopy Routing in VDTNs. Indian Journal of Engineering and Materials Sciences 12 (2005) 521528. [7] N. L. Khobragade, K. C. Deshmukh: A Time-Ecient Connected Densest Subgraph Discovery Algorithm for Big Data. Sadhana 30 (2005), No. 4, 555563. [8] Y. F. Zhou, Z. M. Wang: Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles. J Sound and Vibration 316 (2008), Nos. 15, 198210. Received November 16, 2016