Using Sub-sequence Information with knn for Classification of Sequential Data

Size: px
Start display at page:

Download "Using Sub-sequence Information with knn for Classification of Sequential Data"

Transcription

1 Using Sub-sequence Information with knn for Classification of Sequential Data Pradeep Kumar,2, M. Venkateswara Rao,2, P. Radha Krishna, and Raju S. Bapi 2 Institute for Development and Research in Banking Technology IDRBT, Castle Hills, Masab Tank, Hyderabad, India Ph No: , Fax No: University of Hyderabad, Gachibowli, Hyderabad, India {pradeepkumar, prkrishna}@idrbt.ac.in, mvrao@mtech.idrbt.ac.in, bapics@uohyd.ernet.in Abstract. With the enormous growth of data, which exhibit sequentiality, it has become important to investigate the impact of embedded sequential information within the data. Sequential data are growing enormously, hence an efficient classification of sequential data is needed. k-nearest Neighbor (knn) has been used and proved to be an efficient classification technique for two-class problems. This paper uses sliding window approach to extract sub-sequences of various lengths and classification using knn. We conducted experiments on DARPA 98 IDS dataset using various distance/similarity measures such as Jaccard similarity, Cosine similarity, Euclidian distance and Binary Weighted Cosine (BWC) measure. Our results demonstrate that sub-sequence information enhances knn classification accuracy for sequential data, irrespective of the distance/similarity metric used. Keywords: Sequence mining, k-nearest Neighbor Classification, Similarity/Distance metric, Intrusion detection. Introduction Data are very vital for a commercial organization. These data are sequential or nonsequential in nature. Sequence mining helps us in discovering formal relations in sequence data. Sequence pattern mining is the mining of frequently occurring patterns related to time or other sequences [7, 5]. An example of the rule that sequence mining algorithm would discover is -- A user who has visited rediff website is likely to visit yahoo website within next five page visits. Sequence mining plays a vital role in domains such as telecommunication records, protein classification, signal processing and intrusion detection. It is important to note that datasets in these problems need not necessarily have inherent temporality [7, 5]. Studies on sequential pattern mining mostly concentrate on symbolic patterns [, 0, 7]. As in symbolic patterns, numerical curve patterns usually belong to the scope of trend analysis and prediction in statistical time series analysis. Many other parameters also influence the results of sequential pattern mining. These parameters include duration of time sequence (T), event folding window (w) and time interval between two events (int). If we assign w as the whole duration T, we get time independent G. Chakraborty (Ed.): ICDCIT 2005, LNCS 386, pp , Springer-Verlag Berlin Heidelberg 2005

2 Using Sub-sequence Information with knn for Classification of Sequential Data 537 frequent patterns. An example of such a rule is In 999, customers who bought PCs also bought digital cameras. If w is set to be, that is, no event sequence folding occurs, then all events are considered to be discrete time events. The rule of the type Customers who bought hard disk and then memory chip are likely to buy CD-Writer later on is example of such a case. If w were set to be something between and T, events occurring between sliding windows of specified length would be considered. An example rule is Sale of PC in the month of April 999 is maximum. Sequential data are growing at a rapid pace. A pre-defined collection of historical data with their observed nature helps in determining the nature of newly arriving data stream and hence will be useful in classification of the new data stream. In data mining, classification algorithms are popularly used for exploring the relationships among various object features at various conditions. Sequence data sets are similar in nature except that they have an additional temporal dimension [22]. Classification algorithms help in predicting future trends as well as extracting a model of important data classes. Many classification algorithms have been proposed by researchers in machine learning [2], expert systems [20], statistics [8]. Classification algorithms have been successfully applied to the problems, where the dependent variable (class variable) depends on non-sequential independent (explanatory) variables [3]. Typical classification algorithms are Support Vector Machines, Decision Trees, Bayesian Classification, Neural Networks, k-nearest Neighbor (knn) and Association Classification. To deal with the sequential information, sequential data are transformed into non-sequential variables. This leads to a loss of sequential information of the data. Although traditional classification is robust and efficient for modeling non-sequential data, they fail to capture sequential information of the dataset. Intrusion detection is the process of monitoring and analyzing the events occurring in a computer system in order to detect signs of security problems [2]. Computer security can be achieved by maintaining audit data. Cryptographic techniques, authentication means and firewalls have gained importance with the advent of new technologies. With the ever-increasing size of audit data logs, it becomes crucial for network administrators and security analysts to use some efficient Intrusion Detection System (IDS), to reduce the monitoring activity. Data mining techniques are useful in providing important contributions to the field of intrusion detection. IDSs based on examining sequences of system calls often define normal behavior of an application by sliding a window of fixed size across a sequence of traces of system calls. System call traces are normally produced with programs like strace on Linux systems and truss on Solaris systems. Several methods have been proposed for storing system calls traces information and to use these for detecting anomalies in an IDS. Forrest et al. [5, 9] stored normal behavior by sliding a window of fixed size L across sequence of system call traces and recorded which system call followed the system call in position 0 at offsets through L-. Liao et al. [2] applied knn classifier with Cosine similarity measure considering frequencies of system calls with sliding window size w =. A similar work with modified similarity measure using a combination of Cosine as well Jaccard has also been carried out in [8]. The central theme of this paper is to investigate that vital information stored in subsequences, plays any role in building a classifier. In this paper, we combine sequence analysis problem with knn classification algorithm, to design an efficient classifier

3 538 P. Kumar et al. for sequential data. Sequence analysis can be categorized into two types, depending on the nature of the treatment. Either we can consider the whole sequence as one or sub-sequences of different sizes. Our hypothesis is that sequence or order of information plays a role in sequence classification. We extracted sequence information from sub-sequences and used this information for building various distance/similarity metrics. With the appropriate distance/similarity metric, a new session is classified using knn classifier. In order to evaluate the efficiency and behavior of the classifier with the encoded vector measures, Receiver Operating Characteristics (ROC) curve is used. Experiments are conducted on DARPA 98 IDS [3] dataset to show the viability of our model. Like other classification algorithms, knn classification algorithm does not make a classifier in advance. Hence, it is suitable for classification of data streams. Whenever a new data stream comes, knn finds the k near neighbors to new data stream from training data set using some distance/similarity metric [4, 6]. knn is the best choice for making a good classifier, when simplicity and accuracy is important issues []. The rest of the paper is organized as follows - Section 2 gives a brief description of the nearest neighbor classification algorithm. In section 3, we briefly discuss about the distance/similarity measures used in the experiments. In section 4, we outline our proposed approach. The Section 5 provides the experimental results on DARPA 98 IDS dataset. Finally, we conclude in section 6. 2 Nearest Neighbor Classification knn classifier are based on learning by analogy. KNN classification algorithm assumes that all instances correspond to points in an n-dimensional space. Nearest neighbors of an instance are described by a distance/similarity measure. When a new sample comes, a knn classifier searches the training dataset for the k closest sample to the new sample using distance/similarity measure for determining the nature of new sample. These k samples are known as the k nearest neighbors of the new sample. The new sample is assigned the most common class of its k nearest neighbors. Nearest neighbor algorithm can be summarized as follows: Begin Training Construct Training sample T from the given dataset D. Classification Given a new sample s to be classified, Let I I k denote the k instances from T that are nearest to new sample s Return the class from k nearest neighbor samples. Returned class is the class of new sample. End In the nearest neighbor model, choice of a suitable distance function and the value of the members of nearest neighbors (k) are very crucial. The k represents the complexity of nearest neighbor model. The model is less adaptive with higher k values [7].

4 Using Sub-sequence Information with knn for Classification of Sequential Data Distance/Similarity Measures Distance/similarity measure plays an important role in classifying or grouping observations in homogeneous groups. In other words, a distance/similarity measure establishes the relationship between the rows of the data matrix. Preliminary information for identifying homogeneous groups is provided by the distance/similarity measure. Between any pair of observations x i and x j function of the corresponding row vector in the data matrix is given by: D ij = f (x i, x j ) where i,j =, 2, 3,,n For an accurate classifier, it is important to formulate a metric to determine whether an event is deemed normal or anomalous. In this section, we briefly discuss various measures such as Jaccard similarity measure, Cosine similarity measure, Euclidian distance measure and BWC measure. We used sub-sequence information with these different measures in knn classifier for cross comparison purpose. 3. Jaccard Similarity Function Jaccard similarity function is used for measuring similarity between binary values [9]. It is defined as the degree of commonality between two sets. It is measured as a ratio of number of common attributes of X AND Y to the number of elements possessed by X OR Y. If X and Y are two distinct sets then the similarity between X and Y is: S(X,Y) = X Y X Y Consider two sets X = M, N, P, Q, R, M, S, Q and Y = P, M, N, Q, M, P, P. X Y is given as M, N, P, Q and X Y is M, N, P, Q, R, S. Thus, the similarity between X and Y is Cosine Similarity Cosine similarity is a common vector based similarity measure. Cosine similarity measure is commonly used in text databases [6]. Cosine similarity metric calculates the angle of difference in direction of two vectors, irrespective of their lengths. Cosine similarity between two vectors X and Y is given by: X Y S(X,Y) = X Y Direct application of Cosine similarity measure is not possible across sets. Sets are first converted into n-dimensional vector space. Over these transformed vectors Cosine similarity measure is applied to find the angular similarity. For two sets, X = M, N, P, Q, R, M, S, Q and Y = P, M, N, Q, M, P, P the equivalent transformed frequency vector is X v = < 2,,,2,,> and Y v = < 2,,3,,0,0 >. The Cosine similarity of the transformed vector is

5 540 P. Kumar et al. 3.3 Euclidean Distance Euclidean distance is a widely used distance measure for vector spaces [6]. For two vectors X and Y in an n- dimensional Euclidean space, it is defined as the square root of the sum of difference of the corresponding dimensions of the vector. Mathematically, it is given as D(X,Y) = n ( ) s s X Y s= Similar, to the Cosine similarity metric, application of Euclidean measure on sets is not possible. Similar approach as used in Cosine similarity measure to transform sets into vector is applicable here also. For two sets, X = M, N, P, Q, R, M, S, Q and Y = P, M, N, Q, M, P, P the equivalent transformed frequency vector is X v = < 2,,,2,,> and Y v = < 2,,3,,0,0 >. The Euclidean measure of the transformed vector is Binary Weighted Cosine (BWC) Metric Rawat et.al.[8] proposed BWC similarity measure for measuring similarity across sequences of system calls. They showed the effectiveness of the proposed measure on IDS. They applied knn classification algorithm with BWC metric measure to enhance the capability of the classifier. BWC similarity measure considers both the number of shared elements between two sets as well as frequencies of those elements in traces. The similarity measure between two sequences X and Y is given by 2 X Y S (X, Y)= * X Y X Y X Y BWC measure is derived from Cosine similarity as well as Jaccard similarity measure. Since the Cosine similarity measure is a contributing component in a BWC similarity measure hence, BWC similarity measure is also a vector based similarity measure. The transformation step is same as carried out in Cosine similarity measure or Euclidean measure for sets. For two sets, X = M, N, P, Q, R, M, S, Q and Y = P, M, N, Q, M, P, P the Cosine similarity is given as and Jaccard similarity as Hence, the computed BWC similarity measure comes out to be Proposed Methodology This section illustrates the methodology for extracting sequential information from the sets, thus making it applicable to be used by various vector based distance/similarity metrics. We considered sub-sequences of fixed sizes:,2,3 This fixed size subsequence is called window. This window is slided over the traces of system calls to find the unique sub-sequences of fixed length s over the whole dataset. A frequency count of each sub-sequence is recorded. Consider a sequence, which consists of traces of system calls. /2

6 Using Sub-sequence Information with knn for Classification of Sequential Data 54 execve open mmap open mmap mmap mmap mmap mmap open mmap exit execve open mmap open mmap mmap mmap mmap mmap open mmap exit Sliding window of size 3 Total length of sequence is 2 with the sliding window size w (=3) we will have total sub-sequences of size 3 as = 0. These 0 sub-sequences of size 3 are execve open mmap open mmap open mmap open mmap open mmap mmap mmap mmap mmap mmap mmap mmap mmap mmap mmap mmap mmap open mmap open mmap open mmap exit From among these 0 generated sliding window-sized sub-sequences unique subsequences with their frequencies are as follows: execve open mmap mmap open mmap 2 open mmap open mmap mmap open open mmap mmap open mmap exit mmap mmap mmap 3 With these encoded frequencies for sub-sequences, we can apply any vector based distance/similarity measure, thus incorporating the sequential information with vector space. The traditional classification algorithm the knn classification algorithm [4, 7] with suitable distance/similarity metric can be used to build an efficient classifier. Our proposed methodology consists of two phases namely training and testing phase. Dataset D consists of m sessions. Each session is of variable length. Initially in training phase, all the unique sub-sequences of size s are extracted from the whole dataset. Let n be the number of unique sub-sequences of size w, generated from the dataset D. A matrix C of size m n is constructed where C ij is given by count of j th unique sub-sequence in the i th session. A distance/similarity metric is constructed by applying distance/similarity measure over the C matrix. The model is trained with the dataset consisting of normal sessions. In testing phase, whenever a new process P comes to the classifier, it looks for the presence of any new sub-sequence of size s. If a new sub-sequence is found, the new process is marked as abnormal. When there is no new sub-sequence in new process P, calculate the similarity of new process with all the sessions. If similarity between any session in training set and new process is equal to, mark it as normal. In other case, pick the k highest values of similarity between new process P and training dataset. From this k maximum values, calculate the average similarity for k-nearest neighbors. If the average similarity value is greater than user defined threshold value (τ ) mark the new process P as normal, else mark P as abnormal. 5 Experimental Results Experiments were conducted using k-nearest Neighbor classifier with Jaccard similarity function, Cosine similarity measure, Euclidean distance and BWC metric.

7 542 P. Kumar et al. Each distance/similarity metric was individually experimented with knn classifier on DARPA 98 IDS dataset. DARPA 98 IDS dataset consists of TCPDUMP and BSM audit data. The network traffic of an Air Force Local Area Network was simulated to collect TCPDUMP and BSM audit data [3]. The audit logs contain seven weeks of training data and two weeks of testing data. There were 38 types of network-based attacks and several realistic intrusion scenarios conducted in the midst of normal background data. Detailed discussion of DARPA dataset is given at [2]. For experimental purpose, 605 unique processes were used as a training dataset, which were free from all types of attacks. Testing was conducted on 5285 normal processes. In order to test the detection capability of proposed approach, we incorporate 55 intrusive sessions into our test data. For knn classification experiments, k=5 was considered. With various discussed distance/similarity measures in the above section (Jaccard similarity measure, Cosine similarity measure, Euclidean distance measure and BWC similarity measure) at different sub-sequence lengths (sliding window size) L=,3,5 experiments were carried out. Here, L= means that no sequential information is captured whereas, for L > some amount of order information across elements of the data is preserved. sub-seq L= sub-seq L =3 sub-seq L =5.2 Detection Rate False Positive Rate Fig.. ROC curve for Jaccard similarity metric using knn classification for k =5 To analyze the efficiency of classifier, ROC curve is used. The ROC curve is an interesting tool to analyze two-class problems [4]. ROC curve is very useful where situations detection of rarely occurring event is done. ROC curve depicts the relationship between False Positive Rate (FPR) and Detection Rate (DR) at various threshold values. DR is the ratio of the number of intrusive sessions (abnormal) detected correctly to the total number of intrusive sessions. The FPR is defined as the number of normal processes detected as abnormal, divided by the total number of normal processes. ROC curve gives an idea of the trade off between FPR and DR achieved by classifier. An ideal ROC curve would be parallel to FPR axis at DR equal to.

8 Using Sub-sequence Information with knn for Classification of Sequential Data 543 Sub-seq L= Sub-seq L=3 Sub-seq L=5 Detection Rate False Positive Rate Fig. 2. ROC curve for Cosine similarity metric using knn classification for k =5.2 sub-seq L= sub-seq L=3 Sub-seq L=5 Detection Rate False Positive Rate Fig. 3. ROC curve for Euclidian distance metric using knn classification for k =5 Corresponding ROC curves for Jaccard similarity measure, Cosine similarity measure, Euclidean distance measure and BWC measure are shown in fig, 2, 3 and 4 respectively. It can be observed from fig,2,3 and 4 that as the sliding window size increases from L = to L = 5, high DR (close to ideal value of ) is observed with all the distance/similarity metrics. Rate of increase in false positive is less for Jaccard similarity measure ( ) as compared to different distance/similarity metrics such as Cosine similarity (0.-0.4), Euclidian distance ( ) and BWC similarity (0.-0.7). Table depicts the factor (FPR or Threshold value) that was traded off in order to achieve high DR. For example, in the case of Jaccard similarity measure, FPR was traded off for threshold values (highlighted in bold face) in order to achieve high DR.

9 544 P. Kumar et al. Sub-seq L= Sub-seq L=3 Sub-seq L=5 Detection Rate False Positive Rate Fig. 4. ROC curve for BWC similarity metric using knn classification for k =5 Table. Results for different distance/similarity metric Jaccard similarity measure Cosine similarity measure Euclidian distance measure BWC similarity measure τ FPR τ FPR τ FPR τ FPR L = L = L = Thus, our results support the hypothesis that classification accuracy of sequential data can be improved by incorporating the order information embedded in sequences. We also performed experiments with different k values for nearest neighbor classifier with all the four measures. Table 2. False positive rate at maximum attained detection rate for different sub-sequence length for different distance/similarity measure at k =7 L = L = 3 L =5 Jaccard similarity Euclidian distance Cosine distance BWC measure We present the false positive rate at maximum attained detection rate for different sub-sequence lengths L =, 3, 5 with all the distance/similarity measures in table 2 for k =7. It can be observed that, as per the trend, the FPR is increasing with the increasing sub-sequence lengths for all the four measures. We also performed experiments with k =0 and the trend is also found to be consistent (Results are not included here).

10 Using Sub-sequence Information with knn for Classification of Sequential Data Conclusion Using Intrusion Detection as an example domain, we demonstrated in this paper the usefulness of utilizing sub-sequence information for knn classification of sequential data. We presented results on DARPA 98 IDS dataset wherein we systematically varied the length of the sliding window from to 5 and used various distance /similarity measures such as Jaccard similarity, Cosine similarity, Euclidian distance and BWC similarity measure. As the sub-sequence information is increased, the high DR is achieved with all the four measures. Our results show that if order information is made available, a traditional classifier such as knn can be adapted for sequence classification problem. We are currently working on design of new similarity measure, for capturing complete sequential information. Although the current paper presented results in the domain of information security, we feel this methodology can be adopted for the domains such as web mining, text mining and bio-informatics. References. Agrawal, R., Faloutsos, C. and Swami, A.: Efficient similarity search in sequence databases. In proceedings of the 4th Int'l Conference on Foundations of Data Organization and Algorithms. Chicago, IL, 993. pp Bace, R.: Intrusion Detection. Macmillan Technical Publishing, Buckinx, W., Moons, E., Van den Poel, D. and Wets, G: Customer-Adapted Coupon Targeting Using Feature Selection, Expert Systems with Applications 26, No , Dasarathy, B.V.: Nearest-Neighbor Classification Techniques, IEEE Computer Society Press, Los Alomitos, CA, Forrest S, Hofmeyr S A, Somayaji A and Longstaff T.A.: A Sense of self for UNIX process. In Proceedings of the IEEE Symposium on Security and Privacy, pages 20-28, Los Alamitos, CA, 996. IEEE Comuputer Socity Press. 6. Gludici, P: Applied Data Mining, Statistical methods for business and industry, Wiely publication, Han, Jiawei., Kamber, Micheline.: Data Mining, Concepts and Techniques, Morgan Kaufmann Publishers, Hastie, T., Tibshirani, R. and Friedman, J. H.: The Elements of Statistical Learning, Data Mining, Inference, and Prediction, Springer, Hofmeyr S A, Forrest S, and Somayaji A.: Intrusion Detection Using Sequences of System calls. Journal of Computer Security, 998, 6: Keogh, E., Chakrabarti, K., Pazzani, M. and Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. In proceedings of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, pp Khan, M., Ding, Q. and Perrizo, W.: k-nearest Neighbor Classification on Spatial Data Streams Using P-Trees, In the Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Liao, Y., Rao Vemuri, V.: Using Text Categorization Techniques for Intrusion Detection. USENIX Security Symposium 2002: MIT Lincoln Laboratory,

11 546 P. Kumar et al. 4. Marques de sa, J.P: Pattern recognition: concepts, methods and applications, Springer- Verlag Pujari, A.K.: Data Mining Techniques, Universities Press INDIA, Qian, G, Sural, S., Gu, Y., Pramanik, S.: Similarity between Euclidean and cosine angle distance for nearest neighbor queries. SAC 2004: Ratanamahatana, C. A. and Keogh. E..: Making Time-series Classification More Accurate Using Learned Constraints. In proceedings of SIAM International Conference on Data Mining (SDM '04), Lake Buena Vista, Florida, pp Rawat, S. Pujari, A.K., Gulati, V.P.,and Vemuri, V. Rao.: Intrusion Detection using Text Processing Techniques with a Binary-Weighted Cosine Metric. International Journal of Information Security, Springer-Verlag, Submitted Sams String Metrics, Sholom M. Weiss and Casimir A. Kulikowski: Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems (Machine Learning Series), Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, Tom M. Mitchell.: Machine learning, Mc Graw Hill Wang, Jason T.L.; Zaki, Mohammed J.; Toivonen, Hannu T.T.; Shasha, Dennis: Data mining in bioinformatics, Springer-Verlag 2005

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Bayesian Learning Networks Approach to Cybercrime Detection

Bayesian Learning Networks Approach to Cybercrime Detection Bayesian Learning Networks Approach to Cybercrime Detection N S ABOUZAKHAR, A GANI and G MANSON The Centre for Mobile Communications Research (C4MCR), University of Sheffield, Sheffield Regent Court, 211

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Domain Independent Prediction with Evolutionary Nearest Neighbors.

Domain Independent Prediction with Evolutionary Nearest Neighbors. Research Summary Domain Independent Prediction with Evolutionary Nearest Neighbors. Introduction In January of 1848, on the American River at Coloma near Sacramento a few tiny gold nuggets were discovered.

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Eleazar Eskin Computer Science Department Columbia University 5 West 2th Street, New York, NY 27 eeskin@cs.columbia.edu Salvatore

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming Dr.K.Duraiswamy Dean, Academic K.S.Rangasamy College of Technology Tiruchengode, India V. Valli Mayil (Corresponding

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

Topic 1 Classification Alternatives

Topic 1 Classification Alternatives Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1 Contents 2. Classification Using Frequent

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal

Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Keshavamurthy B.N., Mitesh Sharma and Durga Toshniwal Department of Electronics and Computer Engineering, Indian Institute of Technology, Roorkee, Uttarkhand, India. bnkeshav123@gmail.com, mitusuec@iitr.ernet.in,

More information

Research on outlier intrusion detection technologybased on data mining

Research on outlier intrusion detection technologybased on data mining Acta Technica 62 (2017), No. 4A, 635640 c 2017 Institute of Thermomechanics CAS, v.v.i. Research on outlier intrusion detection technologybased on data mining Liang zhu 1, 2 Abstract. With the rapid development

More information

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery

HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery HOT asax: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery Ninh D. Pham, Quang Loc Le, Tran Khanh Dang Faculty of Computer Science and Engineering, HCM University of Technology,

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Data Distortion for Privacy Protection in a Terrorist Analysis System

Data Distortion for Privacy Protection in a Terrorist Analysis System Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA

More information

Mining High Order Decision Rules

Mining High Order Decision Rules Mining High Order Decision Rules Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 e-mail: yyao@cs.uregina.ca Abstract. We introduce the notion of high

More information

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE

WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE WEB PAGE RE-RANKING TECHNIQUE IN SEARCH ENGINE Ms.S.Muthukakshmi 1, R. Surya 2, M. Umira Taj 3 Assistant Professor, Department of Information Technology, Sri Krishna College of Technology, Kovaipudur,

More information

Network Intrusion Detection Using Fast k-nearest Neighbor Classifier

Network Intrusion Detection Using Fast k-nearest Neighbor Classifier Network Intrusion Detection Using Fast k-nearest Neighbor Classifier K. Swathi 1, D. Sree Lakshmi 2 1,2 Asst. Professor, Prasad V. Potluri Siddhartha Institute of Technology, Vijayawada Abstract: Fast

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY

DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY DECISION TREE INDUCTION USING ROUGH SET THEORY COMPARATIVE STUDY Ramadevi Yellasiri, C.R.Rao 2,Vivekchan Reddy Dept. of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, INDIA. 2 DCIS, School

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

Spatial Topology of Equitemporal Points on Signatures for Retrieval

Spatial Topology of Equitemporal Points on Signatures for Retrieval Spatial Topology of Equitemporal Points on Signatures for Retrieval D.S. Guru, H.N. Prakash, and T.N. Vikram Dept of Studies in Computer Science,University of Mysore, Mysore - 570 006, India dsg@compsci.uni-mysore.ac.in,

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

An Efficient Clustering for Crime Analysis

An Efficient Clustering for Crime Analysis An Efficient Clustering for Crime Analysis Malarvizhi S 1, Siddique Ibrahim 2 1 UG Scholar, Department of Computer Science and Engineering, Kumaraguru College Of Technology, Coimbatore, Tamilnadu, India

More information

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes

Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Modeling System Calls for Intrusion Detection with Dynamic Window Sizes Eleazar Eskin Computer Science Department Columbia University 5 West 2th Street, New York, NY 27 eeskin@cs.columbia.edu Salvatore

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Effective Intrusion Type Identification with Edit Distance for HMM-Based Anomaly Detection System

Effective Intrusion Type Identification with Edit Distance for HMM-Based Anomaly Detection System Effective Intrusion Type Identification with Edit Distance for HMM-Based Anomaly Detection System Ja-Min Koo and Sung-Bae Cho Dept. of Computer Science, Yonsei University, Shinchon-dong, Seodaemoon-ku,

More information

Individualized Error Estimation for Classification and Regression Models

Individualized Error Estimation for Classification and Regression Models Individualized Error Estimation for Classification and Regression Models Krisztian Buza, Alexandros Nanopoulos, Lars Schmidt-Thieme Abstract Estimating the error of classification and regression models

More information

Enhancing Cluster Quality by Using User Browsing Time

Enhancing Cluster Quality by Using User Browsing Time Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

NETWORK FAULT DETECTION - A CASE FOR DATA MINING NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

An Immune Concentration Based Virus Detection Approach Using Particle Swarm Optimization

An Immune Concentration Based Virus Detection Approach Using Particle Swarm Optimization An Immune Concentration Based Virus Detection Approach Using Particle Swarm Optimization Wei Wang 1,2, Pengtao Zhang 1,2, and Ying Tan 1,2 1 Key Laboratory of Machine Perception, Ministry of Eduction,

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network International Journal of Science and Engineering Investigations vol. 6, issue 62, March 2017 ISSN: 2251-8843 An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network Abisola Ayomide

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics Helmut Berger and Dieter Merkl 2 Faculty of Information Technology, University of Technology, Sydney, NSW, Australia hberger@it.uts.edu.au

More information

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology

Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Learning the Three Factors of a Non-overlapping Multi-camera Network Topology Xiaotang Chen, Kaiqi Huang, and Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy

More information

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA)

TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) TOWARDS NEW ESTIMATING INCREMENTAL DIMENSIONAL ALGORITHM (EIDA) 1 S. ADAEKALAVAN, 2 DR. C. CHANDRASEKAR 1 Assistant Professor, Department of Information Technology, J.J. College of Arts and Science, Pudukkottai,

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

Detection and Deletion of Outliers from Large Datasets

Detection and Deletion of Outliers from Large Datasets Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant

More information

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Parallel Popular Crime Pattern Mining in Multidimensional Databases Parallel Popular Crime Pattern Mining in Multidimensional Databases BVS. Varma #1, V. Valli Kumari *2 # Department of CSE, Sri Venkateswara Institute of Science & Information Technology Tadepalligudem,

More information

Review on Data Mining Techniques for Intrusion Detection System

Review on Data Mining Techniques for Intrusion Detection System Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

String Vector based KNN for Text Categorization

String Vector based KNN for Text Categorization 458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research

More information

Detection of Missing Values from Big Data of Self Adaptive Energy Systems

Detection of Missing Values from Big Data of Self Adaptive Energy Systems Detection of Missing Values from Big Data of Self Adaptive Energy Systems MVD tool detect missing values in timeseries energy data Muhammad Nabeel Computer Science Department, SST University of Management

More information

Automated Website Fingerprinting through Deep Learning

Automated Website Fingerprinting through Deep Learning Automated Website Fingerprinting through Deep Learning Vera Rimmer 1, Davy Preuveneers 1, Marc Juarez 2, Tom Van Goethem 1 and Wouter Joosen 1 NDSS 2018 Feb 19th (San Diego, USA) 1 2 Website Fingerprinting

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

TiP: Analyzing Periodic Time Series Patterns

TiP: Analyzing Periodic Time Series Patterns ip: Analyzing Periodic ime eries Patterns homas Bernecker, Hans-Peter Kriegel, Peer Kröger, and Matthias Renz Institute for Informatics, Ludwig-Maximilians-Universität München Oettingenstr. 67, 80538 München,

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

K-Mean Clustering Algorithm Implemented To E-Banking

K-Mean Clustering Algorithm Implemented To E-Banking K-Mean Clustering Algorithm Implemented To E-Banking Kanika Bansal Banasthali University Anjali Bohra Banasthali University Abstract As the nations are connected to each other, so is the banking sector.

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW ON CONTENT BASED IMAGE RETRIEVAL BY USING VISUAL SEARCH RANKING MS. PRAGATI

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Prateek Saxena March 3 2008 1 The Problems Today s lecture is on the discussion of the critique on 1998 and 1999 DARPA IDS evaluations conducted

More information

UMCS. Annales UMCS Informatica AI 7 (2007) Data mining techniques for portal participants profiling. Danuta Zakrzewska *, Justyna Kapka

UMCS. Annales UMCS Informatica AI 7 (2007) Data mining techniques for portal participants profiling. Danuta Zakrzewska *, Justyna Kapka Annales Informatica AI 7 (2007) 153-161 Annales Informatica Lublin-Polonia Sectio AI http://www.annales.umcs.lublin.pl/ Data mining techniques for portal participants profiling Danuta Zakrzewska *, Justyna

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017 International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules

More information

Clustering Algorithms for Data Stream

Clustering Algorithms for Data Stream Clustering Algorithms for Data Stream Karishma Nadhe 1, Prof. P. M. Chawan 2 1Student, Dept of CS & IT, VJTI Mumbai, Maharashtra, India 2Professor, Dept of CS & IT, VJTI Mumbai, Maharashtra, India Abstract:

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2012 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt12 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

Intrusion Detection Using Data Mining Technique (Classification)

Intrusion Detection Using Data Mining Technique (Classification) Intrusion Detection Using Data Mining Technique (Classification) Dr.D.Aruna Kumari Phd 1 N.Tejeswani 2 G.Sravani 3 R.Phani Krishna 4 1 Associative professor, K L University,Guntur(dt), 2 B.Tech(1V/1V),ECM,

More information

Effective Pattern Similarity Match for Multidimensional Sequence Data Sets

Effective Pattern Similarity Match for Multidimensional Sequence Data Sets Effective Pattern Similarity Match for Multidimensional Sequence Data Sets Seo-Lyong Lee, * and Deo-Hwan Kim 2, ** School of Industrial and Information Engineering, Hanu University of Foreign Studies,

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Comparison of Optimization Methods for L1-regularized Logistic Regression

Comparison of Optimization Methods for L1-regularized Logistic Regression Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, ISSN: IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 20131 Improve Search Engine Relevance with Filter session Addlin Shinney R 1, Saravana Kumar T

More information

MULTIDIMENSIONAL INDEXING TREE STRUCTURE FOR SPATIAL DATABASE MANAGEMENT

MULTIDIMENSIONAL INDEXING TREE STRUCTURE FOR SPATIAL DATABASE MANAGEMENT MULTIDIMENSIONAL INDEXING TREE STRUCTURE FOR SPATIAL DATABASE MANAGEMENT Dr. G APPARAO 1*, Mr. A SRINIVAS 2* 1. Professor, Chairman-Board of Studies & Convener-IIIC, Department of Computer Science Engineering,

More information