A Genetic-Fuzzy Classification Approach to improve High-Dimensional Intrusion Detection System

Size: px
Start display at page:

Download "A Genetic-Fuzzy Classification Approach to improve High-Dimensional Intrusion Detection System"

Transcription

1 A Genetic-Fuzzy Classification Approach to improve High-Dimensional Intrusion Detection System Imen Gaied 1, Farah Jemili 1, and Ouajdi Korbaa 1 1 MARS Laboratory, ISITCom, University of Sousse, Tunisia gaiedimen@gmail.com Jmili_farah@yahoo.fr Ouajdi.Korbaa@centraliens-lile.org Abstract. With the increasing number of attacks and growing scalability of connected networks over the past few years, researchers are brought to find other alternatives to judge the relevance, severity and correlation of network attacks. The high-dimensional intrusion detection system seems a promising dynamic protection component in security fields. In this work we propose an optimized classification scheme that coordinates several techniques for generating fuzzy association rules based on a large data set. Our main task is to ameliorate the detection rate of attacks in a real-time environment by using the one-versus-one decomposition to minimize as much as possible the false alarm rate. In addition, we aim to reduce the loss of knowledge through a suitable n- dimensional overlap function in order to model the conjunction in fuzzy rules to provide enough classification accuracy. We can also opt for the aggregation method to obtain the final decision. To evaluate the performance of our approach, an experimental study is performed so as to achieve relevant results. The final outcome shows that our approach outperforms other classifiers by providing the highest detection accuracy, a low false alarm rate and time consumption. Keywords: Intrusion detection system, OVO decomposition, N-dimensional overlap function, Fuzzy rules associations, Detection rate, False positives. 1 Introduction There has been recently an exponential growth of network attacks and an increase in their severity. Therefore, developing an efficient Intrusion Detection System (IDS) becomes an important open problem that is receiving a noteworthy attention from the research community. The main focus of this research work is to put forward an intelligent and accurate intrusion detection using linguistic Fuzzy-Rule-Based Classification Systems (FRBCS) in synergy between the preprocessing techniques, the pairwise learning and the n-dimensional overlap functions. This approach aims to explore the large search space, achieve a more efficient result accuracy, maintain a high confidence and a good coverage of the proposed database and to provide the user with high-quality rules. To our knowledge, this is the first research paper which

2 suggests this approach for intrusion detection problems. To reach these goals, we have to deal at first with the imbalanced distribution of classes in high-dimensional problems. In fact, the classification accuracy of a classifier is directly affected by the quality of the training data used to construct the final model. Basically, we apply the preprocessing of datasets to remove or correct the noisy pattern. Then, we use the One-Versus-One (OVO) decomposition to decrease the complexity and the noise effect of an original problem [1]. To better decrease the effects caused by noise, we consider the fuzzy association rule learning algorithm known as FARC-HD (Fuzzy Association Rule-based Classification model for High-Dimensional problems) [2] in order to obtain the most accurate and highest quality rules. Nevertheless, the usage of the product as a T- norm in a baseline FARC-HD algorithm with an OVO decomposition results in low- variation values, especially when we use a higher number of antecedents of the fuzzy rules. These values are utilized in the aggregation step to get the last decision. Therefore, some robust aggregation methods are affected by this undesirable condition and so can lead to a poorer classification than the original FARC-HD. To address this problem, we have to substitute the product T- norm by a suitable overlap function [3][4]. The latter prior works defined n- dimensional overlap functions in order to obtain suitable outputs, which were in a wider range from the base classifiers. Hence, more information was maintained by a further aggregation process that could make a better improvement of the classification in the OVO scheme. To evaluate the performance of this new methodology in the intrusion detection problem, a comparison study was performed with another approach based on the original FARC-HD and OVO strategy [5]. Accordingly, we show the validity of such suggestions in ameliorating the accurate detection intrusion compared to other models. The remainder of this paper is organized as follows. Section 2 first describes the operating principle of FARC-HD algorithm and then the notion of classification via the pairwise techniques. After that, it specifies the binary tree of the SVM as an aggregation method, and finally draws a brief definition of the n-dimensional overlap functions. The proposed approach is presented in section 3. The obtained experiments and results are discussed in section 4. The conclusion and future work are given in section 5. 2 Proposed methodology 2.1 FARC-HD algorithm We use in this work the FARC-HD algorithm [2] in order to provide the high quality rules in a high dimensional problem. The model of the classification problem consists of m training examples with x p = (x p1,, x pn, C p ), p=1,2,...,m from m classes where x pi is the ith attribute value (i = 1,2,...,n). The fuzzy rule is depicted as following: Rule R j : If x 1 is A ji and... and x n is A jn then Class = C j with RW j (1) Where R j is the label of the jth rule, x = (x 1,, x n ) is an n-dimensional pattern vector, A ji is an antecedent fuzzy set, C j is the class label and RW j is the rule weight.

3 The learning process consists of three main steps which are: Fuzzy association rule mining for classification, Candidate rule prescreening, Genetic rule selection and lateral tuning. A search tree is built for each class to get the fuzzy rule base. We can reduce the search space by generating only the rules with a high support and a high confidence. To preselect the most strong rules, a rule assessment criterion is used by utilizing the "subgroup discovery" mechanism [8]. In order to decrease the computational costs, an evolutionary algorithm is carried out to obtain a lateral tuning of the fuzzy sets [9] so as to pick the final best rules from the rule base. To classify a new example, the FARC-HD applies a fuzzy reasoning method, called additive combination, composed of four steps: Matching degree, Association degree, Confidence degree and Classification. In the first step the strength of activation of the if-part for all rules in the Rule Base (RB) with the pattern x p is computed by means of using a conjunction operator (T-norm ). μ Aj (x p ) = T(μ Aj (x p1 ),, μ Aj (x pn )), j=1,...,l (2) The association degree of the pattern x p with each rule in the RB which has as a consequent class the rule R (k refers to class(r j )), is computed by using a combination operator h to combine the matching degree with the Rule Weight (RW): b j k = h (μ Aj (x p ), RW j k ), K = 1 M; j = 1 L (3) In the third stage we use an aggregation function f (the sum in the case of FARC- HD) whose positive association degrees computed in the previous stage are combined. y k = f( b j k, j = 1 L), K = 1 M, b j k > 0 (4) In the last step we apply a decision function F to determine the class that obtain the highest confidence degree is predicted. F(y 1 y M ) = argmax(y k ), [K = 1 M] (5) 2.2 Classification by using decomposition strategies: OVO In the OVO, the original multi-class is divided into m(m 1) binary sub-problems, which aim to distinguish a pair of classes C i, C j. When a new pattern is presented to each binary classifier, a pair of confidence degrees r ij, r ji is given in favor of two classes C i, C j (the class with the largest confidence is the output class of this classifier). All the outputs (confidence degrees) provided by all binary classifiers represented by a score matrix R are combined to make the final class prediction using aggregation models. 2

4 r 12 r 1m r 21 r 2m R = [ ] (6) r m1 r m2 In the literature [6] we distinguish different methods, which are the Max-Wins rule known as the Voting strategy (VOTE), the Winner Weighted Voting (WINW) whose validity was proven in [4], the methods based on preference relations (ND, LVPC). Please refer to [6][4][10] for more information. Otherwise, in this work we adopt an architecture of the Binary Tree of the SVM (BTS) [10] for the final decision. The BTS is an architecture of tree structure which can be extend easily to any type of binary classifier aiming to construct a recursively binary tree. At the beginning, its root node considers all classes in the list of classes and a binary classifier is selected randomly for training in order to get a separating plane. Then, all the samples in the node are assigned to two subnodes derived from the two classes. To complete the binary tree structure, we have to apply this classification strategy in every node until a leaf node containing only one class is reached. The decision is made when using the binary classifier to discriminate between two classes in order to distinguish the remaining other classes simultaneously. Thereby, more than one class can be removed from the list. In addition, to avoid any false assumption, especially when the leaf contains more than one class, the output is computed with the voting strategy. (a) Multiclass problem (b) Architecture of BTS Fig. 1. Six-class problem determined by BTS[10] Fig. 1 illustrates this concept applied on a six-class problem. The first node classifier discriminates classes 1 and 2. On the one hand, when class 1 is predicted, classes 4 and 6 are removed (when a testing sample is at position A, it does likely appertain to class 4 or 6). On the other hand, when class 2 is predicted, only class 1 is taken out. Therefore, classes 3 and 5 are maintained in the two next nodes by using probabilistic outputs, because class 3 is near the decision function (when a testing sample is at position B, it will have a chance to be classified to class 3) and because class 5 cannot be distinguished with the classifier in the root node.

5 2.3 Classification by using N-Dimensional Overlap Functions As we mentioned above, the confidence degrees provided by the original FARC- HD have low variations, which is not advantageous for the aggregation process performed in the OVO. This negative effect is explained by the use of the product T- norm to model the conjunction. To provide a better synergy between the FARC-HD and the OVO decomposition, a solution is proposed consisting in substituting the product in an association degree by an n-dimensional overlap function so as to obtain results in a wider range and solve the problem of penalization of the rules that have a great number of antecedents. In such a way, the outputs of the base classifiers become more suitable for the subsequent aggregation step. In our case we use a MIN operation as an overlap function that satisfies the property of discrimination capability of the FARC-HD and preserves the idempotence criteria. O n (x 1...x n ) = Min (x 1...x n ) (1) 3 Experimental framework In this section, we first describe the datasets picked for the experimental study (section 3.1). Next we provide details of the base classifiers used in the study by describing their configuration parameters and aggregation strategies in section 3.2. Finally, we present the measures employed to evaluate the performance of each classifier (section 3.3). 3.1 Datasets: KDD CUP 99 The experiments were implemented step by step by utilizing the KDD99 benchmark dataset which was used by many researchers to evaluate their IDS [5][7][12]. It has been the most widespread and complete dataset opted for data mining applications. It is divided into data labelled for a training phase (about 5,000,000 records) and data unlabelled for the test base named "corrected KDD" (about 311,000, it includes 14 new types of attacks). Each record contains TCP/IP connections composed of 41 attributes and comprising four main classes of attacks, namely: Probe, R2L, U2R and DOS. Unfortunately, this dataset might have been some disadvantages: the redundant nature of alerts and a broad base containing 41 attributes with an imbalanced distribution. Those dilemma can affect the detection of rare attacks rather than dominant ones. To resolve this problem, some researchers have used different sizes of data sets prepared by a random selection [5][7]. 3.2 Configuration parameters used for study In this section, we briefly present the FARC-HD algorithm used as a base classifier and different combination methods for the OVO decomposition scheme. We chose 5 labels per attributes for the fuzzy sets which are in the form of triangular MFs. We considered five aggregation methods in addition to our proposed BTS method, which are the VOTE, the Non-Dominance Criteria (ND), the Learning Valued Preference for

6 Classification(LVPC), the WinWV. As a conjunction operator, we selected the Product (PROD), the Geometric Mean (GM), the Harmonic Mean (HM) and the Minimum operator (MIN). As an inference procedure we set up the additive combination. Furthermore, we have fixed the minimum support to 0.05, the minimum confidence to 0.8, the maximum depth level of the tree to 3, the k parameter for the pre-screening to 2, the maximum evaluation to 4,327 and the population size to 50 [2]. 3.3 Performance measure of intrusion detection Unfortunately, it is not easy to design a satisfactory evaluation strategy to get a valuable conclusion. Indeed, the methods used previously show three major drawbacks : the use of a non-representative base of the test data, the lack of a rigorous evaluation methodology, and the utilization of improper metrics. We considered the following measures as an evaluation parameter for these reasons: Classification Rate (CR) : It is used as a classical metric, named also overall accuracy. It defines the fraction of instances predicted correctly (TP+TN) by the total number of instances (n). Cohen s kappa (Kappa) [11] : It is considered as an alternative measure to the CR. Since it scores the success instances independently for each class and aggregates them, compared to the CR that aims to score all the successes over all the classes. Kappa scoring is very convenient for measuring the classification s rate while it is less sensitive to the randomness caused by a different number of instances in each class. Mean F-Score : It denotes the average of the F-Score of each class. It represents the trade-off between precision and recall. These measures are commonly used to evaluate the rare class prediction. Detection Rate (DR) : It is an important measure for the intrusion detection. It is computed as a fraction of the number of correctly detected attacks (TP) by the total number of attacks ( TP+FN). False Positive Rate (FPR) : It is computed as a ratio between the number of normal examples detected as attacks (FP) and the total number of normal examples( FP+TN). Receiver Operating Characteristic (ROC): It is used for comparing the detectors together. The ROC curve can be generated by plotting TP(sensitivity) against FP(1- specificty) for each utilized threshold or decision cutoff. A data point in the upper left corner corresponds to an optimal high DR with low FPR. Area under ROC Curve (AUC): It is a performance measure used to compare the IDS, with an area of (0.1) representing a perfect test as it classifies all positive and negative cases correctly. 4 Experimental Study In this section, we discuss some conducted experimentations to show the performance of our suggested approach on large datasets. All the experimentations were implemented in Java language on an Intel Pentium IV personal computer with 2.40 GHz and 7.88GO RAM.

7 4.1 Datasets and pre-processing We proposed to apply three main preprocessing phases [7]. We removed all the repeated records in the entire KDD training and test set and retained only one copy of each record. In order to reduce the complexity of the large amount of a dataset, we processed the feature selection by applying the factorial multiple correspondence analysis [12] to extract the most relevant features of data. We used just 9 relevant attributes. To become adaptable to the fuzzy logic, a normalization step is necessary. It is determined by the Min-Max function. 4.2 Classifier structure We want to study here how the different overlap functions and the different aggregation strategy affect the rule base size on the one hand and the computation time of learning algorithm on the other hand. To do so, a 5-fold cross-validation model was considered. We used the whole of 10% KDD datasets and we split 90% random instances for training and 10% for testing. We considered the average number of rules and antecedents by rule for each overlap function. The results are presented in the Table 1. Table 2 shows the computation time of each proposition method. Here, we require developing an optimized IDS with low cost-rules, which is the case of the MIN conjunction function and the BTS aggregation method. Table 1. Number of rules and antecedents by rule for each conjunction function Conjunction function Number of rules Number of antecedents by rule PROD GM HM MIN Table 2. Computation time for each conjunction function Conjunction function Computation time (min) VOTE ND LVPC WinW BTS PROD GM HM MIN Intrusion detection performance comparison This section aims to analyze which method is considered the most robust strategy, accounting for the IDS problems. An experimental study was achieved in three steps. First, we studied the affect of our proposed overlap function and aggregation method, which are considered the most appropriate to improve the final performance of our suggested IDS. Second we emphasize the effectiveness of our accurate IDS in contrast with the other approaches. Finally, we adopted the ROC graphs and the AUC measure, which are considered a convenient way in comparing detectors together.

8 Table 3 shows the results for the test partition (10% of the datasets) of the baseline FARC-HD algorithm within the OVO strategies(farchd-ovo) for each aggregation method. Table 4, Table 5 and Table 6 represent the results of the FARCHD-OVO within the GM overlap function, HM overlap function and MIN overlap function respectively. It is important to notice that the function MIN reaches a good trade-off between DR and FPR with 95.80% as DR and 1.41% as FPR compared to other overlap functions: the function GM and HM with 94.25% as DR and 0.67% as FPR; with 94.95% as DR and 0.65% as FPR respectively. Table 3. Test evaluation of FARCHD-OVO-PROD approach Approach CR Kappa MFM DR FPR FARCHD-ND-PROD [5] FARCHD-VOTE- PROD FARCHD-LPVC- PROD FARCHD-WINW- PROD FARCHD-BTS- PROD Table 4. Test evaluation of FARCHD-OVO-GM approach Approach CR Kappa MFM DR FPR FARCHD-ND-GM FARCHD-VOTE-GM FARCHD-LPVC-GM FARCHD-WINW-GM FARCHD-BTS-GM Table 5. Test evaluation of FARCHD-OVO-HM approach Approach CR Kappa MFM DR FPR FARCHD-ND-HM FARCHD-VOTE-HM FARCHD-LPVC-HM FARCHD-WINW-HM FARCHD-BTS-HM Table 6. Test evaluation of FARCHD-OVO-MIN approach Approach CR Kappa MFM DR FPR FARCHD-ND-MIN FARCHD-VOTE- MIN FARCHD-LPVC- MIN FARCHD-WINW- MIN FARCHD-BTS- MIN When facing the results of our suggested FARCHD-BTS-MIN scheme, versus the other schemes, we see that there is a significant improvement in all instances, in particular the classification rate, the cohen s kappa and the detection rate. It be must

9 also pointed out that our approach is considered a specific solution to correctly detect the boundaries for all classes, because the average of the F-Score measure, regards to the FARCHD-VOTE-PROD method, achieves a high score, compared with other methods. These schemes prove their performance, to detect a rare attack with other categories of attacks. Additionally, it provides the most optimal trade-off of two important metrics : the detection rate and the false alarm rate with 95.80% and 1.41% respectively. It's important to stress that a little existing method of comparison in the quoted reference addressed this policy in intrusion detection. Among the studies, we mentioned this work [5] which opted for FARCHD-ND-PROD scheme, it aimed to prove its effectiveness to detect a rare attacks with other classes of attacks. Our method outperforms the latest method obviously. It has proven its effectiveness to detect a rare attack with 68.24% compared to FARCHD-ND-PROD method with 64.75%. Fig.2 illustrates the performance of our method using the ROC graphs and the AUC measure, compared to other various detection method used MIN operator, to complete our experimental study. For calculating the multi-class AUC "Total AUC", we have calculated the average AUC of each class. Considering the ROC curve of the FARCHD-BTS-MIN classifier, we notice that this curve always outperforms the other classifiers with AUC = The rest of the measures are mentioned in Table 7. Fig. 2. ROC Curve Table 7. AUC value for each aggregation method using MIN function overlap Aggregation method ND VOTE LVPC WINW BTS AUC Conclusion In this work we put forward an intelligent misuse intrusion detection based on interpretable model by means of using some linguistic rules of the FARC-HD algorithm, the pairwise learning that is precisely founded on an architecture known as the BTS method, and the MIN operator as a conjunction function. Our new policies is characterized by a good interpretation, a high attack detection with a low false alarm rate and a high computational speed. This conclusion is supported by the experiments

10 that have shown the effectiveness of our approach. Nevertheless, detecting rare attacks is still in need of improvement. References 1. Saez, J.A., Galar, M., Luengo, J., Herrera, F.: Analyzing the presence of noise in multiclass problems: alleviating its influence with the one-vs-one decomposition. Knowledge and information systems, vol. 38, no. 1, pp , (2014) 2. Alcala-Fdez, J., Alcala, R., Herrera, F.: A fuzzy association rule based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems, vol. 19, no. 5, pp ( 2011) 3. Elkano, M., Galar, M., Sanz, J., Bustince, H.: Fuzzy rule-based classification systems for multi-class problems using binary decomposition strategies: On the influence of n- dimensional overlap functions in the fuzzy reasoning method. Information Sciences, vol. 332, pp (2016) 4. Elkano, M., Galar, M., Sanz, J. A., Fernandez, A., Barrenechea, E., Herrera, F. Bustince, H.: Enhancing multiclass classification in FARC-HD fuzzy classifier: On the synergy between-dimensional overlap functions and decomposition strategies. IEEE Transactions on Fuzzy Systems, vol. 23, no. 5, pp (2015) 5. Elhag, S., Fernandez, A., Bawakid, A., Alshomrani, S., Herrera, F.: On the combination of genetic fuzzy systems and pairwise learning for improving detection rates on intrusion detection systems. Expert Systems with Applications, vol. 42, no. 1, pp (2015) 6. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, vol. 44, no. 8, pp , (2011) 7. Gaied, I., Jemili, F., Korbaa, O.: Intrusion detection based on Neuro-Fuzzy classification. International Conference on Computer Systems and Applications AICCSA 2015, vol. 5, pp. 1--8, November (2015) 8. Kavsek, B., Lavrac, N.: APRIORI-SD: Adapting association rule learning to subgroup discovery. Applied Artificial Intelligence, vol. 20, no. 7, pp , (2006) 9. Alcala, R., Herrera, F.: A proposal for the genetic lateral tuning of linguistic fuzzy systems and its interaction with rule selection. IEEE Transactions on Fuzzy Systems, vol. 15, no. 4, pp , (2007) 10. Fei, B. Liu, J.: Binary tree of SVM: a new fast multiclass training and classification algorithm. IEEE Transactions on Neural Networks, vol. 17, no. 3, pp , (2006) 11. Kohen, J. : A coefficient of agreement for nominal scale. Educational and Psychological Measurement, vol. 20, pp , (1960) 12. Jemili, F., Zaghdoud, M., Ben Ahmed, M.: Intrusion Detection based on Hybrid Propagation in Bayesian Networks. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics, pp , Dallas (2009)

2015. This manuscript version is made available under the CC-BY-NC-ND 4.0 license

2015. This manuscript version is made available under the CC-BY-NC-ND 4.0 license 2015. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ Fuzzy Rule-Based Classification Systems for Multi-class Problems Using

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction

Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL SEARCH ALGORITHM

LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL SEARCH ALGORITHM International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 4, April 2013 pp. 1593 1601 LEARNING WEIGHTS OF FUZZY RULES BY USING GRAVITATIONAL

More information

Classification Part 4

Classification Part 4 Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

An Analysis of the Rule Weights and Fuzzy Reasoning Methods for Linguistic Rule Based Classification Systems Applied to Problems with Highly Imbalanced Data Sets Alberto Fernández 1, Salvador García 1,

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

Fast or furious? - User analysis of SF Express Inc

Fast or furious? - User analysis of SF Express Inc CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Face Detection using Hierarchical SVM

Face Detection using Hierarchical SVM Face Detection using Hierarchical SVM ECE 795 Pattern Recognition Christos Kyrkou Fall Semester 2010 1. Introduction Face detection in video is the process of detecting and classifying small images extracted

More information

Flow-based Anomaly Intrusion Detection System Using Neural Network

Flow-based Anomaly Intrusion Detection System Using Neural Network Flow-based Anomaly Intrusion Detection System Using Neural Network tational power to analyze only the basic characteristics of network flow, so as to Intrusion Detection systems (KBIDES) classify the data

More information

Fraud Detection using Machine Learning

Fraud Detection using Machine Learning Fraud Detection using Machine Learning Aditya Oza - aditya19@stanford.edu Abstract Recent research has shown that machine learning techniques have been applied very effectively to the problem of payments

More information

Segmentation of Images

Segmentation of Images Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a

More information

Cyber attack detection using decision tree approach

Cyber attack detection using decision tree approach Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 機器學習與生物資訊學 Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Fuzzy Partitioning with FID3.1

Fuzzy Partitioning with FID3.1 Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on

More information

BITS F464: MACHINE LEARNING

BITS F464: MACHINE LEARNING BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS 4.1. INTRODUCTION This chapter includes implementation and testing of the student s academic performance evaluation to achieve the objective(s)

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

FUZZY INFERENCE SYSTEMS

FUZZY INFERENCE SYSTEMS CHAPTER-IV FUZZY INFERENCE SYSTEMS Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Louis Fourrier Fabien Gaie Thomas Rolf

Louis Fourrier Fabien Gaie Thomas Rolf CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

More information

A Cloud Based Intrusion Detection System Using BPN Classifier

A Cloud Based Intrusion Detection System Using BPN Classifier A Cloud Based Intrusion Detection System Using BPN Classifier Priyanka Alekar Department of Computer Science & Engineering SKSITS, Rajiv Gandhi Proudyogiki Vishwavidyalaya Indore, Madhya Pradesh, India

More information

Part III: Multi-class ROC

Part III: Multi-class ROC Part III: Multi-class ROC The general problem multi-objective optimisation Pareto front convex hull Searching and approximating the ROC hypersurface multi-class AUC multi-class calibration 4 July, 2004

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

A Conflict-Based Confidence Measure for Associative Classification

A Conflict-Based Confidence Measure for Associative Classification A Conflict-Based Confidence Measure for Associative Classification Peerapon Vateekul and Mei-Ling Shyu Department of Electrical and Computer Engineering University of Miami Coral Gables, FL 33124, USA

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Efficient Pairwise Classification

Efficient Pairwise Classification Efficient Pairwise Classification Sang-Hyeun Park and Johannes Fürnkranz TU Darmstadt, Knowledge Engineering Group, D-64289 Darmstadt, Germany Abstract. Pairwise classification is a class binarization

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters

A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters A Comparison Between the Silhouette Index and the Davies-Bouldin Index in Labelling IDS Clusters Slobodan Petrović NISlab, Department of Computer Science and Media Technology, Gjøvik University College,

More information

Robust PDF Table Locator

Robust PDF Table Locator Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN 1 Review: Boosting Classifiers For Intrusion Detection Richa Rawat, Anurag Jain ABSTRACT Network and host intrusion detection systems monitor malicious activities and the management station is a technique

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

CHAPTER-4 WATERMARKING OF GRAY IMAGES

CHAPTER-4 WATERMARKING OF GRAY IMAGES CHAPTER-4 WATERMARKING OF GRAY IMAGES 4.1 INTRODUCTION Like most DCT based watermarking schemes, Middle-Band Coefficient Exchange scheme has proven its robustness against those attacks, which anyhow, do

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Prateek Saxena March 3 2008 1 The Problems Today s lecture is on the discussion of the critique on 1998 and 1999 DARPA IDS evaluations conducted

More information

On the automatic classification of app reviews

On the automatic classification of app reviews Requirements Eng (2016) 21:311 331 DOI 10.1007/s00766-016-0251-9 RE 2015 On the automatic classification of app reviews Walid Maalej 1 Zijad Kurtanović 1 Hadeer Nabil 2 Christoph Stanik 1 Walid: please

More information

CHAPTER 3. Preprocessing and Feature Extraction. Techniques

CHAPTER 3. Preprocessing and Feature Extraction. Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques CHAPTER 3 Preprocessing and Feature Extraction Techniques 3.1 Need for Preprocessing and Feature Extraction schemes for Pattern Recognition and

More information

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET An IDS monitors the network bustle through incoming and outgoing data to assess the conduct of data

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Predictive Analysis: Evaluation and Experimentation. Heejun Kim Predictive Analysis: Evaluation and Experimentation Heejun Kim June 19, 2018 Evaluation and Experimentation Evaluation Metrics Cross-Validation Significance Tests Evaluation Predictive analysis: training

More information

Classification of Protein Crystallization Imagery

Classification of Protein Crystallization Imagery Classification of Protein Crystallization Imagery Xiaoqing Zhu, Shaohua Sun, Samuel Cheng Stanford University Marshall Bern Palo Alto Research Center September 2004, EMBC 04 Outline Background X-ray crystallography

More information

GP Ensemble for Distributed Intrusion Detection Systems

GP Ensemble for Distributed Intrusion Detection Systems GP Ensemble for Distributed Intrusion Detection Systems Gianluigi Folino, Clara Pizzuti and Giandomenico Spezzano ICAR-CNR, Via P.Bucci 41/C, Univ. della Calabria 87036 Rende (CS), Italy {folino,pizzuti,spezzano}@icar.cnr.it

More information

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems

Comparative Study of Instance Based Learning and Back Propagation for Classification Problems Comparative Study of Instance Based Learning and Back Propagation for Classification Problems 1 Nadia Kanwal, 2 Erkan Bostanci 1 Department of Computer Science, Lahore College for Women University, Lahore,

More information

Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu

Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu 1 Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features Wei-Ta Chu H.-H. Yeh, C.-Y. Yang, M.-S. Lee, and C.-S. Chen, Video Aesthetic Quality Assessment by Temporal

More information

Building Intelligent Learning Database Systems

Building Intelligent Learning Database Systems Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5)

More information