A Software Architecture for Classifying Users in e-payment Systems

Size: px
Start display at page:

Download "A Software Architecture for Classifying Users in e-payment Systems"

Transcription

1 In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy. Copyright c 2017 for this paper by its authors. Copying permitted for private and academic purposes. A Software Architecture for Classifying Users in e-payment Systems Gianluigi Folino and Francesco Sergio Pisani Institute of High Performance Computing and Networking (ICAR-CNR) gianluigi.folino,fspisani@icar.cnr.it Abstract In modern payment systems, the user is often the weakest link in the security chain. To identify the key vulnerabilities associated with the user behavior and to implement a number of measures useful to protect the payment systems against these kinds of vulnerability is a real hard task. To this aim, we designed an architecture useful to divide the users of a payment system into pre-defined classes according to the type of vulnerability enabled. In this way, it is possible to address actions (information campaigns, alerts, etc.) towards targeted users of a specific group. Unfortunately, the data useful to classify the user typically presents many missing features. To overcome this issue, a tool was developed, based on artificial intelligence and adopting a meta-ensemble model, to operate efficiently with missing data. Each ensemble evolves a function for combining the classifiers, which does not need of any extra phase of training on the original data. The approach is validated on a well-known real dataset of Unix users demonstrating its goodness. 1 Introduction In the last few years, as a consequence of our interconnected society, the interest in cyber security problems has really been increasing and cyber crime seriously threatens national governments and the economy of many industries [3]. Countermeasures must be undertaken to protect security vulnerabilities and weaknesses of the systems from the potential attacks, in order to minimize the risks. In addition, computer network activities, human actions, etc. generate large amounts of data and this aspect must be seriously taken into account. Many works present in literature [4][5] remark that most serious threats and vulnerabilities concern the user and its wrong behaviors. In modern payment systems, the user is often the weakest link in the security chain. Indeed, in 2015, according to Kaspersky Lab research 1, 73% of organizations had internal data breach issues that can severely compromise their business. Typical examples are the use of insecure passwords, saving critical data not encrypted on a notebook, etc. In particular for e-payment systems, the users, the operators and even the top managers of these systems are often unaware of the risks associated with vulnerabilities enabled by their behavior and that can be dangerous and costly. Therefore, a system for the protection of payment systems can not be separated from the analysis of the vulnerabilities related to the users so that the appropriate countermeasures can be undertaken. In light of these considerations, it is a very hard task to identify the key vulnerabilities associated with the user behavior and to implement a number of measures useful to protect the payment systems against these kinds of vulnerabilities. A scalable solution for operating with the security weaknesses derived from the human factor must consider a number of critical aspects, such as profiling users for better and more focused actions, analyzing large logs in real-time and also it should work efficiently in the case of missing data

2 Data mining techniques could be used to fight efficiently, to alleviate the effect or to prevent the action of cybercriminals, especially in the presence of large datasets [1]. In particular, classification is used efficiently for many cyber security applications, i.e. classification of the user behavior, risk and attack analysis, intrusion detection systems, etc. However, in this particular domain, datasets often have different number of features and each attribute could have different importance and cost. Furthermore, the entire system must also work if some features are missing. Therefore, a single classification algorithm performing well for all the datasets would be really unlikely, especially in the presence of changes and with constraints of real time and scalability. In the ensemble learning paradigm [6][7], multiple classification models are trained by a predictive algorithm, and then their predictions are combined to classify new tuples. This paradigm presents a number of advantages with regard to using a single model, i.e., it reduces the variance of the error, the bias, and the dependence on a single dataset and works well in the case of unbalanced classes; furthermore, the ensemble can be build in an incremental way and can be easily implemented on a distributed environment. In order to protect the e-payment systems from the problems concerning the user behavior, in this work, a general architecture was designed. It is useful to monitor the user behavior, divide the users of a payment system into pre-defined classes according to the type of vulnerability enabled and to make possible to address suitable actions (information campaigns, alerts, etc.) towards targeted users of a specific group. Typically, the data useful to classify the user presents many missing features. To overcome this issue, a classification tool was also used, based on artificial intelligence (i.e., an evolutionary-based algorithm) and adopting a metaensemble model to operate efficiently with missing data. For each ensemble, the function for combining the classifiers, which does not need of any extra phase of training on the original data, is evolved by a genetic programming (GP) tool. Therefore, in the case of changes in the data, the function can be recomputed in an incremental way, with a moderate computational effort; this aspect together with the advantages of running on parallel/distributed architectures make the algorithm suitable to operate with the real time constraints typical of a cyber security problem. The tool was developed in a previous work [2], but here it is adopted in this novel architecture. Among the strategies operating with ensemble for coping with missing data, we remember the work in [8]. The authors build all the possible LCP (Local Complete Pattern), i.e., a partition of the original datasets into complete datasets, without any missing features; a different classifier is built on each LCP, and then they are combined to predict the class label, basing on a voting matrix. The experiments compared the proposed approach with two techniques to cope with missing data, i.e., deletion and imputation, on small datasets and show how the approach outperforms the other two techniques. However, the phase of building the LCP could be really expensive. Another correlate work is Learn++.MF [9], which is an ensemble-based algorithm with base classifiers trained on a random subset of the features of the original dataset. The approach generates a large number of classifiers, each trained on a different feature subset. In practice, the instances with missing attributes are classified by the models generated on the subsets of the remaining features. Then, the algorithm uses a majority voting strategy in order to assign the correct class under the condition that at least one classifier must cover the instance. When the number of attributes is high, it is unfeasible to build classifiers with all the possible sets of features; therefore, the subset of the features is iteratively updated to favor the selection of those features that were previously undersampled. However, this limits the real applicability of the approach to datasets with a low number of attributes The rest of the paper is structured as follows: in Section 2 some related works concerning the classification of user profiles are illustrated; Section 3 introduces the general architecture 77

3 of the proposed approach; Section 4 is devoted to some background information concerning the problem of missing data and incomplete datasets and the ensemble of classifiers; in Section 5, the main algorithm used to classify the user is explained; Section 6 shows the experiments conducted to verify the effectiveness of the approach and to compare it with other similar approaches; finally, Section 7 concludes the work. 2 Methods for classifying user profiles The inspiration of the approach taken in this paper comes from a project on cyber security, in which one of the main tasks consists in dividing the users of an e-payments systems into homogenous groups on the basis of their weakness or vulnerabilities from the cyber security point of view. In this way, the provider of an e-payment system can conduct a different information and prevention campaign for each class of users, with obvious advantages in terms of time and cost savings. In addition, specialized security policies can be conducted towards the users of a specific class. This technique is usually named segmentation, i.e. the process of classifying customers into homogenous groups (segments), so that each group of customers shares enough characteristics in common to make it viable for a company to design specific offerings or products for it. It is based on a preliminary investigation in order to individuate the variables (segmentation variables) necessary to distinguish one class of customers from others. Typically, the goal is to increase the purchases and/or to improve customer satisfaction. Different techniques can be employed to perform this task; in order to cope with large datasets, the most used are based on data mining approaches, mainly clustering and classification; anyway, many other techniques can be employed (see [10] for a survey of these techniques). Another issue to be considered in order to construct the different profiles is the information collection process used to gather raw information about the user, which can be conducted through direct user intervention, or implicitly, through software that monitors user activity. Finally, profiles maintaining the same information over time are considered static, in contrast to dynamic profiles that can be modified or improved over time [11]. In the general case of computer user profiling, the entire audit source can include information from a variety of sources, such as command line calls issued by users, system calls monitoring for unusual application use/events, database/file accesses, and the organization policy management rules and compliance logs. The type of analysis used is primarily the modeling of statistical features, such as the frequency of events, the duration of events, the co-occurrence of multiple events combined through logical operators, and the sequence or transition of events. An interesting approach to computer user modeling is the process of learning about ordinary computer users by observing the way they use the computer. In this case, a computer user behavior is represented as the sequence of commands she/he types during her/his work. This sequence is transformed into a distribution of relevant subsequences of commands in order to find out a profile that defines its behavior. The ABCD (Agent behavior Classification based on Distributions of relevant events) algorithm discussed in [12] is an interesting approach using this technique. 3 Background: Classification, Ensemble and Missing Data In this section, some background is given; in particular, the main methods to cope with missing data and incomplete datasets are illustrated, together with the interesting paradigma of the 78

4 ensemble of classifiers used for the data mining task of classification. Classification is one of the most important data mining tasks and consists in assigning a class label (among a pre-defined set of a classes) to a set of unclassified cases on the basis of the input data. Typically, the algorithm of classification is trained on a dataset, named the training set, consisting of multiple records each having multiple attributes or features. The final result is an accurate description or model for each class using the features present in the data. Finally, this model is used to classify test data for which the class is not known. The ensemble paradigma combines multiple (heterogenous or homogenous) models in order to classify new unseen instances. Usually, ensemble model improves accuracy and robustness over single model methods. For example, in hard domains, such cyber security, having realtime requirements, which do not permit re-training the base models, ensemble strategies can use some functions, which can be combined without using the original training set [13] [14] and also works in an incremental way [15]. Different schemas can be considered to generate the classifiers and to combine the ensemble, i.e. the same learning algorithm can be trained on different datasets or/and different algorithms can be trained on the same dataset. In practice, after a number of classifiers are built usually using part of the dataset, the predictions of the different classifiers are combined and a common decision is taken. When we consider data with missing features, some main patterns are commonly used. We remember: missing completely at random (MCAR), to describe data, in which the complete cases are a random sample of the originally dataset, i.e., the probability of a feature being missing is independent of the value of that or any other feature in the dataset; missing at random (MAR) describe data that are missing for reasons related to completely observed variables in the data set. Finally, the MNAR case considers the probability that an entry will be missing depends on both observed and unobserved values in the data set. Therefore, even if MCAR is more easy to handle, we try to cope with the MAR case, as it is a more realistic model and it is suitable to many real-world applications, i.e., the cyber security scenario. Data mining and in particular classification algorithms must handle the problem of missing values on useful features. The presence of missing features complicates the classification process, as the effective prediction may depend heavily on the way missing values are treated. Typically, missing values are present in both the training and the testing data, i.e., the same sources of data are not available for all the users. However, in our approach, without any loss of generality, we suppose that the training dataset is complete. Even in the case of the presence of a moderate number of tuples presenting missing data, it can be reported to the previous case, simply by deleting all the incomplete tuples. However, handling missing data by eliminating cases with missing data will bias results, if the remaining cases are not representative of the entire sample. Therefore, different techniques can be used to handle these missing features (see [16] for a detailed list of them). In addition to the above-mentioned strategy to remove any tuple with missing values, we remember the option of using a classification algorithm, which can deal with missing values in the training phase and the strategy of imputing all missing values before training the classification algorithm, i.e., replace the missing value with a meaningful estimate. Indeed, in our particular problem, we are more interested in handling groups of missing features, and consequently, we focus on constructing a classifier on the incomplete dataset directly. More formally, in our scenario, we have D 1, D 2,..., D k datasets; typically each dataset comes from a different source of data, but can be used to predict the same class. Therefore, the corresponding i th tuple of the different datasets can be used to predict the class of the same user. However, a particular tuple of a dataset can be missing, i.e., all the features belonging to 79

5 A Software Architecture for Classifying Users in e-payment Systems the same source of data of that tuple are missing. Without any loss of generality, even a problem of missing features of an incomplete dataset can be reported to the previous one, by grouping tuples with the same missing features. 4 An architecture for classification of user profiles in epayment systems In our scenario, the classes, in which the users will be divided, are individuated on the basis of their expertise in computer science and in the domain of the e-payments systems. Indeed, most of the vulnerabilities are associated with the behavior and the practices correlated with the knowledge of the computer and/or of the e-payment system. As example, contrarily to the normal belief, a vulnerability study confirmed that software developers are the most vulnerable to attacks [4]. Furthermore, an excess of confidence and the consequent download and installation of a number of applications can cause vulnerabilities; in the same way, misconfigurations of the system due to inconsistent application of security associated with a lack of competency could abilitate other kinds of vulnerabilities. Similar behaviors could be seen in the activities of users of the e-payments system. Figure 1: A general architecture to monitor and to classify the users of an e-payment system and to undertake the needed countermeasures. Given these considerations, Figure 1 shows the general architecture proposed in this work to handle the problem of mitigating the consequences of the user behavior. The information concerning the user is supplied by using different sources of information or monitoring tools (i.e. generally automatic software analyzing the action and the behavior of the users). Going 80

6 more into detail, user datasets can include demographic and education information, e.g., name, age, country, education level, computer knowledge, task knowledge, etc. and may also includes information concerning the contest in which the users operate and the roles they have in the systems. In addition to these data, which usually do not change if we consider a reasonable amount of time, the monitoring tool collects operational and behavioral data (e.g. IP addresses from which users connect to the system, operating system and browser used, the duration of the session, etc.), for which changes over time should be also considered. Finally, we also collect user input (i.e., commands entered using the keyboard or via GUI, using the mouse, etc.) This information should be captured in a dynamic way, by logging user actions. Unfortunately, all these kinds of data are not present for each user for clear reasons of privacy and for a number of different motivations (i.e., we have users with different roles and therefore, it is possible to monitor only some types of user, some users do not want to give authorization to disclose some data, etc.). Therefore, for different users, some sources are missing and this problem must be faced efficiently in order to obtain an accurate classification. Using these data, the users of the e-payment system are classified into pre-defined classes according to the type of vulnerability enabled by using the meta-ensemble approach, described in detail in the next section. Finally, suitable actions (information campaigns, alerts, periodic checks, etc.) can be undertaken towards targeted users of a specific group. 5 A Meta-Ensemble Approach for Classifying user profiles In this section, we illustrate the software architecture and we detail the main steps of the meta-ensemble approach used to classify the user profiles, also in the case of missing data. The overall software architecture of the meta-ensemble approach, named CAGE-MetaCombiner, is illustrated in Figure 2. Figure 2: The software architecture of the meta-ensemble architecture. In practice, an ensemble is built for each dataset by using a distributed GP tool, used 81

7 to generate the combiner function.this GP tool is a distributed/parallel GP implementation, named CellulAr GEnetic programming (CAGE) [17], running both on distributed-memory parallel computers and on distributed environments. The learning models (classifiers) composing the ensemble are taken from the well-known WEKA tool. This tool is used to evolve the combiner functions, which the ensemble will adopt to classify new tuples. Implicitly, the function selects the more suitable classifiers/models to the specific datasets considered. The different ensembles perform a weighted vote in order to decide the correct class. It is worth remembering that each ensemble evolves a function for combining the classifiers, which does not need any extra phase of training on the original data. The final classification is obtained computing the error using the same formulae as the Adaboost.M2 algorithm used by the boosting algorithm, by computing the error of the entire ensemble instead of a single classifier as in the original boosting algorithm. Note that the approach is also suitable to work on incomplete datasets (named D 1, D 2,..., D k in the figure). It is worth noticing that, as described in the background section, it is equivalent whether each dataset comes from a different source of data, or they are obtained from a partition of an incomplete dataset by removing groups of missing features. The only strong assumption is that each corresponding tuple of the different datasets is used to predict the same class, i.e., the class of the user of the scenario shown in section 4. The corresponding tuple can be missing in one or more datasets, but if it is missing in all the datasets, it will be discarded and counted as a wrong prediction in the evaluation phase. The entire process can be summarized as follows, by considering a dataset partitioned into training, validation and test set: 1. The base classifiers are trained on the training set; then, a weight, proportional to the error on the training set, is associated with each classifier together with the support for each class, i.e. the decision support matrix is built. This phase could be computationally expensive, but it is performed in parallel, as the different algorithms are independent of each other. 2. The combiner function is evolved by using the distributed GP tool, CAGE, on the validation set. No extra computation on the data is necessary, as validation is only used to verify whether the correct class is assigned and consequently to compute the fitness function. 3. The final function is used to combine the base classifiers and classify new data (test set). This phase can be performed in parallel, by partitioning the test set among different nodes and applying the function to each partition. A complete description of the meta-ensemble algorithm can be found in [18]. 6 Experimental Results In this section, the experiments conducted to analyze the capacity of our approach on handling missing features are described together with the main parameters and the main characteristics of the dataset used. The software must be validated with real data to prove the validity of the approach. But it is very hard to find useful dataset with personal data and information about the behavioural profiles. The nature of the data limits the availability of public release of this kind of dataset. For this reason, we have chosen the UNIX dataset, described in [19]. It contains data about users working on linux consoles. The users are profiled according to their expertise with console 82

8 commands. The high number of features gives us the opportunity to test the performance of the algorithm for the case of missing features. All the experiments were performed on a Linux cluster with 16 Itanium2 1.4GHz nodes, each with 2 GBytes of main memory and connected by a Myrinet high performance network. No tuning phase has been conducted for the GP algorithm, but the same parameters used in the original paper [20] were used, listed in the following: a probability of crossover of 0.7 and of mutation of 0.1, a maximum depth of 7, and a population of 132 individuals per node. The algorithm was run on 4 nodes, using 1000 generations and the original training set was partitioned among the 4 nodes. All the results were obtained by averaging 30 runs. Two type of experiments were performed in order to evaluate the ability of the CAGE- MetaCombiner approach in handling missing features. The first type analyzes the behavior of the algorithm when we vary the percentage of tuples with missing data. The second one compares the performance of our software with the EvABCD algorithm [12], described in the introduction. The dataset is preprocessed in the same way used in [12]. For each user we consider the first 100 and 500 commands used; then the commands subsequences of fixed length (from 3 to 6) are extracted from the list. Each user represents a record in the processed dataset and all the distinct subsequences are used as record attribute and the attribute value is the number of times the subsequence is typed by the user (0 means that the subsequence is never typed). In addition, the dataset is partitioned in four subsets, each one with the same number of features. It is worth remembering that we are interested in handling cases in which entire groups of features are missing and not in coping with random patterns of missing features. To simulate the missing data, for each partition a tuple can be removed according to a probability threshold, i.e., this parameter controls the percentage of tuples, which have missing attributes. For instance, if this parameter is set to 10%, the entire partition of the features belonging to this tuple has a probability of 0.1 to be missing. If all the partitions of a tuple are missing, this tuple will be considered as an error of classification. We choose the values for the threshold in the range 0-80%, with an interval of 10%, with 0% means no missing data. Table 1: Classification accuracy (Percent) for Cage-MetaCombiner with the Unix dataset using different subsequence lengths and with different percentages of missing data. Commands Sequence Percentage of Missing % 10% 20% 30% 40% 80% ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 2.68 In Table 1, we show the Cage-MetaCombiner performance on the Unix dataset using different subsequence lengths and different percentage of missing data. The classification rate for the 100 commands experiment is slightly better than for the case with 500 commands probably because the increase in the number of commands could lead to an increment in the number of features and consequently the training phase becomes harder than the previous case. Owing to the high number of features, the results are not much affected by the missing rate, that is the algorithm has good performance with all the percentages of missing data. In Table 2 the results of comparison between Cage-MetaCombiner and EvABCD are re- 83

9 Table 2: Comparison of the Cage-MetaCombiner vs the EvABCD algorithm for the Unix dataset (classification accuracy). Commands Sequence Cage-Combiner EvABCD ported [12]. The EvABCD algorithm is a technique for classification of the behavior profiles of users. As Cage-MetaCombiner, it learns different behaviors from training data. For a different profile, it builds one or more prototypes computing the frequency of all the sequences of commands with a defined length. Moreover, it updates the models in an incremental way. Our algorithm performs better than EvABCD in most cases. By using Cage-MetaCombiner, the results for a different number of commands extracted do not influence the accuracy much, while the EvABCD algorithm improves its accuracy when using 500 commands. 7 Conclusions and Future work A general architecture for profiling users for better and more focused actions is proposed. The proposed methodology also works whether some logs, concerning the behavior of the users, are missing. In particular, a meta-ensemble-based framework for classifying datasets was designed and implemented. It is based on data mining and artificial intelligence (inspired to the evolutionary theory) techniques and permits to handle missing features, as each ensemble is specialized to operate with a different group of features. Experimental results, conducted on a real dataset, demonstrate the proposed system has a better accuracy in comparison with a correlated approach. In particular, the accuracy does not degrade significantly when the percentage of missing data increases. Future works aim to test the framework on a real environment. Acknowledgment This work has been partially supported by MIUR-PON under project PON03PE within the framework of the Technological District on Cyber Security. References [1] G. Folino, P. Sabatino, Ensemble based collaborative and distributed intrusion detection systems: A survey, J. Netw. Comput. Appl. 66 (C) (2016) doi: /j.jnca URL [2] G. Folino, F. S. Pisani, P. Sabatino, A distributed intrusion detection framework based on evolved specialized ensembles of classifiers, in: Applications of Evolutionary Computation - 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 - April 1, 2016, Proceedings, Part I, Vol of Lecture Notes in Computer Science, Springer, 2016, pp

10 [3] CERT Australia, Cyber crime and security survey report, Tech. rep. (2012). [4] V. Subrahmanian, M. Ovelgonne, T. Dumitras, B. A. Prakash, The Global Cyber-Vulnerability Report, 1st Edition, Springer, [5] M. van Zadelhoff, The biggest cybersecurity threats are inside your company, Digital article - Harvard Business Review (2016). [6] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) [7] Y. Freund, R. Shapire, Experiments with a new boosting algorithm, in: Machine Learning, Proceedings of the Thirteenth International Conference (ICML 96), Morgan Kaufmann, 1996, pp [8] Y. Wang, Y. Gao, R. Shen, F. Yang, Selective ensemble approach for classification of datasets with incomplete values, in: Foundations of Intelligent Systems, Springer, 2012, pp [9] R. Polikar, J. DePasquale, H. S. Mohammed, G. Brown, L. I. Kuncheva, Learn++. mf: A random subspace approach for the missing feature problem, Pattern Recognition 43 (11) (2010) [10] D. Godoy, A. Amandi, User profiling in personal information agents: A survey, Knowl. Eng. Rev. 20 (4) (2005) [11] S. Gauch, M. Speretta, A. Chandramouli, A. Micarelli, User profiles for personalized information access, in: P. Brusilovsky, A. Kobsa, W. Nejdl (Eds.), The Adaptive Web, Vol of Lecture Notes in Computer Science, Springer, 2007, pp [12] J. A. Iglesias, P. P. Angelov, A. Ledezma, A. Sanchis, Creating evolving user behavior profiles automatically, IEEE Trans. Knowl. Data Eng. 24 (5) (2012) [13] L. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, Wiley-Interscience, [14] G. Folino, F. S. Pisani, Combining ensemble of classifiers by using genetic programming for cyber security applications, in: Applications of Evolutionary Computation - 18th European Conference, EvoApplications 2015, Copenhagen, Denmark, April 8-10, 2015, Proceedings, Vol of Lecture Notes in Computer Science, Springer, 2015, pp [15] G. Folino, F. S. Pisani, P. Sabatino, An incremental ensemble evolved by using genetic programming to efficiently detect drifts in cyber security datasets, in: Genetic and Evolutionary Computation Conference, GECCO 2016, Denver, CO, USA, July 20-24, 2016, Companion Material Proceedings, 2016, pp [16] R. J. A. Little, D. B. Rubin, Statistical Analysis with Missing Data, John Wiley & Sons, Inc., New York, NY, USA, [17] G. Folino, C. Pizzuti, G. Spezzano, Cage: A tool for parallel genetic programming applications, in: J. F. Miller, M. Tomassini, P. L. Lanzi, C. Ryan, A. G. B. Tettamanzi, W. B. Langdon (Eds.), Proceedings of EuroGP 2001, Vol of LNCS, Springer-Verlag, Lake Como, Italy, 2001, pp [18] G. Folino, F. S. Pisani, Evolving meta-ensemble of classifiers for handling incomplete and unbalanced datasets in the cyber security domain, Appl. Soft Comput. 47 (2016) [19] S. Greenberg, Using unix: Collected traces of 168 users, in: Research Report 88/333/45., Department of Computer Science, University of Calgary, Calgary, Canada., [20] G. Folino, C. Pizzuti, G. Spezzano, A scalable cellular implementation of parallel genetic programming, IEEE Transactions on Evolutionary Computation 7 (1) (2003)

GP Ensemble for Distributed Intrusion Detection Systems

GP Ensemble for Distributed Intrusion Detection Systems GP Ensemble for Distributed Intrusion Detection Systems Gianluigi Folino, Clara Pizzuti and Giandomenico Spezzano ICAR-CNR, Via P.Bucci 41/C, Univ. della Calabria 87036 Rende (CS), Italy {folino,pizzuti,spezzano}@icar.cnr.it

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

An Evolutionary Ensemble Approach for Distributed Intrusion Detection

An Evolutionary Ensemble Approach for Distributed Intrusion Detection An Evolutionary Ensemble Approach for Distributed Intrusion Detection Gianluigi Folino, Clara Pizzuti, and Giandomenico Spezzano ICAR-CNR, Via P.Bucci 41/C, Univ. della Calabria 87036 Rende (CS), Italy

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees

Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Estimating Missing Attribute Values Using Dynamically-Ordered Attribute Trees Jing Wang Computer Science Department, The University of Iowa jing-wang-1@uiowa.edu W. Nick Street Management Sciences Department,

More information

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest

Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological

More information

Ensemble Image Classification Method Based on Genetic Image Network

Ensemble Image Classification Method Based on Genetic Image Network Ensemble Image Classification Method Based on Genetic Image Network Shiro Nakayama, Shinichi Shirakawa, Noriko Yata and Tomoharu Nagao Graduate School of Environment and Information Sciences, Yokohama

More information

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017 Machine Learning in the Wild Dealing with Messy Data Rajmonda S. Caceres SDS 293 Smith College October 30, 2017 Analytical Chain: From Data to Actions Data Collection Data Cleaning/ Preparation Analysis

More information

Boosting Algorithms for Parallel and Distributed Learning

Boosting Algorithms for Parallel and Distributed Learning Distributed and Parallel Databases, 11, 203 229, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting Algorithms for Parallel and Distributed Learning ALEKSANDAR LAZAREVIC

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

Role of big data in classification and novel class detection in data streams

Role of big data in classification and novel class detection in data streams DOI 10.1186/s40537-016-0040-9 METHODOLOGY Open Access Role of big data in classification and novel class detection in data streams M. B. Chandak * *Correspondence: hodcs@rknec.edu; chandakmb@gmail.com

More information

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Study on Classifiers using Genetic Algorithm and Class based Rules Generation 2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules

More information

Security in India: Enabling a New Connected Era

Security in India: Enabling a New Connected Era White Paper Security in India: Enabling a New Connected Era India s economy is growing rapidly, and the country is expanding its network infrastructure to support digitization. India s leapfrogging mobile

More information

Processing Missing Values with Self-Organized Maps

Processing Missing Values with Self-Organized Maps Processing Missing Values with Self-Organized Maps David Sommer, Tobias Grimm, Martin Golz University of Applied Sciences Schmalkalden Department of Computer Science D-98574 Schmalkalden, Germany Phone:

More information

Systemic Analyser in Network Threats

Systemic Analyser in Network Threats Systemic Analyser in Network Threats www.project-saint.eu @saintprojecteu #saintprojecteu John M.A. Bothos jbothos@iit.demokritos.gr Integrated System Laboratory Institute of Informatics & Telecommunication

More information

Metaheuristic Optimization with Evolver, Genocop and OptQuest

Metaheuristic Optimization with Evolver, Genocop and OptQuest Metaheuristic Optimization with Evolver, Genocop and OptQuest MANUEL LAGUNA Graduate School of Business Administration University of Colorado, Boulder, CO 80309-0419 Manuel.Laguna@Colorado.EDU Last revision:

More information

Web Usage Mining: A Research Area in Web Mining

Web Usage Mining: A Research Area in Web Mining Web Usage Mining: A Research Area in Web Mining Rajni Pamnani, Pramila Chawan Department of computer technology, VJTI University, Mumbai Abstract Web usage mining is a main research area in Web mining

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

Handling Missing Values via Decomposition of the Conditioned Set

Handling Missing Values via Decomposition of the Conditioned Set Handling Missing Values via Decomposition of the Conditioned Set Mei-Ling Shyu, Indika Priyantha Kuruppu-Appuhamilage Department of Electrical and Computer Engineering, University of Miami Coral Gables,

More information

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets

Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)

More information

Handling Data with Three Types of Missing Values:

Handling Data with Three Types of Missing Values: Handling Data with Three Types of Missing Values: A Simulation Study Jennifer Boyko Advisor: Ofer Harel Department of Statistics University of Connecticut Storrs, CT May 21, 2013 Jennifer Boyko Handling

More information

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications

Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications Recurrent Neural Network Models for improved (Pseudo) Random Number Generation in computer security applications D.A. Karras 1 and V. Zorkadis 2 1 University of Piraeus, Dept. of Business Administration,

More information

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS

MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS MODELLING DOCUMENT CATEGORIES BY EVOLUTIONARY LEARNING OF TEXT CENTROIDS J.I. Serrano M.D. Del Castillo Instituto de Automática Industrial CSIC. Ctra. Campo Real km.0 200. La Poveda. Arganda del Rey. 28500

More information

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1

WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 WEIGHTED K NEAREST NEIGHBOR CLASSIFICATION ON FEATURE PROJECTIONS 1 H. Altay Güvenir and Aynur Akkuş Department of Computer Engineering and Information Science Bilkent University, 06533, Ankara, Turkey

More information

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA

HALF&HALF BAGGING AND HARD BOUNDARY POINTS. Leo Breiman Statistics Department University of California Berkeley, CA 1 HALF&HALF BAGGING AND HARD BOUNDARY POINTS Leo Breiman Statistics Department University of California Berkeley, CA 94720 leo@stat.berkeley.edu Technical Report 534 Statistics Department September 1998

More information

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis

A Monotonic Sequence and Subsequence Approach in Missing Data Statistical Analysis Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 12, Number 1 (2016), pp. 1131-1140 Research India Publications http://www.ripublication.com A Monotonic Sequence and Subsequence Approach

More information

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AUTOMATIC TEST CASE GENERATION FOR PERFORMANCE ENHANCEMENT OF SOFTWARE THROUGH GENETIC ALGORITHM AND RANDOM TESTING Bright Keswani,

More information

How to Conduct a Heuristic Evaluation

How to Conduct a Heuristic Evaluation Page 1 of 9 useit.com Papers and Essays Heuristic Evaluation How to conduct a heuristic evaluation How to Conduct a Heuristic Evaluation by Jakob Nielsen Heuristic evaluation (Nielsen and Molich, 1990;

More information

A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks.

A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks. A Novel Criterion Function in Feature Evaluation. Application to the Classification of Corks. X. Lladó, J. Martí, J. Freixenet, Ll. Pacheco Computer Vision and Robotics Group Institute of Informatics and

More information

E±cient Detection Of Compromised Nodes In A Wireless Sensor Network

E±cient Detection Of Compromised Nodes In A Wireless Sensor Network E±cient Detection Of Compromised Nodes In A Wireless Sensor Network Cheryl V. Hinds University of Idaho cvhinds@vandals.uidaho.edu Keywords: Compromised Nodes, Wireless Sensor Networks Abstract Wireless

More information

Mozilla position paper on the legislative proposal for an EU Cybersecurity Act

Mozilla position paper on the legislative proposal for an EU Cybersecurity Act Mozilla position paper on the legislative proposal for an EU Cybersecurity Act Enhancing cybersecurity through government vulnerability disclosure I. INTRODUCTION This paper provides an overview of Mozilla

More information

Tools for Remote Web Usability Evaluation

Tools for Remote Web Usability Evaluation Tools for Remote Web Usability Evaluation Fabio Paternò ISTI-CNR Via G.Moruzzi, 1 56100 Pisa - Italy f.paterno@cnuce.cnr.it Abstract The dissemination of Web applications is enormous and still growing.

More information

An Efficient Clustering for Crime Analysis

An Efficient Clustering for Crime Analysis An Efficient Clustering for Crime Analysis Malarvizhi S 1, Siddique Ibrahim 2 1 UG Scholar, Department of Computer Science and Engineering, Kumaraguru College Of Technology, Coimbatore, Tamilnadu, India

More information

Review on Data Mining Techniques for Intrusion Detection System

Review on Data Mining Techniques for Intrusion Detection System Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,

More information

CSI5387: Data Mining Project

CSI5387: Data Mining Project CSI5387: Data Mining Project Terri Oda April 14, 2008 1 Introduction Web pages have become more like applications that documents. Not only do they provide dynamic content, they also allow users to play

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Crossword Puzzles as a Constraint Problem

Crossword Puzzles as a Constraint Problem Crossword Puzzles as a Constraint Problem Anbulagan and Adi Botea NICTA and Australian National University, Canberra, Australia {anbulagan,adi.botea}@nicta.com.au Abstract. We present new results in crossword

More information

An Information-Theoretic Approach to the Prepruning of Classification Rules

An Information-Theoretic Approach to the Prepruning of Classification Rules An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

A Virtual Laboratory for Study of Algorithms

A Virtual Laboratory for Study of Algorithms A Virtual Laboratory for Study of Algorithms Thomas E. O'Neil and Scott Kerlin Computer Science Department University of North Dakota Grand Forks, ND 58202-9015 oneil@cs.und.edu Abstract Empirical studies

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

A Data Classification Algorithm of Internet of Things Based on Neural Network

A Data Classification Algorithm of Internet of Things Based on Neural Network A Data Classification Algorithm of Internet of Things Based on Neural Network https://doi.org/10.3991/ijoe.v13i09.7587 Zhenjun Li Hunan Radio and TV University, Hunan, China 278060389@qq.com Abstract To

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July-2013 ISSN 1 Review: Boosting Classifiers For Intrusion Detection Richa Rawat, Anurag Jain ABSTRACT Network and host intrusion detection systems monitor malicious activities and the management station is a technique

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Process Model Consistency Measurement

Process Model Consistency Measurement IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727Volume 7, Issue 6 (Nov. - Dec. 2012), PP 40-44 Process Model Consistency Measurement Sukanth Sistla CSE Department, JNTUniversity,

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

A Web Page Recommendation system using GA based biclustering of web usage data

A Web Page Recommendation system using GA based biclustering of web usage data A Web Page Recommendation system using GA based biclustering of web usage data Raval Pratiksha M. 1, Mehul Barot 2 1 Computer Engineering, LDRP-ITR,Gandhinagar,cepratiksha.2011@gmail.com 2 Computer Engineering,

More information

Approach Using Genetic Algorithm for Intrusion Detection System

Approach Using Genetic Algorithm for Intrusion Detection System Approach Using Genetic Algorithm for Intrusion Detection System 544 Abhijeet Karve Government College of Engineering, Aurangabad, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra-

More information

Cluster based boosting for high dimensional data

Cluster based boosting for high dimensional data Cluster based boosting for high dimensional data Rutuja Shirbhate, Dr. S. D. Babar Abstract -Data Dimensionality is crucial for learning and prediction systems. Term Curse of High Dimensionality means

More information

Intrusion Detection Using Data Mining Technique (Classification)

Intrusion Detection Using Data Mining Technique (Classification) Intrusion Detection Using Data Mining Technique (Classification) Dr.D.Aruna Kumari Phd 1 N.Tejeswani 2 G.Sravani 3 R.Phani Krishna 4 1 Associative professor, K L University,Guntur(dt), 2 B.Tech(1V/1V),ECM,

More information

Deduplication of Hospital Data using Genetic Programming

Deduplication of Hospital Data using Genetic Programming Deduplication of Hospital Data using Genetic Programming P. Gujar Department of computer engineering Thakur college of engineering and Technology, Kandiwali, Maharashtra, India Priyanka Desai Department

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Evolving Human Competitive Research Spectra-Based Note Fault Localisation Techniques

Evolving Human Competitive Research Spectra-Based Note Fault Localisation Techniques UCL DEPARTMENT OF COMPUTER SCIENCE Research Note RN/12/03 Evolving Human Competitive Research Spectra-Based Note Fault Localisation Techniques RN/17/07 Deep Parameter Optimisation for Face Detection Using

More information

Chapter X Security Performance Metrics

Chapter X Security Performance Metrics Chapter X Security Performance Metrics Page 1 of 10 Chapter X Security Performance Metrics Background For many years now, NERC and the electricity industry have taken actions to address cyber and physical

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network

An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network International Journal of Science and Engineering Investigations vol. 6, issue 62, March 2017 ISSN: 2251-8843 An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network Abisola Ayomide

More information

Cost-sensitive Boosting for Concept Drift

Cost-sensitive Boosting for Concept Drift Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems

More information

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES 8 th International Conference on DEVELOPMENT AND APPLICATION SYSTEMS S u c e a v a, R o m a n i a, M a y 25 27, 2 0 0 6 LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION

More information

Timestamps and authentication protocols

Timestamps and authentication protocols Timestamps and authentication protocols Chris J. Mitchell Technical Report RHUL MA 2005 3 25 February 2005 Royal Holloway University of London Department of Mathematics Royal Holloway, University of London

More information

Protect Your Organization from Cyber Attacks

Protect Your Organization from Cyber Attacks Protect Your Organization from Cyber Attacks Leverage the advanced skills of our consultants to uncover vulnerabilities our competitors overlook. READY FOR MORE THAN A VA SCAN? Cyber Attacks by the Numbers

More information

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan

A HYBRID METHOD FOR SIMULATION FACTOR SCREENING. Hua Shen Hong Wan Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. A HYBRID METHOD FOR SIMULATION FACTOR SCREENING Hua Shen Hong

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine

More information

Fraud Detection Using Random Forest Algorithm

Fraud Detection Using Random Forest Algorithm Fraud Detection Using Random Forest Algorithm Eesha Goel Computer Science Engineering and Technology, GZSCCET, Bhatinda, India eesha1992@rediffmail.com Abhilasha Computer Science Engineering and Technology,

More information

Prowess Improvement of Accuracy for Moving Rating Recommendation System

Prowess Improvement of Accuracy for Moving Rating Recommendation System 2017 IJSRST Volume 3 Issue 1 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Scienceand Technology Prowess Improvement of Accuracy for Moving Rating Recommendation System P. Damodharan *1,

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Leave-One-Out Support Vector Machines

Leave-One-Out Support Vector Machines Leave-One-Out Support Vector Machines Jason Weston Department of Computer Science Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 OEX, UK. Abstract We present a new learning algorithm

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors Shao-Tzu Huang, Chen-Chien Hsu, Wei-Yen Wang International Science Index, Electrical and Computer Engineering waset.org/publication/0007607

More information

Review Paper onbuilding Prediction based model for cloud-based data mining

Review Paper onbuilding Prediction based model for cloud-based data mining Review Paper onbuilding Prediction based model for cloud-based data mining Er. Spinder kaur 1,Dr. Sandeep Kautish 2 1 M.Tech Scholar, 2 Assistant Professor ABSTRACT University of Computer Application Guru

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree

An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree P. Radoglou-Grammatikis and P. Sarigiannidis* University of Western Macedonia Department of Informatics & Telecommunications

More information

How to implement NIST Cybersecurity Framework using ISO WHITE PAPER. Copyright 2017 Advisera Expert Solutions Ltd. All rights reserved.

How to implement NIST Cybersecurity Framework using ISO WHITE PAPER. Copyright 2017 Advisera Expert Solutions Ltd. All rights reserved. How to implement NIST Cybersecurity Framework using ISO 27001 WHITE PAPER Copyright 2017 Advisera Expert Solutions Ltd. All rights reserved. Copyright 2017 Advisera Expert Solutions Ltd. All rights reserved.

More information

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Systematic Detection And Resolution Of Firewall Policy Anomalies

Systematic Detection And Resolution Of Firewall Policy Anomalies Systematic Detection And Resolution Of Firewall Policy Anomalies 1.M.Madhuri 2.Knvssk Rajesh Dept.of CSE, Kakinada institute of Engineering & Tech., Korangi, kakinada, E.g.dt, AP, India. Abstract: In this

More information

A Prospect of Websites Evaluation Tools Based on Event Logs

A Prospect of Websites Evaluation Tools Based on Event Logs A Prospect of Websites Evaluation Tools Based on Event Logs Vagner Figuerêdo de Santana 1, and M. Cecilia C. Baranauskas 2 1 Institute of Computing, UNICAMP, Brazil, v069306@dac.unicamp.br 2 Institute

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

Dynamic Ensemble Construction via Heuristic Optimization

Dynamic Ensemble Construction via Heuristic Optimization Dynamic Ensemble Construction via Heuristic Optimization Şenay Yaşar Sağlam and W. Nick Street Department of Management Sciences The University of Iowa Abstract Classifier ensembles, in which multiple

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM

OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM OPTIMIZATION OF BAGGING CLASSIFIERS BASED ON SBCB ALGORITHM XIAO-DONG ZENG, SAM CHAO, FAI WONG Faculty of Science and Technology, University of Macau, Macau, China E-MAIL: ma96506@umac.mo, lidiasc@umac.mo,

More information

White Paper. Why IDS Can t Adequately Protect Your IoT Devices

White Paper. Why IDS Can t Adequately Protect Your IoT Devices White Paper Why IDS Can t Adequately Protect Your IoT Devices Introduction As a key component in information technology security, Intrusion Detection Systems (IDS) monitor networks for suspicious activity

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

INTERACTIVE MULTI-OBJECTIVE GENETIC ALGORITHMS FOR THE BUS DRIVER SCHEDULING PROBLEM

INTERACTIVE MULTI-OBJECTIVE GENETIC ALGORITHMS FOR THE BUS DRIVER SCHEDULING PROBLEM Advanced OR and AI Methods in Transportation INTERACTIVE MULTI-OBJECTIVE GENETIC ALGORITHMS FOR THE BUS DRIVER SCHEDULING PROBLEM Jorge PINHO DE SOUSA 1, Teresa GALVÃO DIAS 1, João FALCÃO E CUNHA 1 Abstract.

More information

Efficient Use of Random Delays

Efficient Use of Random Delays Efficient Use of Random Delays Olivier Benoit 1 and Michael Tunstall 2 1 Gemalto olivier.banoit@gemalto.com 2 Royal Holloway, University of London m.j.tunstall@rhul.ac.uk Abstract Random delays are commonly

More information

Multidimensional Process Mining with PMCube Explorer

Multidimensional Process Mining with PMCube Explorer Multidimensional Process Mining with PMCube Explorer Thomas Vogelgesang and H.-Jürgen Appelrath Department of Computer Science University of Oldenburg, Germany thomas.vogelgesang@uni-oldenburg.de Abstract.

More information

Genetic Programming for Data Classification: Partitioning the Search Space

Genetic Programming for Data Classification: Partitioning the Search Space Genetic Programming for Data Classification: Partitioning the Search Space Jeroen Eggermont jeggermo@liacs.nl Joost N. Kok joost@liacs.nl Walter A. Kosters kosters@liacs.nl ABSTRACT When Genetic Programming

More information

SOC for cybersecurity

SOC for cybersecurity April 2018 SOC for cybersecurity a backgrounder Acknowledgments Special thanks to Francette Bueno, Senior Manager, Advisory Services, Ernst & Young LLP and Chris K. Halterman, Executive Director, Advisory

More information