A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection

Size: px
Start display at page:

Download "A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection"

Transcription

1 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 25 A Multi-agent Based Cognitive Approach to Unsupervised Feature Extraction and Classification for Network Intrusion Detection Kaiser Nahiyan, Samilat Kaiser, Dr. Ken Ferens, Dr. Robert McLeod Department of Electrical and Computer Engineering, University of Manitoba, Canada. { nahiyank, kaisers3 }@myumanitoba.ca, { Ken.Ferens, Robert.McLeod }@umanitoba.ca Abstract The importance of finding meaning in unstructured data is increasing. In the field of network intrusion detection, unsupervised learning from unlabeled data is of vital significance yet there is no universal technique for the purpose. Most approaches including unsupervised machine learning algorithms involve tedious efforts in terms of computational complexity on large amounts of data that needs additional preprocessing, and yet the accuracy of detection is not satisfactory. This work focuses on an automated, agent-based, nonsupervised, relatively uncomplicated cognitive approach that segregates attacks from normal events within the large search space with reduced computational demands. The algorithm presented collects features from statistical analysis of the observed attributes over each time-step (much like any intuitive learner would try to infer from a stream of unlabeled data) and uses machine learning to isolate the attack events from normal ones using an unsupervised k-means clustering algorithm over the reduced dataset. The computational load for central processing is further optimized by utilizing the agent based architecture where agents are deployed in hosts, and some processing is done at the host and the rest is performed by the node that performs the classification. With an increasing number of small device networks supporting IoT, mobile and sensor networks, demands for fast light weight machine learning models for unsupervised attack identification is a requirement. We validate our algorithm on two recent datasets with modern day attacks, and furthermore do a multi-scale analysis to locate the time-scale of attacks. I. INTRODUCTION From leaking debit card details to intrusion into highly classified materials, cyber-attacks have become a real threat and a part of our political and social discourse. Attacks are no longer done by isolated individuals, now there are organized crimes orchestrated by hacker groups. Likewise, the research in cybersecurity is also at its peak. Machine learning has demonstrated much recent success in transforming all sectors including cyber-security. However, in cyber security the availability of datasets is very rare. Only a small number of datasets are publicly available, generation methods are not uniform, they often contain private data with added formalities, and in many cases, there is no ground truth to guide the researchers into what to expect. For supervised learning methods, the approach is to utilize the labelled data to train the algorithm with training data with mixed samples consisting of all the classes. Once the learner has learned it can be exposed to new samples and can classify the attacks from normal traffic data. Unlike others [1], network traffic data is vastly diverse; IPs and ports are categorical data represented in numbers, hardware addresses are categories represented by groups of characters, payloads and user data are often encrypted, network parameters are flags that are often binary, and the list goes on. Henceforth, achieving consistent detection accuracy on test data becomes difficult even for supervised techniques, let alone unsupervised ones. In unsupervised machine learning the data is unlabeled and hence there is no understanding as to how to find out meaning of the data and how to utilize the knowledge to further classify the samples. The conventional techniques like clustering when applied to the entire dataset are incapable of delivering satisfactory accuracy whereas complex methods like deep learning and neural networks require huge sets of data samples and long hours of intensive computation. We argue that nowadays much focus is towards implementing complex and resource hungry machine learning methods whereas comparable results can be achieved with much less computation power and much less data, and hence

2 26 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 timely actions can be taken to address the intrusion. From the context of cognitive detection, an unsupervised learner will be applying simpler techniques like statistical learning, flow analysis and clustering to identify the attacks, which is exactly the approach described in this study. Now that we have set our focus on simplified learning using less computation power, we present the idea of agent based model in our approach. An Agent Based Model (ABM) is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole [2]. In our model, the agents deployed in the hosts and gateways, agents at hosts perform independent analysis from the host traffic and provides the processed information to a gateway agent for further processing. In this manner, the computation load and time required for convergence is further reduced. II. RELATED PREVIOUS WORK Statistical analysis of traffic has been done previously for classification of application or user types. Roughan et al. [3], used nearest neighbor and linear discriminate analysis approaches to map different network applications to different QoS classes. Bernaille et al. [4], proposed a technique using unsupervised ML (K-Means clustering) algorithm that classifies different types of TCP-based applications using the first few packets of the traffic flow. On the UNSW-NB15 dataset [5], Moustafa et al. [6] applied an Association Rule Mining algorithm as feature selection to generate the strongest features from the dataset, Gharaee et al. [7] proposed an anomaly based IDS using Genetic algorithm and Support Vector Machine (SVM) with a new feature selection method. Moustafa et al. [8] performed statistical learning of the observations and the attributes of UNSW dataset, examined the feature correlations and applied existing classifiers to evaluate the complexity in terms of accuracy with KDD99 data set. Previously on the Aegean Wi-Fi Intrusion Dataset - AWID [9], Kolias compared the accuracy of different machine learning techniques on AWID reduced dataset. Thanthrige et al. [10] applied feature reduction techniques such as Information Gain and Chi-Squared statistics to evaluate dataset performance with feature reduction techniques. However, no one has worked on the analysis of time-step based statistical feature analysis on these datasets. Moreover, no previous work mentioned above has approached agent based computation modeling which has been presented in this work. The results of accuracy gained from previous authors were in lieu of high computation based machine learning methods which had to process the entire number of rows in the dataset, hence required major processing time. Our approach is a much more straight-forward, can be easily automated, and can classify the big complex datasets by extracting smaller feature datasets using statistical techniques, runs much faster than others, and utilizes the distributed processing architecture which makes it compatible in micro habitats. III. PROPOSED METHODOLOGY Our motive is to classify the dataset into normal and attack in an unsupervised manner without any training as such, and to find some meaning out of the data. We first apply our algorithm with UNSW-NB15 dataset. We consider all the four files of UNSM-NB15 dataset, which has 3,239,993 rows containing 14.48% attack rows and the rest are normal. The dataset for UNSW contains 49 columns in total. To alter the datasets for the unsupervised problem, we strip the labels from the dataset during preprocessing. The missing data analysis are shown in Fig.. We impute the missing values, and change the categorical data into numeric representations for the columns state, proto, service, srcip and dstip. Such methods are conventional measures for making it easier for the machine to learn. We add two more attributes - srcip_trunc_encoded and dstip_trunc_encoded, which are the subnet addresses of the source and destination IPs and encode them from categorical to numbers. When working with large data sets it is helpful to divide the dataset into smaller fractions which can be analyzed individually. Our sampler divides the large data into time-steps, fragmenting the data set into smaller sections based on the timestamp. These small segments are then processed to find out features from their statistical analysis. Our hypothesis is that the time window that contain attack samples will have significant feature separation from the time window that will have only normal samples. Hence, the sampler collects groups of rows from the dataset, which fall in a certain range of timestamps and creates a new data frame. Then,

3 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 27 the feature extraction portion of the algorithm extracts the mean and standard deviation of each of the attribute in the data frame. We have now transformed 3,239,993 rows to rows, each row now representing the events that occurred during that time window. At this point the dataset has been reduced by 97%. The statistical analysis of each of these attributes are extracted as features and a new data frame consisting of the mean and variance of all the attributes except IP, port, time, etc. Fig. 1: Sampler and Feature Extractor Fig. 2: Classification and Evaluation The dataset that is created from the original data is much reduced in terms of rows, and columns. Hence we are reducing the algorithm s computation time by reducing the number of rows that the unsupervised algorithm needs to process. Fig. 3: Missing Data UNSW Fig. 4: Missing Data AWID This presents the cognitive learner with two sample spaces, one of which has the attack samples. Now, for the intuitive learner to identify which is the attack cluster, it will pick up the time-step samples from each class and try to understand if any attack has occurred during this time step. A way of achieving that will be using the internal system logs corresponding to the time mentioned in the time-step, however, this part is out of the scope of this study. For the evaluation of our algorithm, these two clusters are examined individually and checked if they have accurately classified the events as attack and normal. Since we are doing this on the datasets which initially had labels, we can evaluate the prediction with the actual values by checking their accuracy scores from their false and true positives and negatives. After we have evaluated the accuracy of our machine learning instance, the same algorithm is applied on another dataset to as a final validation for the algorithm. For this we have used AWID dataset. On top of the classification of network intrusion based on statistical features of timesteps, our work presents a multi-scale analysis of the time-steps; we create feature datasets considering time-steps of t, 2t, 4t and 8t, where t is a relatively small time-window in the dataset which contains a balanced number of events. In other words, we are trying to find out the best value of n for time-step 2 n t. We run our algorithm on each of the datasets and our results present the best time-scales for each dataset. Such an analysis can be further used as a benchmark for future research. IV. MULTI-AGENT BASED MODEL Without an agent based approach, the gateway is the node that has access to the entire traffic in the network, hence the gateway must process the traffic flows from each host, and this may include multiple traffic from each host that has occurred in that time window. For example, any host X, has initiated 1000 traffic flows at a time step t. If there are 20 hosts like X, the gateway must process 500*1000 traffic flows every time step, and if we are recording 50 attributes of a traffic flow, the gateway node must perform multiplications and additions over a data size of 20*1000*50 = 10,00,000 for each time step. If the classification is deployed in another node, external to the gateway, then the gateway should send this much data (for each time step) over the network to that

4 28 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 classifier node for further processing. On the contrary, if we deploy a multi-agent based approach in the below manner the computation on the classifier node can be further optimized from the central gateway or classifier node and distributed over the network. Each host has an agent that performs the computation of the traffic flows for that host IP. Hence, the 1000 traffic flows for the host X will be processed by the agent in X and the statistical analysis of these 1000 rows will provide a single row for host X at the timestep t. In this manner now the classifier has to compute over only 20(nodes) *1(row provided by each nodes)*50(attributes recorded) = 1000 rows instead of 10,00,000, which is a significant increase. The other advantage of such an approach is that now not only the gateway, any node can be the classifier node. V. ANALYSIS AND RESULTS A. Physical Setup We perform the simulation on the python engine running on a 64-bit OS, the underlying hardware is AMD Quad-Core processor with 8GB RAM. The data is processed using the various python libraries like pandas, scikit-learn, etc. B. Results Out of the rows in the reduced dataset, positives were correctly identified, and negatives were correctly identified. The confusion matrix is shown in below, which is depicted in Fig. 5. In Fig 6., the comparison of computation time is shown, which shows that our approach is much faster. The classification is 89% correct which is a very high number achieved for unsupervised learning. The AWID- R dataset showed an accuracy of 29% with basic unsupervised K-Means, and with our algorithm the accuracy was increased by 60%. This is depicted in Fig. 7. Fig. 5: Comparison of processing time Fig. 6: Comparison of Rows Processed Fig. 7: Comparison of Achieved Accuracy Fig. 6 to 9 is a depiction of the time scale analysis of one of the UNSW dataset files. The plot shows the time step in x-axis, vs the count of total rows observed during that time-step for various scales t=1 second to t = 4 seconds. Fig. 8 : Confusion Matrix for K-means on time scale t=1 sec for UNSW dataset As the scales increase, the maximum value in the x-axis reduces and the maximum value in the y- axis increases.

5 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 29 Fig. 9: Time Scale Analysis shown for UNSW (scale t = 1 sec) Fig. 10: Time Scale Analysis shown for UNSW (scale t = 2 sec) Fig. 11: Time Scale Analysis shown for UNSW (scale t = 4 sec) Fig. 12: Time Scale Analysis shown for UNSW (scale t = 8 sec) The metrices that we use for evaluating the results are Accuracy, Recall, Precision and F1 score. Their desciptions are provided in Table 1. The results achieved are provided in Table 2 show that the algorithm performed best was for the scales that are 4 seconds or higher. The same is depicted in Fig. 13. Table 2: Results achieved for different time-scales Time Scale 1 sec 2 sec 4 sec 8 sec Class precision recall f1- score total total total total Fig. 13: Accuracy Analysis on various time windows Table 1 : Accuracy Metrics Accuracy Recall Precision F1 Score (TP + TN) / (TP + TN + FP + FN) (TP ) / (TP + FN) (TP ) / (TP + FP) 2 ( (Precision * Recall) / (Precision + Recall) ) Ratio of positive and negative cases correctly identified Ratio of overall positive cases correctly identified Ratio of negative cases correctly identified measure of the accuracy of the test, a weighted average of the recall and precision VI. FUTURE WORK We need to address the fact that the attacks are ever changing. No algorithm can withstand for decades as there are more improved efforts by the attackers to imitate the normal traffic, hence soon there will be attacks with normal features. Therefore, our future work of this study will be to synthetically design attack traffic that will

6 30 Int'l Conf. on Advances on Applied Cognitive Computing ACC'17 outperform this algorithm, and then to apply other advanced techniques to filter out such attacks. One way of doing this would be by applying fractal analysis to differentiate normal and attack. This approach has received significant recent attention in the research community. VII. REFERENCES [1] R. Sommer and V. Paxson, "Outside the Closed World: On Using Machine Learning for Network Intrusion Detection," in IEEE Symposium on Security and Privacy, Oakland, CA, USA, [2] "Wikipedia," [Online]. Available: [Accessed 15 May 2017]. [3] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, "Class of service mapping for QoS: a statistical signature-based approach to IP traffic classification," in 4th ACM SIGCOMM conference on Internet measurement, New York, NY, USA, [4] L. Bernaille, R. Teixeira, T. Akodkenou, A. Soule and K. Salamatian, "Traffic Classification On The Fly," in ACM SIGCOMM Computer Communication Review, New York, NY, USA, April [5] N. Moustafa and J. Slay, "UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set)," in 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, [6] N. Moustafa and J. Slay, "The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems," in 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), Kyoto, [7] H. Gharaee and H. Hosseinvand, "A new feature selection IDS based on genetic algorithm and SVM," in 8th International Symposium on Telecommunications (IST), Tehran, [8] N. Moustafa and J. Slay, "The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set," Information Security Journal: A Global Perspective, Vols. 1-3, no. 25, pp , [9] C. Kolias, G. Kambourakis, A. Stavrou, S. Gritzali, "Intrusion detection in networks: Empirical evaluation of threats and a public dataset," in Communications Surveys Tutorials IEEE, [10] U. S. K. P. M. Thanthrige, J. Samarabandu and X. Wang, "Machine learning techniques for intrusion detection on public dataset," in IEEE Canadian Conference on Electrical and Computer Engineering (CCECE), Vancouver, 2016.

Efficient Flow based Network Traffic Classification using Machine Learning

Efficient Flow based Network Traffic Classification using Machine Learning Efficient Flow based Network Traffic Classification using Machine Learning Jamuna.A*, Vinodh Ewards S.E** *(Department of Computer Science and Engineering, Karunya University, Coimbatore-114) ** (Assistant

More information

Machine Learning Classifiers for Network Intrusion Detection

Machine Learning Classifiers for Network Intrusion Detection Int'l Conf. on Advances on Applied Cognitive Computing ACC'18 55 Machine Learning Classifiers for Network Intrusion Detection Samilat Kaiser and Ken Ferens Department of Electrical and Computer Engineering,

More information

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis

Machine Learning based Traffic Classification using Low Level Features and Statistical Analysis Machine Learning based Traffic using Low Level Features and Statistical Analysis Rajesh Kumar M.Tech Scholar PTU Regional Center (SBBSIET) Jalandhar, India TajinderKaur Assistant Professor SBBSIET Padhiana

More information

Internet Traffic Classification using Machine Learning

Internet Traffic Classification using Machine Learning Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing

Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing Collaborative Anomaly Detection Framework for handling Big Data of Cloud Computing School of Engineering and Information Technology University of New South Wales @ Canberra Nour Moustafa, Gideon Creech,

More information

Intrusion Detection System using AI and Machine Learning Algorithm

Intrusion Detection System using AI and Machine Learning Algorithm Intrusion Detection System using AI and Machine Learning Algorithm Syam Akhil Repalle 1, Venkata Ratnam Kolluru 2 1 Student, Department of Electronics and Communication Engineering, Koneru Lakshmaiah Educational

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Anomaly Detection in Communication Networks

Anomaly Detection in Communication Networks Anomaly Detection in Communication Networks Prof. D. J. Parish High Speed networks Group Department of Electronic and Electrical Engineering D.J.Parish@lboro.ac.uk Loughborough University Overview u u

More information

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017 Internet Traffic Classification Using Machine Learning Tanjila Ahmed Dec 6, 2017 Agenda 1. Introduction 2. Motivation 3. Methodology 4. Results 5. Conclusion 6. References Motivation Traffic classification

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly

UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly UNSUPERVISED LEARNING FOR ANOMALY INTRUSION DETECTION Presented by: Mohamed EL Fadly Outline Introduction Motivation Problem Definition Objective Challenges Approach Related Work Introduction Anomaly detection

More information

Event Detection through Differential Pattern Mining in Internet of Things

Event Detection through Differential Pattern Mining in Internet of Things Event Detection through Differential Pattern Mining in Internet of Things Authors: Md Zakirul Alam Bhuiyan and Jie Wu IEEE MASS 2016 The 13th IEEE International Conference on Mobile Ad hoc and Sensor Systems

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Intrusion Detection System with FGA and MLP Algorithm

Intrusion Detection System with FGA and MLP Algorithm Intrusion Detection System with FGA and MLP Algorithm International Journal of Engineering Research & Technology (IJERT) Miss. Madhuri R. Yadav Department Of Computer Engineering Siddhant College Of Engineering,

More information

Statistical based Approach for Packet Classification

Statistical based Approach for Packet Classification Statistical based Approach for Packet Classification Dr. Mrudul Dixit 1, Ankita Sanjay Moholkar 2, Sagarika Satish Limaye 2, Devashree Chandrashekhar Limaye 2 Cummins College of engineering for women,

More information

Automated Application Signature Generation Using LASER and Cosine Similarity

Automated Application Signature Generation Using LASER and Cosine Similarity Automated Application Signature Generation Using LASER and Cosine Similarity Byungchul Park, Jae Yoon Jung, John Strassner *, and James Won-ki Hong * {fates, dejavu94, johns, jwkhong}@postech.ac.kr Dept.

More information

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation. Volume 3, Issue 12, December 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

Detecting Network Performance Anomalies with Contextual Anomaly Detection

Detecting Network Performance Anomalies with Contextual Anomaly Detection Detecting Network Performance Anomalies with Contextual Anomaly Detection Giorgos Dimopoulos *, Pere Barlet-Ros *, Constantine Dovrolis, Ilias Leontiadis * UPC BarcelonaTech, Barcelona, {gd, pbarlet}@ac.upc.edu

More information

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17, ISSN

International Journal of Computer Engineering and Applications, Volume XI, Issue XII, Dec. 17,   ISSN RULE BASED CLASSIFICATION FOR NETWORK INTRUSION DETECTION SYSTEM USING USNW-NB 15 DATASET Dr C Manju Assistant Professor, Department of Computer Science Kanchi Mamunivar center for Post Graduate Studies,

More information

Flow-based Anomaly Intrusion Detection System Using Neural Network

Flow-based Anomaly Intrusion Detection System Using Neural Network Flow-based Anomaly Intrusion Detection System Using Neural Network tational power to analyze only the basic characteristics of network flow, so as to Intrusion Detection systems (KBIDES) classify the data

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,

More information

NMLRG #4 meeting in Berlin. Mobile network state characterization and prediction. P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3)

NMLRG #4 meeting in Berlin. Mobile network state characterization and prediction. P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3) NMLRG #4 meeting in Berlin Mobile network state characterization and prediction P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3) (1)University of Piraeus (2)WINGS ICT Solutions, www.wings-ict-solutions.eu/

More information

Encoding Words into String Vectors for Word Categorization

Encoding Words into String Vectors for Word Categorization Int'l Conf. Artificial Intelligence ICAI'16 271 Encoding Words into String Vectors for Word Categorization Taeho Jo Department of Computer and Information Communication Engineering, Hongik University,

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset

Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset International Journal of Computer Applications (0975 8887) Combination of PCA with SMOTE Resampling to Boost the Prediction Rate in Lung Cancer Dataset Mehdi Naseriparsa Islamic Azad University Tehran

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised

More information

Detecting Malicious Hosts Using Traffic Flows

Detecting Malicious Hosts Using Traffic Flows Detecting Malicious Hosts Using Traffic Flows Miguel Pupo Correia joint work with Luís Sacramento NavTalks, Lisboa, June 2017 Motivation Approach Evaluation Conclusion Outline 2 1 Outline Motivation Approach

More information

A NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM

A NEW HYBRID APPROACH FOR NETWORK TRAFFIC CLASSIFICATION USING SVM AND NAÏVE BAYES ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence

A Network Intrusion Detection System Architecture Based on Snort and. Computational Intelligence 2nd International Conference on Electronics, Network and Computer Engineering (ICENCE 206) A Network Intrusion Detection System Architecture Based on Snort and Computational Intelligence Tao Liu, a, Da

More information

Object Purpose Based Grasping

Object Purpose Based Grasping Object Purpose Based Grasping Song Cao, Jijie Zhao Abstract Objects often have multiple purposes, and the way humans grasp a certain object may vary based on the different intended purposes. To enable

More information

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes

Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes Modeling Intrusion Detection Systems With Machine Learning And Selected Attributes Thaksen J. Parvat USET G.G.S.Indratrastha University Dwarka, New Delhi 78 pthaksen.sit@sinhgad.edu Abstract Intrusion

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Detecting malware even when it is encrypted

Detecting malware even when it is encrypted Detecting malware even when it is encrypted Machine Learning for network HTTPS analysis František Střasák strasfra@fel.cvut.cz @FrenkyStrasak Sebastian Garcia sebastian.garcia@agents.fel.cvut.cz @eldracote

More information

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER

SOFTWARE DEFECT PREDICTION USING IMPROVED SUPPORT VECTOR MACHINE CLASSIFIER International Journal of Mechanical Engineering and Technology (IJMET) Volume 7, Issue 5, September October 2016, pp.417 421, Article ID: IJMET_07_05_041 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=7&itype=5

More information

node2vec: Scalable Feature Learning for Networks

node2vec: Scalable Feature Learning for Networks node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database

More information

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography

More information

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations

Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Lecture Notes on Critique of 1998 and 1999 DARPA IDS Evaluations Prateek Saxena March 3 2008 1 The Problems Today s lecture is on the discussion of the critique on 1998 and 1999 DARPA IDS evaluations conducted

More information

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

NETWORK FAULT DETECTION - A CASE FOR DATA MINING NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information

Can t you hear me knocking

Can t you hear me knocking Can t you hear me knocking Identification of user actions on Android apps via traffic analysis Candidate: Supervisor: Prof. Mauro Conti Riccardo Spolaor Co-Supervisor: Dr. Nino V. Verde April 17, 2014

More information

An Analysis of UDP Traffic Classification

An Analysis of UDP Traffic Classification An Analysis of UDP Traffic Classification 123 Jing Cai 13 Zhibin Zhang 13 Xinbo Song 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2 Graduate University of Chinese Academy

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4

More information

Network traffic classification: From theory to practice

Network traffic classification: From theory to practice Network traffic classification: From theory to practice Pere Barlet-Ros Associate Professor at UPC BarcelonaTech Co-founder and Chairman at Polygraph.io Joint work with: Valentín Carela-Español, Tomasz

More information

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK

INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK INTRUSION DETECTION SYSTEM USING BIG DATA FRAMEWORK Abinesh Kamal K. U. and Shiju Sathyadevan Amrita Center for Cyber Security Systems and Networks, Amrita School of Engineering, Amritapuri, Amrita Vishwa

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI) April 25.

Intelligent Edge Computing and ML-based Traffic Classifier. Kwihoon Kim, Minsuk Kim (ETRI)  April 25. Intelligent Edge Computing and ML-based Traffic Classifier Kwihoon Kim, Minsuk Kim (ETRI) (kwihooi@etri.re.kr, mskim16@etri.re.kr) April 25. 2018 ITU Workshop on Impact of AI on ICT Infrastructures Cian,

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

Anonymization of Network Traces Using Noise Addition Techniques

Anonymization of Network Traces Using Noise Addition Techniques Anonymization of Network Traces Using Noise Addition Techniques By Ahmed AlEroud Assistant Professor of Computer Information Systems Yarmouk University, Jordan Post-doctoral Fellow, Department of Information

More information

Data Sources for Cyber Security Research

Data Sources for Cyber Security Research Data Sources for Cyber Security Research Melissa Turcotte mturcotte@lanl.gov Advanced Research in Cyber Systems, Los Alamos National Laboratory 14 June 2018 Background Advanced Research in Cyber Systems,

More information

I211: Information infrastructure II

I211: Information infrastructure II Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

More information

A Firewall Architecture to Enhance Performance of Enterprise Network

A Firewall Architecture to Enhance Performance of Enterprise Network A Firewall Architecture to Enhance Performance of Enterprise Network Hailu Tegenaw HiLCoE, Computer Science Programme, Ethiopia Commercial Bank of Ethiopia, Ethiopia hailutegenaw@yahoo.com Mesfin Kifle

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

Anomaly Detection System for Video Data Using Machine Learning

Anomaly Detection System for Video Data Using Machine Learning Anomaly Detection System for Video Data Using Machine Learning Tadashi Ogino Abstract We are developing an anomaly detection system for video data that uses machine learning. The proposed system has two

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

Machine Learning based session drop prediction in LTE networks and its SON aspects

Machine Learning based session drop prediction in LTE networks and its SON aspects Machine Learning based session drop prediction in LTE networks and its SON aspects Bálint Daróczy, András Benczúr Institute for Computer Science and Control (MTA SZTAKI) Hungarian Academy of Sciences Péter

More information

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Noise-based Feature Perturbation as a Selection Method for Microarray Data Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools

Performance Analysis of various classifiers using Benchmark Datasets in Weka tools Performance Analysis of various classifiers using Benchmark Datasets in Weka tools Abstract Intrusion occurs in the network due to redundant and irrelevant data that cause problem in network traffic classification.

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

Machine Learning in Digital Security

Machine Learning in Digital Security Machine Learning in Digital Security White Paper www.seqrite.com Table of Contents 1. Introduction 2. Introduction to Machine Learning 3. Machine Learning usage in Security Industry 4. Clustering Samples

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Further Thoughts on Precision

Further Thoughts on Precision Further Thoughts on Precision David Gray, David Bowes, Neil Davey, Yi Sun and Bruce Christianson Abstract Background: There has been much discussion amongst automated software defect prediction researchers

More information

Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data

Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data Jharna Chopra 1, Sampada Satav 2 M.E. Scholar, CTA, SSGI, Bhilai, Chhattisgarh, India 1 Asst.Prof, CSE, SSGI,

More information

Basic Concepts in Intrusion Detection

Basic Concepts in Intrusion Detection Technology Technical Information Services Security Engineering Roma, L Università Roma Tor Vergata, 23 Aprile 2007 Basic Concepts in Intrusion Detection JOVAN GOLIĆ Outline 2 Introduction Classification

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks. Anna Giannakou, Daniel Gunter, Sean Peisert

Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks. Anna Giannakou, Daniel Gunter, Sean Peisert Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks Anna Giannakou, Daniel Gunter, Sean Peisert Research Networks Scientific applications that process large amounts of data

More information

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

Empirical Study of Automatic Dataset Labelling

Empirical Study of Automatic Dataset Labelling Empirical Study of Automatic Dataset Labelling Francisco J. Aparicio-Navarro, Konstantinos G. Kyriakopoulos, David J. Parish School of Electronic, Electrical and System Engineering Loughborough University

More information

Online Traffic Classification Based on Sub-Flows

Online Traffic Classification Based on Sub-Flows Online Traffic Classification Based on SubFlows Victor Pasknel de A. Ribeiro, Raimir Holanda Filho Master s Course in Applied Computer Sciences University of Fortaleza UNIFOR Fortaleza Ceará Brazil paskel@unifor.br,

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

Plagiarism Detection Using FP-Growth Algorithm

Plagiarism Detection Using FP-Growth Algorithm Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,

More information

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer The Beautiful World of IoT 06.03.2018 garcia@tk.tu-darmstadt.de

More information

BUILDING A FRAMEWORK FOR INTRUSION DETECTION AND PREVENTION IN IoT USING DATA ANALYTICS METHODS

BUILDING A FRAMEWORK FOR INTRUSION DETECTION AND PREVENTION IN IoT USING DATA ANALYTICS METHODS BUILDING A FRAMEWORK FOR INTRUSION DETECTION AND PREVENTION IN IoT USING DATA ANALYTICS METHODS RESEARCH PROPOSAL STUDENT NAME: Ahmad Arida STUDENT NUMBER: 2632348 COURSE NAME: CIS 698 Independent Study

More information