Association Rule Mining in Big Data using MapReduce Approach in Hadoop

Size: px
Start display at page:

Download "Association Rule Mining in Big Data using MapReduce Approach in Hadoop"

Transcription

1 GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) July 2016 e-issn: Association Rule Mining in Big Data using MapReduce Approach in Hadoop 1 J. Jenifer Nancy 2 M. Jansi Rani 3 Dr. D. Devaraj 1 P. G Scholar 2 Assistant Professor 3 Senior Professor and H.O.D 1,2 Department of Computer Science & Engineering 3 Department of Electrical and Electronics Engineering 1,2,3 Kalasalingam University, Krishnankovil, India Abstract The concept of Association rule mining is an important task in data mining. In case of big data the large volume of data makes is impossible to generate rules at a faster pace. By making use of parallel execution in Hadoop using the MapReduce framework, the rules can be generated much faster and in an efficient way. The existing method transforms the input dataset into binomial representation before processing them using MapReduce. But binomial conversion is not user-friendly since it is complex in case of continuous values. In this paper, an improved and scalable algorithm is proposed for association rule mining that will convert the input dataset into key-value pairs instead of binomial. All the stages of proposed association rule mining algorithm are parallelized using MapReduce. The proposed algorithm works on high cardinality features and so no dimension detection is needed. Keyword- Hadoop; MapReduce; Association rule mining; Data mining; big data I. INTRODUCTION A. Big Data and Characteristics The data is collected and stored in every minute, every hour and every day in an organization or institute and is available in large quantity. But the amount of data is not of importance but what the organizations do with these data to identify information that can be useful for them. This can be done by analyzing the data to identify insights or critical information that can help the organization to make useful decisions for their growth. The term big data describes a large volume of data that is available in both structured and in unstructured formats. Even though the concept of big data is a new term, the process of collecting the data, storing them in large amounts and analyzing them to gather new information is something that has been done since long before big data has been used. The characteristics of big data can be explained using 3 V s such as (1) Volume, (2) Velocity and (3) Variety. The applications of big data include areas such as health care, telecom, finance, etc. In this paper the process of association rule generation in big data is discussed and an association rule mining technique is proposed to generate the rules from the KDD CUP 99 dataset. B. Data Mining in Big Data Big Data mining deals with a large amount of data that is stored in the data warehouses and databases. The concept of big data mining can be used to extract or identify the interesting patterns and information from these large data. Many data mining techniques are available that can be applied to the big data. They are classification, clustering, association rules, prediction, estimation, documentation and description. The researches around these techniques have been large since long ago. Many algorithms have been applied in each of the data mining techniques and this also applies to big data. One such well known technique that is applied is the association rule mining in big data. This is a most efficient data mining technique that is used to discover the various hidden patterns and information from large databases. Here the relationships between the various attributes of the data are identified using the association rule mining algorithm. Some basic types of association rule mining algorithms are the Apriori algorithm, Distributed algorithm and Parallel algorithm. C. Association Rule Mining The Association Rule Mining (ARM) [1] in data mining is a popular approach that is used to analyse the given dataset to discover interesting patterns or relationships between the various items in the dataset. The concept of strong association rules was first used by Agarwal et al. [2] to identify the various association rules between the items that are sold during a large scale transaction database collected from a supermarket using a point system. The relationship between the items is identified based on the purchase pattern. The ARM technique generates a set of association rules prevailing between the various items of the given dataset based on the number of occurrences of these items combination in the dataset. 179

2 An association rule is used to define the relationship between any two items in the given dataset. Consider three items A, B and C. The relation {A, B} C say that if a person buys two items A and B together, then he/she will most likely buy the item C also. That is, the relations between the items are generated by identifying the various patterns within the dataset. The Association Rule Mining (ARM) technique [3] consists of two stages as follows: 1) Identify the itemset that occur frequently in the dataset The frequent itemset are those that have a support value (sup(item)) equal to or greater than the minimum support value (min_sup) that is pre-defined. The support value of itemset is calculated as the number of transactions that contains that item. In the above example support of {A, B} is calculated as how many transactions have both A and B. 2) Association rule generation using frequent itemset: In this stage the interesting rules are generated by calculating the confidence factor (conf) for all the frequent itemset that are generated in previous stage. The confidence value for the above example rule of {A, B} C will be sup({a, B})/sup(C). D. MapReduce Approach for ARM The association rules and the generation of rules are widely used and they face many issues and the major one is the availability of large data and multidimensional datasets [4]. A single processor system and normal CPU speed and resources cannot handle such large data and this makes the algorithm inefficient to use. In recent developments, the growth of network technology and especially cloud platforms provided new ideas in terms of association rule generation by making use of parallel environment like Hadoop [5]. MapReduce has been a popular and more used for computing large amounts of data ever since it was launched by Google in its platform. The Google Distributed File System (GFS) and the Amazon Web Service (AWS) makes use of the Hadoop platform and MapReduce to provide their services. A MapReduce job usually splits the input data into various chunks and each of these are processed by the map tasks in parallel manner. The Mapper maps the small tasks by making use of the key and value pair concept and the outputs are sorted. Then the Reducer reduces the obtained outputs from the maps to obtain the final output. The MapReduce framework contains a single Job Tracker as the master and a single Task Tracker as the slave for each cluster node. All input and output in MapReduce are <key, value> pairs. The Hadoop is a Java based distributed programming environment sponsored by Apache that can be used to process and handle large amounts of data. Hadoop has been created using the concept of MapReduce for large processing by using a large number of nodes and clusters. In case of Association Rule Mining in MapReduce, the Mapper maps the task of obtaining the various combinations of items as the key and the value is used to keep track of the number of occurrences or the support count. Then finally the Reducer task will reduce the obtained set of Mappers for each key value and calculates the final support and confidence for all the candidate itemsets. This way the Association Rules can be generated with maximum support and confidence. This remainder of this paper is organized as follows: Section 2 explains about the various association rule mining algorithms using Hadoop and MapReduce; Section 3 describes the proposed method and its working; Section 4 shows the experimental results of the proposed method; and finally Section 5 provides the overall conclusion of the paper. II. LITERATURE SURVEY The MapReduce can be used to design the existing sequential algorithms into parallel algorithms that can be used to handle large amounts of data in a shorter time and so this is applied for association rule mining [6]. Some of the existing methods have been discussed as given below. A. State-of-art in Association Rule Mining Yang et al. proposed a MapReduce based programming model for generation of association rules in Hadoop framework to handle large volumes of data. The Apriori algorithm [7] is used as the underlying association rule generation technique. But the standard Apriori algorithm is time consuming and it takes a really consumes more time especially when dealing with many candidate sets. To overcome this issue, they implemented the improved Apriori algorithm that is parallelized using the Hadoop framework to save time. The use of Hadoop for association rule generation provided new research focus in upcoming years. The improved Apriori algorithm [8] is proposed by Yang et al. that mainly works using the MapReduce concept to handle large data by making use of the various nodes in Hadoop platform. Lin et al. [11] proposed a similar method for association rule generation by using the same Apriori approach for frequent itemset generation in Hadoop platform using the MapReduce approach. The mining process is executed in a fast manner by implementing the parallelized mining technique during frequent itemset generation. But parallelization cannot be handled effectively. For this purposed the MapReduce is used. They proposed a parallelization algorithm in MapReduce that performs better than the previously existing algorithms in terms of speed and efficiency in rule generation. That is, the comparison of results obtained here shows better performance in terms of both speed and the rule generation accuracy [9] with existing algorithms. Riondata et al. proposed a randomized algorithm for association rules mining that is implemented using a parallel approach [10] in MapReduce framework. The proposed approach generated the association rules appropriately based on the dataset content. At first the proposed PARMA (Parallel Association Rule Mining Algorithm) approach randomized the 180

3 MapReduce algorithm to identify the appropriate frequent itemsets and association rules by using a near-linear speed up process. A large number of random samples are mined by using the original dataset. Jongwook Woo et al. proposed a Market Based Analysis algorithm combined with MapReduce for association rule generation. This is one of the most used algorithms for association rules [12]. At first the algorithm sorts the give dataset in ascending order and then converts each instance of the dataset into a (key, value) pair and fit them into the MapReduce. Then the execution is done on the Amazon EC2 MapReduce platform. The obtained experimental results shows that the performance is increased by making use of the MapReduce parallel code but still there is a bottle neck at certain point when more nodes are used. B. Need for Proposed Method The use of binomial algorithm is not suitable in many datasets and a novel method should be available that can be applied to any format of datasets [13]. Also binomial transformation is complex and time consuming and is not necessary. It is difficult to handle and process large volumes of data in a single server and so there is a need to use parallel environment. In this paper an improved scalable and distributed key-value pair algorithm is proposed for the selection of frequent itemsets from the dataset and for association rules generation. The proposed algorithm is a bottom up approach since at first the candidate itemsets are generated and then the support values are calculated by getting the count from the dataset transactions. The minimum support value is then provided to converts the candidate itemsets to frequent itemsets. A very large dataset is used here and after selecting the frequent itemsets the association rules are generated. The implementation is done by making use of the MapReduce platform and the complete process is parallelized. III. PROPOSED METHOD The paper proposes and implements the association rule mining using a very large dataset in the Hadoop platform using MapReduce [14]. The proposed algorithm converts the input dataset into <key, value> pairs instead of binomial representation. This way, one level of transformation can be reduced at the end for converting binomial features to data features. The input dataset should be first preprocessed before going for the rule generation phase in MapReduce [15]. The various phases of the proposed algorithm are discussed below. 1) Phase 1: Generate frequent 1-itemsets The input dataset is stored in the HDFS of the Hadoop environment at first to make data access easy and fast for MapReduce operations [16]. The input data is then split into various chunks and provided to the Mapper that maps the data to the output. The output from the mapper is represented as <key, value> pair. The outputs obtained from all the maps are then combined together in the combiner and then sent to the reducer. Here the support values are calculated by combining the values corresponding to each of the key values. Then the support values are compared with the minimum support and the items that support these items are taken as the output and thisis the frequent 1-itemsets. 2) Phase 2: Generate candidate 2-itemsets and n-itemsets Next the candidate 2-itemsets are generated by the mapper using the frequent 1-itemsets. The count of each item in the candidate 2-itemsets is verified with the input data that is provided to the mapper. They are then combined using the combiner to calculate the count values of the 2 -itemsets and provided to the reducer. The reducer further reduces and counts the support values of 2-itemsets. This is repeated till all the possible candidate n-itemsets are generated. The same process is repeated until no possible frequent itemset is available in previous iteration. 3) Phase 3: Association rule generation Finally after generating all the frequent n-itemsets, the association rules are generated based on confidence values. The confidence values are calculated by using the support values of the frequent itemsets that form the rules. The output contains all the selected itemset value and its support count. The output is written in an output file. These support values are then used for confidence calculation and the rules that contain 100% confidence are generated as the output rules. The overall association rule generation as discussed above is implemented in the Hadoop framework by creating a sing node Hadoop environment [17]. The time in Hadoop is synchronized with the system time and the time values are calculated in milliseconds using the time function in Hadoop. The data flow for two iterations of MapReduce in Hadoop is shown below in Fig

4 Fig. 1: Data flow showing two iterations of proposed method First the dataset is read as input by the MapReduce code from the HDFS storage and it processes each item as a separate key to calculate the frequent 1-itemset as in Fig. 1. Then using pair of items from the 1-itemset the frequent 2-itemsets are generated. This process is repeated till any number of iterations based on the number of itemsets needed. Fig. 1 shows till 3- itemset calculation using MapReduce. The key used in the Mapper represents the n-itemsets where n is the number items used to form the key. The MapReduce flow of the proposed MapReduce framework is shown below in Fig. 2. Fig. 2: Proposed MapReduce framework During the MapReduce operation the input dataset or file is split into many sections in the Mapper phase with each Mapper having a unique key. In ARM the key represents the items available within the dataset and the value is the number of occurrence of the item in the dataset. Initially the count is set to 1 in the Mapper and for each occurrence this count is increment. Finally in the Reducer the total occurrence is found using merge and the support and confidence are calculated. The output file consist of the list of rules generated based on the support and confidence. IV. EXPERIMENTATION AND RESULTS A. Dataset Description The proposed approach for association rule mining is applied to KDD CUP 99 data and the simulation details are presented here. The KDD CUP 99 input dataset consist of records from four categories of attacks such as Denial of Service, user-to-root, probing attack and remote-to-local. The instances of the dataset consists of both labeled and unlabeled records in which each labeled records consists of 41 attributes and one target attribute. The dataset consists of three groups of values such as basic, content based and time based values. And not all the values are binary. The training set consists of almost 5 million instances of input dataset. The description of test set and training set are given below: Training Set Contains 494,021 connections or records with a total of 22 attack types. 182

5 Test Set Contains 311,029 connections or records with 17 new attacks types not available in training data. No. Value No. Value 1 duration 22 is_guest_login 2 protocol_type 23 count 3 service 24 srv_count 4 flag 25 serror_rate 5 src_bytes 26 srv_serror_rate 6 dst_bytes 27 rerror_rate 7 land 28 srv_rerror_rate 8 wrong_fragment 29 same_srv_rate 9 urgent 30 diff_srv_rate 10 hot 31 srv_diff_host_rate 11 num_failed_logins 32 dst_host_count 12 logged_in 33 dst_host_srv_count 13 num_compromised 34 dst_host_same_srv_rate 14 root_shell 35 dst_host_diff_srv_rate 15 su_attempted 36 dst_host_same_src_port_rate 16 num_root 37 dst_host_srv_diff_host_rate 17 num_file_creation 38 dst_host_serror_rate 18 num_shells 39 dst_host_srv_serror_rate 19 num_access_files 40 dst_host_rerror_rate 20 num_outbound_cmds 41 dst_host_srv_rerror_rate 21 is_host_login Table 1: Features of the input dataset The 41 features of the KDD CUP 99 dataset is shown in Table 1 and Fig. 3 shows the sample values of the dataset. The values from 1 to 41 are represented by separating them using, (comma) in the dataset given below in Fig. 3. That is, each instance or row of the dataset consists of 42 attributes with 41 feature attributes and one class attribute all separated using a, (comma) as in the figure below. The row values are split to read each attributes separately. Fig. 3: KDD CUP 99 dataset sample values B. Results and Discussion The input dataset is split into many tasks by using the Map and Reduce in the Hadoop environment during the execution. The input data is sent to the mapper that will split the instances of the data into <key, value> pairs and then it is sent to the reducer. The data is sorted and then shuffled before it is sent to the reducer. The final result is obtained by reducing the <key, value> pairs 183

6 by calculating support and confidence and then selecting the rules based on that. Based on this it is possible to identify if the user of a specific instance or attack is a guest login or host login. The obtained values of support and confidence during the 4 levels of MapReduce operations are shown in Fig. 4. Fig. 4: Support and Confidence The execution of the MapReduce phase [18] in Hadoop and the obtained final results of the reducer phase are shown in Fig. 5 and Fig. 6 respectively. Fig. 5 shows the execution of the Reducer phase and the output file is being generated. The final statistics of the MapReduce job is shown in Fig. 5. The generated output file is shown in Fig. 6. Fig. 5: Mapper and Reducer execution Fig. 6: Final output 184

7 The final output shown in Fig.6 shows the list of all frequent items sets that are generated along with the support and confidence values near them. The format represented in the output is <itemset, support, confidence> and this is generated for all possible combinations of itemsets for the given input attributes. In this case the 2-itemsets are generated. V. CONCLUSION AND FUTURE WORK The concept of association rule generation or mining can be done effectively in distributed systems that can use parallel executions as in Hadoop environment. This is because it can be scaled up to large volumes of data with less execution time and cost with good accuracy. The proposed algorithm in this paper also considers the type of input data and can be applied to any data formats. By dividing the input data into many splits and processing them using many nodes, the execution is made easy. The management issues such as data transfer between the nodes, storage of data, failure of any node and other issues within the cluster are all handled by Hadoop automatically. Thus the proposed system is more efficient in terms of scalability and robustness. The proposed association rule mining algorithm also has the same features and so it is efficient. Also by making use of the key-value pair approach, the processing is made much easier compared to that of the existing binomial approach. But still the proposed algorithm is not the best in performance when comes to really large datasets. So in the future the Fuzzy based association rule mining can be done in Hadoop to handle data larger than the one in this paper. Further the input data can be classified based on the calculated support and confidence values by using a suitable classification algorithm. In future this work can be extended to implement feature selection first using information gain or mutual information [19] before implanting ARM. REFERENCES [1] Ashrafi, M.Z.,Taniar,D., Smith,K., ODAM:An Optimized Distributed Association Rule Mining Algorithm, Distributed Systems Online, IEEE, Volume 5, Issue 3, [2] R.Agrawal, R.Srikant, Fast Algorithms for Mining Association Rules, In Proceedings of International Conference on Very Large DataBases,pp , Santiago,Chile,September1994. [3] JongSooPark, Ming-SyanChen, PhilipS. Yu, An Effective Hash-based Algorithm for Mining Association Rules, In Proceedings of the ACMSIGMOD International Conference on Management of Data, Michael Carey and Donovan Schneider, ACM, [4] Ozel,S.A., Guvenir,H.A., An Algorithm for Mining Association Rules using Perfect Hashing and Database Pruning,10th Turkish Symposiumon Artificial Intelligence and Neural Networks, Gazimagusa, Springer, pp , [5] KaramGouda, Mohammed JaveedZaki, Efficiently Mining Maximal Frequent Itemsets, In Proceedings of the IEEE International Conference on DataMining, pp , November29-December 02, [6] J.Han,J. Pei,Y. Yin, Mining Frequent Patterns without Candidate Generation, ACMSIGMOD International Conference,Dallas,2000. [7] D.W.Cheung, Jiawei Han, V.T. Ng, A.W. Fu, Yongjian Fu, "Afast Distributed Algorithm for Mining Association Rules, In Proceedings of International Conference on Parallel and Distributed Information Systems, IEEE CS Press, [8] AnsariE, DastghaibifardG, KeshtkaranM, KaabiH, Distributed Frequent Itemset Mining using Trie Data Structure,International Journal of Computer Science, Volume 35, Issue 3, pp , [9] Park,J.S.,Chen,M. S., Yu,P. S., Efficient Paralle l Data Mining for Association Rules, In Proceedings of the Fourth International Conference on Information and Knowledge Management,pp.31-33, [10] Woo, J., Xu, Y, Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, [11] Lin, Ming-Yen, Pei-Yu Lee, Sue-Chen Hsueh, "Apriori-based Frequent Itemset Mining Algorithms on MapReduce", In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ACM, [12] PeddiKishor, SammulalPorika, Literature Survey on Association Rule Discovery in Data Mining, International Journal of Computer Science and Management Research, Volume 2, Issue 1, January [13] Zhang C.S, Li Z.Y, Zheng D.S., An Improved Algorithm for Apriori, In Proceedings of the 1st International Workshop on Education Technology and Computer Science, Volume 1, pp , [14] C.Jin, C.Vecchiola, R.Buyya, MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms, Fourth IEEE International Conference on escience, pp , [15] T.Elsayed, J.Lin, Douglas W. Oard, Pairwise Document Similarity in Large Collections with MapReduce, In Proceedings of 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, [16] J.H.C. Yeung, C.C. Tsang, K.H. Tsoi, B.Kwan, C. Cheung, A.P.C. Chan P.H.W. Leong, Map-reduce as a Programming Model for Custom Computing Machines, In Proceedings of the 16th IEEE Symposium on Field-Programmable Custom Computing Machines, pp , [17] M.Zaharia, A.Konwinski, A. D. Joseph, R. Katz, I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, EECS Department University of California, Berkeley Technical Report Number UCB/EECS August 19,

8 [18] MohammadhosseinBarkhordari, Mahdi Niamanesh, ScadiBino: An Effective MapReduce-based Association Rule Mining Method, ACM 16th International Conference on Electronic Commerce, August [19] P.Ganesh Kumar, D.Devaraj, Intrusion Detection using Artificial Neural Network with Reduced Input Features, International Journal on Soft Computing, ICTACT, Issue 1, pp , July

Network attack analysis via k-means clustering

Network attack analysis via k-means clustering Network attack analysis via k-means clustering - By Team Cinderella Chandni Pakalapati cp6023@rit.edu Priyanka Samanta ps7723@rit.edu Dept. of Computer Science CONTENTS Recap of project overview Analysis

More information

Analysis of Feature Selection Techniques: A Data Mining Approach

Analysis of Feature Selection Techniques: A Data Mining Approach Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature

More information

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge

Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities Eren Golge FRAUD? HACKERS!! DoS: Denial of service R2L: Unauth. Access U2R: Root access to Local Machine. Probing: Survallience....

More information

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown

CHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown CHAPTER V KDD CUP 99 DATASET With the widespread use of computer networks, the number of attacks has grown extensively, and many new hacking tools and intrusive methods have appeared. Using an intrusion

More information

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics

A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics Abhishek choudhary 1, Swati Sharma 2, Pooja

More information

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction

Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 5 (Nov. - Dec. 2013), PP 107-112 Intrusion Detection System Based on K-Star Classifier and Feature

More information

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System

Classification Trees with Logistic Regression Functions for Network Based Intrusion Detection System IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. IV (May - June 2017), PP 48-52 www.iosrjournals.org Classification Trees with Logistic Regression

More information

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET An IDS monitors the network bustle through incoming and outgoing data to assess the conduct of data

More information

INTRUSION DETECTION SYSTEM

INTRUSION DETECTION SYSTEM INTRUSION DETECTION SYSTEM Project Trainee Muduy Shilpa B.Tech Pre-final year Electrical Engineering IIT Kharagpur, Kharagpur Supervised By: Dr.V.Radha Assistant Professor, IDRBT-Hyderabad Guided By: Mr.

More information

Detection of DDoS Attack on the Client Side Using Support Vector Machine

Detection of DDoS Attack on the Client Side Using Support Vector Machine Detection of DDoS Attack on the Client Side Using Support Vector Machine Donghoon Kim * and Ki Young Lee** *Department of Information and Telecommunication Engineering, Incheon National University, Incheon,

More information

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS

FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University

More information

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform

Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Nachirat Rachburee and Wattana Punlumjeak Department of Computer Engineering, Faculty of Engineering,

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June-2015 1496 A Comprehensive Survey of Selected Data Mining Algorithms used for Intrusion Detection Vivek Kumar Srivastava

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis

Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Rupali Datti 1, Bhupendra verma 2 1 PG Research Scholar Department of Computer Science and Engineering, TIT, Bhopal (M.P.) rupal3010@gmail.com

More information

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION

CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION 55 CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION In this work, an intelligent approach for building an efficient NIDS which involves data preprocessing, feature extraction and classification has been

More information

A Hybrid Anomaly Detection Model using G-LDA

A Hybrid Anomaly Detection Model using G-LDA A Hybrid Detection Model using G-LDA Bhavesh Kasliwal a, Shraey Bhatia a, Shubham Saini a, I.Sumaiya Thaseen a, Ch.Aswani Kumar b a, School of Computing Science and Engineering, VIT University, Chennai,

More information

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms

A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms ISSN (Online) 2278-121 ISSN (Print) 2319-594 Vol. 4, Issue 6, June 215 A Study on NSL-KDD set for Intrusion Detection System Based on ification Algorithms L.Dhanabal 1, Dr. S.P. Shantharajah 2 Assistant

More information

Experiments with Applying Artificial Immune System in Network Attack Detection

Experiments with Applying Artificial Immune System in Network Attack Detection Kennesaw State University DigitalCommons@Kennesaw State University KSU Proceedings on Cybersecurity Education, Research and Practice 2017 KSU Conference on Cybersecurity Education, Research and Practice

More information

Fuzzy Grids-Based Intrusion Detection in Neural Networks

Fuzzy Grids-Based Intrusion Detection in Neural Networks Fuzzy Grids-Based Intrusion Detection in Neural Networks Izani Islam, Tahir Ahmad, Ali H. Murid Abstract: In this paper, a framework is used for intrusion detection that shows the effectiveness of data

More information

Data Reduction and Ensemble Classifiers in Intrusion Detection

Data Reduction and Ensemble Classifiers in Intrusion Detection Second Asia International Conference on Modelling & Simulation Data Reduction and Ensemble Classifiers in Intrusion Detection Anazida Zainal, Mohd Aizaini Maarof and Siti Mariyam Shamsuddin Faculty of

More information

Machine Learning for Network Intrusion Detection

Machine Learning for Network Intrusion Detection Machine Learning for Network Intrusion Detection ABSTRACT Luke Hsiao Stanford University lwhsiao@stanford.edu Computer networks have become an increasingly valuable target of malicious attacks due to the

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection

An Efficient Decision Tree Model for Classification of Attacks with Feature Selection An Efficient Decision Tree Model for Classification of Attacks with Feature Selection Akhilesh Kumar Shrivas Research Scholar, CVRU, Bilaspur (C.G.), India S. K. Singhai Govt. Engineering College Bilaspur

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

Document Clustering with Map Reduce using Hadoop Framework

Document Clustering with Map Reduce using Hadoop Framework Document Clustering with Map Reduce using Hadoop Framework Satish Muppidi* Department of IT, GMRIT, Rajam, AP, India msatishmtech@gmail.com M. Ramakrishna Murty Department of CSE GMRIT, Rajam, AP, India

More information

Available online at ScienceDirect. Procedia Computer Science 79 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 79 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 79 (2016 ) 207 214 7th International Conference on Communication, Computing and Virtualization 2016 An Improved PrePost

More information

System Health Monitoring and Reactive Measures Activation

System Health Monitoring and Reactive Measures Activation System Health Monitoring and Reactive Measures Activation Alireza Shameli Sendi Michel Dagenais Department of Computer and Software Engineering December 10, 2009 École Polytechnique, Montreal Content Definition,

More information

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set

Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Maheshkumar Sabhnani and Gursel Serpen Electrical Engineering and Computer Science Department The University

More information

NAVAL POSTGRADUATE SCHOOL THESIS

NAVAL POSTGRADUATE SCHOOL THESIS NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS NEURAL DETECTION OF MALICIOUS NETWORK ACTIVITIES USING A NEW DIRECT PARSING AND FEATURE EXTRACTION TECHNIQUE by Cheng Hong Low September 2015 Thesis

More information

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Anomaly detection using machine learning techniques. A comparison of classification algorithms

Anomaly detection using machine learning techniques. A comparison of classification algorithms Anomaly detection using machine learning techniques A comparison of classification algorithms Henrik Hivand Volden Master s Thesis Spring 2016 Anomaly detection using machine learning techniques Henrik

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

Classification of Attacks in Data Mining

Classification of Attacks in Data Mining Classification of Attacks in Data Mining Bhavneet Kaur Department of Computer Science and Engineering GTBIT, New Delhi, Delhi, India Abstract- Intrusion Detection and data mining are the major part of

More information

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem

Journal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM Soukaena Hassan Hashem Computer Science

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

Distributed Face Recognition Using Hadoop

Distributed Face Recognition Using Hadoop Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,

More information

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique

An Improved Performance Evaluation on Large-Scale Data using MapReduce Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH

DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)

More information

Association Rules Mining using BOINC based Enterprise Desktop Grid

Association Rules Mining using BOINC based Enterprise Desktop Grid Association Rules Mining using BOINC based Enterprise Desktop Grid Evgeny Ivashko and Alexander Golovin Institute of Applied Mathematical Research, Karelian Research Centre of Russian Academy of Sciences,

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

An Algorithm for Frequent Pattern Mining Based On Apriori

An Algorithm for Frequent Pattern Mining Based On Apriori An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior

More information

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS

ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.

More information

Available online at ScienceDirect. Procedia Computer Science 89 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 89 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach

More information

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute

More information

Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework

Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework I.J. Intelligent Systems and Applications, 2015, 07, 44-49 Published Online June 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.07.06 Mining Interesting Infrequent Itemsets from Very

More information

Classifying Network Intrusions: A Comparison of Data Mining Methods

Classifying Network Intrusions: A Comparison of Data Mining Methods Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2005 Proceedings Americas Conference on Information Systems (AMCIS) 2005 Classifying Network Intrusions: A Comparison of Data Mining

More information

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm

Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 55 Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm Wafa'

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER

PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER A.Tharani MSc (CS) M.Phil. Research Scholar Full Time B.Leelavathi, MCA, MPhil., Assistant professor, Dept. of information technology,

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

A Back Propagation Neural Network Intrusion Detection System Based on KVM

A Back Propagation Neural Network Intrusion Detection System Based on KVM International Journal of Innovation Engineering and Science Research Open Access A Back Propagation Neural Network Intrusion Detection System Based on KVM ABSTRACT Jiazuo Wang Computer Science Department,

More information

Network Anomaly Detection using Co-clustering

Network Anomaly Detection using Co-clustering Network Anomaly Detection using Co-clustering Evangelos E. Papalexakis, Alex Beutel, Peter Steenkiste Department of Electrical & Computer Engineering School of Computer Science Carnegie Mellon University,

More information

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets

I. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets 2017 IJSRSET Volume 3 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Emancipation of FP Growth Algorithm using Association Rules on Spatial Data Sets Sudheer

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth

More information

An Efficient Algorithm for finding high utility itemsets from online sell

An Efficient Algorithm for finding high utility itemsets from online sell An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

CHAPTER 2 DARPA KDDCUP99 DATASET

CHAPTER 2 DARPA KDDCUP99 DATASET 44 CHAPTER 2 DARPA KDDCUP99 DATASET 2.1 THE DARPA INTRUSION-DETECTION EVALUATION PROGRAM The number of intrusions is to be found in any computer and network audit data are plentiful as well as ever-changing.

More information

Pamba Pravallika 1, K. Narendra 2

Pamba Pravallika 1, K. Narendra 2 2018 IJSRSET Volume 4 Issue 1 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology Analysis on Medical Data sets using Apriori Algorithm Based on Association Rules

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support

Mining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare

More information

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data

I ++ Mapreduce: Incremental Mapreduce for Mining the Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

A Hierarchical Document Clustering Approach with Frequent Itemsets

A Hierarchical Document Clustering Approach with Frequent Itemsets A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

Utility Mining Algorithm for High Utility Item sets from Transactional Databases

Utility Mining Algorithm for High Utility Item sets from Transactional Databases IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. V (Mar-Apr. 2014), PP 34-40 Utility Mining Algorithm for High Utility Item sets from Transactional

More information

The Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method

The Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method Available online at http://www.csjonline.org/ The Caspian Sea Journal ISSN: 1578-7899 Volume 10, Issue 1, Supplement 4 (2016) 456-461 A Study on Improvement of Intrusion Detection Systems in Computer Networks

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data

Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

A Survey on Apriori algorithm using MapReduce Technique

A Survey on Apriori algorithm using MapReduce Technique A Survey on Apriori algorithm using MapReduce Technique Mr. Kiran C. Kulkarni 1, Mr.R.S.Jagale 2, Prof.S.M.Rokade 3 1 PG Student, Computer Dept., SVIT COE, Nasik, Maharashtra, India 2 PG Student, Computer

More information

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET

DATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 5 ISSN : 2456-3307 Mapreduce Based Pattern Mining Algorithm In Distributed

More information

Review on Data Mining Techniques for Intrusion Detection System

Review on Data Mining Techniques for Intrusion Detection System Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,

More information

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection

Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Pattern Recognition 40 (2007) 2373 2391 www.elsevier.com/locate/pr Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Chi-Ho Tsang, Sam Kwong,

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Comparative Analysis of K means Clustering Sequentially And Parallely

Comparative Analysis of K means Clustering Sequentially And Parallely Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India

More information

Research and Improvement of Apriori Algorithm Based on Hadoop

Research and Improvement of Apriori Algorithm Based on Hadoop Research and Improvement of Apriori Algorithm Based on Hadoop Gao Pengfei a, Wang Jianguo b and Liu Pengcheng c School of Computer Science and Engineering Xi'an Technological University Xi'an, 710021,

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Association Rule Mining in XML databases: Performance Evaluation and Analysis Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Department of Computer Science & Engineering, UIET Panjab University Chandigarh. E-mail

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

A New Approach to Web Data Mining Based on Cloud Computing

A New Approach to Web Data Mining Based on Cloud Computing Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 4, December 2014, pp. 181-186 A New Approach to Web Data Mining Based on Cloud Computing Wenzheng Zhu* and Changhoon Lee School of

More information

Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic

Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 2008 39 Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic G.V.S.N.R.V.Prasad

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information