Association Rule Mining in Big Data using MapReduce Approach in Hadoop
|
|
- Todd Oliver
- 6 years ago
- Views:
Transcription
1 GRD Journals Global Research and Development Journal for Engineering International Conference on Innovations in Engineering and Technology (ICIET) July 2016 e-issn: Association Rule Mining in Big Data using MapReduce Approach in Hadoop 1 J. Jenifer Nancy 2 M. Jansi Rani 3 Dr. D. Devaraj 1 P. G Scholar 2 Assistant Professor 3 Senior Professor and H.O.D 1,2 Department of Computer Science & Engineering 3 Department of Electrical and Electronics Engineering 1,2,3 Kalasalingam University, Krishnankovil, India Abstract The concept of Association rule mining is an important task in data mining. In case of big data the large volume of data makes is impossible to generate rules at a faster pace. By making use of parallel execution in Hadoop using the MapReduce framework, the rules can be generated much faster and in an efficient way. The existing method transforms the input dataset into binomial representation before processing them using MapReduce. But binomial conversion is not user-friendly since it is complex in case of continuous values. In this paper, an improved and scalable algorithm is proposed for association rule mining that will convert the input dataset into key-value pairs instead of binomial. All the stages of proposed association rule mining algorithm are parallelized using MapReduce. The proposed algorithm works on high cardinality features and so no dimension detection is needed. Keyword- Hadoop; MapReduce; Association rule mining; Data mining; big data I. INTRODUCTION A. Big Data and Characteristics The data is collected and stored in every minute, every hour and every day in an organization or institute and is available in large quantity. But the amount of data is not of importance but what the organizations do with these data to identify information that can be useful for them. This can be done by analyzing the data to identify insights or critical information that can help the organization to make useful decisions for their growth. The term big data describes a large volume of data that is available in both structured and in unstructured formats. Even though the concept of big data is a new term, the process of collecting the data, storing them in large amounts and analyzing them to gather new information is something that has been done since long before big data has been used. The characteristics of big data can be explained using 3 V s such as (1) Volume, (2) Velocity and (3) Variety. The applications of big data include areas such as health care, telecom, finance, etc. In this paper the process of association rule generation in big data is discussed and an association rule mining technique is proposed to generate the rules from the KDD CUP 99 dataset. B. Data Mining in Big Data Big Data mining deals with a large amount of data that is stored in the data warehouses and databases. The concept of big data mining can be used to extract or identify the interesting patterns and information from these large data. Many data mining techniques are available that can be applied to the big data. They are classification, clustering, association rules, prediction, estimation, documentation and description. The researches around these techniques have been large since long ago. Many algorithms have been applied in each of the data mining techniques and this also applies to big data. One such well known technique that is applied is the association rule mining in big data. This is a most efficient data mining technique that is used to discover the various hidden patterns and information from large databases. Here the relationships between the various attributes of the data are identified using the association rule mining algorithm. Some basic types of association rule mining algorithms are the Apriori algorithm, Distributed algorithm and Parallel algorithm. C. Association Rule Mining The Association Rule Mining (ARM) [1] in data mining is a popular approach that is used to analyse the given dataset to discover interesting patterns or relationships between the various items in the dataset. The concept of strong association rules was first used by Agarwal et al. [2] to identify the various association rules between the items that are sold during a large scale transaction database collected from a supermarket using a point system. The relationship between the items is identified based on the purchase pattern. The ARM technique generates a set of association rules prevailing between the various items of the given dataset based on the number of occurrences of these items combination in the dataset. 179
2 An association rule is used to define the relationship between any two items in the given dataset. Consider three items A, B and C. The relation {A, B} C say that if a person buys two items A and B together, then he/she will most likely buy the item C also. That is, the relations between the items are generated by identifying the various patterns within the dataset. The Association Rule Mining (ARM) technique [3] consists of two stages as follows: 1) Identify the itemset that occur frequently in the dataset The frequent itemset are those that have a support value (sup(item)) equal to or greater than the minimum support value (min_sup) that is pre-defined. The support value of itemset is calculated as the number of transactions that contains that item. In the above example support of {A, B} is calculated as how many transactions have both A and B. 2) Association rule generation using frequent itemset: In this stage the interesting rules are generated by calculating the confidence factor (conf) for all the frequent itemset that are generated in previous stage. The confidence value for the above example rule of {A, B} C will be sup({a, B})/sup(C). D. MapReduce Approach for ARM The association rules and the generation of rules are widely used and they face many issues and the major one is the availability of large data and multidimensional datasets [4]. A single processor system and normal CPU speed and resources cannot handle such large data and this makes the algorithm inefficient to use. In recent developments, the growth of network technology and especially cloud platforms provided new ideas in terms of association rule generation by making use of parallel environment like Hadoop [5]. MapReduce has been a popular and more used for computing large amounts of data ever since it was launched by Google in its platform. The Google Distributed File System (GFS) and the Amazon Web Service (AWS) makes use of the Hadoop platform and MapReduce to provide their services. A MapReduce job usually splits the input data into various chunks and each of these are processed by the map tasks in parallel manner. The Mapper maps the small tasks by making use of the key and value pair concept and the outputs are sorted. Then the Reducer reduces the obtained outputs from the maps to obtain the final output. The MapReduce framework contains a single Job Tracker as the master and a single Task Tracker as the slave for each cluster node. All input and output in MapReduce are <key, value> pairs. The Hadoop is a Java based distributed programming environment sponsored by Apache that can be used to process and handle large amounts of data. Hadoop has been created using the concept of MapReduce for large processing by using a large number of nodes and clusters. In case of Association Rule Mining in MapReduce, the Mapper maps the task of obtaining the various combinations of items as the key and the value is used to keep track of the number of occurrences or the support count. Then finally the Reducer task will reduce the obtained set of Mappers for each key value and calculates the final support and confidence for all the candidate itemsets. This way the Association Rules can be generated with maximum support and confidence. This remainder of this paper is organized as follows: Section 2 explains about the various association rule mining algorithms using Hadoop and MapReduce; Section 3 describes the proposed method and its working; Section 4 shows the experimental results of the proposed method; and finally Section 5 provides the overall conclusion of the paper. II. LITERATURE SURVEY The MapReduce can be used to design the existing sequential algorithms into parallel algorithms that can be used to handle large amounts of data in a shorter time and so this is applied for association rule mining [6]. Some of the existing methods have been discussed as given below. A. State-of-art in Association Rule Mining Yang et al. proposed a MapReduce based programming model for generation of association rules in Hadoop framework to handle large volumes of data. The Apriori algorithm [7] is used as the underlying association rule generation technique. But the standard Apriori algorithm is time consuming and it takes a really consumes more time especially when dealing with many candidate sets. To overcome this issue, they implemented the improved Apriori algorithm that is parallelized using the Hadoop framework to save time. The use of Hadoop for association rule generation provided new research focus in upcoming years. The improved Apriori algorithm [8] is proposed by Yang et al. that mainly works using the MapReduce concept to handle large data by making use of the various nodes in Hadoop platform. Lin et al. [11] proposed a similar method for association rule generation by using the same Apriori approach for frequent itemset generation in Hadoop platform using the MapReduce approach. The mining process is executed in a fast manner by implementing the parallelized mining technique during frequent itemset generation. But parallelization cannot be handled effectively. For this purposed the MapReduce is used. They proposed a parallelization algorithm in MapReduce that performs better than the previously existing algorithms in terms of speed and efficiency in rule generation. That is, the comparison of results obtained here shows better performance in terms of both speed and the rule generation accuracy [9] with existing algorithms. Riondata et al. proposed a randomized algorithm for association rules mining that is implemented using a parallel approach [10] in MapReduce framework. The proposed approach generated the association rules appropriately based on the dataset content. At first the proposed PARMA (Parallel Association Rule Mining Algorithm) approach randomized the 180
3 MapReduce algorithm to identify the appropriate frequent itemsets and association rules by using a near-linear speed up process. A large number of random samples are mined by using the original dataset. Jongwook Woo et al. proposed a Market Based Analysis algorithm combined with MapReduce for association rule generation. This is one of the most used algorithms for association rules [12]. At first the algorithm sorts the give dataset in ascending order and then converts each instance of the dataset into a (key, value) pair and fit them into the MapReduce. Then the execution is done on the Amazon EC2 MapReduce platform. The obtained experimental results shows that the performance is increased by making use of the MapReduce parallel code but still there is a bottle neck at certain point when more nodes are used. B. Need for Proposed Method The use of binomial algorithm is not suitable in many datasets and a novel method should be available that can be applied to any format of datasets [13]. Also binomial transformation is complex and time consuming and is not necessary. It is difficult to handle and process large volumes of data in a single server and so there is a need to use parallel environment. In this paper an improved scalable and distributed key-value pair algorithm is proposed for the selection of frequent itemsets from the dataset and for association rules generation. The proposed algorithm is a bottom up approach since at first the candidate itemsets are generated and then the support values are calculated by getting the count from the dataset transactions. The minimum support value is then provided to converts the candidate itemsets to frequent itemsets. A very large dataset is used here and after selecting the frequent itemsets the association rules are generated. The implementation is done by making use of the MapReduce platform and the complete process is parallelized. III. PROPOSED METHOD The paper proposes and implements the association rule mining using a very large dataset in the Hadoop platform using MapReduce [14]. The proposed algorithm converts the input dataset into <key, value> pairs instead of binomial representation. This way, one level of transformation can be reduced at the end for converting binomial features to data features. The input dataset should be first preprocessed before going for the rule generation phase in MapReduce [15]. The various phases of the proposed algorithm are discussed below. 1) Phase 1: Generate frequent 1-itemsets The input dataset is stored in the HDFS of the Hadoop environment at first to make data access easy and fast for MapReduce operations [16]. The input data is then split into various chunks and provided to the Mapper that maps the data to the output. The output from the mapper is represented as <key, value> pair. The outputs obtained from all the maps are then combined together in the combiner and then sent to the reducer. Here the support values are calculated by combining the values corresponding to each of the key values. Then the support values are compared with the minimum support and the items that support these items are taken as the output and thisis the frequent 1-itemsets. 2) Phase 2: Generate candidate 2-itemsets and n-itemsets Next the candidate 2-itemsets are generated by the mapper using the frequent 1-itemsets. The count of each item in the candidate 2-itemsets is verified with the input data that is provided to the mapper. They are then combined using the combiner to calculate the count values of the 2 -itemsets and provided to the reducer. The reducer further reduces and counts the support values of 2-itemsets. This is repeated till all the possible candidate n-itemsets are generated. The same process is repeated until no possible frequent itemset is available in previous iteration. 3) Phase 3: Association rule generation Finally after generating all the frequent n-itemsets, the association rules are generated based on confidence values. The confidence values are calculated by using the support values of the frequent itemsets that form the rules. The output contains all the selected itemset value and its support count. The output is written in an output file. These support values are then used for confidence calculation and the rules that contain 100% confidence are generated as the output rules. The overall association rule generation as discussed above is implemented in the Hadoop framework by creating a sing node Hadoop environment [17]. The time in Hadoop is synchronized with the system time and the time values are calculated in milliseconds using the time function in Hadoop. The data flow for two iterations of MapReduce in Hadoop is shown below in Fig
4 Fig. 1: Data flow showing two iterations of proposed method First the dataset is read as input by the MapReduce code from the HDFS storage and it processes each item as a separate key to calculate the frequent 1-itemset as in Fig. 1. Then using pair of items from the 1-itemset the frequent 2-itemsets are generated. This process is repeated till any number of iterations based on the number of itemsets needed. Fig. 1 shows till 3- itemset calculation using MapReduce. The key used in the Mapper represents the n-itemsets where n is the number items used to form the key. The MapReduce flow of the proposed MapReduce framework is shown below in Fig. 2. Fig. 2: Proposed MapReduce framework During the MapReduce operation the input dataset or file is split into many sections in the Mapper phase with each Mapper having a unique key. In ARM the key represents the items available within the dataset and the value is the number of occurrence of the item in the dataset. Initially the count is set to 1 in the Mapper and for each occurrence this count is increment. Finally in the Reducer the total occurrence is found using merge and the support and confidence are calculated. The output file consist of the list of rules generated based on the support and confidence. IV. EXPERIMENTATION AND RESULTS A. Dataset Description The proposed approach for association rule mining is applied to KDD CUP 99 data and the simulation details are presented here. The KDD CUP 99 input dataset consist of records from four categories of attacks such as Denial of Service, user-to-root, probing attack and remote-to-local. The instances of the dataset consists of both labeled and unlabeled records in which each labeled records consists of 41 attributes and one target attribute. The dataset consists of three groups of values such as basic, content based and time based values. And not all the values are binary. The training set consists of almost 5 million instances of input dataset. The description of test set and training set are given below: Training Set Contains 494,021 connections or records with a total of 22 attack types. 182
5 Test Set Contains 311,029 connections or records with 17 new attacks types not available in training data. No. Value No. Value 1 duration 22 is_guest_login 2 protocol_type 23 count 3 service 24 srv_count 4 flag 25 serror_rate 5 src_bytes 26 srv_serror_rate 6 dst_bytes 27 rerror_rate 7 land 28 srv_rerror_rate 8 wrong_fragment 29 same_srv_rate 9 urgent 30 diff_srv_rate 10 hot 31 srv_diff_host_rate 11 num_failed_logins 32 dst_host_count 12 logged_in 33 dst_host_srv_count 13 num_compromised 34 dst_host_same_srv_rate 14 root_shell 35 dst_host_diff_srv_rate 15 su_attempted 36 dst_host_same_src_port_rate 16 num_root 37 dst_host_srv_diff_host_rate 17 num_file_creation 38 dst_host_serror_rate 18 num_shells 39 dst_host_srv_serror_rate 19 num_access_files 40 dst_host_rerror_rate 20 num_outbound_cmds 41 dst_host_srv_rerror_rate 21 is_host_login Table 1: Features of the input dataset The 41 features of the KDD CUP 99 dataset is shown in Table 1 and Fig. 3 shows the sample values of the dataset. The values from 1 to 41 are represented by separating them using, (comma) in the dataset given below in Fig. 3. That is, each instance or row of the dataset consists of 42 attributes with 41 feature attributes and one class attribute all separated using a, (comma) as in the figure below. The row values are split to read each attributes separately. Fig. 3: KDD CUP 99 dataset sample values B. Results and Discussion The input dataset is split into many tasks by using the Map and Reduce in the Hadoop environment during the execution. The input data is sent to the mapper that will split the instances of the data into <key, value> pairs and then it is sent to the reducer. The data is sorted and then shuffled before it is sent to the reducer. The final result is obtained by reducing the <key, value> pairs 183
6 by calculating support and confidence and then selecting the rules based on that. Based on this it is possible to identify if the user of a specific instance or attack is a guest login or host login. The obtained values of support and confidence during the 4 levels of MapReduce operations are shown in Fig. 4. Fig. 4: Support and Confidence The execution of the MapReduce phase [18] in Hadoop and the obtained final results of the reducer phase are shown in Fig. 5 and Fig. 6 respectively. Fig. 5 shows the execution of the Reducer phase and the output file is being generated. The final statistics of the MapReduce job is shown in Fig. 5. The generated output file is shown in Fig. 6. Fig. 5: Mapper and Reducer execution Fig. 6: Final output 184
7 The final output shown in Fig.6 shows the list of all frequent items sets that are generated along with the support and confidence values near them. The format represented in the output is <itemset, support, confidence> and this is generated for all possible combinations of itemsets for the given input attributes. In this case the 2-itemsets are generated. V. CONCLUSION AND FUTURE WORK The concept of association rule generation or mining can be done effectively in distributed systems that can use parallel executions as in Hadoop environment. This is because it can be scaled up to large volumes of data with less execution time and cost with good accuracy. The proposed algorithm in this paper also considers the type of input data and can be applied to any data formats. By dividing the input data into many splits and processing them using many nodes, the execution is made easy. The management issues such as data transfer between the nodes, storage of data, failure of any node and other issues within the cluster are all handled by Hadoop automatically. Thus the proposed system is more efficient in terms of scalability and robustness. The proposed association rule mining algorithm also has the same features and so it is efficient. Also by making use of the key-value pair approach, the processing is made much easier compared to that of the existing binomial approach. But still the proposed algorithm is not the best in performance when comes to really large datasets. So in the future the Fuzzy based association rule mining can be done in Hadoop to handle data larger than the one in this paper. Further the input data can be classified based on the calculated support and confidence values by using a suitable classification algorithm. In future this work can be extended to implement feature selection first using information gain or mutual information [19] before implanting ARM. REFERENCES [1] Ashrafi, M.Z.,Taniar,D., Smith,K., ODAM:An Optimized Distributed Association Rule Mining Algorithm, Distributed Systems Online, IEEE, Volume 5, Issue 3, [2] R.Agrawal, R.Srikant, Fast Algorithms for Mining Association Rules, In Proceedings of International Conference on Very Large DataBases,pp , Santiago,Chile,September1994. [3] JongSooPark, Ming-SyanChen, PhilipS. Yu, An Effective Hash-based Algorithm for Mining Association Rules, In Proceedings of the ACMSIGMOD International Conference on Management of Data, Michael Carey and Donovan Schneider, ACM, [4] Ozel,S.A., Guvenir,H.A., An Algorithm for Mining Association Rules using Perfect Hashing and Database Pruning,10th Turkish Symposiumon Artificial Intelligence and Neural Networks, Gazimagusa, Springer, pp , [5] KaramGouda, Mohammed JaveedZaki, Efficiently Mining Maximal Frequent Itemsets, In Proceedings of the IEEE International Conference on DataMining, pp , November29-December 02, [6] J.Han,J. Pei,Y. Yin, Mining Frequent Patterns without Candidate Generation, ACMSIGMOD International Conference,Dallas,2000. [7] D.W.Cheung, Jiawei Han, V.T. Ng, A.W. Fu, Yongjian Fu, "Afast Distributed Algorithm for Mining Association Rules, In Proceedings of International Conference on Parallel and Distributed Information Systems, IEEE CS Press, [8] AnsariE, DastghaibifardG, KeshtkaranM, KaabiH, Distributed Frequent Itemset Mining using Trie Data Structure,International Journal of Computer Science, Volume 35, Issue 3, pp , [9] Park,J.S.,Chen,M. S., Yu,P. S., Efficient Paralle l Data Mining for Association Rules, In Proceedings of the Fourth International Conference on Information and Knowledge Management,pp.31-33, [10] Woo, J., Xu, Y, Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing, In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, [11] Lin, Ming-Yen, Pei-Yu Lee, Sue-Chen Hsueh, "Apriori-based Frequent Itemset Mining Algorithms on MapReduce", In Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ACM, [12] PeddiKishor, SammulalPorika, Literature Survey on Association Rule Discovery in Data Mining, International Journal of Computer Science and Management Research, Volume 2, Issue 1, January [13] Zhang C.S, Li Z.Y, Zheng D.S., An Improved Algorithm for Apriori, In Proceedings of the 1st International Workshop on Education Technology and Computer Science, Volume 1, pp , [14] C.Jin, C.Vecchiola, R.Buyya, MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms, Fourth IEEE International Conference on escience, pp , [15] T.Elsayed, J.Lin, Douglas W. Oard, Pairwise Document Similarity in Large Collections with MapReduce, In Proceedings of 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, [16] J.H.C. Yeung, C.C. Tsang, K.H. Tsoi, B.Kwan, C. Cheung, A.P.C. Chan P.H.W. Leong, Map-reduce as a Programming Model for Custom Computing Machines, In Proceedings of the 16th IEEE Symposium on Field-Programmable Custom Computing Machines, pp , [17] M.Zaharia, A.Konwinski, A. D. Joseph, R. Katz, I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, EECS Department University of California, Berkeley Technical Report Number UCB/EECS August 19,
8 [18] MohammadhosseinBarkhordari, Mahdi Niamanesh, ScadiBino: An Effective MapReduce-based Association Rule Mining Method, ACM 16th International Conference on Electronic Commerce, August [19] P.Ganesh Kumar, D.Devaraj, Intrusion Detection using Artificial Neural Network with Reduced Input Features, International Journal on Soft Computing, ICTACT, Issue 1, pp , July
Network attack analysis via k-means clustering
Network attack analysis via k-means clustering - By Team Cinderella Chandni Pakalapati cp6023@rit.edu Priyanka Samanta ps7723@rit.edu Dept. of Computer Science CONTENTS Recap of project overview Analysis
More informationAnalysis of Feature Selection Techniques: A Data Mining Approach
Analysis of Feature Selection Techniques: A Data Mining Approach Sheena M.Tech Scholar CSE, SBSSTC Krishan Kumar Associate Professor CSE, SBSSTC Gulshan Kumar Assistant Professor MCA, SBSSTC ABSTRACT Feature
More informationAnalysis of FRAUD network ACTIONS; rules and models for detecting fraud activities. Eren Golge
Analysis of FRAUD network ACTIONS; rules and models for detecting fraud activities Eren Golge FRAUD? HACKERS!! DoS: Denial of service R2L: Unauth. Access U2R: Root access to Local Machine. Probing: Survallience....
More informationCHAPTER V KDD CUP 99 DATASET. With the widespread use of computer networks, the number of attacks has grown
CHAPTER V KDD CUP 99 DATASET With the widespread use of computer networks, the number of attacks has grown extensively, and many new hacking tools and intrusive methods have appeared. Using an intrusion
More informationA Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) A Technique by using Neuro-Fuzzy Inference System for Intrusion Detection and Forensics Abhishek choudhary 1, Swati Sharma 2, Pooja
More informationIntrusion Detection System Based on K-Star Classifier and Feature Set Reduction
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 5 (Nov. - Dec. 2013), PP 107-112 Intrusion Detection System Based on K-Star Classifier and Feature
More informationClassification Trees with Logistic Regression Functions for Network Based Intrusion Detection System
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. IV (May - June 2017), PP 48-52 www.iosrjournals.org Classification Trees with Logistic Regression
More informationCHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET
CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET An IDS monitors the network bustle through incoming and outgoing data to assess the conduct of data
More informationINTRUSION DETECTION SYSTEM
INTRUSION DETECTION SYSTEM Project Trainee Muduy Shilpa B.Tech Pre-final year Electrical Engineering IIT Kharagpur, Kharagpur Supervised By: Dr.V.Radha Assistant Professor, IDRBT-Hyderabad Guided By: Mr.
More informationDetection of DDoS Attack on the Client Side Using Support Vector Machine
Detection of DDoS Attack on the Client Side Using Support Vector Machine Donghoon Kim * and Ki Young Lee** *Department of Information and Telecommunication Engineering, Incheon National University, Incheon,
More informationFUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS
FUZZY KERNEL C-MEANS ALGORITHM FOR INTRUSION DETECTION SYSTEMS 1 ZUHERMAN RUSTAM, 2 AINI SURI TALITA 1 Senior Lecturer, Department of Mathematics, Faculty of Mathematics and Natural Sciences, University
More informationBig Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform
Big Data Analytics: Feature Selection and Machine Learning for Intrusion Detection On Microsoft Azure Platform Nachirat Rachburee and Wattana Punlumjeak Department of Computer Engineering, Faculty of Engineering,
More informationInternational Journal of Scientific & Engineering Research, Volume 6, Issue 6, June ISSN
International Journal of Scientific & Engineering Research, Volume 6, Issue 6, June-2015 1496 A Comprehensive Survey of Selected Data Mining Algorithms used for Intrusion Detection Vivek Kumar Srivastava
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationFrequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management
Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES
More informationFeature Reduction for Intrusion Detection Using Linear Discriminant Analysis
Feature Reduction for Intrusion Detection Using Linear Discriminant Analysis Rupali Datti 1, Bhupendra verma 2 1 PG Research Scholar Department of Computer Science and Engineering, TIT, Bhopal (M.P.) rupal3010@gmail.com
More informationCHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION
55 CHAPTER 4 DATA PREPROCESSING AND FEATURE SELECTION In this work, an intelligent approach for building an efficient NIDS which involves data preprocessing, feature extraction and classification has been
More informationA Hybrid Anomaly Detection Model using G-LDA
A Hybrid Detection Model using G-LDA Bhavesh Kasliwal a, Shraey Bhatia a, Shubham Saini a, I.Sumaiya Thaseen a, Ch.Aswani Kumar b a, School of Computing Science and Engineering, VIT University, Chennai,
More informationA Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms
ISSN (Online) 2278-121 ISSN (Print) 2319-594 Vol. 4, Issue 6, June 215 A Study on NSL-KDD set for Intrusion Detection System Based on ification Algorithms L.Dhanabal 1, Dr. S.P. Shantharajah 2 Assistant
More informationExperiments with Applying Artificial Immune System in Network Attack Detection
Kennesaw State University DigitalCommons@Kennesaw State University KSU Proceedings on Cybersecurity Education, Research and Practice 2017 KSU Conference on Cybersecurity Education, Research and Practice
More informationFuzzy Grids-Based Intrusion Detection in Neural Networks
Fuzzy Grids-Based Intrusion Detection in Neural Networks Izani Islam, Tahir Ahmad, Ali H. Murid Abstract: In this paper, a framework is used for intrusion detection that shows the effectiveness of data
More informationData Reduction and Ensemble Classifiers in Intrusion Detection
Second Asia International Conference on Modelling & Simulation Data Reduction and Ensemble Classifiers in Intrusion Detection Anazida Zainal, Mohd Aizaini Maarof and Siti Mariyam Shamsuddin Faculty of
More informationMachine Learning for Network Intrusion Detection
Machine Learning for Network Intrusion Detection ABSTRACT Luke Hsiao Stanford University lwhsiao@stanford.edu Computer networks have become an increasingly valuable target of malicious attacks due to the
More informationImproved Frequent Pattern Mining Algorithm with Indexing
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More informationAn Efficient Decision Tree Model for Classification of Attacks with Feature Selection
An Efficient Decision Tree Model for Classification of Attacks with Feature Selection Akhilesh Kumar Shrivas Research Scholar, CVRU, Bilaspur (C.G.), India S. K. Singhai Govt. Engineering College Bilaspur
More informationEfficient Algorithm for Frequent Itemset Generation in Big Data
Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru
More informationDocument Clustering with Map Reduce using Hadoop Framework
Document Clustering with Map Reduce using Hadoop Framework Satish Muppidi* Department of IT, GMRIT, Rajam, AP, India msatishmtech@gmail.com M. Ramakrishna Murty Department of CSE GMRIT, Rajam, AP, India
More informationAvailable online at ScienceDirect. Procedia Computer Science 79 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 79 (2016 ) 207 214 7th International Conference on Communication, Computing and Virtualization 2016 An Improved PrePost
More informationSystem Health Monitoring and Reactive Measures Activation
System Health Monitoring and Reactive Measures Activation Alireza Shameli Sendi Michel Dagenais Department of Computer and Software Engineering December 10, 2009 École Polytechnique, Montreal Content Definition,
More informationWhy Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set
Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Data Set Maheshkumar Sabhnani and Gursel Serpen Electrical Engineering and Computer Science Department The University
More informationNAVAL POSTGRADUATE SCHOOL THESIS
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS NEURAL DETECTION OF MALICIOUS NETWORK ACTIVITIES USING A NEW DIRECT PARSING AND FEATURE EXTRACTION TECHNIQUE by Cheng Hong Low September 2015 Thesis
More informationThe Transpose Technique to Reduce Number of Transactions of Apriori Algorithm
The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute
More informationMining of Web Server Logs using Extended Apriori Algorithm
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationAnomaly detection using machine learning techniques. A comparison of classification algorithms
Anomaly detection using machine learning techniques A comparison of classification algorithms Henrik Hivand Volden Master s Thesis Spring 2016 Anomaly detection using machine learning techniques Henrik
More informationOpen Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments
Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing
More informationClassification of Attacks in Data Mining
Classification of Attacks in Data Mining Bhavneet Kaur Department of Computer Science and Engineering GTBIT, New Delhi, Delhi, India Abstract- Intrusion Detection and data mining are the major part of
More informationJournal of Asian Scientific Research EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM. Soukaena Hassan Hashem
Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 EFFICIENCY OF SVM AND PCA TO ENHANCE INTRUSION DETECTION SYSTEM Soukaena Hassan Hashem Computer Science
More informationMining Distributed Frequent Itemset with Hadoop
Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario
More informationDMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE
DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationFREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India
Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,
More informationAn Improved Performance Evaluation on Large-Scale Data using MapReduce Technique
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 27-32 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationAssociation Rules Mining using BOINC based Enterprise Desktop Grid
Association Rules Mining using BOINC based Enterprise Desktop Grid Evgeny Ivashko and Alexander Golovin Institute of Applied Mathematical Research, Karelian Research Centre of Russian Academy of Sciences,
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationAn Algorithm for Frequent Pattern Mining Based On Apriori
An Algorithm for Frequent Pattern Mining Based On Goswami D.N.*, Chaturvedi Anshu. ** Raghuvanshi C.S.*** *SOS In Computer Science Jiwaji University Gwalior ** Computer Application Department MITS Gwalior
More informationADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 ADAPTIVE HANDLING OF 3V S OF BIG DATA TO IMPROVE EFFICIENCY USING HETEROGENEOUS CLUSTERS Radhakrishnan R 1, Karthik
More informationA Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm
A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of
More informationAPRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW
International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 341 348 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Parallel Approach
More informationIMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING
IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute
More informationMining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework
I.J. Intelligent Systems and Applications, 2015, 07, 44-49 Published Online June 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.07.06 Mining Interesting Infrequent Itemsets from Very
More informationClassifying Network Intrusions: A Comparison of Data Mining Methods
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2005 Proceedings Americas Conference on Information Systems (AMCIS) 2005 Classifying Network Intrusions: A Comparison of Data Mining
More informationAdaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 55 Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm Wafa'
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationPAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER
PAYLOAD BASED INTERNET WORM DETECTION USING NEURAL NETWORK CLASSIFIER A.Tharani MSc (CS) M.Phil. Research Scholar Full Time B.Leelavathi, MCA, MPhil., Assistant professor, Dept. of information technology,
More informationTemporal Weighted Association Rule Mining for Classification
Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider
More informationA Back Propagation Neural Network Intrusion Detection System Based on KVM
International Journal of Innovation Engineering and Science Research Open Access A Back Propagation Neural Network Intrusion Detection System Based on KVM ABSTRACT Jiazuo Wang Computer Science Department,
More informationNetwork Anomaly Detection using Co-clustering
Network Anomaly Detection using Co-clustering Evangelos E. Papalexakis, Alex Beutel, Peter Steenkiste Department of Electrical & Computer Engineering School of Computer Science Carnegie Mellon University,
More informationI. INTRODUCTION. Keywords : Spatial Data Mining, Association Mining, FP-Growth Algorithm, Frequent Data Sets
2017 IJSRSET Volume 3 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Emancipation of FP Growth Algorithm using Association Rules on Spatial Data Sets Sudheer
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationUday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved
Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth
More informationAn Efficient Algorithm for finding high utility itemsets from online sell
An Efficient Algorithm for finding high utility itemsets from online sell Sarode Nutan S, Kothavle Suhas R 1 Department of Computer Engineering, ICOER, Maharashtra, India 2 Department of Computer Engineering,
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationCHAPTER 2 DARPA KDDCUP99 DATASET
44 CHAPTER 2 DARPA KDDCUP99 DATASET 2.1 THE DARPA INTRUSION-DETECTION EVALUATION PROGRAM The number of intrusions is to be found in any computer and network audit data are plentiful as well as ever-changing.
More informationPamba Pravallika 1, K. Narendra 2
2018 IJSRSET Volume 4 Issue 1 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section : Engineering and Technology Analysis on Medical Data sets using Apriori Algorithm Based on Association Rules
More informationCLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationAN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE
AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3
More informationMining Frequent Itemsets Along with Rare Itemsets Based on Categorical Multiple Minimum Support
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 6, Ver. IV (Nov.-Dec. 2016), PP 109-114 www.iosrjournals.org Mining Frequent Itemsets Along with Rare
More informationI ++ Mapreduce: Incremental Mapreduce for Mining the Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. IV (May-Jun. 2016), PP 125-129 www.iosrjournals.org I ++ Mapreduce: Incremental Mapreduce for
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationA Hierarchical Document Clustering Approach with Frequent Itemsets
A Hierarchical Document Clustering Approach with Frequent Itemsets Cheng-Jhe Lee, Chiun-Chieh Hsu, and Da-Ren Chen Abstract In order to effectively retrieve required information from the large amount of
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationUtility Mining Algorithm for High Utility Item sets from Transactional Databases
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. V (Mar-Apr. 2014), PP 34-40 Utility Mining Algorithm for High Utility Item sets from Transactional
More informationThe Caspian Sea Journal ISSN: A Study on Improvement of Intrusion Detection Systems in Computer Networks via GNMF Method
Available online at http://www.csjonline.org/ The Caspian Sea Journal ISSN: 1578-7899 Volume 10, Issue 1, Supplement 4 (2016) 456-461 A Study on Improvement of Intrusion Detection Systems in Computer Networks
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationPerformance Analysis of Apriori Algorithm with Progressive Approach for Mining Data
Performance Analysis of Apriori Algorithm with Progressive Approach for Mining Data Shilpa Department of Computer Science & Engineering Haryana College of Technology & Management, Kaithal, Haryana, India
More informationAn Evolutionary Algorithm for Mining Association Rules Using Boolean Approach
An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,
More informationA Survey on Apriori algorithm using MapReduce Technique
A Survey on Apriori algorithm using MapReduce Technique Mr. Kiran C. Kulkarni 1, Mr.R.S.Jagale 2, Prof.S.M.Rokade 3 1 PG Student, Computer Dept., SVIT COE, Nasik, Maharashtra, India 2 PG Student, Computer
More informationDATA REDUCTION TECHNIQUES TO ANALYZE NSL-KDD DATASET
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 5 ISSN : 2456-3307 Mapreduce Based Pattern Mining Algorithm In Distributed
More informationReview on Data Mining Techniques for Intrusion Detection System
Review on Data Mining Techniques for Intrusion Detection System Sandeep D 1, M. S. Chaudhari 2 Research Scholar, Dept. of Computer Science, P.B.C.E, Nagpur, India 1 HoD, Dept. of Computer Science, P.B.C.E,
More informationGenetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection
Pattern Recognition 40 (2007) 2373 2391 www.elsevier.com/locate/pr Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection Chi-Ho Tsang, Sam Kwong,
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationComparative Analysis of K means Clustering Sequentially And Parallely
Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India
More informationResearch and Improvement of Apriori Algorithm Based on Hadoop
Research and Improvement of Apriori Algorithm Based on Hadoop Gao Pengfei a, Wang Jianguo b and Liu Pengcheng c School of Computer Science and Engineering Xi'an Technological University Xi'an, 710021,
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationGurpreet Kaur 1, Naveen Aggarwal 2 1,2
Association Rule Mining in XML databases: Performance Evaluation and Analysis Gurpreet Kaur 1, Naveen Aggarwal 2 1,2 Department of Computer Science & Engineering, UIET Panjab University Chandigarh. E-mail
More informationComparing the Performance of Frequent Itemsets Mining Algorithms
Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationA New Approach to Web Data Mining Based on Cloud Computing
Regular Paper Journal of Computing Science and Engineering, Vol. 8, No. 4, December 2014, pp. 181-186 A New Approach to Web Data Mining Based on Cloud Computing Wenzheng Zhu* and Changhoon Lee School of
More informationModeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.7, July 2008 39 Modeling An Intrusion Detection System Using Data Mining And Genetic Algorithms Based On Fuzzy Logic G.V.S.N.R.V.Prasad
More informationA Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining
A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India
More informationKeywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More information