Using Center Representation and Variance Effect on K-NN Classification

Size: px
Start display at page:

Download "Using Center Representation and Variance Effect on K-NN Classification"

Transcription

1 Using Center Representation and Variance Effect on K-NN Classification Tamer TULGAR Department of Computer Engineering Girne American University Girne, T.R.N.C., Mersin 10 TURKEY Abstract The K-Nearest Neighbor classifier is a well-known and widely applied method in data mining applications. Nevertheless, its high computation and memory usage cost makes the classical K-NN not feasible for today s Big Data analysis applications. To overcome the cost drawbacks of the known data mining methods, several distributed environment alternatives have emerged. Recently, several K-NN based classification algorithms have been proposed which are distributed methods suitable for distributed computing environments and applicable for emerging data analysis needs. In this work, a new CR-KNN algorithm is proposed, which improves the classification accuracy performance of the well-known K- Nearest Neighbor (K-NN) algorithm by benefiting from the center representation of the instances belonging to different data classes. The proposed algorithm relies on the data class representations which are the closest to the test instance. The CR-KNN algorithm was tested using several real-datasets belonging to different application areas. The performance results acquired after extensive experiments are presented in this paper and they prove that the proposed CR-KNN algorithm is a competitive alternative to other studies recently proposed in the literature. Keywords- Classification, K-Nearest Neighbor, Center Representation, Variance I. INTRODUCTION In the current information age, businesses tend to base their policies and forecasting on the analysis and processing continuously growing volumes of data retrieved from many different online streaming sources [1]. To achieve precise business decisions, data coming from different environments (e.g. different social media tools, data warehouses, cloud storages etc.) need to be analyzed using efficient data mining algorithms. Analyzing data coming from high number of data sources results in working with huge amounts of unstructured raw data, which are big in terms of volume, variety and velocity of acquisition. The process of analyzing such data is recently a popular research area, known as the Big Data Analysis [2]. One of the crucial data mining tasks is classification. Classification, which is the task of assigning data instances to one of several predefined data classes, is a pervasive problem that encompasses many diverse applications [3]. To classify data in the big data age, centralized techniques lack the low classification delay performance, which is vital to cope with the high velocity data streams of big data. To tackle with this timing requirement of the modern data classification, several distributed environments have been tested and used by different researchers. One of these distributed environments is the Apache Hadoop and the Google s MapReduce Framework [4]. K Nearest Neighbor Classification (K-NN) has been one of the most popular classification algorithms [5]. Nevertheless, the classical K-NN algorithm suffers from high computation costs, which makes it inconvenient for modern big data analysis that requires rapid and accurate classification results. The classical K-NN algorithm is based on calculating the distances between the test data instance to be classified and all of the instances in the training data set and finding the closest K number of training instances. After detecting the K number of closest training instances, the K-NN algorithm applies majority voting which is the process of detecting the data class with the maximum number of instances among the K selected instances. Traditionally, individual instance distances based classification strategy of the classical K-NN algorithm makes this algorithm perform weakly in terms of the classification accuracy. On the other hand, K-NN s individual instance distances strategy makes K-NN a strong candidate for distributed data classification, which is the basis of achieving acceptably low classification delays while classifying big data. ISSN : Vol. 8 No. 10 Oct

2 Taking into account the K-NN s suitability to distributed environments, many K-NN based studies, which try to improve the K-NN algorithms performance, have been proposed in the literature [6-16] to meet the modern data mining needs. In this paper, a new K Nearest Neighbor classification algorithm, which will be referred as the Center Representation KNN algorithm (CR-KNN) hereafter, is proposed. The CR-KNN algorithm tries to improve the classification accuracy performance of the classical K-NN algorithm. The main idea of the CR-KNN algorithm is to base the classification decision of the classical K-NN algorithm on the class representations by calculating the centroids of the training instances belonging to the same classes among the K closest instances detected by the K-NN algorithm. In other words, the classical majority voting decision step of the classical K-NN is refined by a better class representation based classification decision, which is also computationally efficient. Furthermore, the CR-KNN is further improved by introducing the variance effect to the distance measurement so that the algorithm can adapt to the varying data distributions of different datasets. The rest of this paper is organized as follows: Section II presents the proposed CR-KNN algorithm in detail. The experimental setup and the achieved performance results are presented in section III. Finally, the section IV concludes the paper and states the future works. II. THE PROPOSED CR-KNN ALGORITHM A. The Classical K-NN Algorithm In a classification task, if a data instance is considered as a vector of feature values, then a data instance i can be denoted as vi which corresponds to a vector containing p features. Therefore, classifying a data instance means detecting the correct data classes of n test data instances by comparing their similarity to m training data instances using their feature values. The classical K-NN algorithm is based on the idea of calculating the distances of a test data to be classified to all of the of data instances in the training data set. On computing the distances, the classical K-NN considers the K number of minimum distances and uses the training instances which produced those K distances. These K training instances are the K Nearest Neighbors of the tested data. After detecting the K nearest neighbors, the classical K-NN detects which data class has the most number of instances among the selected K nearest neighbors, which is known as the process of majority voting. As it can be understood from the given explanation of the classical K-NN algorithm, the classification decision is based on the individual instance distances between all testing and training instances. Nevertheless, when a data set with high number of instances and high number of features per instance needs to be classified, the classical K-NN algorithm s classification accuracy performance becomes lower than its competitors, like K-Means classification [17]. Hence, it can be concluded that to become a classification algorithm suitable for modern data analysis needs, the K-NN s classification accuracy performance should be improved. Taking into account these needs, the CR-KNN algorithm is proposed. B. The proposed CR-KNN Algorithm The proposed CR-KNN algorithm improves the classical K-NN algorithm with contributions in the distance measurement and classification decision steps. The classical Euclidean Distance Calculation is given in (1). = (1) Like classical K-NN, the first step of the CR-KNN algorithm is calculating the distances between the testing data instance and all training data instances using (1). After computing all the distances, the CR-KNN finds the K nearest neighbors with the minimum distances to the tested data instance. At this point, unlike classical KNN, the proposed CR-KNN finds which data classes exist among the K nearest neighbors and groups the nearest neighbors according to which data classes they belong to. After the class detection, the CR-KNN finds the centers of the training data instances for each detected class. To clarify the introduced idea, consider the following simplified example, as the detected K nearest neighbors: ISSN : Vol. 8 No. 10 Oct

3 <ts 9, <tr 3, 0.11, A> <ts 9, <tr 11, 0.14, B> <ts 9, <tr 27, 0.15, B> <ts 9, <tr 23, 0.12, C> <ts 9, <tr42, 0.12, A> In the example given above, 5 nearest neighbors are from classes A, B and C where tr 3 and tr 4 are from class A, tr 11 and tr 27 are from class B and tr 23 is from class C. Hence, the listed training instances of each of the listed classes will be used to represent each of the classes by their means (in other words the centers). The mean of a class with N instances from the K nearest neighbors are calculated using (2). In other words, centroids are found that represent classes. = (2) Upon calculating the means, the CR-KNN calculates the minimum of the distances of the testing instance to be classified to the calculated means using (3) for all classes j. As the classification decision which is the main contribution of the CR-KNN, the algorithm relies on the minimum T of all centroids j rather than classical majority voting. = min ( ) (3) The second contribution, the variance effect can also be seen in (3). The variance is defined in (4). = (4) Calculating the distance of the testing instance to centroid contribution of the CR-KNN improves the classification accuracy of the classical K-NN by eliminating such misclassifications: Please consider the simple example given at the top. In such a case, because of majority voting, the classical K- NN algorithm will conclude that the test instance 9 belongs to class A, since class A and class B has equal number of instances among the K nearest neighbors and the class A contains an instance, which has the closest proximity to the testing instance. Nevertheless, the same class A has an instance, which is further away from the class B instances to the test instance. 9. To give equal chances to classes in the classification decision, CR-KNN proposes to represent the classes A, B and C of the K nearest neighbors by centroids and base the classification decision on the distance of the test instance to the centroids of the classes rather than relying on the individual data members. In this way, if class B has a centroid closer to the test instance, the CR-KNN can correctly decide that the test instance should belong to class B. Furthermore, CR-KNN addresses another classification problem that is the fact that the different datasets belonging to different application areas may have data distributions that are not easy to deal with. The main difficulty encountered in different datasets is a data distribution which includes radical data instances belonging to different classes may have close proximity to other class instances. In such cases, false classification decisions can be easily produced by the classical K-NN algorithm. To compensate the distance calculation based decisions of K-NN, the proposed CR-KNN algorithm includes the effect of how far the instances are spread around the centroids to the classification decision by including the variance to the decision. This compensation is achieved by dividing the calculated distances to the centroids to the variance. This results in concluding that a smaller variance will produce a stronger membership strength versus a greater variance will produce a weaker membership strength. The experimental results, which are presented in the next section, demonstrate that both of the proposed improvements enhance the performance of the classical K-NN algorithm for datasets with different natures. ISSN : Vol. 8 No. 10 Oct

4 III. EXPERIMENTAL SETTING AND THE RELUTS The proposed CR-KNN algorithm was implemented in Sun JAVA JDK 1.8 [18]. To evaluate and compare the performance of the proposed algorithm several datasets from the UCI machine learning repository [19] was used. The summary of the used datasets are given in Table 1. While selecting the datasets, the factors like being binary or multi-class classification problem, various instance counts and different data distributions were considered so that the strengths and weaknesses of the proposed CR-KNN algorithm can be analyzed. Also, it is worth to mention that the experimental model was validated using 10-fold cross validation and the presented classification accuracy values are the averages of the results found after cross validations. TABLE I. REAL DATASETS USED IN THE EXPERIMENTS Dataset Instances Features Classes ionosphere wdbc satimage pendigits A. Experimental Results The results presented in this section demonstrate the classification accuracy of the proposed CR-KNN algorithm and other two recent classification schemes SR-KNN [14] and KMEANS-MOD [17]. The classification accuracy can be defined as the ratio of the number of correct classifications decisions to the number of total classification decisions for a given dataset s total number of testing instances. The SR-KNN, which was proposed in [14] modifies the KNN algorithm with a self representation of the data clusters ideology. The presented main aim is to learn an optimal k value in KNN to improve the accuracy of the classification. To support their claim the authors compare their results with three other algorithms named as knnc, LMMN and ADNN which are summarized in [14]. The results presented in [14] paper show better performance when compared to these three algorithms. The KMEANS-MOD proposed in [17] on the other hand, is an improved KMEANS clustering based algorithm which also benefits from the effect of variance and introduces a membership strength as the metric to base its classification decisions. Since the algorithm proposed in [17] is a KMEANS based algorithm, the classification idea is completely dependent on strong class representations using whole class data centers. Table II provides the classification accuracy results which demonstrate the effect of adding variance effect to the distance calculation of the CR-KNN for different K values. TABLE II. CR-KNN WITHOUT THE VARIANCE EFFCT VS INCLUDING VARIANCE EFFECT TO CR-KNN CR-KNN without Variance Effect CR-KNN with Variance Effect K=5 K=7 K=9 K=5 K=7 K=9 pendigits satimage ionosphere wdbc By investigating the results, it can be observed that the variance effect improves the classification accuracy performance of the CR-KNN when the dataset used in the classification shows outlying instances in the data distribution of the class instances like explained in section II. What is more, the results also show that the variance addition to the distance calculation does not have a degrading effect on the tested datasets, which have a clearer data separation among the class instances. Furthermore, it can be observed that, reasonably small K values are enough to achieve improved classification accuracy performance from the proposed CR-KNN algorithm. The proposed CR-KNN algorithm was also compared against the classification schemes SR-KNN [14] and KMEANS-MOD [17] in terms of the classification accuracy performance to investigate how the proposed CR-KNN algorithm compares against both a recent Nearest Neighbor based classification scheme and a completely centroid ISSN : Vol. 8 No. 10 Oct

5 representation based classification approach. The comparative results mentioned above can be observed in Table III. TABLE III. CR-KNN CLASSIFICATION ACCURACY PERFORMANCE COMPARISONS KMEANS- MOD Classical- KNN SR-KNN CR-KNN pendigits satimage ionosphere wdbc The comparative results presented in Table III illustrate that the proposed CR-KNN improves the performance of the classical KNN algorithm for the majority of the tested datasets. Against the other K-NN based algorithm SR-KNN [14], the CR-KNN shows better classification accuracy performance for all datasets proving that the variance effect and the centroid representation for the data classes provide the expected improvement on the classification accuracy. When compared with the KMEANS classification based KMEANS-MOD [17], which is expected to have a better complete data class center representation with the cost of higher computation complexities, the proposed CR- KNN demonstrates a similar accuracy performance, if not better. IV. CONCLUSION In this paper, a new Nearest Neighbor based classification algorithm is presented. The proposed algorithm is designed to improve the classical K-NN classification scheme with two improvement contributions. The proposed algorithm was tested with extensive experiments conducted on real datasets with different natures. The experimental results achieved after experiments show that the proposed algorithm improves the classification accuracy for the majority of the datasets. Also, the results demonstrate that the proposed algorithm is able to compete against other recently proposed classification schemes successfully. The future works include focusing on the classification delay performance and testing the proposed algorithm on other datasets which include higher number of instances, in the margin of millions. Another planned future work is further improving the distance calculation, by replacing the distance concept with a novel similarity measure capable of providing similarity of mixed data including both continuous and categorical features. ACKNOWLEDGMENT The author of this paper thanks Prof.Dr.Ali HAYDAR and Assoc.Prof.Dr. Ibrahim ERSAN from Girne American University, Department of Computer Engineering, for their continuous support and valuable discussions on this study. REFERENCES [1] Klaus Schwab, "The Fourth Industrial Revolution", Crown Business, [2] Singh and.k. Reddy, A survey on platforms for big data analytics, Journal of Big Data vol. 1, no. 8, [3] P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, 1st ed., Reading, MA: Addison-Wesley, [4] J. Dean, S. Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, vol. 53 no. 1, pp.72-77, [5] X. Wu et. Al., Top 10 algorithms in data mining, Knowledge and Information Systems,vol. 14, no. 1, pp 137, [6] Fahad et. AL., A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis, IEEE Trans.on Emerging Topics in Computing, vol. 2, no.3, pp , [7] S. Zhang, M. Zong and D. Cheng, Learning k for KNN Classification, ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, pp. 43:1-19, [8] K. Niu, F. Zhao and S. Zhang, A Fast Classification Algorithm for Big Data Based on KNN, Journal of Applied Sciences, vol. 13,no. 12, pp , [9] Bifet, J. Read, B. Pfahringer and G. Holmes, Efficient Data Stream Classification via Probabilistic Adaptive Windows, in Proc. 28th Annual ACM Symposium on Applied Computing, 2013, pp [10] S. S. Labib, A Comparative Study to Classify Big Data Using fuzzy Techniques, in Proc. 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), [11] M. El Bakry, S. Safwat and O. Hegazy, A Mapreduce Fuzzy technique of Big Data Classification, in Proc. SAI Computing Conference 2016, pp ISSN : Vol. 8 No. 10 Oct

6 [12] B. Quost and T. Denoeux, Clustering and Classification of fuzzy data using the fuzzy EM algorithm, Fuzzy Sets and Systems, vol. 286, pp , [13] Z. Deng, X. Zhu, D. Cheng, M. Zong and S. Zhang, Efficient knn classification algorithm for big data, Neurocomputing, vol.195, pp , [14] S. Zhang, D. Cheng, M. Zong and L. Gao, Self representation nearest neighbour search for classification, Neurocomputing, vol.195, pp , 2016 [15] G. Song, J. Rochas, L. El Beze, F. Huet and F. Magoules, K Nearest Neighbour Joins for Big Data on MapReduce:A Theoretical and Experimental Analysis, IEEE Trans. on Knowledge and Data Engineering, vol. 28, no. 9, pp , [16] J. Maillo, S. Ramirez, I. Triguero and F. Herrera, knn-is: An Iterative Spark-based design of the k-nearest Neighbours classifier for big data, Knowledge-Based Systems, vol. 117, pp. 3-15, [17] T.Tulgar, A.haydar and İ.Erşan, "Data Distribution Aware Classification Algorithm based on K-Means", International Journal of Advanced Computer Science and Applications, vol. 8, no.9, pp , September [18] J. Gosling, B. Joy, G. Steele, G. Bracha, A. Buckley, (2017,AUG 01). The Java Language Specification-Java SE 8 Edition Online. Available: [19] UCI Center for Machine Learning and Intelligent Systems, (2017, AUG 01). UC Irvine Machine Learning Repository Online. Available: ISSN : Vol. 8 No. 10 Oct

Data Distribution Aware Classification Algorithm based on K-Means

Data Distribution Aware Classification Algorithm based on K-Means Data Distribution Aware Classification Algorithm based on K-Means Tamer Tulgar Department of Computer Engineering Girne American University Girne, T.R.N.C. Mersin 10 Turkey Ali Haydar Department of Computer

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters. Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Exact Fuzzy k-nearest Neighbor Classification for Big Datasets

Exact Fuzzy k-nearest Neighbor Classification for Big Datasets Exact Fuzzy k-nearest Neighbor Classification for Big Datasets Jesus Maillo, Julián Luengo, Salvador García, Francisco Herrera Department of Computer Science and Artificial Intelligence University of Granada,

More information

Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification

Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Manoj Praphakar.T 1, Shabariram C.P 2 P.G. Student, Department of Computer Science Engineering,

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification

An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11 Nearest Neighbour Classifier Keywords: K Neighbours, Weighted, Nearest Neighbour 1 Nearest neighbour classifiers This is amongst the simplest

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

AN IMPROVED DENSITY BASED k-means ALGORITHM

AN IMPROVED DENSITY BASED k-means ALGORITHM AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology

More information

An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Parallel K-Means Clustering with Triangle Inequality

Parallel K-Means Clustering with Triangle Inequality Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Image Similarity Measurements Using Hmok- Simrank

Image Similarity Measurements Using Hmok- Simrank Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,

More information

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2 A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open

More information

A Survey On Data Mining Algorithm

A Survey On Data Mining Algorithm A Survey On Data Mining Algorithm Rohit Jacob Mathew 1 Sasi Rekha Sankar 1 Preethi Varsha. V 2 1 Dept. of Software Engg., 2 Dept. of Electronics & Instrumentation Engg. SRM University India Abstract This

More information

Comparative Study Of Different Data Mining Techniques : A Review

Comparative Study Of Different Data Mining Techniques : A Review Volume II, Issue IV, APRIL 13 IJLTEMAS ISSN 7-5 Comparative Study Of Different Data Mining Techniques : A Review Sudhir Singh Deptt of Computer Science & Applications M.D. University Rohtak, Haryana sudhirsingh@yahoo.com

More information

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Data Warehousing Classification-Lazy Learners Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is

More information

Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy

Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy 2017 IJSRST Volume 3 Issue 1 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy

More information

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection

K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer

More information

Multiple Classifier Fusion using k-nearest Localized Templates

Multiple Classifier Fusion using k-nearest Localized Templates Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,

More information

K-Means Clustering With Initial Centroids Based On Difference Operator

K-Means Clustering With Initial Centroids Based On Difference Operator K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,

More information

Normalization based K means Clustering Algorithm

Normalization based K means Clustering Algorithm Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

A New HadoopBased Network Management System with Policy Approach

A New HadoopBased Network Management System with Policy Approach Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,

More information

The Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform

The Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform Computer and Information Science; Vol. 11, No. 1; 2018 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Science and Education The Analysis and Implementation of the K - Means Algorithm Based

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Big Data Using Hadoop

Big Data Using Hadoop IEEE 2016-17 PROJECT LIST(JAVA) Big Data Using Hadoop 17ANSP-BD-001 17ANSP-BD-002 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for

More information

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,

More information

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Fuzzy vs Non-Fuzzy Classification in Big Data

Fuzzy vs Non-Fuzzy Classification in Big Data Fuzzy vs Non-Fuzzy Classification in Big Data Malak EL-Bakry 1, Soha Safwat 2 and Osman Hegazy 3 1,2 Faculty of Computer Science, October University for Modern Sciences and Arts, Giza, Egypt 3 Faculty

More information

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

An Improved Document Clustering Approach Using Weighted K-Means Algorithm An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.

More information

Keyword Extraction by KNN considering Similarity among Features

Keyword Extraction by KNN considering Similarity among Features 64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,

More information

Nearest Cluster Classifier

Nearest Cluster Classifier Nearest Cluster Classifier Hamid Parvin, Moslem Mohamadi, Sajad Parvin, Zahra Rezaei, Behrouz Minaei Nourabad Mamasani Branch Islamic Azad University Nourabad Mamasani, Iran hamidparvin@mamasaniiau.ac.ir,

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Text clustering based on a divide and merge strategy

Text clustering based on a divide and merge strategy Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and

More information

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Election Analysis and Prediction Using Big Data Analytics

Election Analysis and Prediction Using Big Data Analytics Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India

More information

An improved MapReduce Design of Kmeans for clustering very large datasets

An improved MapReduce Design of Kmeans for clustering very large datasets An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

Incremental K-means Clustering Algorithms: A Review

Incremental K-means Clustering Algorithms: A Review Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

PERFORMANCE ANALYSIS OF DATA MINING TECHNIQUES FOR REAL TIME APPLICATIONS

PERFORMANCE ANALYSIS OF DATA MINING TECHNIQUES FOR REAL TIME APPLICATIONS PERFORMANCE ANALYSIS OF DATA MINING TECHNIQUES FOR REAL TIME APPLICATIONS Pradeep Nagendra Hegde 1, Ayush Arunachalam 2, Madhusudan H C 2, Ngabo Muzusangabo Joseph 1, Shilpa Ankalaki 3, Dr. Jharna Majumdar

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION

CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)

More information

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers

More information

Lorentzian Distance Classifier for Multiple Features

Lorentzian Distance Classifier for Multiple Features Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey

More information

Domain Specific Search Engine for Students

Domain Specific Search Engine for Students Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Performance Analysis of Video Data Image using Clustering Technique

Performance Analysis of Video Data Image using Clustering Technique Indian Journal of Science and Technology, Vol 9(10), DOI: 10.17485/ijst/2016/v9i10/79731, March 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Performance Analysis of Video Data Image using Clustering

More information

Comparison of FP tree and Apriori Algorithm

Comparison of FP tree and Apriori Algorithm International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Weighting and selection of features.

Weighting and selection of features. Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 1 Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam-

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

An Empirical Study of Lazy Multilabel Classification Algorithms

An Empirical Study of Lazy Multilabel Classification Algorithms An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

More information

Multi-label classification using rule-based classifier systems

Multi-label classification using rule-based classifier systems Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar

More information

Analysis on the technology improvement of the library network information retrieval efficiency

Analysis on the technology improvement of the library network information retrieval efficiency Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the

More information

Research Article Apriori Association Rule Algorithms using VMware Environment

Research Article Apriori Association Rule Algorithms using VMware Environment Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,

More information

I. INTRODUCTION II. RELATED WORK.

I. INTRODUCTION II. RELATED WORK. ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Figure 1 shows unstructured data when plotted on the co-ordinate axis

Figure 1 shows unstructured data when plotted on the co-ordinate axis 7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal

More information

Classifier Inspired Scaling for Training Set Selection

Classifier Inspired Scaling for Training Set Selection Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification

More information

Clustering & Classification (chapter 15)

Clustering & Classification (chapter 15) Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

More information

A Cloud Framework for Big Data Analytics Workflows on Azure

A Cloud Framework for Big Data Analytics Workflows on Azure A Cloud Framework for Big Data Analytics Workflows on Azure Fabrizio MAROZZO a, Domenico TALIA a,b and Paolo TRUNFIO a a DIMES, University of Calabria, Rende (CS), Italy b ICAR-CNR, Rende (CS), Italy Abstract.

More information

Improving K-NN Internet Traffic Classification Using Clustering and Principle Component Analysis

Improving K-NN Internet Traffic Classification Using Clustering and Principle Component Analysis Bulletin of Electrical Engineering and Informatics ISSN: 2302-9285 Vol. 6, No. 2, June 2017, pp. 159~165, DOI: 10.11591/eei.v6i2.608 159 Improving K-NN Internet Traffic Classification Using Clustering

More information

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth

More information

Schema Matching with Inter-Attribute Dependencies Using VF2 Approach

Schema Matching with Inter-Attribute Dependencies Using VF2 Approach International Journal of Emerging Engineering Research and Technology Volume 2, Issue 3, June 2014, PP 14-20 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Schema Matching with Inter-Attribute Dependencies

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Mounica B, Aditya Srivastava, Md. Faisal Alam

Mounica B, Aditya Srivastava, Md. Faisal Alam International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem

More information

Keywords: clustering algorithms, unsupervised learning, cluster validity

Keywords: clustering algorithms, unsupervised learning, cluster validity Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based

More information

Nearest Cluster Classifier

Nearest Cluster Classifier Nearest Cluster Classifier Hamid Parvin, Moslem Mohamadi, Sajad Parvin, Zahra Rezaei, and Behrouz Minaei Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran hamidparvin@mamasaniiau.ac.ir,

More information

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality

Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Abstract: Mass classification of objects is an important area of research and application in a variety of fields. In this

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR

More information

Comparative Study of Clustering Algorithms using R

Comparative Study of Clustering Algorithms using R Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer

More information

Efficient Multiclass Classification Model For Imbalanced Data.

Efficient Multiclass Classification Model For Imbalanced Data. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 5 May 2015, Page No. 11783-11787 Efficient Multiclass Classification Model For Imbalanced Data. Mr.

More information

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels

Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life

Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Ch.Srilakshmi Asst Professor,Department of Information Technology R.M.D Engineering College, Kavaraipettai,

More information

A Novel Approach for Weighted Clustering

A Novel Approach for Weighted Clustering A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

An Enhanced K-Medoid Clustering Algorithm

An Enhanced K-Medoid Clustering Algorithm An Enhanced Clustering Algorithm Archna Kumari Science &Engineering kumara.archana14@gmail.com Pramod S. Nair Science &Engineering, pramodsnair@yahoo.com Sheetal Kumrawat Science &Engineering, sheetal2692@gmail.com

More information

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm

Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Majid Hatami Faculty of Electrical and Computer Engineering University of Tabriz,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

A Survey Of Issues And Challenges Associated With Clustering Algorithms

A Survey Of Issues And Challenges Associated With Clustering Algorithms International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 10(1): 7-11 (2013) ISSN No. (Print): 2277-8136 A Survey Of Issues And Challenges Associated With

More information