Using Center Representation and Variance Effect on K-NN Classification
|
|
- Christal Quinn
- 5 years ago
- Views:
Transcription
1 Using Center Representation and Variance Effect on K-NN Classification Tamer TULGAR Department of Computer Engineering Girne American University Girne, T.R.N.C., Mersin 10 TURKEY Abstract The K-Nearest Neighbor classifier is a well-known and widely applied method in data mining applications. Nevertheless, its high computation and memory usage cost makes the classical K-NN not feasible for today s Big Data analysis applications. To overcome the cost drawbacks of the known data mining methods, several distributed environment alternatives have emerged. Recently, several K-NN based classification algorithms have been proposed which are distributed methods suitable for distributed computing environments and applicable for emerging data analysis needs. In this work, a new CR-KNN algorithm is proposed, which improves the classification accuracy performance of the well-known K- Nearest Neighbor (K-NN) algorithm by benefiting from the center representation of the instances belonging to different data classes. The proposed algorithm relies on the data class representations which are the closest to the test instance. The CR-KNN algorithm was tested using several real-datasets belonging to different application areas. The performance results acquired after extensive experiments are presented in this paper and they prove that the proposed CR-KNN algorithm is a competitive alternative to other studies recently proposed in the literature. Keywords- Classification, K-Nearest Neighbor, Center Representation, Variance I. INTRODUCTION In the current information age, businesses tend to base their policies and forecasting on the analysis and processing continuously growing volumes of data retrieved from many different online streaming sources [1]. To achieve precise business decisions, data coming from different environments (e.g. different social media tools, data warehouses, cloud storages etc.) need to be analyzed using efficient data mining algorithms. Analyzing data coming from high number of data sources results in working with huge amounts of unstructured raw data, which are big in terms of volume, variety and velocity of acquisition. The process of analyzing such data is recently a popular research area, known as the Big Data Analysis [2]. One of the crucial data mining tasks is classification. Classification, which is the task of assigning data instances to one of several predefined data classes, is a pervasive problem that encompasses many diverse applications [3]. To classify data in the big data age, centralized techniques lack the low classification delay performance, which is vital to cope with the high velocity data streams of big data. To tackle with this timing requirement of the modern data classification, several distributed environments have been tested and used by different researchers. One of these distributed environments is the Apache Hadoop and the Google s MapReduce Framework [4]. K Nearest Neighbor Classification (K-NN) has been one of the most popular classification algorithms [5]. Nevertheless, the classical K-NN algorithm suffers from high computation costs, which makes it inconvenient for modern big data analysis that requires rapid and accurate classification results. The classical K-NN algorithm is based on calculating the distances between the test data instance to be classified and all of the instances in the training data set and finding the closest K number of training instances. After detecting the K number of closest training instances, the K-NN algorithm applies majority voting which is the process of detecting the data class with the maximum number of instances among the K selected instances. Traditionally, individual instance distances based classification strategy of the classical K-NN algorithm makes this algorithm perform weakly in terms of the classification accuracy. On the other hand, K-NN s individual instance distances strategy makes K-NN a strong candidate for distributed data classification, which is the basis of achieving acceptably low classification delays while classifying big data. ISSN : Vol. 8 No. 10 Oct
2 Taking into account the K-NN s suitability to distributed environments, many K-NN based studies, which try to improve the K-NN algorithms performance, have been proposed in the literature [6-16] to meet the modern data mining needs. In this paper, a new K Nearest Neighbor classification algorithm, which will be referred as the Center Representation KNN algorithm (CR-KNN) hereafter, is proposed. The CR-KNN algorithm tries to improve the classification accuracy performance of the classical K-NN algorithm. The main idea of the CR-KNN algorithm is to base the classification decision of the classical K-NN algorithm on the class representations by calculating the centroids of the training instances belonging to the same classes among the K closest instances detected by the K-NN algorithm. In other words, the classical majority voting decision step of the classical K-NN is refined by a better class representation based classification decision, which is also computationally efficient. Furthermore, the CR-KNN is further improved by introducing the variance effect to the distance measurement so that the algorithm can adapt to the varying data distributions of different datasets. The rest of this paper is organized as follows: Section II presents the proposed CR-KNN algorithm in detail. The experimental setup and the achieved performance results are presented in section III. Finally, the section IV concludes the paper and states the future works. II. THE PROPOSED CR-KNN ALGORITHM A. The Classical K-NN Algorithm In a classification task, if a data instance is considered as a vector of feature values, then a data instance i can be denoted as vi which corresponds to a vector containing p features. Therefore, classifying a data instance means detecting the correct data classes of n test data instances by comparing their similarity to m training data instances using their feature values. The classical K-NN algorithm is based on the idea of calculating the distances of a test data to be classified to all of the of data instances in the training data set. On computing the distances, the classical K-NN considers the K number of minimum distances and uses the training instances which produced those K distances. These K training instances are the K Nearest Neighbors of the tested data. After detecting the K nearest neighbors, the classical K-NN detects which data class has the most number of instances among the selected K nearest neighbors, which is known as the process of majority voting. As it can be understood from the given explanation of the classical K-NN algorithm, the classification decision is based on the individual instance distances between all testing and training instances. Nevertheless, when a data set with high number of instances and high number of features per instance needs to be classified, the classical K-NN algorithm s classification accuracy performance becomes lower than its competitors, like K-Means classification [17]. Hence, it can be concluded that to become a classification algorithm suitable for modern data analysis needs, the K-NN s classification accuracy performance should be improved. Taking into account these needs, the CR-KNN algorithm is proposed. B. The proposed CR-KNN Algorithm The proposed CR-KNN algorithm improves the classical K-NN algorithm with contributions in the distance measurement and classification decision steps. The classical Euclidean Distance Calculation is given in (1). = (1) Like classical K-NN, the first step of the CR-KNN algorithm is calculating the distances between the testing data instance and all training data instances using (1). After computing all the distances, the CR-KNN finds the K nearest neighbors with the minimum distances to the tested data instance. At this point, unlike classical KNN, the proposed CR-KNN finds which data classes exist among the K nearest neighbors and groups the nearest neighbors according to which data classes they belong to. After the class detection, the CR-KNN finds the centers of the training data instances for each detected class. To clarify the introduced idea, consider the following simplified example, as the detected K nearest neighbors: ISSN : Vol. 8 No. 10 Oct
3 <ts 9, <tr 3, 0.11, A> <ts 9, <tr 11, 0.14, B> <ts 9, <tr 27, 0.15, B> <ts 9, <tr 23, 0.12, C> <ts 9, <tr42, 0.12, A> In the example given above, 5 nearest neighbors are from classes A, B and C where tr 3 and tr 4 are from class A, tr 11 and tr 27 are from class B and tr 23 is from class C. Hence, the listed training instances of each of the listed classes will be used to represent each of the classes by their means (in other words the centers). The mean of a class with N instances from the K nearest neighbors are calculated using (2). In other words, centroids are found that represent classes. = (2) Upon calculating the means, the CR-KNN calculates the minimum of the distances of the testing instance to be classified to the calculated means using (3) for all classes j. As the classification decision which is the main contribution of the CR-KNN, the algorithm relies on the minimum T of all centroids j rather than classical majority voting. = min ( ) (3) The second contribution, the variance effect can also be seen in (3). The variance is defined in (4). = (4) Calculating the distance of the testing instance to centroid contribution of the CR-KNN improves the classification accuracy of the classical K-NN by eliminating such misclassifications: Please consider the simple example given at the top. In such a case, because of majority voting, the classical K- NN algorithm will conclude that the test instance 9 belongs to class A, since class A and class B has equal number of instances among the K nearest neighbors and the class A contains an instance, which has the closest proximity to the testing instance. Nevertheless, the same class A has an instance, which is further away from the class B instances to the test instance. 9. To give equal chances to classes in the classification decision, CR-KNN proposes to represent the classes A, B and C of the K nearest neighbors by centroids and base the classification decision on the distance of the test instance to the centroids of the classes rather than relying on the individual data members. In this way, if class B has a centroid closer to the test instance, the CR-KNN can correctly decide that the test instance should belong to class B. Furthermore, CR-KNN addresses another classification problem that is the fact that the different datasets belonging to different application areas may have data distributions that are not easy to deal with. The main difficulty encountered in different datasets is a data distribution which includes radical data instances belonging to different classes may have close proximity to other class instances. In such cases, false classification decisions can be easily produced by the classical K-NN algorithm. To compensate the distance calculation based decisions of K-NN, the proposed CR-KNN algorithm includes the effect of how far the instances are spread around the centroids to the classification decision by including the variance to the decision. This compensation is achieved by dividing the calculated distances to the centroids to the variance. This results in concluding that a smaller variance will produce a stronger membership strength versus a greater variance will produce a weaker membership strength. The experimental results, which are presented in the next section, demonstrate that both of the proposed improvements enhance the performance of the classical K-NN algorithm for datasets with different natures. ISSN : Vol. 8 No. 10 Oct
4 III. EXPERIMENTAL SETTING AND THE RELUTS The proposed CR-KNN algorithm was implemented in Sun JAVA JDK 1.8 [18]. To evaluate and compare the performance of the proposed algorithm several datasets from the UCI machine learning repository [19] was used. The summary of the used datasets are given in Table 1. While selecting the datasets, the factors like being binary or multi-class classification problem, various instance counts and different data distributions were considered so that the strengths and weaknesses of the proposed CR-KNN algorithm can be analyzed. Also, it is worth to mention that the experimental model was validated using 10-fold cross validation and the presented classification accuracy values are the averages of the results found after cross validations. TABLE I. REAL DATASETS USED IN THE EXPERIMENTS Dataset Instances Features Classes ionosphere wdbc satimage pendigits A. Experimental Results The results presented in this section demonstrate the classification accuracy of the proposed CR-KNN algorithm and other two recent classification schemes SR-KNN [14] and KMEANS-MOD [17]. The classification accuracy can be defined as the ratio of the number of correct classifications decisions to the number of total classification decisions for a given dataset s total number of testing instances. The SR-KNN, which was proposed in [14] modifies the KNN algorithm with a self representation of the data clusters ideology. The presented main aim is to learn an optimal k value in KNN to improve the accuracy of the classification. To support their claim the authors compare their results with three other algorithms named as knnc, LMMN and ADNN which are summarized in [14]. The results presented in [14] paper show better performance when compared to these three algorithms. The KMEANS-MOD proposed in [17] on the other hand, is an improved KMEANS clustering based algorithm which also benefits from the effect of variance and introduces a membership strength as the metric to base its classification decisions. Since the algorithm proposed in [17] is a KMEANS based algorithm, the classification idea is completely dependent on strong class representations using whole class data centers. Table II provides the classification accuracy results which demonstrate the effect of adding variance effect to the distance calculation of the CR-KNN for different K values. TABLE II. CR-KNN WITHOUT THE VARIANCE EFFCT VS INCLUDING VARIANCE EFFECT TO CR-KNN CR-KNN without Variance Effect CR-KNN with Variance Effect K=5 K=7 K=9 K=5 K=7 K=9 pendigits satimage ionosphere wdbc By investigating the results, it can be observed that the variance effect improves the classification accuracy performance of the CR-KNN when the dataset used in the classification shows outlying instances in the data distribution of the class instances like explained in section II. What is more, the results also show that the variance addition to the distance calculation does not have a degrading effect on the tested datasets, which have a clearer data separation among the class instances. Furthermore, it can be observed that, reasonably small K values are enough to achieve improved classification accuracy performance from the proposed CR-KNN algorithm. The proposed CR-KNN algorithm was also compared against the classification schemes SR-KNN [14] and KMEANS-MOD [17] in terms of the classification accuracy performance to investigate how the proposed CR-KNN algorithm compares against both a recent Nearest Neighbor based classification scheme and a completely centroid ISSN : Vol. 8 No. 10 Oct
5 representation based classification approach. The comparative results mentioned above can be observed in Table III. TABLE III. CR-KNN CLASSIFICATION ACCURACY PERFORMANCE COMPARISONS KMEANS- MOD Classical- KNN SR-KNN CR-KNN pendigits satimage ionosphere wdbc The comparative results presented in Table III illustrate that the proposed CR-KNN improves the performance of the classical KNN algorithm for the majority of the tested datasets. Against the other K-NN based algorithm SR-KNN [14], the CR-KNN shows better classification accuracy performance for all datasets proving that the variance effect and the centroid representation for the data classes provide the expected improvement on the classification accuracy. When compared with the KMEANS classification based KMEANS-MOD [17], which is expected to have a better complete data class center representation with the cost of higher computation complexities, the proposed CR- KNN demonstrates a similar accuracy performance, if not better. IV. CONCLUSION In this paper, a new Nearest Neighbor based classification algorithm is presented. The proposed algorithm is designed to improve the classical K-NN classification scheme with two improvement contributions. The proposed algorithm was tested with extensive experiments conducted on real datasets with different natures. The experimental results achieved after experiments show that the proposed algorithm improves the classification accuracy for the majority of the datasets. Also, the results demonstrate that the proposed algorithm is able to compete against other recently proposed classification schemes successfully. The future works include focusing on the classification delay performance and testing the proposed algorithm on other datasets which include higher number of instances, in the margin of millions. Another planned future work is further improving the distance calculation, by replacing the distance concept with a novel similarity measure capable of providing similarity of mixed data including both continuous and categorical features. ACKNOWLEDGMENT The author of this paper thanks Prof.Dr.Ali HAYDAR and Assoc.Prof.Dr. Ibrahim ERSAN from Girne American University, Department of Computer Engineering, for their continuous support and valuable discussions on this study. REFERENCES [1] Klaus Schwab, "The Fourth Industrial Revolution", Crown Business, [2] Singh and.k. Reddy, A survey on platforms for big data analytics, Journal of Big Data vol. 1, no. 8, [3] P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, 1st ed., Reading, MA: Addison-Wesley, [4] J. Dean, S. Ghemawat, MapReduce: A Flexible Data Processing Tool, Communications of the ACM, vol. 53 no. 1, pp.72-77, [5] X. Wu et. Al., Top 10 algorithms in data mining, Knowledge and Information Systems,vol. 14, no. 1, pp 137, [6] Fahad et. AL., A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis, IEEE Trans.on Emerging Topics in Computing, vol. 2, no.3, pp , [7] S. Zhang, M. Zong and D. Cheng, Learning k for KNN Classification, ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, pp. 43:1-19, [8] K. Niu, F. Zhao and S. Zhang, A Fast Classification Algorithm for Big Data Based on KNN, Journal of Applied Sciences, vol. 13,no. 12, pp , [9] Bifet, J. Read, B. Pfahringer and G. Holmes, Efficient Data Stream Classification via Probabilistic Adaptive Windows, in Proc. 28th Annual ACM Symposium on Applied Computing, 2013, pp [10] S. S. Labib, A Comparative Study to Classify Big Data Using fuzzy Techniques, in Proc. 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), [11] M. El Bakry, S. Safwat and O. Hegazy, A Mapreduce Fuzzy technique of Big Data Classification, in Proc. SAI Computing Conference 2016, pp ISSN : Vol. 8 No. 10 Oct
6 [12] B. Quost and T. Denoeux, Clustering and Classification of fuzzy data using the fuzzy EM algorithm, Fuzzy Sets and Systems, vol. 286, pp , [13] Z. Deng, X. Zhu, D. Cheng, M. Zong and S. Zhang, Efficient knn classification algorithm for big data, Neurocomputing, vol.195, pp , [14] S. Zhang, D. Cheng, M. Zong and L. Gao, Self representation nearest neighbour search for classification, Neurocomputing, vol.195, pp , 2016 [15] G. Song, J. Rochas, L. El Beze, F. Huet and F. Magoules, K Nearest Neighbour Joins for Big Data on MapReduce:A Theoretical and Experimental Analysis, IEEE Trans. on Knowledge and Data Engineering, vol. 28, no. 9, pp , [16] J. Maillo, S. Ramirez, I. Triguero and F. Herrera, knn-is: An Iterative Spark-based design of the k-nearest Neighbours classifier for big data, Knowledge-Based Systems, vol. 117, pp. 3-15, [17] T.Tulgar, A.haydar and İ.Erşan, "Data Distribution Aware Classification Algorithm based on K-Means", International Journal of Advanced Computer Science and Applications, vol. 8, no.9, pp , September [18] J. Gosling, B. Joy, G. Steele, G. Bracha, A. Buckley, (2017,AUG 01). The Java Language Specification-Java SE 8 Edition Online. Available: [19] UCI Center for Machine Learning and Intelligent Systems, (2017, AUG 01). UC Irvine Machine Learning Repository Online. Available: ISSN : Vol. 8 No. 10 Oct
Data Distribution Aware Classification Algorithm based on K-Means
Data Distribution Aware Classification Algorithm based on K-Means Tamer Tulgar Department of Computer Engineering Girne American University Girne, T.R.N.C. Mersin 10 Turkey Ali Haydar Department of Computer
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationAnalysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark
Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,
More informationKeywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationExact Fuzzy k-nearest Neighbor Classification for Big Datasets
Exact Fuzzy k-nearest Neighbor Classification for Big Datasets Jesus Maillo, Julián Luengo, Salvador García, Francisco Herrera Department of Computer Science and Artificial Intelligence University of Granada,
More informationStorm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification
Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification Manoj Praphakar.T 1, Shabariram C.P 2 P.G. Student, Department of Computer Science Engineering,
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationAn Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification
An Empirical Study of Hoeffding Racing for Model Selection in k-nearest Neighbor Classification Flora Yu-Hui Yeh and Marcus Gallagher School of Information Technology and Electrical Engineering University
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationDistribution-free Predictive Approaches
Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for
More informationMODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour
MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11 Nearest Neighbour Classifier Keywords: K Neighbours, Weighted, Nearest Neighbour 1 Nearest neighbour classifiers This is amongst the simplest
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationAN IMPROVED DENSITY BASED k-means ALGORITHM
AN IMPROVED DENSITY BASED k-means ALGORITHM Kabiru Dalhatu 1 and Alex Tze Hiang Sim 2 1 Department of Computer Science, Faculty of Computing and Mathematical Science, Kano University of Science and Technology
More informationAn Improvement of Centroid-Based Classification Algorithm for Text Classification
An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationA NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY
A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationParallel K-Means Clustering with Triangle Inequality
Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationImage Similarity Measurements Using Hmok- Simrank
Image Similarity Measurements Using Hmok- Simrank A.Vijay Department of computer science and Engineering Selvam College of Technology, Namakkal, Tamilnadu,india. k.jayarajan M.E (Ph.D) Assistant Professor,
More informationA Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2
A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open
More informationA Survey On Data Mining Algorithm
A Survey On Data Mining Algorithm Rohit Jacob Mathew 1 Sasi Rekha Sankar 1 Preethi Varsha. V 2 1 Dept. of Software Engg., 2 Dept. of Electronics & Instrumentation Engg. SRM University India Abstract This
More informationComparative Study Of Different Data Mining Techniques : A Review
Volume II, Issue IV, APRIL 13 IJLTEMAS ISSN 7-5 Comparative Study Of Different Data Mining Techniques : A Review Sudhir Singh Deptt of Computer Science & Applications M.D. University Rohtak, Haryana sudhirsingh@yahoo.com
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationEnhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy
2017 IJSRST Volume 3 Issue 1 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Enhancement in Next Web Page Recommendation with the help of Multi- Attribute Weight Prophecy
More informationK-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection
K-Nearest-Neighbours with a Novel Similarity Measure for Intrusion Detection Zhenghui Ma School of Computer Science The University of Birmingham Edgbaston, B15 2TT Birmingham, UK Ata Kaban School of Computer
More informationMultiple Classifier Fusion using k-nearest Localized Templates
Multiple Classifier Fusion using k-nearest Localized Templates Jun-Ki Min and Sung-Bae Cho Department of Computer Science, Yonsei University Biometrics Engineering Research Center 134 Shinchon-dong, Sudaemoon-ku,
More informationK-Means Clustering With Initial Centroids Based On Difference Operator
K-Means Clustering With Initial Centroids Based On Difference Operator Satish Chaurasiya 1, Dr.Ratish Agrawal 2 M.Tech Student, School of Information and Technology, R.G.P.V, Bhopal, India Assistant Professor,
More informationNormalization based K means Clustering Algorithm
Normalization based K means Clustering Algorithm Deepali Virmani 1,Shweta Taneja 2,Geetika Malhotra 3 1 Department of Computer Science,Bhagwan Parshuram Institute of Technology,New Delhi Email:deepalivirmani@gmail.com
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationThe Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform
Computer and Information Science; Vol. 11, No. 1; 2018 ISSN 1913-8989 E-ISSN 1913-8997 Published by Canadian Center of Science and Education The Analysis and Implementation of the K - Means Algorithm Based
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationBig Data Using Hadoop
IEEE 2016-17 PROJECT LIST(JAVA) Big Data Using Hadoop 17ANSP-BD-001 17ANSP-BD-002 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for
More informationImproved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning
Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning Timothy Glennan, Christopher Leckie, Sarah M. Erfani Department of Computing and Information Systems,
More informationVolume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationFuzzy vs Non-Fuzzy Classification in Big Data
Fuzzy vs Non-Fuzzy Classification in Big Data Malak EL-Bakry 1, Soha Safwat 2 and Osman Hegazy 3 1,2 Faculty of Computer Science, October University for Modern Sciences and Arts, Giza, Egypt 3 Faculty
More informationAn Improved Document Clustering Approach Using Weighted K-Means Algorithm
An Improved Document Clustering Approach Using Weighted K-Means Algorithm 1 Megha Mandloi; 2 Abhay Kothari 1 Computer Science, AITR, Indore, M.P. Pin 453771, India 2 Computer Science, AITR, Indore, M.P.
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationNearest Cluster Classifier
Nearest Cluster Classifier Hamid Parvin, Moslem Mohamadi, Sajad Parvin, Zahra Rezaei, Behrouz Minaei Nourabad Mamasani Branch Islamic Azad University Nourabad Mamasani, Iran hamidparvin@mamasaniiau.ac.ir,
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationText clustering based on a divide and merge strategy
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 55 (2015 ) 825 832 Information Technology and Quantitative Management (ITQM 2015) Text clustering based on a divide and
More informationCLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationElection Analysis and Prediction Using Big Data Analytics
Election Analysis and Prediction Using Big Data Analytics Omkar Sawant, Chintaman Taral, Roopak Garbhe Students, Department Of Information Technology Vidyalankar Institute of Technology, Mumbai, India
More informationAn improved MapReduce Design of Kmeans for clustering very large datasets
An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb
More informationEfficient Algorithm for Frequent Itemset Generation in Big Data
Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru
More informationIncremental K-means Clustering Algorithms: A Review
Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationPERFORMANCE ANALYSIS OF DATA MINING TECHNIQUES FOR REAL TIME APPLICATIONS
PERFORMANCE ANALYSIS OF DATA MINING TECHNIQUES FOR REAL TIME APPLICATIONS Pradeep Nagendra Hegde 1, Ayush Arunachalam 2, Madhusudan H C 2, Ngabo Muzusangabo Joseph 1, Shilpa Ankalaki 3, Dr. Jharna Majumdar
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationCHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION
CHAPTER VII INDEXED K TWIN NEIGHBOUR CLUSTERING ALGORITHM 7.1 INTRODUCTION Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster)
More informationData Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers
More informationLorentzian Distance Classifier for Multiple Features
Yerzhan Kerimbekov 1 and Hasan Şakir Bilge 2 1 Department of Computer Engineering, Ahmet Yesevi University, Ankara, Turkey 2 Department of Electrical-Electronics Engineering, Gazi University, Ankara, Turkey
More informationDomain Specific Search Engine for Students
Domain Specific Search Engine for Students Domain Specific Search Engine for Students Wai Yuen Tang The Department of Computer Science City University of Hong Kong, Hong Kong wytang@cs.cityu.edu.hk Lam
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationPerformance Analysis of Video Data Image using Clustering Technique
Indian Journal of Science and Technology, Vol 9(10), DOI: 10.17485/ijst/2016/v9i10/79731, March 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Performance Analysis of Video Data Image using Clustering
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationWeighting and selection of features.
Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer
More informationCS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods
+ CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest
More informationMine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2
Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 1 Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam-
More informationMonika Maharishi Dayanand University Rohtak
Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures
More informationAn Empirical Study of Lazy Multilabel Classification Algorithms
An Empirical Study of Lazy Multilabel Classification Algorithms E. Spyromitros and G. Tsoumakas and I. Vlahavas Department of Informatics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationAnalysis on the technology improvement of the library network information retrieval efficiency
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):2198-2202 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Analysis on the technology improvement of the
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationI. INTRODUCTION II. RELATED WORK.
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationFigure 1 shows unstructured data when plotted on the co-ordinate axis
7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Key Frame Extraction and Foreground Modelling Using K-Means Clustering Azra Nasreen Kaushik Roy Kunal
More informationClassifier Inspired Scaling for Training Set Selection
Classifier Inspired Scaling for Training Set Selection Walter Bennette DISTRIBUTION A: Approved for public release: distribution unlimited: 16 May 2016. Case #88ABW-2016-2511 Outline Instance-based classification
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationA Cloud Framework for Big Data Analytics Workflows on Azure
A Cloud Framework for Big Data Analytics Workflows on Azure Fabrizio MAROZZO a, Domenico TALIA a,b and Paolo TRUNFIO a a DIMES, University of Calabria, Rende (CS), Italy b ICAR-CNR, Rende (CS), Italy Abstract.
More informationImproving K-NN Internet Traffic Classification Using Clustering and Principle Component Analysis
Bulletin of Electrical Engineering and Informatics ISSN: 2302-9285 Vol. 6, No. 2, June 2017, pp. 159~165, DOI: 10.11591/eei.v6i2.608 159 Improving K-NN Internet Traffic Classification Using Clustering
More informationUday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved
Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth
More informationSchema Matching with Inter-Attribute Dependencies Using VF2 Approach
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 3, June 2014, PP 14-20 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) Schema Matching with Inter-Attribute Dependencies
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationMounica B, Aditya Srivastava, Md. Faisal Alam
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem
More informationKeywords: clustering algorithms, unsupervised learning, cluster validity
Volume 6, Issue 1, January 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Clustering Based
More informationNearest Cluster Classifier
Nearest Cluster Classifier Hamid Parvin, Moslem Mohamadi, Sajad Parvin, Zahra Rezaei, and Behrouz Minaei Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran hamidparvin@mamasaniiau.ac.ir,
More informationMass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality
Mass Classification Method in Mammogram Using Fuzzy K-Nearest Neighbour Equality Abstract: Mass classification of objects is an important area of research and application in a variety of fields. In this
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationEnhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm
Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm K.Parimala, Assistant Professor, MCA Department, NMS.S.Vellaichamy Nadar College, Madurai, Dr.V.Palanisamy,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK DISTRIBUTED FRAMEWORK FOR DATA MINING AS A SERVICE ON PRIVATE CLOUD RUCHA V. JAMNEKAR
More informationComparative Study of Clustering Algorithms using R
Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer
More informationEfficient Multiclass Classification Model For Imbalanced Data.
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 5 May 2015, Page No. 11783-11787 Efficient Multiclass Classification Model For Imbalanced Data. Mr.
More informationClassifying Twitter Data in Multiple Classes Based On Sentiment Class Labels
Classifying Twitter Data in Multiple Classes Based On Sentiment Class Labels Richa Jain 1, Namrata Sharma 2 1M.Tech Scholar, Department of CSE, Sushila Devi Bansal College of Engineering, Indore (M.P.),
More informationAn Efficient Clustering Method for k-anonymization
An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management
More informationFrequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management
Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES
More informationWearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life
Wearable Technology Orientation Using Big Data Analytics for Improving Quality of Human Life Ch.Srilakshmi Asst Professor,Department of Information Technology R.M.D Engineering College, Kavaraipettai,
More informationA Novel Approach for Weighted Clustering
A Novel Approach for Weighted Clustering CHANDRA B. Indian Institute of Technology, Delhi Hauz Khas, New Delhi, India 110 016. Email: bchandra104@yahoo.co.in Abstract: - In majority of the real life datasets,
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationAn Enhanced K-Medoid Clustering Algorithm
An Enhanced Clustering Algorithm Archna Kumari Science &Engineering kumara.archana14@gmail.com Pramod S. Nair Science &Engineering, pramodsnair@yahoo.com Sheetal Kumrawat Science &Engineering, sheetal2692@gmail.com
More informationImproving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm
Improving Results and Performance of Collaborative Filtering-based Recommender Systems using Cuckoo Optimization Algorithm Majid Hatami Faculty of Electrical and Computer Engineering University of Tabriz,
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationA Survey Of Issues And Challenges Associated With Clustering Algorithms
International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 10(1): 7-11 (2013) ISSN No. (Print): 2277-8136 A Survey Of Issues And Challenges Associated With
More information