MODEL FUZZY K-NEAREST NEIGHBOR WITH LOCAL MEAN FOR PATTERN RECOGNITION

Size: px

Start display at page:

Download "MODEL FUZZY K-NEAREST NEIGHBOR WITH LOCAL MEAN FOR PATTERN RECOGNITION"

Meryl Miles
5 years ago
Views:

1 International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 2, March-April 2018, pp , Article ID: IJCET_09_02_017 Available online at Journal Impact Factor (2016): (Calculated by GISI) ISSN Print: and ISSN Online: IAEME Publication MODEL FUZZY K-NEAREST NEIGHBOR WITH LOCAL MEAN FOR PATTERN RECOGNITION Hafizh Al-Kautsar Aidilof, Muhammad Zarlis, Syahril Efendi Department Computer Science, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, Indonesia ABSTRACT K-Nearest Neighbor is one of the top 10 algorithm in data mining (Wu, 2009). Based on its development, K-Nearest Neighbor is combined with Fuzzy's approach. Fuzzy K-Nearest Neighbor located as membership degrees - except euclidean distance - as a feature of the data attachment to the target class so that Fuzzy KNN is known to improve the classification results. Except for adding Fuzzy, K-Nearest Neighbor is also modified at the class determination stage with Local Mean rules. At Local Mean KNN, the value of the data vector's test were calculated in each target class so that the euclidian distance was not calculated between the data but it was also between the target classes. In this study, we divide the local mean vector of LMKNN by the degree of membership for each class produced by Fuzzy K-Nearest Neighbor to obtain a smaller value vector. This will affect the more obvious range of values of the trend of a data to a class than other class. The test was performed using Iris dataset with k taken as many as 3 nearest neighbors in each target class. Accuracy results obtained with data testing in each class are 93.3%, 86.6% and 100%, so the overall average is 93.3%. Key words: FKNN, LMKNN, Pattern Recognition. Cite this Article: Hafizh Al-Kautsar Aidilof, Muhammad Zarlis, Syahril Efendi, Model Fuzzy K-Nearest Neighbor with Local Mean for Pattern Recognition. International Journal of Computer Engineering and Technology, 9(2), 2018, pp INTRODUCTION K-nearest neighbor is known as one of the powerful data mining algorithms for solving classification problems (Wu, 2009). Not only is the problem of classification, k-nearest neighbor is also widely applied in pattern recognition and categorization of texts (Bhatia & Vandana, 2010; Jabbar, et al. 2013; Sánchez, et al., 2014). Some of the advantages of k- nearest neighbor are very nonlinear, fast, simple and easy to understand and apply (Wang, et al. 2007; GarcíaPedrajas & Ortiz-Boyer, 2009; Pan, et al., 2017; Ougiaroglou & Evangelidis, 2012; Song, et al., 2017). In addition to some advantages, k nearest neighbor certainly has its own weaknesses. Some weaknesses of k-nearest neighbor must use all training data to editor@iaeme.com

2 Model Fuzzy K-Nearest Neighbor with Local Mean for Pattern Recognition identify or classify, susceptible to data with high dimensionality and variable ranges, slow computing and inability to handle missing values (Rosyid, et al., 2013; Raikwal, 2012 ). Various studies were conducted to perfect the k-nearest neighbor. One is by applying fuzzy logic to determine the degree of membership of a test data against the target class. In addition, efforts to maximize the results are also done by modifying the way of determining the target class as the final stage of classification or pattern recognition from initially using the vote majority to local mean. Target class determination using a vote majority approach determines the class of a new data or test data based on the amount of data that dominates one of the classes of the class k taken and ignores the similarity of traits or patterns between data residing in different classes. In contrast to the local mean approach where the distance between test data and train data on different target classes is also taken into account so that the process will be more fair. 2. THEORETICAL BASIS 2.1. Fuzzy K-Nearest Neighbor Zbancioc (2012) quotes from Keller (-) wrote that the motivation using Fuzzy K-Nearest Neighbor is all vectors of classic K-Nearest Neighbor has a degree equivalent in clustering process so there is no measure of the level of the strength of the data against a target class. With the step fuzzyfication the data will be information describing the importance of the power of each vector in a final decision. Vector high value on the degree of fuzzy membership will have a value of more weight in the determination of the target class. The algorithm Fuzzy K-Nearest Neighbor described as follows (Beyan, 2014) : Determine the value of data training and data testing Perform data normalization if the range is too far Calculate the distance between the test data to training data with Euclidean equation Sort range from the smallest value to the largest value Retrieving as much as the value of k Calculate them embership degree fuzzy using the following equation : Where : : Fuzzy membership value : the value of adjacent data membership i to the class j : The number of values neighborhoods nearby taken : difference data between the data to the data in the nearest neighbor : weight rank magnitude m > Local Mean K-Nearest Neighbor Local Mean is an approach that works by calculating the vector of test data to the k nearest neighbors of each class. Determination of the target class of the test data using Local Mean approach would be fairer because the distance between the data with each class of targets was calculated. Unlike the vote majority determining the class are only counting the closeness and the majority of the distribution of k nearest class neighborhoods. Gou (2012) states that the editor@iaeme.com

3 Hafizh Al-Kautsar Aidilof, Muhammad Zarlis, Syahril Efendi use of local mean the K-Nearest Neighbor Centroid managed to improve accuracy significantly. Tu (2015) suggested that the main step in Local Mean K-Nearest Neighbor is the smallest Euclidean calculate the test data with each data of each class as much as k nearest neighborhoods. In other words, determination of the class on the K-Nearest Neighbor which use a majority vote to be replaced by considering the proximity characteristic of each class that have been calculated proximity as much as k. Pan (2016) formulate the steps of decisionmaking by local mean the K-Nearest Neighbor as follows : Define training data of ech class Count vector local mean Classify into classes where the Euclidean distance between vectors are local mean and minimum values 3. PROPOSE METHOD Fuzzyfication performed to calculate the degree of membership of the test data of each class of training data. While local mean is used to calculate closeness between the test data to characterize each class nearby. Modeling is done by dividing the value of the membership of Local Mean weighted so that the proximity characteristic fuzzyfikasi result will be more meetings and looks increasingly separate from other target classes. In the high-density data, this approach will further clarify the position of the test data to be placed on a target class that will facilitate classifier and of course the accuracy of the classification increases. Furthermore, the model algorithm is described as follows : Determine the data training and data testing Calculate the distance between data testing on the training data with Euclidean equation Sort distance from the smallest value to the largest value Taking the data as much as the value k of each class of targets Compute Local Mean Vector of data testing to each training class as much as k Calculate the degree of membership of the data to the respective testing classes training as much as k Calculate the Vector Model of each training class as much as k : Class targets of data testing defined by VectorModel with a minimum value of x editor@iaeme.com

4 Model Fuzzy K-Nearest Neighbor with Local Mean for Pattern Recognition 4. RESULT & DISCUSSION In experiments using iris dataset, conducted the training and testing of each class as much as 30 samples and 20 samples of the obtained results on the class accuracy setosa 93.3%, 86.6% in grade versicolor and virginica 100% in class. In other experiments using simple image data with comparison of data training and data testing about 75% : 25% of the overall data obtained a maximum accuracy of 96.43% with the value of k = 1 both in the Local Mean FKNN and primitive FKNN. The change in accuracy is seen when the value of k is enhanced. The comparison between Local Mean FKNN and primitive FKNN can be seen in the following table: Table 1 Comparison between Local Mean FKNN and FKNN No. Value of k Local Mean FKNN FKNN ,43 96, ,30 96, ,30 96, ,15 92, ,15 89,28 Seen in the table that the values k = 1 to k = 3 show the same accuracy results between LMFKNN and primitive FKNN. Both managed to recognize a simple image pattern with a maximum accuracy of 96.43%. However, when the value of k is increased to 4 which means the algorithm takes the neighborhood with a wider range, LMFKNN only decreases accuracy by 0.15%. This is much different from the primitive FKNN which decreased the accuracy of 3.71% and decreased when given k worth 5. While LMFKNN still persisted in 96.15% accuracy. Herein lies the advantage of local mean approach. The local mean does not attract any data that tends to other classes although it is given a high k value which means the algorithm has a wide range of neighborhoods. As for the distribution of data classification can be seen in the following figure: Figure 1 Value of accuration 96,43% editor@iaeme.com

5 Hafizh Al-Kautsar Aidilof, Muhammad Zarlis, Syahril Efendi Figure 2 Value of accuration 96,30% Figure 3 Value of accuration 96,15% Figure 4 Value of accuration 92,59% editor@iaeme.com

6 Model Fuzzy K-Nearest Neighbor with Local Mean for Pattern Recognition Figure 5 Value of accuration 89,28% At the local mean, the target class determination is performed by calculating the euclidian distance between the attribute values of the test data with the average attribute values as much as k neighborhood of each training data class. The test data will be placed on the target class which has the highest mean local vector value. In the k fuzzy k-nearest primitive neighbor, the determination of the target class is performed by calculating the degree of membership of a test pattern against each target class. The target class with the largest degree of degree of membership for a test pattern will be selected as the target class for the test pattern it is placed. This approach still inherits the nature of the vote majority because in the determination of its class is still determined by the number of dominant classes of nearby neighbors although there is a degree of membership that calculates the weight of test data against each target class. Fuzzy k-nearest neighbor algorithm with local mean using both class determination techniques. Having calculated the degree of membership and the local mean vector of the test pattern on the training pattern, the next step is to divide the mean local vector value by degree of membership. This division is done to minimize the vector distance between test patterns with each pattern of training and with the target class so that it will be clearly visible to the class where a pattern will be recognized. 5. CONCLUSIONS It can concluded that Local Mean Vector on K-Nearest Neighbor divided by a fuzzy membership function can minimize Euclidean distance so there would be a tendency of data testing against a target class. With shrinking Euclidean distance, the more it will bring up the similarity of the data to a class and accuracy of the classification results will be higher. In addition, the local mean model of the Fuzzy K-Nearest Neighbor algorithm can also strengthen the algorithm for the high value of k. High fetching k creates an algorithm competing with each other to retrieve data to fit in a class across a wide range of neighborhoods. With the local mean, class determination is based not only on the distance and degree of membership, but also the average attribute value for each target class editor@iaeme.com

7 REFERENCES Hafizh Al-Kautsar Aidilof, Muhammad Zarlis, Syahril Efendi [1] Beyan, C & Ogul, H A Fuzzy K-NN Approach for Cancer Diagnosis with Microarray Gene Expression Data. [2] Bhatia, N. & Vandana., Survey of Nearest Neighbor Techniques. International Journal of Computer Science and Information Security (IJCSIS) : [3] García-Pedrajas, N. & Ortiz-Boyer, D Boosting K-Nearest Neighbor Classifier By Means Of Input Space Projection. Expert System With Application : [4] Gou, J., Yi, Z., Du, L. & Xiong, T A Local Mean-Based k-nearest Centroid Neighbor Classifier. The Computer Journal 55 (9) : [5] Jabbar, M.A., Deekshatulu, B.L. & Chandra. P Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm. International Conference on Computational Intelligence: Modeling Techniques and Applications (CIMTA) : [6] Keller, J.M., Gray, M.R. & Givens, J.A. A Fuzzy K-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man, and Cybernetics SMC-15 (4) : [7] Ougiaroglou, S. & Evangelidis, G Fast and Accuratek-Nearest Neighbor Classification using Prototype Selection by Clustering. Panhellenic Conference on Informatics. [8] Pan, Z., Wang, Y. & Ku, W A New K-Harmonic Nearest Neighbor Classifier based on the Multi Local Means. [9] Pan, Z., Wang, Y. & Ku, W A New General Nearest Neighbor Classification Based On The Mutual Neighborhood Information. Knowledge-Based Systems : [10] Raikwal, J.S. & Saxena, K Performance Evaluation of SVM and K-Nearest Neighbor Algorithm over Medical Dataset. International Journal of Computer Applications 50 (14) : [11] Rosyid, H., Prasetyo, E. & Agustin, S Perbaikan Akurasi Fuzzy K-Nearest Neighbor In Every Class menggunakan Fungsi Kernel. Seminar Nasional Teknologi Informasi dan Multimedia 2013, pp [12] Sánchez, A.S., Iglesias-Rodríguez, F.J., Fernándes, P.R. & Juez, F.J.de.C Applying The K-Nearest Neighbor Technique To The Classification Of Workers According To Their Risk Of Suffering Musculoskeletal Disorders. International Journal of Indsutrial Ergonomics : 1-8. [13] Song, Y., Liang, J., Lu, J. & Zhao, X An Efficient Instance Selection Algorithm For K Nearest Neighbor Regression. Neurocomputing : 26-34, Volume : 251. [14] Tu, L., Wei, H. & Ai, L Galaxy and Quasar Classificication Based on Local Mean based K-Nearest Neighbor Method. IEEE, /15. [15] Wang. J., Neskovic. P. & Cooper L.N., Improving Nearest Neighbor Rule With A Simple Adaptive Distance Measure. Pattern Recognition Letter : , vol 28. [16] Wu, X. & Kumar, V The Top Ten Algorithms in Data Mining. CRC Press : Boca Raton, USA. [17] Zbancioc, M. & Feraru, S.M Emotion Recognition of the SROL Romanian Database using Fuzzy KNN Algorithm. IEEE, / editor@iaeme.com

Improving the accuracy of k-nearest neighbor using local mean based and distance weight

Journal of Physics: Conference Series PAPER OPEN ACCESS Improving the accuracy of k-nearest neighbor using local mean based and distance weight To cite this article: K U Syaliman et al 2018 J. Phys.: Conf.