A Multi-threading in Prolog to Implement K-means Clustering

Size: px
Start display at page:

Download "A Multi-threading in Prolog to Implement K-means Clustering"

Transcription

1 A Multi-threading in Prolog to Implement K-means Clustering SURASITH TAOKOK, PRACH PONGPANICH, NITTAYA KERDPRASOP, KITTISAK KERDPRASOP Data Engineering Research Unit, School of Computer Engineering Suranaree University of Technology 111 University Avenue, Muang District, Nakhon Ratchasima 30000, THALAND Abstract: - Prolog presented in this paper is a logic programming language with multi-threading support. Several programmers use multi-threading to improve execution time. This is due to the fact that the multithreads can split tasks to work in concurrency. In this paper, we propose an algorithm and its implementation of k-means clustering with multi-threading using Prolog. The main objective is to speedup execution time. The experimentation compares k-means and multi-thread k-means, as well as percentage of speedup time execution. The result has proved our claim. Key-Words: - Data Mining, Clustering, Modified k-means, Multi-thread k-means, Logic programming, Prolog 1 Introduction Prolog is a general purpose logic programming language. Many Prolog compilers support for multi-threaded such as SWI-Prolog, SICStus Prolog, CIAO Prolog and Qu-Prolog. In this paper use SWI-Prolog [2] to implement k-means clustering algorithm. SWI-Prolog is an open source and multi-threading support available for Linux, Windows and Macintosh platform. We can profit multi-thread Prolog by splitting a large task into subtasks that are speedup on multi-core processors. The k-means clustering algorithm is an unsupervised learning method that separates data points into groups. The time complexity of k- means depends on the number of data points and the number of clusters and the number of iterations. Computational complexity of k-means is O(nkt), when n is number of data points, k is number of clusters and t is number of iteration until the centroids are stable. However when data has big size the time complexity are raising and the traditional k-means algorithm does not efficiency. We introduce the propose multi-thread in Prolog [1] [4] apply to k-means [6] algorithm called multi-thread k-mean algorithm (MTK). The MTK is not parallel k-means [3] [5] [7] algorithm; it has new re-designed at sub process original k- means algorithm to support multi-thread. That process is the calculating new centroids step. We create threads to distribute tasks so calculating all new centroids concurrency. The organization of the rest of this paper is as follows. Discussion of related work in developing a multi-thread, k-means and parallel k-means is presented in Section 2. Our proposed algorithms, a multi-thread k-means, are explained in Section 3. The implementation (a complete source code is available in the appendix) and experimental results are demonstrated in Section 4. The conclusion as well as future research direction appears as a last section. 2 Related Works The k-means algorithm [6] was presented by J.B. MacQueen in 1976 and then it has applied to many several applications. Then the technology of multicore processors has been created and applied to [4] and support the parallel k-means algorithm [5]. J. Wielemaker [1] presents the multi-thread in SWI-Prolog, their works show speedup when running multiple thread on multi-core processor. Manasi N. Joshi [5] presents the parallel k- means algorithm with messsage passing interface (MPI) on distributed mermory and multiprocessors system (Sun workstations). Their method can take advantage from multiprocessor environment. B. Hohlt [8] introduces a parallel k-means algorithm implemented with C++ and pthread. N. Kerdprasop and K. Kerdprasop [3] propose the parallel k-means implemented with Erlang. Their experimental results show the speedup when clustering on large dataset ISBN:

2 Hence in this paper we proposed the MTK algorithm and implement it as Logic programming as Prolog multi-thread methodology. 3 Proposed Algorithm K-means algorithm [6] start with random the initialization k centroids. Then assign data point to the nearest cluster and re-compute the new centroids of each k clusters. If the new centroids are not stables its will be iterate assign data point to nearest cluster and re-compute new centroids again until the new centroids are stabled. The k- means algorithm is shown in Algorithm 1. Algorithm 1 K-means (KM) Input: number of clustering and a set of data points Output: K-centroids and members of each cluster Steps 1. Select initial centroid C=<C 1,C 2,,C K > 2. Repeat 2.1 Assign each data point to its nearest cluster center 2.2 Re-compute the cluster centers using the current cluster memberships 3. Until there is no further change in the assignment of the data points to new cluster centers The k-means algorithm has some gaps to distribute the sequence works to do its together. We found that, when the k-means re-compute new centroids after assign all data point to each cluster, they can compute the new centroids are concurrency. We proposed the multi-threading method adapt to k-means algorithm. The pseudo code of modify k-means algorithm (Multi-thread k-means) apply to support the concurrency is shown in Algorithm 2. Algorithm 2 Multi-thread k-means (MTK) Input: number of clustering and a set of data points Output: K-centroids and members of each cluster Steps 1. Select initial centroid C=<C 1,C 2,,C K > 2. Assign each data point to nearest cluster center 3. Create threads process T=<T 1,T 2,,T K > for centroid C=<C 1,C 2, C K > 4. For each Thread (T i=1 to T K ) 4.1 Re-compute cluster centers <T i :cal(c i )> 4.2 Return a new centroid C i to set C 5. Check stable of centroids 5.1 if C!= C' then set C = C' go to step if C == C' then stop and return C and cluster members The MKT algorithm was added the multithread process at the re-compute process. The recompute process is the main process master and responsible for create threads, sending a set of data point each cluster to thread, and recalculating the new centroids. The re-compute process repeat as long as the old and new centroids do not converge and multi-threading process just invoke every time when re-compute process started. The re-compute process and multi-threading process can be graphically shown in Fig.1 Fig.1 A diagram illustrating the communication between master process and threads 4 Experimental and Results We implement the proposed algorithms with Prolog language (SWI-Prolog standard). The implementation of KM and MTK algorithms as a Prolog program is given in appendix. A screenshot of running the program (SWI- Prolog Multi-threaded, 32 bits, Version ) is in Fig.2. To running the program we use the command. cluster(k). The argument K is the number of clusters and before running the program the data file points.pl must exist in working directory. And data format with following predicate and data point list in 3 dimensional. Or item([[p1],[p2],,[pk]]). item([ [-4,8,-7],[-9,0,-5],[8,4,4], [9,5,6],[-4,-5,-7],[-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). ISBN:

3 A predicate item([[p1],[p2],,[pk]]) is a set of data points for clustering with the KM and MTK implementations program. when the numbers of cluster are increasing. The different running time is shows in Fig.3 Fig.3 Running time comparisons of KM versus MTK with 10,000 data points Fig.2 A screenshot to illustrate running the MTK program with generate 2 centroids The screenshot in Fig.2 show a command to running MTK algorithm program for 2 clusters classification. Then show the running time usage after finished clustering. 4.1 Performance of Multi-thread k-means We evaluate performances of the proposed KM and MTK algorithms on synthetic three dimensional dataset. The computational speed of k-means as compared to multi-thread k-means is given in Table 1. Experimental are performed on Laptop computer with the processor intel(r) Core(TM) i GHz, 4Gb of memory, and Windows 7 32-bit operating system. The numbers of synthetic data points are 10,000 points. Table 1 Execution time of KM versus MTK with 10,000 data points (The number of clusters is equivalent the number of threads) Number of Cluster Times (Sec) KM MTK Speedup (%) The results experimental observable running time of KM and MTK by used the 10,000 data points the percentage of running time speedup average more than 30% which the different number of clusters test (2 to 10 clusters). Percentage of speedup between KM and MTK is shown in Fig.4 Fig.4 Percentage of running time speedup different number of clusters with 10,000 data points 4.2 Speedup of Multi-thread k-means This section, we test to evaluate percentage of running time speedup. We prepare series of data set include 500, 1,000, 2,000, 3,000, 4,000, 5,000, 8,000, 10,000, and 12,000 points of data and 3 dimensional. The experimental use different number of clusters each dataset, the number of cluster for test are 2, 4, 6, 8, and 10 clusters. The results of experiment different running time are shown in Table 2 and Table 3. And the percentage of running time speedup is shown in Table 4. The results from Table 1 show that the running time of MTK is faster than KM. And also the running time is more different increase speedup ISBN:

4 Table 2 Execution time of KM versus MTK in 2, 4 and 6 clusters Data Running Time (Sec) 2 Clusters 4 Clusters 6 Cluster K TK K TK K TK , , , , , , , , Table 3 Execution time of KM versus MTK in 8 and 10 clusters Running Time (Sec) Data 8 Clusters 10 Clusters K TK K TK , , , , , , , , Table 4 Speedup percentage of different number of clusters and data sizes Data Speedup percentage (%) via Number of Clusters , , , , , , , , Percentage execution time speed up of the experimental is shown in Fig. 5. It can be noticed from experimental results that if the number of data points and clusters are increase the percentage of speedup running time are increase too. Fig.5 Comparison of speedup percentage at different data sizes and number of threads 5 Conclusion Nowadays the processors are mostly multi-core processing. And traditional programming and algorithm are not work efficiency and effective with the hardware. K-means clustering is the most well-known algorithm commonly used for clustering data. The k-means algorithm is simple but it s not more effective if implement with traditional style. In this paper we propose the design and implementation of KM and MTK with logic programming. The MTK algorithm is modified from KM by integrations multi-threading process into the algorithm. The experimental results reveal that the multithreading method considerably speedups the computation time, especially with tested with multi-core processors. Our future work will focus on the real parallel k-means algorithm and applications. 6 Appendix Source codes presented in this section are in SWI- Prolog format. A line preceded with "%" is a comment. We provide two versions of clustering programs: k-means and multi-thread k-means. Each program starts with comments explaining how to run the program. K-means Clustering % files "points.pl" must exist in working directory % example of data file: % item([ [-4,8,-7], [-9,0,-5], [8,4,4], [9,5,6], [-4,-5,-7], % [-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). ISBN:

5 % Then test a program with this command % cluster(2). %% use 2 or more %% -- Reserve memories :-set_prolog_stack(global,limit(3*10**9)), set_prolog_stack(local,limit(4*10**9)). %% -- Main program cluster(k):- ensure_loaded('points.pl'), pc_time(h1-m1-s1), item(item), initial(item,k,mean), writeln(mean), kmean(item,mean), pc_time(h2-m2-s2), TS1 is H1*60*60+M1*60+S1, TS2 is H2*60*60+M2*60+S2, DTS is TS2 - TS1, writeln(time-dts). %% -- Return the execution times %% -- example time-15.9 pc_time(ct):- get_time(t), stamp_date_time(t, date(_, _, _, H, M, S, 0, 'UTC', -), 'UTC'), CT = H-M-S. %% -- Initial Centroid pick from a set of data lis initial(_,0,[]):-!. initial([hitem Titem],K,[Hitem Tmean]):- Nk is K - 1, initial(titem,nk,tmean). %% -- K-means work kmean(item,mean):- calculate_dist(item,mean,caleditem), split_item(mean,caleditem,splititem), calculate_mean(splititem,newmean), writeln(newmean), ( Mean = NewMean -> true,!; kmean(item,newmean) ),!. %% -- Calculate distance and assign %% -- each point to nearest cluster calculate_dist([],_,[]):-!. calculate_dist([hitem Titem],Mean,[Hitem- SelMean TSelMean]):- calculating(hitem,mean,dist), select_cluster(dist,mean,selmean), calculate_dist(titem,mean,tselmean). %% -- Euclidian distance with 3 Dimensional data calculating(_,[],[]):-!. calculating([hi1,hi2,hi3], [[Hm1,Hm2,Hm3] Tmean], [Dist Tdist]):- Caler is (Hi1-Hm1)^2 + (Hi2-Hm2)^2 + (Hi3-Hm3)^2, sqrt(caler,dist), calculating([hi1,hi2,hi3],tmean,tdist). %% -- Each point choose nearest cluster select_cluster([_],[mean],mean):-!. select_cluster([hd1,hd2 Tdist], [Hm1,Hm2 Tmean], SelMean):- (Hd1 < Hd2 -> select_cluster([hd1 Tdist], [Hm1 Tmean], SelMean) ; select_cluster([hd2 Tdist], [Hm2 Tmean], SelMean) ). %% -- splited data to classify cluster split_item([],_,[]):-!. split_item([hm Mean], CaledItem,[Splited SplitItem]):- spliting(hm,caleditem,splited), split_item(mean,caleditem,splititem). spliting(_,[],[]):-!. spliting(mean,[hitem-selmean Titem],Splited):- spliting(mean,titem,tsplited), ( Mean = SelMean -> Splited = [Hitem TSplited] ; Splited = TSplited ). %% -- Re-compute new Centroid value calculate_mean([],[]):-!. calculate_mean([hs SplitItem],[HR NewMean]):- cal_mean(hs,hr), calculate_mean(splititem,newmean). cal_mean(l,r):- mean_me(0,[0,0,0],l,r). mean_me(n,[sx,sy,sz],[[x,y,z] T],R):- NN is N + 1, NSx is Sx + X, NSy is Sy + Y, NSz is Sz + Z, mean_me(nn,[nsx,nsy,nsz],t,r). mean_me(n,[sx,sy,sz],[],[rsx,rsy,rsz]):- RSx is Sx / N, RSy is Sy / N, RSz is Sz / N. % End of K-means % Multi-thread K-means Clustering % K-means Clustering % % data files "points.pl" must exist in working directory % example of data file: % item([ [-4,8,-7], [-9,0,-5], [8,4,4], [9,5,6], [-4,-5,-7], % [-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). % Then test a program with this command % cluster(2). %% use 2 or more/ is a number of clusters %% Reserve memories :-set_prolog_stack(global,limit(2*10**9)), set_prolog_stack(local,limit(2*10**9)). %% -- Main program cluster(k):- ensure_loaded('points.pl'), pc_time(h1-m1-s1), item(item), initial(item,k,mean), writeln(mean), kmean(item,mean), pc_time(h2-m2-s2), TS1 is H1*60*60+M1*60+S1, TS2 is H2*60*60+M2*60+S2, DTS is TS2 - TS1, writeln(time-dts). ISBN:

6 %% -- Return the execution times %% -- example time-15.9 pc_time(ct):- get_time(t), stamp_date_time(t, date(_, _, _, H, M, S, 0, 'UTC', -), 'UTC'), CT = H-M-S. %% -- Initial Centroid pick from a set of data lis initial(_,0,[]):-!. initial([hitem Titem],K,[Hitem Tmean]):- Nk is K - 1, initial(titem,nk,tmean). %% -- Multi-trhead K-means work kmean(item,mean):- calculate_dist(item,mean,caleditem), split_item(mean,caleditem,splititem), calculate_mean(splititem,tl), wait_for_threads(tl,newmean), writeln(newmean), ( intersection(mean,newmean,mean) -> true,! ; kmean(item,newmean) ),!. %% -- Calculate distance and assign each point %% -- to nearest cluster calculate_dist([],_,[]):-!. calculate_dist([hitem Titem],Mean,[Hitem- SelMean TSelMean]):- calculating(hitem,mean,dist), select_cluster(dist,mean,selmean), calculate_dist(titem,mean,tselmean). %% -- Euclidian distance with 3 Dimensional data calculating(_,[],[]):-!. calculating([hi1,hi2,hi3], [[Hm1,Hm2,Hm3] Tmean], [Dist Tdist]):- Caler is (Hi1-Hm1)^2 + (Hi2-Hm2)^2 + (Hi3-Hm3)^2, sqrt(caler,dist), calculating([hi1,hi2,hi3],tmean,tdist). %% -- Each point choose nearest cluster select_cluster([_],[mean],mean):-!. select_cluster([hd1,hd2 Tdist], [Hm1,Hm2 Tmean], SelMean):- (Hd1 < Hd2 -> select_cluster([hd1 Tdist], [Hm1 Tmean], SelMean) ; select_cluster([hd2 Tdist], [Hm2 Tmean], SelMean) ). %% -- splited data to classify cluster split_item([],_,[]):-!. split_item([hm Mean], CaledItem,[Splited SplitItem]):- spliting(hm,caleditem,splited), split_item(mean,caleditem,splititem). spliting(_,[],[]):-!. spliting(mean,[hitem-selmean Titem],Splited):- spliting(mean,titem,tsplited), ( Mean = SelMean -> Splited = [Hitem TSplited] ; Splited = TSplited ). %% -- Re-compute new Centroid value %% -- In this section create on thread %% -- per one cluster re-computer new centroid calculate_mean([],[]):-!. calculate_mean([hs SplitItem],[T0 TL1]):- calculate_mean(splititem,tl1), thread_create(cal_mean(hs), T0, []). cal_mean(l):- mean_me(0,[0,0,0],l,r), assert(mean(r)). mean_me(n,[sx,sy,sz],[[x,y,z] T],R):- NN is N + 1, NSx is Sx + X, NSy is Sy + Y, NSz is Sz + Z, mean_me(nn,[nsx,nsy,nsz],t,r). mean_me(n,[sx,sy,sz],[],[rsx,rsy,rsz]):- RSx is Sx / N, RSy is Sy / N, RSz is Sz / N. %% -- Wait for all thread completed work. wait_for_threads([],[]):-!. wait_for_threads([t TL],NewMean) :- ( thread_join(t, true) -> mean(nm), retract(mean(nm)), wait_for_threads(tl,tm), NewMean = [NM TM] ; wait_for_threads([t TL],NewMean) ). % End of Multi-thread K-means % References: [1] J. Wielemaker, Native Preemptive Threads in SWI-Prolog, ICLP. Volume 2916 of Lecture Notes in Computer Science., Springer (2003), pp [2] J. Wielemaker, T. Schrijvers, M. Triska, T. Lager, SWI-Prolog, Accepted for publication in TPLP, [3] N. Kerdprasop and K. Kerdprasop, A Lightweight Method to Parallel K-means Clustering, International Journal of Mathematics and Computers in Simulation, Issue 4, Volume 4, 2010, pp [4] M. Carro and M. Hermenegildo, Concurrency in Prolog Using Threads and a Shared Database, International Conference on Logic Programming, [5] M. Joshi, Parallel K-means Algorithm on Distributed Memory Multiprocessors, Technical Report, University of Minnesota, 2003, pp [6] J. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp [7] A. Prasad, Parallelization of K-means Clustering Algorithm, Project Report, University of Colorado, 2007, pp [8] B. Hohlt, Pthread Parallel K-means, CS267 Applications of Parallel Computing, UC Berkeley, 2001 ISBN:

Parallelization of K-Means Clustering on Multi-Core Processors

Parallelization of K-Means Clustering on Multi-Core Processors Parallelization of K-Means Clustering on Multi-Core Processors Kittisak Kerdprasop and Nittaya Kerdprasop Data Engineering and Knowledge Discovery (DEKD) Research Unit School of Computer Engineering, Suranaree

More information

Parallel Implementation of K-Means on Multi-Core Processors

Parallel Implementation of K-Means on Multi-Core Processors Parallel Implementation of K-Means on Multi-Core Processors Fahim Ahmed M. Faculty of Science, Suez University, Suez, Egypt, ahmmedfahim@yahoo.com Abstract Nowadays, all most personal computers have multi-core

More information

Parallel Customer Clustering: A Computational Performance Study

Parallel Customer Clustering: A Computational Performance Study Review of Integrative Business and Economics Research, Vol. 6, NRRU special issue 39 Parallel Customer Clustering: A Computational Performance Study Nittaya Kerdprasop* Naritsara Tharaputh Supagrid Tangsermsit

More information

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process

Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree

More information

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm Rashmi C a ahigh-performance Computing Project, Department of Studies in Computer Science, University of Mysore,

More information

Pthread Parallel K-means

Pthread Parallel K-means Pthread Parallel K-means Barbara Hohlt CS267 Applications of Parallel Computing UC Berkeley December 14, 2001 1 Introduction K-means is a popular non-hierarchical method for clustering large datasets.

More information

The Clustering Validity with Silhouette and Sum of Squared Errors

The Clustering Validity with Silhouette and Sum of Squared Errors Proceedings of the 3rd International Conference on Industrial Application Engineering 2015 The Clustering Validity with Silhouette and Sum of Squared Errors Tippaya Thinsungnoen a*, Nuntawut Kaoungku b,

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data

Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data Keerachart Suksut, Kittisak Kerdprasop, and Nittaya Kerdprasop Abstract Algorithms for data classification are normally

More information

Parallel K-means Clustering. Ajay Padoor Chandramohan Fall 2012 CSE 633

Parallel K-means Clustering. Ajay Padoor Chandramohan Fall 2012 CSE 633 Parallel K-means Clustering Ajay Padoor Chandramohan Fall 2012 CSE 633 Outline Problem description Implementation MPI Implementation OpenMP Test Results Conclusions Future work Problem Description Clustering

More information

Clustering Algorithm with Asynchronous Programming

Clustering Algorithm with Asynchronous Programming American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-6, Issue-8, pp-286-294 www.ajer.org Research Paper Clustering Algorithm with Asynchronous Programming Ohidujjaman

More information

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process Vol.133 (Information Technology and Computer Science 2016), pp.79-84 http://dx.doi.org/10.14257/astl.2016. Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Concurrent Programming Constructs and First-Class Logic Engines

Concurrent Programming Constructs and First-Class Logic Engines Concurrent Programming Constructs and First-Class Logic Engines Paul Tarau University of North Texas tarau@cs.unt.edu Multi-threading has been adopted in today s Prolog implementations as it became widely

More information

I. INTRODUCTION II. RELATED WORK.

I. INTRODUCTION II. RELATED WORK. ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique

More information

Effective Learning and Classification using Random Forest Algorithm CHAPTER 6

Effective Learning and Classification using Random Forest Algorithm CHAPTER 6 CHAPTER 6 Parallel Algorithm for Random Forest Classifier Random Forest classification algorithm can be easily parallelized due to its inherent parallel nature. Being an ensemble, the parallel implementation

More information

K+ Means : An Enhancement Over K-Means Clustering Algorithm

K+ Means : An Enhancement Over K-Means Clustering Algorithm K+ Means : An Enhancement Over K-Means Clustering Algorithm Srikanta Kolay SMS India Pvt. Ltd., RDB Boulevard 5th Floor, Unit-D, Plot No.-K1, Block-EP&GP, Sector-V, Salt Lake, Kolkata-700091, India Email:

More information

Saudi Journal of Engineering and Technology. DOI: /sjeat ISSN (Print)

Saudi Journal of Engineering and Technology. DOI: /sjeat ISSN (Print) DOI:10.21276/sjeat.2016.1.4.6 Saudi Journal of Engineering and Technology Scholars Middle East Publishers Dubai, United Arab Emirates Website: http://scholarsmepub.com/ ISSN 2415-6272 (Print) ISSN 2415-6264

More information

Introduction of Clustering by using K-means Methodology

Introduction of Clustering by using K-means Methodology ISSN: 78-08 Vol. Issue 0, December- 0 Introduction of ing by using K-means Methodology Niraj N Kasliwal, Prof Shrikant Lade, Prof Dr. S. S. Prabhune M-Tech, IT HOD,IT HOD,IT RKDF RKDF SSGMCE Bhopal,(India)

More information

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

Predictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor*

Predictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor* Predictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor* Tyler Viswanath Krishnamurthy, and Hridesh Laboratory for Software Design Department of Computer Science Iowa State University

More information

Introduction to Parallel Programming

Introduction to Parallel Programming Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Parallel Clustering of Gene Expression Dataset in Multicore Environment

Parallel Clustering of Gene Expression Dataset in Multicore Environment Parallel Clustering of Gene Expression Dataset in Multicore Environment Pranoti Kamble, Prof. Rakhi Wajgi Dept. of Computer Technology, Yeshwantrao Chavan College of Engineering, Nagpur, India Professor,

More information

Parallel K-Means Clustering with Triangle Inequality

Parallel K-Means Clustering with Triangle Inequality Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA

More information

Enhancing the Efficiency of Radix Sort by Using Clustering Mechanism

Enhancing the Efficiency of Radix Sort by Using Clustering Mechanism Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

A Review of K-mean Algorithm

A Review of K-mean Algorithm A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster

More information

Test Report. May Executive Summary. Product Evaluation: Diskeeper Professional Edition vs. Built-in Defragmenter of Windows Vista

Test Report. May Executive Summary. Product Evaluation: Diskeeper Professional Edition vs. Built-in Defragmenter of Windows Vista Test Report May 2009 Sponsored by: Diskeeper Corporation Executive Summary Product Evaluation: Diskeeper Professional Edition vs. Built-in Defragmenter of Windows Vista Inside Test Environment Test Methodology

More information

Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface

Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface A Comparative Study of Classification Methods in Data Mining using RapidMiner Studio Vishnu Kumar Goyal Dept. of Computer Engineering Govt. R.C. Khaitan Polytechnic College, Jaipur, India vishnugoyal_jaipur@yahoo.co.in

More information

The Fuzzy Search for Association Rules with Interestingness Measure

The Fuzzy Search for Association Rules with Interestingness Measure The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of

More information

Parallelization of Graph Isomorphism using OpenMP

Parallelization of Graph Isomorphism using OpenMP Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to

More information

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

More information

Master-Worker pattern

Master-Worker pattern COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

Data Mining Algorithms In R/Clustering/K-Means

Data Mining Algorithms In R/Clustering/K-Means 1 / 7 Data Mining Algorithms In R/Clustering/K-Means Contents 1 Introduction 2 Technique to be discussed 2.1 Algorithm 2.2 Implementation 2.3 View 2.4 Case Study 2.4.1 Scenario 2.4.2 Input data 2.4.3 Execution

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

More information

Dimension reduction : PCA and Clustering

Dimension reduction : PCA and Clustering Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental

More information

Cluster-Based Sequence Analysis of Complex Manufacturing Process

Cluster-Based Sequence Analysis of Complex Manufacturing Process Cluster-Based Sequence Analysis of Complex Manufacturing Process Kittisak Kerdprasop and Nittaya Kerdprasop, Member, IAENG Abstract Wafer fabrication in the semiconductor industry is probably one of the

More information

K-Means Clustering Using Localized Histogram Analysis

K-Means Clustering Using Localized Histogram Analysis K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many

More information

Fast Efficient Clustering Algorithm for Balanced Data

Fast Efficient Clustering Algorithm for Balanced Data Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut

More information

The Application of K-medoids and PAM to the Clustering of Rules

The Application of K-medoids and PAM to the Clustering of Rules The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research

More information

A Parallel Evolutionary Algorithm for Discovery of Decision Rules

A Parallel Evolutionary Algorithm for Discovery of Decision Rules A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl

More information

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1

Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1 3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao

More information

COMPARATIVE ANALYSIS OF PARALLEL K MEANS AND PARALLEL FUZZY C MEANS CLUSTER ALGORITHMS

COMPARATIVE ANALYSIS OF PARALLEL K MEANS AND PARALLEL FUZZY C MEANS CLUSTER ALGORITHMS COMPARATIVE ANALYSIS OF PARALLEL K MEANS AND PARALLEL FUZZY C MEANS CLUSTER ALGORITHMS 1 Juby Mathew, 2 Dr. R Vijayakumar Abstract: In this paper, we give a short review of recent developments in clustering.

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

INITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH

INITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH Volume 118 No. 18 2018, 1565-1570 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu INITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

High Performance Multithreaded Model for Stream Cipher

High Performance Multithreaded Model for Stream Cipher 228 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 High Performance Multithreaded Model for Stream Cipher Khaled M. Suwais and Azman Bin Samsudin School of

More information

Accelerating K-Means Clustering with Parallel Implementations and GPU computing

Accelerating K-Means Clustering with Parallel Implementations and GPU computing Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam

More information

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION

AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO

More information

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization Automatic Categorization of Image Regions using Dominant Color based Vector Quantization Md Monirul Islam, Dengsheng Zhang, Guojun Lu Gippsland School of Information Technology, Monash University Churchill

More information

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi

More information

Parallelization of K-Means Clustering Algorithm for Data Mining

Parallelization of K-Means Clustering Algorithm for Data Mining Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

MULTICORE LEARNING ALGORITHM

MULTICORE LEARNING ALGORITHM MULTICORE LEARNING ALGORITHM CHENG-TAO CHU, YI-AN LIN, YUANYUAN YU 1. Summary The focus of our term project is to apply the map-reduce principle to a variety of machine learning algorithms that are computationally

More information

Clustering algorithms and autoencoders for anomaly detection

Clustering algorithms and autoencoders for anomaly detection Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms

More information

An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing

An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing Shuzhi Nie Abstract Clustering is one of the most effective algorithms in data analysis and management.

More information

Classifying Documents by Distributed P2P Clustering

Classifying Documents by Distributed P2P Clustering Classifying Documents by Distributed P2P Clustering Martin Eisenhardt Wolfgang Müller Andreas Henrich Chair of Applied Computer Science I University of Bayreuth, Germany {eisenhardt mueller2 henrich}@uni-bayreuth.de

More information

A Parallel Community Detection Algorithm for Big Social Networks

A Parallel Community Detection Algorithm for Big Social Networks A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic

More information

Parallel Computing of Shared Memory Multiprocessors Based on JOMP

Parallel Computing of Shared Memory Multiprocessors Based on JOMP International Conference on Mechatronics, Electronic, Industrial and Control Engineering (MEIC 2015) Parallel Computing of Shared Memory Multiprocessors Based on JOMP ZHANG Hong College of Electrical &

More information

Iterative random projections for high-dimensional data clustering

Iterative random projections for high-dimensional data clustering Iterative random projections for high-dimensional data clustering Ângelo Cardoso, Andreas Wichert INESC-ID Lisboa and Instituto Superior Técnico, Technical University of Lisbon Av. Prof. Dr. Aníbal Cavaco

More information

Parallel K-Means Algorithm for Shared Memory Multiprocessors

Parallel K-Means Algorithm for Shared Memory Multiprocessors Journal of Computer and Communications, 2014, 2, 15-23 Published Online September 2014 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2014.211002 Parallel K-Means Algorithm for

More information

Scalability of Efficient Parallel K-Means

Scalability of Efficient Parallel K-Means Scalability of Efficient Parallel K-Means David Pettinger and Giuseppe Di Fatta School of Systems Engineering The University of Reading Whiteknights, Reading, Berkshire, RG6 6AY, UK {D.G.Pettinger,G.DiFatta}@reading.ac.uk

More information

New Approach for K-mean and K-medoids Algorithm

New Approach for K-mean and K-medoids Algorithm New Approach for K-mean and K-medoids Algorithm Abhishek Patel Department of Information & Technology, Parul Institute of Engineering & Technology, Vadodara, Gujarat, India Purnima Singh Department of

More information

Asynchronous Multi-Task Learning

Asynchronous Multi-Task Learning Asynchronous Multi-Task Learning Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou December 14th, 2016 ICDM 2016 Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou 1 Outline 1 Introduction 2 Solving

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college

More information

Parallel Implementaton of the Weibull

Parallel Implementaton of the Weibull Journal of Environmental Protection and Ecology 15, No 1, 287 292 (2014) Computer applications on environmental information system Parallel Implementaton of the Weibull Distribution Parameters Estimator

More information

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS

OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS DEEVI RADHA RANI Department of CSE, K L University, Vaddeswaram, Guntur, Andhra Pradesh, India. deevi_radharani@rediffmail.com NAVYA DHULIPALA

More information

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction

Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant

More information

Incremental K-means Clustering Algorithms: A Review

Incremental K-means Clustering Algorithms: A Review Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering

More information

Enhanced Bug Detection by Data Mining Techniques

Enhanced Bug Detection by Data Mining Techniques ISSN (e): 2250 3005 Vol, 04 Issue, 7 July 2014 International Journal of Computational Engineering Research (IJCER) Enhanced Bug Detection by Data Mining Techniques Promila Devi 1, Rajiv Ranjan* 2 *1 M.Tech(CSE)

More information

Problem Set 4. Danfei Xu CS 231A March 9th, (Courtesy of last year s slides)

Problem Set 4. Danfei Xu CS 231A March 9th, (Courtesy of last year s slides) Problem Set 4 Danfei Xu CS 231A March 9th, 2018 (Courtesy of last year s slides) Outline Part 1: Facial Detection via HoG Features + SVM Classifier Part 2: Image Segmentation with K-Means and Meanshift

More information

BRACE: A Paradigm For the Discretization of Continuously Valued Data

BRACE: A Paradigm For the Discretization of Continuously Valued Data Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

732A54/TDDE31 Big Data Analytics

732A54/TDDE31 Big Data Analytics 732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks

More information

Patent Image Retrieval

Patent Image Retrieval Patent Image Retrieval Stefanos Vrochidis IRF Symposium 2008 Vienna, November 6, 2008 Aristotle University of Thessaloniki Overview 1. Introduction 2. Related Work in Patent Image Retrieval 3. Patent Image

More information

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering

A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering Nghiem Van Tinh 1, Vu Viet Vu 1, Tran Thi Ngoc Linh 1 1 Thai Nguyen University of

More information

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913)

FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) Overview The partitioning of data points according to certain features of the points into small groups is called clustering.

More information

Parallel Stochastic Gradient Descent

Parallel Stochastic Gradient Descent University of Montreal August 11th, 2007 CIAR Summer School - Toronto Stochastic Gradient Descent Cost to optimize: E z [C(θ, z)] with θ the parameters and z a training point. Stochastic gradient: θ t+1

More information

An Initialization Method for the K-means Algorithm using RNN and Coupling Degree

An Initialization Method for the K-means Algorithm using RNN and Coupling Degree An Initialization for the K-means Algorithm using RNN and Coupling Degree Alaa H. Ahmed Faculty of Engineering Islamic university of Gaza Gaza Strip, Palestine Wesam Ashour Faculty of Engineering Islamic

More information

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications

Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for

More information

Awos Kanan B.Sc., Jordan University of Science and Technology, 2003 M.Sc., Jordan University of Science and Technology, 2006

Awos Kanan B.Sc., Jordan University of Science and Technology, 2003 M.Sc., Jordan University of Science and Technology, 2006 Optimized Hardware Accelerators for Data Mining Applications by Awos Kanan B.Sc., Jordan University of Science and Technology, 2003 M.Sc., Jordan University of Science and Technology, 2006 A Dissertation

More information

Analysis of Parallelization Techniques and Tools

Analysis of Parallelization Techniques and Tools International Journal of Information and Computation Technology. ISSN 97-2239 Volume 3, Number 5 (213), pp. 71-7 International Research Publications House http://www. irphouse.com /ijict.htm Analysis of

More information

Monika Maharishi Dayanand University Rohtak

Monika Maharishi Dayanand University Rohtak Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures

More information

Record Linkage using Probabilistic Methods and Data Mining Techniques

Record Linkage using Probabilistic Methods and Data Mining Techniques Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

Introduction to pthreads

Introduction to pthreads CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides

More information

Cost-sensitive Boosting for Concept Drift

Cost-sensitive Boosting for Concept Drift Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems

More information

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection

Time Series Clustering Ensemble Algorithm Based on Locality Preserving Projection Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016 Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the

More information

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 1 Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam-

More information

Comparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other)

Comparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other) (IJRI) International Journal of dvanced Research in rtificial Intelligence, omparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other)

More information