A Multi-threading in Prolog to Implement K-means Clustering
|
|
- Morris Joseph
- 6 years ago
- Views:
Transcription
1 A Multi-threading in Prolog to Implement K-means Clustering SURASITH TAOKOK, PRACH PONGPANICH, NITTAYA KERDPRASOP, KITTISAK KERDPRASOP Data Engineering Research Unit, School of Computer Engineering Suranaree University of Technology 111 University Avenue, Muang District, Nakhon Ratchasima 30000, THALAND Abstract: - Prolog presented in this paper is a logic programming language with multi-threading support. Several programmers use multi-threading to improve execution time. This is due to the fact that the multithreads can split tasks to work in concurrency. In this paper, we propose an algorithm and its implementation of k-means clustering with multi-threading using Prolog. The main objective is to speedup execution time. The experimentation compares k-means and multi-thread k-means, as well as percentage of speedup time execution. The result has proved our claim. Key-Words: - Data Mining, Clustering, Modified k-means, Multi-thread k-means, Logic programming, Prolog 1 Introduction Prolog is a general purpose logic programming language. Many Prolog compilers support for multi-threaded such as SWI-Prolog, SICStus Prolog, CIAO Prolog and Qu-Prolog. In this paper use SWI-Prolog [2] to implement k-means clustering algorithm. SWI-Prolog is an open source and multi-threading support available for Linux, Windows and Macintosh platform. We can profit multi-thread Prolog by splitting a large task into subtasks that are speedup on multi-core processors. The k-means clustering algorithm is an unsupervised learning method that separates data points into groups. The time complexity of k- means depends on the number of data points and the number of clusters and the number of iterations. Computational complexity of k-means is O(nkt), when n is number of data points, k is number of clusters and t is number of iteration until the centroids are stable. However when data has big size the time complexity are raising and the traditional k-means algorithm does not efficiency. We introduce the propose multi-thread in Prolog [1] [4] apply to k-means [6] algorithm called multi-thread k-mean algorithm (MTK). The MTK is not parallel k-means [3] [5] [7] algorithm; it has new re-designed at sub process original k- means algorithm to support multi-thread. That process is the calculating new centroids step. We create threads to distribute tasks so calculating all new centroids concurrency. The organization of the rest of this paper is as follows. Discussion of related work in developing a multi-thread, k-means and parallel k-means is presented in Section 2. Our proposed algorithms, a multi-thread k-means, are explained in Section 3. The implementation (a complete source code is available in the appendix) and experimental results are demonstrated in Section 4. The conclusion as well as future research direction appears as a last section. 2 Related Works The k-means algorithm [6] was presented by J.B. MacQueen in 1976 and then it has applied to many several applications. Then the technology of multicore processors has been created and applied to [4] and support the parallel k-means algorithm [5]. J. Wielemaker [1] presents the multi-thread in SWI-Prolog, their works show speedup when running multiple thread on multi-core processor. Manasi N. Joshi [5] presents the parallel k- means algorithm with messsage passing interface (MPI) on distributed mermory and multiprocessors system (Sun workstations). Their method can take advantage from multiprocessor environment. B. Hohlt [8] introduces a parallel k-means algorithm implemented with C++ and pthread. N. Kerdprasop and K. Kerdprasop [3] propose the parallel k-means implemented with Erlang. Their experimental results show the speedup when clustering on large dataset ISBN:
2 Hence in this paper we proposed the MTK algorithm and implement it as Logic programming as Prolog multi-thread methodology. 3 Proposed Algorithm K-means algorithm [6] start with random the initialization k centroids. Then assign data point to the nearest cluster and re-compute the new centroids of each k clusters. If the new centroids are not stables its will be iterate assign data point to nearest cluster and re-compute new centroids again until the new centroids are stabled. The k- means algorithm is shown in Algorithm 1. Algorithm 1 K-means (KM) Input: number of clustering and a set of data points Output: K-centroids and members of each cluster Steps 1. Select initial centroid C=<C 1,C 2,,C K > 2. Repeat 2.1 Assign each data point to its nearest cluster center 2.2 Re-compute the cluster centers using the current cluster memberships 3. Until there is no further change in the assignment of the data points to new cluster centers The k-means algorithm has some gaps to distribute the sequence works to do its together. We found that, when the k-means re-compute new centroids after assign all data point to each cluster, they can compute the new centroids are concurrency. We proposed the multi-threading method adapt to k-means algorithm. The pseudo code of modify k-means algorithm (Multi-thread k-means) apply to support the concurrency is shown in Algorithm 2. Algorithm 2 Multi-thread k-means (MTK) Input: number of clustering and a set of data points Output: K-centroids and members of each cluster Steps 1. Select initial centroid C=<C 1,C 2,,C K > 2. Assign each data point to nearest cluster center 3. Create threads process T=<T 1,T 2,,T K > for centroid C=<C 1,C 2, C K > 4. For each Thread (T i=1 to T K ) 4.1 Re-compute cluster centers <T i :cal(c i )> 4.2 Return a new centroid C i to set C 5. Check stable of centroids 5.1 if C!= C' then set C = C' go to step if C == C' then stop and return C and cluster members The MKT algorithm was added the multithread process at the re-compute process. The recompute process is the main process master and responsible for create threads, sending a set of data point each cluster to thread, and recalculating the new centroids. The re-compute process repeat as long as the old and new centroids do not converge and multi-threading process just invoke every time when re-compute process started. The re-compute process and multi-threading process can be graphically shown in Fig.1 Fig.1 A diagram illustrating the communication between master process and threads 4 Experimental and Results We implement the proposed algorithms with Prolog language (SWI-Prolog standard). The implementation of KM and MTK algorithms as a Prolog program is given in appendix. A screenshot of running the program (SWI- Prolog Multi-threaded, 32 bits, Version ) is in Fig.2. To running the program we use the command. cluster(k). The argument K is the number of clusters and before running the program the data file points.pl must exist in working directory. And data format with following predicate and data point list in 3 dimensional. Or item([[p1],[p2],,[pk]]). item([ [-4,8,-7],[-9,0,-5],[8,4,4], [9,5,6],[-4,-5,-7],[-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). ISBN:
3 A predicate item([[p1],[p2],,[pk]]) is a set of data points for clustering with the KM and MTK implementations program. when the numbers of cluster are increasing. The different running time is shows in Fig.3 Fig.3 Running time comparisons of KM versus MTK with 10,000 data points Fig.2 A screenshot to illustrate running the MTK program with generate 2 centroids The screenshot in Fig.2 show a command to running MTK algorithm program for 2 clusters classification. Then show the running time usage after finished clustering. 4.1 Performance of Multi-thread k-means We evaluate performances of the proposed KM and MTK algorithms on synthetic three dimensional dataset. The computational speed of k-means as compared to multi-thread k-means is given in Table 1. Experimental are performed on Laptop computer with the processor intel(r) Core(TM) i GHz, 4Gb of memory, and Windows 7 32-bit operating system. The numbers of synthetic data points are 10,000 points. Table 1 Execution time of KM versus MTK with 10,000 data points (The number of clusters is equivalent the number of threads) Number of Cluster Times (Sec) KM MTK Speedup (%) The results experimental observable running time of KM and MTK by used the 10,000 data points the percentage of running time speedup average more than 30% which the different number of clusters test (2 to 10 clusters). Percentage of speedup between KM and MTK is shown in Fig.4 Fig.4 Percentage of running time speedup different number of clusters with 10,000 data points 4.2 Speedup of Multi-thread k-means This section, we test to evaluate percentage of running time speedup. We prepare series of data set include 500, 1,000, 2,000, 3,000, 4,000, 5,000, 8,000, 10,000, and 12,000 points of data and 3 dimensional. The experimental use different number of clusters each dataset, the number of cluster for test are 2, 4, 6, 8, and 10 clusters. The results of experiment different running time are shown in Table 2 and Table 3. And the percentage of running time speedup is shown in Table 4. The results from Table 1 show that the running time of MTK is faster than KM. And also the running time is more different increase speedup ISBN:
4 Table 2 Execution time of KM versus MTK in 2, 4 and 6 clusters Data Running Time (Sec) 2 Clusters 4 Clusters 6 Cluster K TK K TK K TK , , , , , , , , Table 3 Execution time of KM versus MTK in 8 and 10 clusters Running Time (Sec) Data 8 Clusters 10 Clusters K TK K TK , , , , , , , , Table 4 Speedup percentage of different number of clusters and data sizes Data Speedup percentage (%) via Number of Clusters , , , , , , , , Percentage execution time speed up of the experimental is shown in Fig. 5. It can be noticed from experimental results that if the number of data points and clusters are increase the percentage of speedup running time are increase too. Fig.5 Comparison of speedup percentage at different data sizes and number of threads 5 Conclusion Nowadays the processors are mostly multi-core processing. And traditional programming and algorithm are not work efficiency and effective with the hardware. K-means clustering is the most well-known algorithm commonly used for clustering data. The k-means algorithm is simple but it s not more effective if implement with traditional style. In this paper we propose the design and implementation of KM and MTK with logic programming. The MTK algorithm is modified from KM by integrations multi-threading process into the algorithm. The experimental results reveal that the multithreading method considerably speedups the computation time, especially with tested with multi-core processors. Our future work will focus on the real parallel k-means algorithm and applications. 6 Appendix Source codes presented in this section are in SWI- Prolog format. A line preceded with "%" is a comment. We provide two versions of clustering programs: k-means and multi-thread k-means. Each program starts with comments explaining how to run the program. K-means Clustering % files "points.pl" must exist in working directory % example of data file: % item([ [-4,8,-7], [-9,0,-5], [8,4,4], [9,5,6], [-4,-5,-7], % [-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). ISBN:
5 % Then test a program with this command % cluster(2). %% use 2 or more %% -- Reserve memories :-set_prolog_stack(global,limit(3*10**9)), set_prolog_stack(local,limit(4*10**9)). %% -- Main program cluster(k):- ensure_loaded('points.pl'), pc_time(h1-m1-s1), item(item), initial(item,k,mean), writeln(mean), kmean(item,mean), pc_time(h2-m2-s2), TS1 is H1*60*60+M1*60+S1, TS2 is H2*60*60+M2*60+S2, DTS is TS2 - TS1, writeln(time-dts). %% -- Return the execution times %% -- example time-15.9 pc_time(ct):- get_time(t), stamp_date_time(t, date(_, _, _, H, M, S, 0, 'UTC', -), 'UTC'), CT = H-M-S. %% -- Initial Centroid pick from a set of data lis initial(_,0,[]):-!. initial([hitem Titem],K,[Hitem Tmean]):- Nk is K - 1, initial(titem,nk,tmean). %% -- K-means work kmean(item,mean):- calculate_dist(item,mean,caleditem), split_item(mean,caleditem,splititem), calculate_mean(splititem,newmean), writeln(newmean), ( Mean = NewMean -> true,!; kmean(item,newmean) ),!. %% -- Calculate distance and assign %% -- each point to nearest cluster calculate_dist([],_,[]):-!. calculate_dist([hitem Titem],Mean,[Hitem- SelMean TSelMean]):- calculating(hitem,mean,dist), select_cluster(dist,mean,selmean), calculate_dist(titem,mean,tselmean). %% -- Euclidian distance with 3 Dimensional data calculating(_,[],[]):-!. calculating([hi1,hi2,hi3], [[Hm1,Hm2,Hm3] Tmean], [Dist Tdist]):- Caler is (Hi1-Hm1)^2 + (Hi2-Hm2)^2 + (Hi3-Hm3)^2, sqrt(caler,dist), calculating([hi1,hi2,hi3],tmean,tdist). %% -- Each point choose nearest cluster select_cluster([_],[mean],mean):-!. select_cluster([hd1,hd2 Tdist], [Hm1,Hm2 Tmean], SelMean):- (Hd1 < Hd2 -> select_cluster([hd1 Tdist], [Hm1 Tmean], SelMean) ; select_cluster([hd2 Tdist], [Hm2 Tmean], SelMean) ). %% -- splited data to classify cluster split_item([],_,[]):-!. split_item([hm Mean], CaledItem,[Splited SplitItem]):- spliting(hm,caleditem,splited), split_item(mean,caleditem,splititem). spliting(_,[],[]):-!. spliting(mean,[hitem-selmean Titem],Splited):- spliting(mean,titem,tsplited), ( Mean = SelMean -> Splited = [Hitem TSplited] ; Splited = TSplited ). %% -- Re-compute new Centroid value calculate_mean([],[]):-!. calculate_mean([hs SplitItem],[HR NewMean]):- cal_mean(hs,hr), calculate_mean(splititem,newmean). cal_mean(l,r):- mean_me(0,[0,0,0],l,r). mean_me(n,[sx,sy,sz],[[x,y,z] T],R):- NN is N + 1, NSx is Sx + X, NSy is Sy + Y, NSz is Sz + Z, mean_me(nn,[nsx,nsy,nsz],t,r). mean_me(n,[sx,sy,sz],[],[rsx,rsy,rsz]):- RSx is Sx / N, RSy is Sy / N, RSz is Sz / N. % End of K-means % Multi-thread K-means Clustering % K-means Clustering % % data files "points.pl" must exist in working directory % example of data file: % item([ [-4,8,-7], [-9,0,-5], [8,4,4], [9,5,6], [-4,-5,-7], % [-2,-1,3], [10,11,0],[0,-15,7],[2,-1,3]]). % Then test a program with this command % cluster(2). %% use 2 or more/ is a number of clusters %% Reserve memories :-set_prolog_stack(global,limit(2*10**9)), set_prolog_stack(local,limit(2*10**9)). %% -- Main program cluster(k):- ensure_loaded('points.pl'), pc_time(h1-m1-s1), item(item), initial(item,k,mean), writeln(mean), kmean(item,mean), pc_time(h2-m2-s2), TS1 is H1*60*60+M1*60+S1, TS2 is H2*60*60+M2*60+S2, DTS is TS2 - TS1, writeln(time-dts). ISBN:
6 %% -- Return the execution times %% -- example time-15.9 pc_time(ct):- get_time(t), stamp_date_time(t, date(_, _, _, H, M, S, 0, 'UTC', -), 'UTC'), CT = H-M-S. %% -- Initial Centroid pick from a set of data lis initial(_,0,[]):-!. initial([hitem Titem],K,[Hitem Tmean]):- Nk is K - 1, initial(titem,nk,tmean). %% -- Multi-trhead K-means work kmean(item,mean):- calculate_dist(item,mean,caleditem), split_item(mean,caleditem,splititem), calculate_mean(splititem,tl), wait_for_threads(tl,newmean), writeln(newmean), ( intersection(mean,newmean,mean) -> true,! ; kmean(item,newmean) ),!. %% -- Calculate distance and assign each point %% -- to nearest cluster calculate_dist([],_,[]):-!. calculate_dist([hitem Titem],Mean,[Hitem- SelMean TSelMean]):- calculating(hitem,mean,dist), select_cluster(dist,mean,selmean), calculate_dist(titem,mean,tselmean). %% -- Euclidian distance with 3 Dimensional data calculating(_,[],[]):-!. calculating([hi1,hi2,hi3], [[Hm1,Hm2,Hm3] Tmean], [Dist Tdist]):- Caler is (Hi1-Hm1)^2 + (Hi2-Hm2)^2 + (Hi3-Hm3)^2, sqrt(caler,dist), calculating([hi1,hi2,hi3],tmean,tdist). %% -- Each point choose nearest cluster select_cluster([_],[mean],mean):-!. select_cluster([hd1,hd2 Tdist], [Hm1,Hm2 Tmean], SelMean):- (Hd1 < Hd2 -> select_cluster([hd1 Tdist], [Hm1 Tmean], SelMean) ; select_cluster([hd2 Tdist], [Hm2 Tmean], SelMean) ). %% -- splited data to classify cluster split_item([],_,[]):-!. split_item([hm Mean], CaledItem,[Splited SplitItem]):- spliting(hm,caleditem,splited), split_item(mean,caleditem,splititem). spliting(_,[],[]):-!. spliting(mean,[hitem-selmean Titem],Splited):- spliting(mean,titem,tsplited), ( Mean = SelMean -> Splited = [Hitem TSplited] ; Splited = TSplited ). %% -- Re-compute new Centroid value %% -- In this section create on thread %% -- per one cluster re-computer new centroid calculate_mean([],[]):-!. calculate_mean([hs SplitItem],[T0 TL1]):- calculate_mean(splititem,tl1), thread_create(cal_mean(hs), T0, []). cal_mean(l):- mean_me(0,[0,0,0],l,r), assert(mean(r)). mean_me(n,[sx,sy,sz],[[x,y,z] T],R):- NN is N + 1, NSx is Sx + X, NSy is Sy + Y, NSz is Sz + Z, mean_me(nn,[nsx,nsy,nsz],t,r). mean_me(n,[sx,sy,sz],[],[rsx,rsy,rsz]):- RSx is Sx / N, RSy is Sy / N, RSz is Sz / N. %% -- Wait for all thread completed work. wait_for_threads([],[]):-!. wait_for_threads([t TL],NewMean) :- ( thread_join(t, true) -> mean(nm), retract(mean(nm)), wait_for_threads(tl,tm), NewMean = [NM TM] ; wait_for_threads([t TL],NewMean) ). % End of Multi-thread K-means % References: [1] J. Wielemaker, Native Preemptive Threads in SWI-Prolog, ICLP. Volume 2916 of Lecture Notes in Computer Science., Springer (2003), pp [2] J. Wielemaker, T. Schrijvers, M. Triska, T. Lager, SWI-Prolog, Accepted for publication in TPLP, [3] N. Kerdprasop and K. Kerdprasop, A Lightweight Method to Parallel K-means Clustering, International Journal of Mathematics and Computers in Simulation, Issue 4, Volume 4, 2010, pp [4] M. Carro and M. Hermenegildo, Concurrency in Prolog Using Threads and a Shared Database, International Conference on Logic Programming, [5] M. Joshi, Parallel K-means Algorithm on Distributed Memory Multiprocessors, Technical Report, University of Minnesota, 2003, pp [6] J. MacQueen, Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967, pp [7] A. Prasad, Parallelization of K-means Clustering Algorithm, Project Report, University of Colorado, 2007, pp [8] B. Hohlt, Pthread Parallel K-means, CS267 Applications of Parallel Computing, UC Berkeley, 2001 ISBN:
Parallelization of K-Means Clustering on Multi-Core Processors
Parallelization of K-Means Clustering on Multi-Core Processors Kittisak Kerdprasop and Nittaya Kerdprasop Data Engineering and Knowledge Discovery (DEKD) Research Unit School of Computer Engineering, Suranaree
More informationParallel Implementation of K-Means on Multi-Core Processors
Parallel Implementation of K-Means on Multi-Core Processors Fahim Ahmed M. Faculty of Science, Suez University, Suez, Egypt, ahmmedfahim@yahoo.com Abstract Nowadays, all most personal computers have multi-core
More informationParallel Customer Clustering: A Computational Performance Study
Review of Integrative Business and Economics Research, Vol. 6, NRRU special issue 39 Parallel Customer Clustering: A Computational Performance Study Nittaya Kerdprasop* Naritsara Tharaputh Supagrid Tangsermsit
More informationFeature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process
Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree
More informationAnalysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm
Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm Rashmi C a ahigh-performance Computing Project, Department of Studies in Computer Science, University of Mysore,
More informationPthread Parallel K-means
Pthread Parallel K-means Barbara Hohlt CS267 Applications of Parallel Computing UC Berkeley December 14, 2001 1 Introduction K-means is a popular non-hierarchical method for clustering large datasets.
More informationThe Clustering Validity with Silhouette and Sum of Squared Errors
Proceedings of the 3rd International Conference on Industrial Application Engineering 2015 The Clustering Validity with Silhouette and Sum of Squared Errors Tippaya Thinsungnoen a*, Nuntawut Kaoungku b,
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationSupport Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data
Support Vector Machine with Restarting Genetic Algorithm for Classifying Imbalanced Data Keerachart Suksut, Kittisak Kerdprasop, and Nittaya Kerdprasop Abstract Algorithms for data classification are normally
More informationParallel K-means Clustering. Ajay Padoor Chandramohan Fall 2012 CSE 633
Parallel K-means Clustering Ajay Padoor Chandramohan Fall 2012 CSE 633 Outline Problem description Implementation MPI Implementation OpenMP Test Results Conclusions Future work Problem Description Clustering
More informationClustering Algorithm with Asynchronous Programming
American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-6, Issue-8, pp-286-294 www.ajer.org Research Paper Clustering Algorithm with Asynchronous Programming Ohidujjaman
More informationData Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process
Vol.133 (Information Technology and Computer Science 2016), pp.79-84 http://dx.doi.org/10.14257/astl.2016. Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationConcurrent Programming Constructs and First-Class Logic Engines
Concurrent Programming Constructs and First-Class Logic Engines Paul Tarau University of North Texas tarau@cs.unt.edu Multi-threading has been adopted in today s Prolog implementations as it became widely
More informationI. INTRODUCTION II. RELATED WORK.
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A New Hybridized K-Means Clustering Based Outlier Detection Technique
More informationEffective Learning and Classification using Random Forest Algorithm CHAPTER 6
CHAPTER 6 Parallel Algorithm for Random Forest Classifier Random Forest classification algorithm can be easily parallelized due to its inherent parallel nature. Being an ensemble, the parallel implementation
More informationK+ Means : An Enhancement Over K-Means Clustering Algorithm
K+ Means : An Enhancement Over K-Means Clustering Algorithm Srikanta Kolay SMS India Pvt. Ltd., RDB Boulevard 5th Floor, Unit-D, Plot No.-K1, Block-EP&GP, Sector-V, Salt Lake, Kolkata-700091, India Email:
More informationSaudi Journal of Engineering and Technology. DOI: /sjeat ISSN (Print)
DOI:10.21276/sjeat.2016.1.4.6 Saudi Journal of Engineering and Technology Scholars Middle East Publishers Dubai, United Arab Emirates Website: http://scholarsmepub.com/ ISSN 2415-6272 (Print) ISSN 2415-6264
More informationIntroduction of Clustering by using K-means Methodology
ISSN: 78-08 Vol. Issue 0, December- 0 Introduction of ing by using K-means Methodology Niraj N Kasliwal, Prof Shrikant Lade, Prof Dr. S. S. Prabhune M-Tech, IT HOD,IT HOD,IT RKDF RKDF SSGMCE Bhopal,(India)
More informationVolume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationPredictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor*
Predictive Thread-to-Core Assignment on a Heterogeneous Multi-core Processor* Tyler Viswanath Krishnamurthy, and Hridesh Laboratory for Software Design Department of Computer Science Iowa State University
More informationIntroduction to Parallel Programming
Introduction to Parallel Programming January 14, 2015 www.cac.cornell.edu What is Parallel Programming? Theoretically a very simple concept Use more than one processor to complete a task Operationally
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationParallel Clustering of Gene Expression Dataset in Multicore Environment
Parallel Clustering of Gene Expression Dataset in Multicore Environment Pranoti Kamble, Prof. Rakhi Wajgi Dept. of Computer Technology, Yeshwantrao Chavan College of Engineering, Nagpur, India Professor,
More informationParallel K-Means Clustering with Triangle Inequality
Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA
More informationEnhancing the Efficiency of Radix Sort by Using Clustering Mechanism
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationA Review of K-mean Algorithm
A Review of K-mean Algorithm Jyoti Yadav #1, Monika Sharma *2 1 PG Student, CSE Department, M.D.U Rohtak, Haryana, India 2 Assistant Professor, IT Department, M.D.U Rohtak, Haryana, India Abstract Cluster
More informationTest Report. May Executive Summary. Product Evaluation: Diskeeper Professional Edition vs. Built-in Defragmenter of Windows Vista
Test Report May 2009 Sponsored by: Diskeeper Corporation Executive Summary Product Evaluation: Diskeeper Professional Edition vs. Built-in Defragmenter of Windows Vista Inside Test Environment Test Methodology
More informationIndex Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface
A Comparative Study of Classification Methods in Data Mining using RapidMiner Studio Vishnu Kumar Goyal Dept. of Computer Engineering Govt. R.C. Khaitan Polytechnic College, Jaipur, India vishnugoyal_jaipur@yahoo.co.in
More informationThe Fuzzy Search for Association Rules with Interestingness Measure
The Fuzzy Search for Association Rules with Interestingness Measure Phaichayon Kongchai, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract Association rule are important to retailers as a source of
More informationParallelization of Graph Isomorphism using OpenMP
Parallelization of Graph Isomorphism using OpenMP Vijaya Balpande Research Scholar GHRCE, Nagpur Priyadarshini J L College of Engineering, Nagpur ABSTRACT Advancement in computer architecture leads to
More informationUNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania
UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing
More informationMaster-Worker pattern
COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Spring 2017 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities:
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationData Mining Algorithms In R/Clustering/K-Means
1 / 7 Data Mining Algorithms In R/Clustering/K-Means Contents 1 Introduction 2 Technique to be discussed 2.1 Algorithm 2.2 Implementation 2.3 View 2.4 Case Study 2.4.1 Scenario 2.4.2 Input data 2.4.3 Execution
More informationMathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul
Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we
More informationIndexing in Search Engines based on Pipelining Architecture using Single Link HAC
Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationCluster-Based Sequence Analysis of Complex Manufacturing Process
Cluster-Based Sequence Analysis of Complex Manufacturing Process Kittisak Kerdprasop and Nittaya Kerdprasop, Member, IAENG Abstract Wafer fabrication in the semiconductor industry is probably one of the
More informationK-Means Clustering Using Localized Histogram Analysis
K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many
More informationFast Efficient Clustering Algorithm for Balanced Data
Vol. 5, No. 6, 214 Fast Efficient Clustering Algorithm for Balanced Data Adel A. Sewisy Faculty of Computer and Information, Assiut University M. H. Marghny Faculty of Computer and Information, Assiut
More informationThe Application of K-medoids and PAM to the Clustering of Rules
The Application of K-medoids and PAM to the Clustering of Rules A. P. Reynolds, G. Richards, and V. J. Rayward-Smith School of Computing Sciences, University of East Anglia, Norwich Abstract. Earlier research
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More informationImprovements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao Fan1, Yuexin Wu2,b, Ao Xiao1
3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015) Improvements and Implementation of Hierarchical Clustering based on Hadoop Jun Zhang1, a, Chunxiao
More informationCOMPARATIVE ANALYSIS OF PARALLEL K MEANS AND PARALLEL FUZZY C MEANS CLUSTER ALGORITHMS
COMPARATIVE ANALYSIS OF PARALLEL K MEANS AND PARALLEL FUZZY C MEANS CLUSTER ALGORITHMS 1 Juby Mathew, 2 Dr. R Vijayakumar Abstract: In this paper, we give a short review of recent developments in clustering.
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationINITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH
Volume 118 No. 18 2018, 1565-1570 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu INITIALIZING CENTROIDS FOR K-MEANS ALGORITHM AN ALTERNATIVE APPROACH
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationHigh Performance Multithreaded Model for Stream Cipher
228 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 High Performance Multithreaded Model for Stream Cipher Khaled M. Suwais and Azman Bin Samsudin School of
More informationAccelerating K-Means Clustering with Parallel Implementations and GPU computing
Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam
More informationAN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
AN IMPROVED K-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION WILLIAM ROBSON SCHWARTZ University of Maryland, Department of Computer Science College Park, MD, USA, 20742-327, schwartz@cs.umd.edu RICARDO
More informationAutomatic Categorization of Image Regions using Dominant Color based Vector Quantization
Automatic Categorization of Image Regions using Dominant Color based Vector Quantization Md Monirul Islam, Dengsheng Zhang, Guojun Lu Gippsland School of Information Technology, Monash University Churchill
More informationPerformance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads
Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi
More informationParallelization of K-Means Clustering Algorithm for Data Mining
Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationMULTICORE LEARNING ALGORITHM
MULTICORE LEARNING ALGORITHM CHENG-TAO CHU, YI-AN LIN, YUANYUAN YU 1. Summary The focus of our term project is to apply the map-reduce principle to a variety of machine learning algorithms that are computationally
More informationClustering algorithms and autoencoders for anomaly detection
Clustering algorithms and autoencoders for anomaly detection Alessia Saggio Lunch Seminars and Journal Clubs Université catholique de Louvain, Belgium 3rd March 2017 a Outline Introduction Clustering algorithms
More informationAn Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing
An Improved Parallel Scalable K-means++ Massive Data Clustering Algorithm Based on Cloud Computing Shuzhi Nie Abstract Clustering is one of the most effective algorithms in data analysis and management.
More informationClassifying Documents by Distributed P2P Clustering
Classifying Documents by Distributed P2P Clustering Martin Eisenhardt Wolfgang Müller Andreas Henrich Chair of Applied Computer Science I University of Bayreuth, Germany {eisenhardt mueller2 henrich}@uni-bayreuth.de
More informationA Parallel Community Detection Algorithm for Big Social Networks
A Parallel Community Detection Algorithm for Big Social Networks Yathrib AlQahtani College of Computer and Information Sciences King Saud University Collage of Computing and Informatics Saudi Electronic
More informationParallel Computing of Shared Memory Multiprocessors Based on JOMP
International Conference on Mechatronics, Electronic, Industrial and Control Engineering (MEIC 2015) Parallel Computing of Shared Memory Multiprocessors Based on JOMP ZHANG Hong College of Electrical &
More informationIterative random projections for high-dimensional data clustering
Iterative random projections for high-dimensional data clustering Ângelo Cardoso, Andreas Wichert INESC-ID Lisboa and Instituto Superior Técnico, Technical University of Lisbon Av. Prof. Dr. Aníbal Cavaco
More informationParallel K-Means Algorithm for Shared Memory Multiprocessors
Journal of Computer and Communications, 2014, 2, 15-23 Published Online September 2014 in SciRes. http://www.scirp.org/journal/jcc http://dx.doi.org/10.4236/jcc.2014.211002 Parallel K-Means Algorithm for
More informationScalability of Efficient Parallel K-Means
Scalability of Efficient Parallel K-Means David Pettinger and Giuseppe Di Fatta School of Systems Engineering The University of Reading Whiteknights, Reading, Berkshire, RG6 6AY, UK {D.G.Pettinger,G.DiFatta}@reading.ac.uk
More informationNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm Abhishek Patel Department of Information & Technology, Parul Institute of Engineering & Technology, Vadodara, Gujarat, India Purnima Singh Department of
More informationAsynchronous Multi-Task Learning
Asynchronous Multi-Task Learning Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou December 14th, 2016 ICDM 2016 Inci M. Baytas, Ming Yan, Anil K. Jain and Jiayu Zhou 1 Outline 1 Introduction 2 Solving
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationNORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM
NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college
More informationParallel Implementaton of the Weibull
Journal of Environmental Protection and Ecology 15, No 1, 287 292 (2014) Computer applications on environmental information system Parallel Implementaton of the Weibull Distribution Parameters Estimator
More informationOUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS
OUTLIER DETECTION FOR DYNAMIC DATA STREAMS USING WEIGHTED K-MEANS DEEVI RADHA RANI Department of CSE, K L University, Vaddeswaram, Guntur, Andhra Pradesh, India. deevi_radharani@rediffmail.com NAVYA DHULIPALA
More informationComparision between Quad tree based K-Means and EM Algorithm for Fault Prediction
Comparision between Quad tree based K-Means and EM Algorithm for Fault Prediction Swapna M. Patil Dept.Of Computer science and Engineering,Walchand Institute Of Technology,Solapur,413006 R.V.Argiddi Assistant
More informationIncremental K-means Clustering Algorithms: A Review
Incremental K-means Clustering Algorithms: A Review Amit Yadav Department of Computer Science Engineering Prof. Gambhir Singh H.R.Institute of Engineering and Technology, Ghaziabad Abstract: Clustering
More informationEnhanced Bug Detection by Data Mining Techniques
ISSN (e): 2250 3005 Vol, 04 Issue, 7 July 2014 International Journal of Computational Engineering Research (IJCER) Enhanced Bug Detection by Data Mining Techniques Promila Devi 1, Rajiv Ranjan* 2 *1 M.Tech(CSE)
More informationProblem Set 4. Danfei Xu CS 231A March 9th, (Courtesy of last year s slides)
Problem Set 4 Danfei Xu CS 231A March 9th, 2018 (Courtesy of last year s slides) Outline Part 1: Facial Detection via HoG Features + SVM Classifier Part 2: Image Segmentation with K-Means and Meanshift
More informationBRACE: A Paradigm For the Discretization of Continuously Valued Data
Proceedings of the Seventh Florida Artificial Intelligence Research Symposium, pp. 7-2, 994 BRACE: A Paradigm For the Discretization of Continuously Valued Data Dan Ventura Tony R. Martinez Computer Science
More informationI. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS
Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com
More information732A54/TDDE31 Big Data Analytics
732A54/TDDE31 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Peña IDA, Linköping University, Sweden 1/27 Contents MapReduce Framework Machine Learning with MapReduce Neural Networks
More informationPatent Image Retrieval
Patent Image Retrieval Stefanos Vrochidis IRF Symposium 2008 Vienna, November 6, 2008 Aristotle University of Thessaloniki Overview 1. Introduction 2. Related Work in Patent Image Retrieval 3. Patent Image
More informationA New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering
A New Method For Forecasting Enrolments Combining Time-Variant Fuzzy Logical Relationship Groups And K-Means Clustering Nghiem Van Tinh 1, Vu Viet Vu 1, Tran Thi Ngoc Linh 1 1 Thai Nguyen University of
More informationFINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913)
FINAL REPORT: K MEANS CLUSTERING SAPNA GANESH (sg1368) VAIBHAV GANDHI(vrg5913) Overview The partitioning of data points according to certain features of the points into small groups is called clustering.
More informationParallel Stochastic Gradient Descent
University of Montreal August 11th, 2007 CIAR Summer School - Toronto Stochastic Gradient Descent Cost to optimize: E z [C(θ, z)] with θ the parameters and z a training point. Stochastic gradient: θ t+1
More informationAn Initialization Method for the K-means Algorithm using RNN and Coupling Degree
An Initialization for the K-means Algorithm using RNN and Coupling Degree Alaa H. Ahmed Faculty of Engineering Islamic university of Gaza Gaza Strip, Palestine Wesam Ashour Faculty of Engineering Islamic
More informationNearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications
Nearest Clustering Algorithm for Satellite Image Classification in Remote Sensing Applications Anil K Goswami 1, Swati Sharma 2, Praveen Kumar 3 1 DRDO, New Delhi, India 2 PDM College of Engineering for
More informationAwos Kanan B.Sc., Jordan University of Science and Technology, 2003 M.Sc., Jordan University of Science and Technology, 2006
Optimized Hardware Accelerators for Data Mining Applications by Awos Kanan B.Sc., Jordan University of Science and Technology, 2003 M.Sc., Jordan University of Science and Technology, 2006 A Dissertation
More informationAnalysis of Parallelization Techniques and Tools
International Journal of Information and Computation Technology. ISSN 97-2239 Volume 3, Number 5 (213), pp. 71-7 International Research Publications House http://www. irphouse.com /ijict.htm Analysis of
More informationMonika Maharishi Dayanand University Rohtak
Performance enhancement for Text Data Mining using k means clustering based genetic optimization (KMGO) Monika Maharishi Dayanand University Rohtak ABSTRACT For discovering hidden patterns and structures
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationIntroduction to pthreads
CS 220: Introduction to Parallel Computing Introduction to pthreads Lecture 25 Threads In computing, a thread is the smallest schedulable unit of execution Your operating system has a scheduler that decides
More informationCost-sensitive Boosting for Concept Drift
Cost-sensitive Boosting for Concept Drift Ashok Venkatesan, Narayanan C. Krishnan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, School of Computing, Informatics and Decision Systems
More informationTime Series Clustering Ensemble Algorithm Based on Locality Preserving Projection
Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationMine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2
Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2 1 Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam-
More informationComparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other)
(IJRI) International Journal of dvanced Research in rtificial Intelligence, omparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other)
More information