Parallel Approach for Implementing Data Mining Algorithms


 Phyllis Walton
 1 years ago
 Views:
Transcription
1 TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN (COMPUTER SCIENCE and ENGINEERING) By MANISH BHARDWAJ Registration No < > UNDER THE GUIDANCE OF DR. D.S.ADANE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, NAGPUR, MAHARASHTRA Year 2016
2 MANISH BHARDWAJ Doctorate Research Proposal RCOEM, Nagpur About this proposal This doctorate research proposal document describes the working title of the research proposal and general overview of the area. The research plan mentioned in this document may be modified based on the approval on this documented research proposal. Research Proposal, RCOEM Nagpur Confidential Page ii
3 MANISH BHARDWAJ Doctorate Research Proposal RCOEM, Nagpur CONTENTS 1. RESEARCH TITLE ABSTRACT 1 3. LITERATURE SURVEY PROBLEM DEFINITION PROPOSED METHODOLOGY REFERENCES. 9 Research Proposal, RCOEM Nagpur Confidential Page iii
4 1. Research Proposal Title Parallel approach for Implementing Data Mining Algorithms. 2. Abstract Parallel data mining approach concerns, parallel algorithms, techniques and tools for extraction of useful, implicit and novel pattern from datasets using high performance architecture. The huge data that is generated by online transaction, by social networking sites and government organization working in the area of space and bioinformatics fields create new problems for data mining and knowledge discovery methods. Due to large size most of the currently available data mining algorithms are not useful to many problems. Data mining algorithms not giving better result when the size of datasets becomes very large. The time required to execute the algorithm is also high for large datasets. By help of parallel technique the problem of mining is done in more efficient manner, its help to perform the task by taking the advantages of available high performance architecture. By the parallel approach like Data partition, task partition, divideandconquer, single dimension reduction, scalable thread scheduling and local sort help to implement the data mining algorithm which performance is high and time requirement is low as compare to simple implementation. Graphics processing unit with CUDA enable model allow to doing the task in parallel by help of thread block which are running in parallel. OpenMP API with fork join model with multiple constructs and Directives helping the parallel approach implementation with multiple core support. 3. Literature Review and related work 3.1 Research Issues and Challenges Some important research issues and a set of open problems for designing and implementing the largescale data mining algorithms High Dimensionality Available methods are able to handle hundreds of attributes. New parallel algorithms are needed that are able to handle more number of attributes. Research Proposal, RCOEM Nagpur Confidential Page 1
5 3.1.2 Large Size Data warehouse continue to increase in size. Available techniques are able to handle data in the gigabyte range, but are not yet better suitable for terabytesized data Data Type More data mining research has focused on structured data, due to its simplicity. But support for other data types are also required. Examples include semistructured, unstructured, spatial, temporal and multimedia databases Dynamic Load Balancing For homogeneous environment static partitioning are used. Dynamic load balancing is also crucial to handle a heterogeneous environment Multitable Mining Applying mining over multiple tables or over distributed databases contain with different database schemas is very difficult with available mining methods. Better methods are required to handle the multi table mining problem [1]. 3.2 Scaling up Methods for Data Mining Scaling up is only the way to handle the large datasets. By parallel approach like one dimension reduction, scalable thread scheduling and local sorting for implementing data mining algorithm which able to handle the large data sets Modifying Algorithm Modifying algorithms mainly having the aim to making algorithm faster. For this purpose different optimizing search techniques are used. It also reduce the complexity and showing the optimize representation or try to find approximate solution instead of accurate solution Model restriction and reducing the search space Restricting the model space has an immediate advantage in that the search space is also reduced. Furthermore, simple solutions are usually faster to obtain and evaluate and, in many cases, are competitive with more complex solutions. The major problem is when the intrinsic complexity of the problem Research Proposal, RCOEM Nagpur Confidential Page 2
6 cannot be met by a simple solution. Examples of this strategy are many, including linear models, perceptrons, and decision stumps Using powerful search heuristics Using a more efficient search heuristic avoids artificially constraining the possible models and tries to make the search process faster. The method consists of three steps: first, it must derive an upper bound on the relative loss between using a subset of the available data and the whole dataset in each step of the learning algorithm. Then, it must derive an upper bound of the time complexity of the learning algorithm as a function of the number of samples used in each step. Finally, it must minimize the time bound, via the number of samples used in each step, subject to the target limits on the loss of performance of using a subset of the dataset Change the way to deal Problem It consisting on modifying the way to solve problem, is based on general principal of divideand Conquer. The idea is to perform some kind of data partitioning or problem decomposition. 3.3 Parallelization Parallelization is help in the sense that the most costly parts are performed concurrently, with parallelization there is possibility of addressing the scaling up of the mining methods without either simplifying the algorithm or the task. 3.4 Graphics Processing Unit with Compute Unified device Architecture Graphics processing units (GPUs) has enabled inexpensive high performance computing for general purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. GPUs have evolved into a highly parallel, multithreaded, manycore processor with tremendous computational horsepower and very high memory bandwidth. NVIDIA s GPU with the CUDA programming model provides an adequate API for nongraphics applications. Research Proposal, RCOEM Nagpur Confidential Page 3
7 Fig. 3.1 A set of SIMD stream multiprocessors with memory hierarchy CUDA Programming Model In software level CUDA is the collection of threads block which are running in parallel. The unit of work is assign to the GPU is called a kernel. CUDA program is running in a threadparallel way. Computation is organized as a grid of thread blocks which consists of a set of threads as shown in below figure. At Instruction level, 32 consecutive threads in a thread block make up of a minimum unit of execution, which is called a thread warp. Each stream multiprocessor executes one or more thread block concurrently [2]. Research Proposal, RCOEM Nagpur Confidential Page 4
8 Fig.3.2 Serial execution on the host and parallel execution on the device Parallelization techniques on CUDA enabled platform Three schemes for data mining parallelization on CUDA based platform are as follow: Scalable threads scheduling scheme for irregular pattern A task is assigned to the CPU or the GPU or the number of thread blocks is usually determined by the size of the problem before the GPU kernel starts. However, the size of a problems as irregular pattern problem. CUDA computing is not suitable for this problem. Solution: Scalable threads scheduling, Upper bound of number of threads/threads blocks and allocate the GPU resources are calculated first and if some threads block are ideal let the corresponding threads blocks quit immediately Parallel distributed top k scheme Top k problem is to select the k minimum or maximum elements from a data collection. Insertion sort is has been proved to be efficient when k is small but CUDA based insertion sort is not efficient. Solution: To reduce the computation and tackle the weakness of the CUDAbased insertion sort by using local sorts rather than a global sort Parallel high dimension reduction scheme Text mining may consist of hundreds of attributes, exceeding the size of the shared memory allocated to each thread block on the GPU. In such case, the record has to be broken into multiple subrecords to fit in the shared Research Proposal, RCOEM Nagpur Confidential Page 5
9 memory, but breaking down in too many subrecords is not the solution because the cost for manipulating the records and temporal results will high. Solution: By observing that different attributes in a record are independent, if each thread block only takes care of one distinct attribute of all the records. Rather than perform reduction on the high dimensional data, perform one dimensional reduction on each attribute. 3.5 CUDA based implementations of data mining algorithms CUApriori In CUDA based Apriori especially Candidate generation and Support counting take most of the computation of Apriori Candidate generation Candidate generation procedure joins two frequent (k1) itemsets and prunes the unpromising kcandidates. Since the task of joining two itemsets is independent between different threads, it is suitable for parallelization, here scalable threads scheduling scheme for irregular pattern is used Support counting Support counting procedure records the number of occurrence of a candidate itemset by scanning the transaction database. Since the counting for each candidate is independent with others, it is suitable for parallelization. Transactions are loaded into the shared memory and shared by all the threads within a threads block [3] CUKNN CUDA based K Nearest Neighbour classifier, Distance calculation and Selection of k nearest neighbours done most of computation Distance calculation It can be fully parallelized since pairwise distance calculation is independent. This property makes KNN perfectly suitable for a GPU parallel implementation. The goal of this is to maximize the concurrency of the distance calculation invoked by different threads and minimize the global memory access Selection of k nearest neighbours The selection of k nearest neighbours of a query object is essentially to find the k shortest distances, which is a typical topk problem. So, its implementation is done by distributed topk scheme. Research Proposal, RCOEM Nagpur Confidential Page 6
10 3.5.3 CUKmeans In CUDA based Kmeans especially Cluster label update, Centroid update, Centroid movement detection, take most of computation of Kmeans Cluster label update All thread performs the distance calculation of an object to all the centroids, and selects the nearest centroid. Each object is assigned to the cluster whose centroid is closest to it. Attribute partitions of objects are loaded into the shared memory, so the bandwidth between the global memory and the shared memory is utilized efficiently Centroid update Each new centroid is calculated by averaging the attribute values of all the records belonging to the common cluster. Parallel high dimension reduction scheme is used to do the this task Centroid movement detection If the new centroids move away from the centroids in the last iteration. Firstly we required to calculate the square of the difference between every attribute of the new and old centroids, called centroid difference matrix. Secondly perform the parallel high dimension reduction scheme on the centroid difference matrix. Thirdly, since the attributes of the record is small this record is transferred to the main memory, and summed up to get global_squared_error. The cost of data transfer between the main and global memory is negligible [7] FPGrowth Although the FPGrowth associationrule mining algorithm is more efficient than the Apriori algorithm, it has two disadvantages. The first is that the FPtree can become too large to be created in memory; the second is serial processing approach used. A distributed application data framework parallel approach of FPGrowth not required generating overall FPtree. Overall FPtree may be too large to create in shared memory. Algorithm uses parallel processing approach in all important steps. Which improve the processing capability and efficiency of associationrule mining algorithm [4] Parallel Bees Swarm Optimization Association mining problem with huge datasets solved by using and applying the bees behaviour. It take the advantage of GPU architecture and deal with large datasets to solve real time problem. Master and slave paradigm is used with this method. The master is executing on CPU and the slave is offloaded to the GPU. First, The master initializes randomly the solution reference. After that, it determines regions of the whole bees by generating the Research Proposal, RCOEM Nagpur Confidential Page 7
11 neighbours of each bee. Single solution is evaluated on GPU in parallel. After, the master receives back the fitness of all rules; each bee calculates sequentially the best rule and puts it in the table dance. The best rule of the dance table becomes the solution reference for the next iteration [5] Accelerating Parallel Frequent Itemset Mining on Graphic Processors with Sorting It constructing the Transaction Identifier table and performing the sorting for all frequent itemsets this is helping to reduce the candidate itemsets by using GPU architecture. GPU thread block were allocated after sorting the itemsets in descending order. Therefore time required to check and support counting take less time [6] Parallel Highly Informative KItemSet PHIKS, a highly scalable, parallel miki mining algorithm. PHIKS able to handle the mining process of huge databases (terabytes of datasets in size). MIKI, the problem of maximally informative kitemsets (miki for short) discovery in massive data sets, where in formativeness is expressed is expressed by means of joint entropy and k is the size of the itemset. Miki mining is a key problem in data analytics with high potential impact on various tasks such as unsupervised learning, supervised learning, or information retrieval, to cite a few. A typical application is the discovery of discriminative sets of features, based on joint entropy [9]. 4. Problem Definition Large data generated by online transaction, social networking sites and government organization of space and bioinformatics, available data mining algorithms are not performing well with this datasets. Other problem is about performance; some of algorithms are able to solve the mining problem facing problem of search space which prevent efficient execution and generated solution are not satisfactory level. Research Proposal, RCOEM Nagpur Confidential Page 8
12 5. Proposed Methodology To deal with the very large datasets, the only way to deal with this problem by apply the Parallel approach for Scaling up the data mining algorithm and that can be done by modifying the algorithm, by data partitioning, by problem decomposition and parallelization. For parallelization Graphics processing units enabling in expensive high performance computing power with this Compute unified device architecture programming model provide the programmers adequate c language like API to better exploits the parallel power of GPU.GPU has evolved into a highly parallel,multithreaded,many core processor so work is distributed among different thread block and threads are performing operation in thread parallel fashion. Other approach is based on OpenMP, it is shared memory API work in fork and join model. It having large set of constructs and directives which allow to do the work in parallel, that way task utilize the computing power of multiple core and parallel approach are apply for scaling up data mining algorithms. 6. References 1. M. J. Zaki, LargeScale parallel Data Mining, LNAI 1759, pp. 123,Springer N. GarciaPedrajas, A. de HeroGarcia, Scaling up data mining algorithms: review and taxonomy, springerverlag L. Jian, C. Wang, Y. Liu,Y. Shi, Parallel data mining techniques on Graphics processing Unit with Compute Unified device Architecture (CUDA),pp Springer science + Business Media, LLC Zhi gang Wang, Chishe Wang,A Parallel AssociationRule Mining Algorithm,pp springerverlag Berling Heidelberg Y. Tan, Parallel Bees Swarm Optimization for Association rules mining using GPU Architecture,pp ,Springer International Publishing Switzerland H.Hsu, Accelerating parallel Frequent Itemset Mining on Graphics processors with Sorting,pp IFIP H. Decker, Parallel and Distributed Mining of Probalilistic Frequent Itemsets Using Multiple GPUs, SpringerVerlag Berlin Heidelberg S. Tsutsui and P.Collet, Data Mining Using parallel Multiobjective Evolutionary Algorithms on Graphics Processing Units, SpringerVerlag Berlin Saber Salah, A high scalable parallel algorithm for maximally informative k itemset mining,springer Verlag London Research Proposal, RCOEM Nagpur Confidential Page 9
Accelerating KMeans Clustering with Parallel Implementations and GPU computing
Accelerating KMeans Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware Email: jdimarco@udel.edu, taufer@udel.edu
More informationTHE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS
Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT
More informationOptimization solutions for the segmented sum algorithmic function
Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics RomanianAmerican University 1B, Expozitiei Blvd., district 1, code
More informationInfrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset
Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development eissn: 2278067X, pissn: 2278800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.7882 Comparison of FP tree and Apriori Algorithm Prashasti
More information2. Department of Electronic Engineering and Computer Science, Case Western Reserve University
Chapter MINING HIGHDIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the
More informationAnalyzing Outlier Detection Techniques with Hybrid Method
Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,
More informationEfficient Lists Intersection by CPU GPU Cooperative Computing
Efficient Lists Intersection by CPU GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu NankaiBaidu Joint Lab, Nankai University Outline Introduction Cooperative
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationEfficient Algorithm for Frequent Itemset Generation in Big Data
Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru
More informationParallelism in Knowledge Discovery Techniques
Parallelism in Knowledge Discovery Techniques Domenico Talia DEIS, Università della Calabria, Via P. Bucci, 41c 87036 Rende, Italy talia@deis.unical.it Abstract. Knowledge discovery in databases or data
More informationData Mining Part 3. Associations Rules
Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets
More informationA Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 09734406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract  The primary goal of the web site is to provide the
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline KNearest Neighbour method Classification (Supervised learning) Basic NN (1NN)
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationEnhancing Kmeans Clustering Algorithm with Improved Initial Center
Enhancing Kmeans Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationPerformance Based Study of Association Rule Algorithms On Voter DB
Performance Based Study of Association Rule Algorithms On Voter DB K.Padmavathi 1, R.Aruna Kirithika 2 1 Department of BCA, St.Joseph s College, Thiruvalluvar University, Cuddalore, Tamil Nadu, India,
More informationMachine Learning : Clustering, SelfOrganizing Maps
Machine Learning Clustering, SelfOrganizing Maps 12/12/2013 Machine Learning : Clustering, SelfOrganizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The
More informationEfficiency of kmeans and KMedoids Algorithms for Clustering Arbitrary Data Points
Efficiency of kmeans and KMedoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai600106,
More informationPattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42
Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 20160114 Roman Kern (KTI, TU Graz) Pattern Mining 20160114 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FPGrowth
More informationCOMP 465: Data Mining Still More on Clustering
3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following
More informationProfilingBased L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
ProfilingBased L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationA Graph Theoretic Approach to Image Database Retrieval
A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 981952500
More informationFPGrowth algorithm in Data Compression frequent patterns
FPGrowth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com AbstractThe transmission
More informationA TALENTED CPUTOGPU MEMORY MAPPING TECHNIQUE
A TALENTED CPUTOGPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA
More informationIndexing in Search Engines based on Pipelining Architecture using Single Link HAC
Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH58, Meerut, India ABSTRACT Search on the web is a daily
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab M. Duwairi* and Khaleifah Al.jada'** * Department of Computer Information Systems, Jordan University of Science and Technology, Irbid 22110,
More informationCode No: R Set No. 1
Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for online analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationSOM+EOF for Finding Missing Values
SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1 Helsinki University of Technology  CIS P.O. Box 5400, 02015 HUT  Finland 2 Variances and
More informationEnhancing Cluster Quality by Using User Browsing Time
Enhancing Cluster Quality by Using User Browsing Time Rehab Duwairi Dept. of Computer Information Systems Jordan Univ. of Sc. and Technology Irbid, Jordan rehab@just.edu.jo Khaleifah Al.jada' Dept. of
More informationImproved Balanced Parallel FPGrowth with MapReduce Qing YANG 1,a, FeiYang DU 2,b, Xi ZHU 1,c, ChengGong JIANG *
2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 9781605953625
More informationDynamic Clustering of Data with Modified KMeans Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified KMeans Algorithm Ahamed Shafeeq
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FPGrowth Algorithm 47 4 Assignment Bundle: Frequent
More informationData Mining Technology Based on Bayesian Network Structure Applied in Learning
, pp.6771 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai
More informationTutorial on Association Rule Mining
Tutorial on Association Rule Mining Yang Yang yang.yang@itee.uq.edu.au DKE Group, 78625 August 13, 2010 Outline 1 Quick Review 2 Apriori Algorithm 3 FPGrowth Algorithm 4 Mining Flickr and Tag Recommendation
More informationUnsupervised Learning
Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,
More informationIntroducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values
Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine
More informationCHAPTER 4 KMEANS AND UCAM CLUSTERING ALGORITHM
CHAPTER 4 KMEANS AND UCAM CLUSTERING 4.1 Introduction ALGORITHM Clustering has been used in a number of applications such as engineering, biology, medicine and data mining. The most popular clustering
More informationNoval Stream Data Mining Framework under the Background of Big Data
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 5 Special Issue on Application of Advanced Computing and Simulation in Information Systems Sofia 2016 Print ISSN: 13119702;
More informationA Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function
A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function ChenTing Chang, YuSheng Chen, IWei Wu, and JyhJiun Shann Dept. of Computer Science, National Chiao
More informationDense matching GPU implementation
Dense matching GPU implementation Author: Hailong Fu. Supervisor: Prof. Dr.Ing. Norbert Haala, Dipl. Ing. Mathias Rothermel. Universität Stuttgart 1. Introduction Correspondence problem is an important
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationDATA MINING II  1DL460
DATA MINING II  1DL460 Spring 2013 " An second class in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationMining Distributed Frequent Itemset with Hadoop
Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16166, 214 DOI:1.1926/rjaset.8.955 ISSN: 247459; eissn: 247467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationEfficient FM Algorithm for VLSI Circuit Partitioning
Efficient FM Algorithm for VLSI Circuit Partitioning M.RAJESH #1, R.MANIKANDAN #2 #1 School Of Comuting, Sastra University, Thanjavur613401. #2 Senior Assistant Professer, School Of Comuting, Sastra University,
More informationParallelizing Frequent Itemset Mining with FPTrees
Parallelizing Frequent Itemset Mining with FPTrees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis Kmeans Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationSimultaneous Solving of Linear Programming Problems in GPU
Simultaneous Solving of Linear Programming Problems in GPU Amit Gurung* amitgurung@nitm.ac.in Binayak Das* binayak89cse@gmail.com Rajarshi Ray* raj.ray84@gmail.com * National Institute of Technology Meghalaya
More informationLecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics
More informationChapter 7 UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION
UNSUPERVISED LEARNING TECHNIQUES FOR MAMMOGRAM CLASSIFICATION Supervised and unsupervised learning are the two prominent machine learning algorithms used in pattern recognition and classification. In this
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationPractical NearData Processing for InMemory Analytics Frameworks
Practical NearData Processing for InMemory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationParallel Algorithm Design. CS595, Fall 2010
Parallel Algorithm Design CS595, Fall 2010 1 Programming Models The programming model o determines the basic concepts of the parallel implementation and o abstracts from the hardware as well as from the
More information2. Discovery of Association Rules
2. Discovery of Association Rules Part I Motivation: market basket data Basic notions: association rule, frequency and confidence Problem of association rule mining (Sub)problem of frequent set mining
More informationREDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS
BeBeC201408 REDUCING BEAMFORMING CALCULATION TIME WITH GPU ACCELERATED ALGORITHMS Steffen Schmidt GFaI ev Volmerstraße 3, 12489, Berlin, Germany ABSTRACT Beamforming algorithms make high demands on the
More informationDISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY) SEQUENTIAL PATTERN MINING A CONSTRAINT BASED APPROACH
International Journal of Information Technology and Knowledge Management JanuaryJune 2011, Volume 4, No. 1, pp. 2732 DISCOVERING ACTIVE AND PROFITABLE PATTERNS WITH RFM (RECENCY, FREQUENCY AND MONETARY)
More informationThe Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti
Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria
More informationA Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 6, ISSUE 08, AUGUST 2017 ISSN 22778616 A Modified Apriori Algorithm for Fast and Accurate Generation of Frequent Item Sets K.A.Baffour,
More informationAn Efficient Algorithm for Finding the Support Count of Frequent 1Itemsets in Frequent Pattern Mining
An Efficient Algorithm for Finding the Support Count of Frequent 1Itemsets in Frequent Pattern Mining P.Subhashini 1, Dr.G.Gunasekaran 2 Research Scholar, Dept. of Information Technology, St.Peter s University,
More informationACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU
Computer Science 14 (2) 2013 http://dx.doi.org/10.7494/csci.2013.14.2.243 Marcin Pietroń Pawe l Russek Kazimierz Wiatr ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU Abstract This paper presents
More informationSBKMA: Sorting based KMeans Clustering Algorithm using Multi Machine Technique for Big Data
I J C T A, 8(5), 2015, pp. 21052110 International Science Press SBKMA: Sorting based KMeans Clustering Algorithm using Multi Machine Technique for Big Data E. Mahima Jane* and E. George Dharma Prakash
More informationAccelerating MapReduce on a Coupled CPUGPU Architecture
Accelerating MapReduce on a Coupled CPUGPU Architecture Linchuan Chen Xin Huo Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {chenlinc,huox,agrawal}@cse.ohiostate.edu
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A MultiDimensional
More informationTHE STUDY OF WEB MINING  A SURVEY
THE STUDY OF WEB MINING  A SURVEY Ashish Gupta, Anil Khandekar Abstract over the year s web mining is the very fast growing research field. Web mining contains two research areas: Data mining and World
More informationUnsupervised Learning
Unsupervised Learning Chapter 14: The Elements of Statistical Learning Presented for 540 by Len Tanaka Objectives Introduction Techniques: Association Rules Cluster Analysis SelfOrganizing Maps Projective
More informationFundamental Data Mining Algorithms
2018 EE448, Big Data Mining, Lecture 3 Fundamental Data Mining Algorithms Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html REVIEW What is Data
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,
More informationALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS
ALGORITHM FOR MINING TIME VARYING FREQUENT ITEMSETS D.SUJATHA 1, PROF.B.L.DEEKSHATULU 2 1 HOD, Department of IT, Aurora s Technological and Research Institute, Hyderabad 2 Visiting Professor, Department
More informationData Mining Concepts
Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential
More informationSegmentation of Images
Segmentation of Images SEGMENTATION If an image has been preprocessed appropriately to remove noise and artifacts, segmentation is often the key step in interpreting the image. Image segmentation is a
More informationChapter 6. Parallel Processors from Client to Cloud. Copyright 2014 Elsevier Inc. All rights reserved.
Chapter 6 Parallel Processors from Client to Cloud FIGURE 6.1 Hardware/software categorization and examples of application perspective on concurrency versus hardware perspective on parallelism. 2 FIGURE
More informationCOMPARISON OF KMEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS
ABSTRACT International Journal On Engineering Technology and Sciences IJETS COMPARISON OF KMEAN ALGORITHM & APRIORI ALGORITHM AN ANALYSIS Dr.C.Kumar Charliepaul 1 G.Immanual Gnanadurai 2 Principal Assistant
More informationParallel Architecture & Programing Models for Face Recognition
Parallel Architecture & Programing Models for Face Recognition Submitted by Sagar Kukreja Computer Engineering Department Rochester Institute of Technology Agenda Introduction to face recognition Feature
More information1 (eagle_eye) and Naeem Latif
1 CS614 today quiz solved by my campus group these are just for idea if any wrong than we don t responsible for it Question # 1 of 10 ( Start time: 07:08:29 PM ) Total Marks: 1 As opposed to the outcome
More informationFace Detection CUDA Accelerating
Face Detection CUDA Accelerating Jaromír Krpec Department of Computer Science VŠB Technical University Ostrava Ostrava, Czech Republic krpec.jaromir@seznam.cz Martin Němec Department of Computer Science
More informationData Mining for Knowledge Management. Association Rules
1 Data Mining for Knowledge Management Association Rules Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Thanks for slides to: Jiawei Han George Kollios Zhenyu Lu Osmar R. Zaïane Mohammad
More informationAdvanced Eclat Algorithm for Frequent Itemsets Generation
International Journal of Applied Engineering Research ISSN 09734562 Volume 10, Number 9 (2015) pp. 2326323279 Research India Publications http://www.ripublication.com Advanced Eclat Algorithm for Frequent
More informationCUDA. GPU Computing. K. Cooper 1. 1 Department of Mathematics. Washington State University
GPU Computing K. Cooper 1 1 Department of Mathematics Washington State University 2014 Review of Parallel Paradigms MIMD Computing Multiple Instruction Multiple Data Several separate program streams, each
More informationNew Approach of Bellman Ford Algorithm on GPU using Compute Unified Design Architecture (CUDA)
New Approach of Bellman Ford Algorithm on GPU using Compute Unified Design Architecture (CUDA) Pankhari Agarwal Department of Computer Science and Engineering National Institute of Technical Teachers Training
More informationAssociation Rule Discovery
Association Rule Discovery Association Rules describe frequent cooccurences in sets an item set is a subset A of all possible items I Example Problems: Which products are frequently bought together by
More informationFIELA: A Fast Image Encryption with Lorenz Attractor using Hybrid Computing
FIELA: A Fast Image Encryption with Lorenz Attractor using Hybrid Computing P Kranthi Kumar, B V Nagendra Prasad, Gelli MBSS Kumar, V. Chandrasekaran, P.K.Baruah Sri Sathya Sai Institute of Higher Learning,
More informationAssociation Rules. Comp 135 Machine Learning Computer Science Tufts University. Association Rules. Association Rules. Data Model.
Comp 135 Machine Learning Computer Science Tufts University Fall 2017 Roni Khardon Unsupervised learning but complementary to data exploration in clustering. The goal is to find weak implications in the
More informationKapitel 4: Clustering
LudwigMaximiliansUniversität München Institut für Informatik Lehr und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.
More informationBest Combination of Machine Learning Algorithms for Course Recommendation System in Elearning
Best Combination of Machine Learning Algorithms for Course Recommendation System in Elearning Sunita B Aher M.E. (CSE) II Walchand Institute of Technology Solapur University India Lobo L.M.R.J. Associate
More informationIMPROVING APRIORI ALGORITHM USING PAFI AND TDFI
IMPROVING APRIORI ALGORITHM USING PAFI AND TDFI Manali Patekar 1, Chirag Pujari 2, Juee Save 3 1,2,3 Computer Engineering, St. John College of Engineering And Technology, Palghar Mumbai, (India) ABSTRACT
More informationChapter 4: Association analysis:
Chapter 4: Association analysis: 4.1 Introduction: Many business enterprises accumulate large quantities of data from their daytoday operations, huge amounts of customer purchase data are collected daily
More informationSecure Frequent Itemset Hiding Techniques in Data Mining
Secure Frequent Itemset Hiding Techniques in Data Mining Arpit Agrawal 1 Asst. Professor Department of Computer Engineering Institute of Engineering & Technology Devi Ahilya University M.P., India Jitendra
More information