Parallel Approach for Implementing Data Mining Algorithms

Size: px
Start display at page:

Download "Parallel Approach for Implementing Data Mining Algorithms"

Transcription

1 TITLE OF THE THESIS Parallel Approach for Implementing Data Mining Algorithms A RESEARCH PROPOSAL SUBMITTED TO THE SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN (COMPUTER SCIENCE and ENGINEERING) By MANISH BHARDWAJ Registration No < > UNDER THE GUIDANCE OF DR. D.S.ADANE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI RAMDEOBABA COLLEGE OF ENGINEERING AND MANAGEMENT, NAGPUR, MAHARASHTRA Year 2016

2 MANISH BHARDWAJ Doctorate Research Proposal RCOEM, Nagpur About this proposal This doctorate research proposal document describes the working title of the research proposal and general overview of the area. The research plan mentioned in this document may be modified based on the approval on this documented research proposal. Research Proposal, RCOEM Nagpur Confidential Page ii

3 MANISH BHARDWAJ Doctorate Research Proposal RCOEM, Nagpur CONTENTS 1. RESEARCH TITLE ABSTRACT 1 3. LITERATURE SURVEY PROBLEM DEFINITION PROPOSED METHODOLOGY REFERENCES. 9 Research Proposal, RCOEM Nagpur Confidential Page iii

4 1. Research Proposal Title Parallel approach for Implementing Data Mining Algorithms. 2. Abstract Parallel data mining approach concerns, parallel algorithms, techniques and tools for extraction of useful, implicit and novel pattern from datasets using high performance architecture. The huge data that is generated by online transaction, by social networking sites and government organization working in the area of space and bioinformatics fields create new problems for data mining and knowledge discovery methods. Due to large size most of the currently available data mining algorithms are not useful to many problems. Data mining algorithms not giving better result when the size of datasets becomes very large. The time required to execute the algorithm is also high for large datasets. By help of parallel technique the problem of mining is done in more efficient manner, its help to perform the task by taking the advantages of available high performance architecture. By the parallel approach like Data partition, task partition, divide-and-conquer, single dimension reduction, scalable thread scheduling and local sort help to implement the data mining algorithm which performance is high and time requirement is low as compare to simple implementation. Graphics processing unit with CUDA enable model allow to doing the task in parallel by help of thread block which are running in parallel. OpenMP API with fork join model with multiple constructs and Directives helping the parallel approach implementation with multiple core support. 3. Literature Review and related work 3.1 Research Issues and Challenges Some important research issues and a set of open problems for designing and implementing the large-scale data mining algorithms High Dimensionality Available methods are able to handle hundreds of attributes. New parallel algorithms are needed that are able to handle more number of attributes. Research Proposal, RCOEM Nagpur Confidential Page 1

5 3.1.2 Large Size Data warehouse continue to increase in size. Available techniques are able to handle data in the gigabyte range, but are not yet better suitable for terabyte-sized data Data Type More data mining research has focused on structured data, due to its simplicity. But support for other data types are also required. Examples include semi-structured, unstructured, spatial, temporal and multimedia databases Dynamic Load Balancing For homogeneous environment static partitioning are used. Dynamic load balancing is also crucial to handle a heterogeneous environment Multi-table Mining Applying mining over multiple tables or over distributed databases contain with different database schemas is very difficult with available mining methods. Better methods are required to handle the multi table mining problem [1]. 3.2 Scaling up Methods for Data Mining Scaling up is only the way to handle the large datasets. By parallel approach like one dimension reduction, scalable thread scheduling and local sorting for implementing data mining algorithm which able to handle the large data sets Modifying Algorithm Modifying algorithms mainly having the aim to making algorithm faster. For this purpose different optimizing search techniques are used. It also reduce the complexity and showing the optimize representation or try to find approximate solution instead of accurate solution Model restriction and reducing the search space Restricting the model space has an immediate advantage in that the search space is also reduced. Furthermore, simple solutions are usually faster to obtain and evaluate and, in many cases, are competitive with more complex solutions. The major problem is when the intrinsic complexity of the problem Research Proposal, RCOEM Nagpur Confidential Page 2

6 cannot be met by a simple solution. Examples of this strategy are many, including linear models, perceptrons, and decision stumps Using powerful search heuristics Using a more efficient search heuristic avoids artificially constraining the possible models and tries to make the search process faster. The method consists of three steps: first, it must derive an upper bound on the relative loss between using a subset of the available data and the whole dataset in each step of the learning algorithm. Then, it must derive an upper bound of the time complexity of the learning algorithm as a function of the number of samples used in each step. Finally, it must minimize the time bound, via the number of samples used in each step, subject to the target limits on the loss of performance of using a subset of the dataset Change the way to deal Problem It consisting on modifying the way to solve problem, is based on general principal of divide-and- Conquer. The idea is to perform some kind of data partitioning or problem decomposition. 3.3 Parallelization Parallelization is help in the sense that the most costly parts are performed concurrently, with parallelization there is possibility of addressing the scaling up of the mining methods without either simplifying the algorithm or the task. 3.4 Graphics Processing Unit with Compute Unified device Architecture Graphics processing units (GPUs) has enabled inexpensive high performance computing for general purpose applications. Compute Unified Device Architecture (CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. GPUs have evolved into a highly parallel, multithreaded, many-core processor with tremendous computational horsepower and very high memory bandwidth. NVIDIA s GPU with the CUDA programming model provides an adequate API for non-graphics applications. Research Proposal, RCOEM Nagpur Confidential Page 3

7 Fig. 3.1 A set of SIMD stream multiprocessors with memory hierarchy CUDA Programming Model In software level CUDA is the collection of threads block which are running in parallel. The unit of work is assign to the GPU is called a kernel. CUDA program is running in a thread-parallel way. Computation is organized as a grid of thread blocks which consists of a set of threads as shown in below figure. At Instruction level, 32 consecutive threads in a thread block make up of a minimum unit of execution, which is called a thread warp. Each stream multiprocessor executes one or more thread block concurrently [2]. Research Proposal, RCOEM Nagpur Confidential Page 4

8 Fig.3.2 Serial execution on the host and parallel execution on the device Parallelization techniques on CUDA enabled platform Three schemes for data mining parallelization on CUDA- based platform are as follow: Scalable threads scheduling scheme for irregular pattern A task is assigned to the CPU or the GPU or the number of thread blocks is usually determined by the size of the problem before the GPU kernel starts. However, the size of a problems as irregular pattern problem. CUDA computing is not suitable for this problem. Solution: Scalable threads scheduling, Upper bound of number of threads/threads blocks and allocate the GPU resources are calculated first and if some threads block are ideal let the corresponding threads blocks quit immediately Parallel distributed top k scheme Top k problem is to select the k minimum or maximum elements from a data collection. Insertion sort is has been proved to be efficient when k is small but CUDA based insertion sort is not efficient. Solution: To reduce the computation and tackle the weakness of the CUDAbased insertion sort by using local sorts rather than a global sort Parallel high dimension reduction scheme Text mining may consist of hundreds of attributes, exceeding the size of the shared memory allocated to each thread block on the GPU. In such case, the record has to be broken into multiple sub-records to fit in the shared Research Proposal, RCOEM Nagpur Confidential Page 5

9 memory, but breaking down in too many sub-records is not the solution because the cost for manipulating the records and temporal results will high. Solution: By observing that different attributes in a record are independent, if each thread block only takes care of one distinct attribute of all the records. Rather than perform reduction on the high dimensional data, perform one dimensional reduction on each attribute. 3.5 CUDA based implementations of data mining algorithms CU-Apriori In CUDA based Apriori especially Candidate generation and Support counting take most of the computation of Apriori Candidate generation Candidate generation procedure joins two frequent (k-1) itemsets and prunes the unpromising k-candidates. Since the task of joining two itemsets is independent between different threads, it is suitable for parallelization, here scalable threads scheduling scheme for irregular pattern is used Support counting Support counting procedure records the number of occurrence of a candidate itemset by scanning the transaction database. Since the counting for each candidate is independent with others, it is suitable for parallelization. Transactions are loaded into the shared memory and shared by all the threads within a threads block [3] CU-KNN CUDA based K- Nearest- Neighbour classifier, Distance calculation and Selection of k nearest neighbours done most of computation Distance calculation It can be fully parallelized since pair-wise distance calculation is independent. This property makes KNN perfectly suitable for a GPU parallel implementation. The goal of this is to maximize the concurrency of the distance calculation invoked by different threads and minimize the global memory access Selection of k nearest neighbours The selection of k nearest neighbours of a query object is essentially to find the k shortest distances, which is a typical top-k problem. So, its implementation is done by distributed top-k scheme. Research Proposal, RCOEM Nagpur Confidential Page 6

10 3.5.3 CU-K-means In CUDA based K-means especially Cluster label update, Centroid update, Centroid movement detection, take most of computation of K-means Cluster label update All thread performs the distance calculation of an object to all the centroids, and selects the nearest centroid. Each object is assigned to the cluster whose centroid is closest to it. Attribute partitions of objects are loaded into the shared memory, so the bandwidth between the global memory and the shared memory is utilized efficiently Centroid update Each new centroid is calculated by averaging the attribute values of all the records belonging to the common cluster. Parallel high dimension reduction scheme is used to do the this task Centroid movement detection If the new centroids move away from the centroids in the last iteration. Firstly we required to calculate the square of the difference between every attribute of the new and old centroids, called centroid difference matrix. Secondly perform the parallel high dimension reduction scheme on the centroid difference matrix. Thirdly, since the attributes of the record is small this record is transferred to the main memory, and summed up to get global_squared_error. The cost of data transfer between the main and global memory is negligible [7] FP-Growth Although the FP-Growth association-rule mining algorithm is more efficient than the Apriori algorithm, it has two disadvantages. The first is that the FPtree can become too large to be created in memory; the second is serial processing approach used. A distributed application data framework parallel approach of FP-Growth not required generating overall FP-tree. Overall FP-tree may be too large to create in shared memory. Algorithm uses parallel processing approach in all important steps. Which improve the processing capability and efficiency of association-rule mining algorithm [4] Parallel Bees Swarm Optimization Association mining problem with huge datasets solved by using and applying the bees behaviour. It take the advantage of GPU architecture and deal with large datasets to solve real time problem. Master and slave paradigm is used with this method. The master is executing on CPU and the slave is offloaded to the GPU. First, The master initializes randomly the solution reference. After that, it determines regions of the whole bees by generating the Research Proposal, RCOEM Nagpur Confidential Page 7

11 neighbours of each bee. Single solution is evaluated on GPU in parallel. After, the master receives back the fitness of all rules; each bee calculates sequentially the best rule and puts it in the table dance. The best rule of the dance table becomes the solution reference for the next iteration [5] Accelerating Parallel Frequent Itemset Mining on Graphic Processors with Sorting It constructing the Transaction Identifier table and performing the sorting for all frequent itemsets this is helping to reduce the candidate itemsets by using GPU architecture. GPU thread block were allocated after sorting the itemsets in descending order. Therefore time required to check and support counting take less time [6] Parallel Highly Informative K-ItemSet PHIKS, a highly scalable, parallel miki mining algorithm. PHIKS able to handle the mining process of huge databases (terabytes of datasets in size). MIKI, the problem of maximally informative k-itemsets (miki for short) discovery in massive data sets, where in formativeness is expressed is expressed by means of joint entropy and k is the size of the itemset. Miki mining is a key problem in data analytics with high potential impact on various tasks such as unsupervised learning, supervised learning, or information retrieval, to cite a few. A typical application is the discovery of discriminative sets of features, based on joint entropy [9]. 4. Problem Definition Large data generated by online transaction, social networking sites and government organization of space and bioinformatics, available data mining algorithms are not performing well with this datasets. Other problem is about performance; some of algorithms are able to solve the mining problem facing problem of search space which prevent efficient execution and generated solution are not satisfactory level. Research Proposal, RCOEM Nagpur Confidential Page 8

12 5. Proposed Methodology To deal with the very large datasets, the only way to deal with this problem by apply the Parallel approach for Scaling up the data mining algorithm and that can be done by modifying the algorithm, by data partitioning, by problem decomposition and parallelization. For parallelization Graphics processing units enabling in expensive high performance computing power with this Compute unified device architecture programming model provide the programmers adequate c language like API to better exploits the parallel power of GPU.GPU has evolved into a highly parallel,multithreaded,many core processor so work is distributed among different thread block and threads are performing operation in thread parallel fashion. Other approach is based on OpenMP, it is shared memory API work in fork and join model. It having large set of constructs and directives which allow to do the work in parallel, that way task utilize the computing power of multiple core and parallel approach are apply for scaling up data mining algorithms. 6. References 1. M. J. Zaki, Large-Scale parallel Data Mining, LNAI 1759, pp. 1-23,Springer N. Garcia-Pedrajas, A. de Hero-Garcia, Scaling up data mining algorithms: review and taxonomy, springer-verlag L. Jian, C. Wang, Y. Liu,Y. Shi, Parallel data mining techniques on Graphics processing Unit with Compute Unified device Architecture (CUDA),pp Springer science + Business Media, LLC Zhi- gang Wang, Chi-she Wang,A Parallel Association-Rule Mining Algorithm,pp springer-verlag Berling Heidelberg Y. Tan, Parallel Bees Swarm Optimization for Association rules mining using GPU Architecture,pp ,Springer International Publishing Switzerland H.Hsu, Accelerating parallel Frequent Itemset Mining on Graphics processors with Sorting,pp IFIP H. Decker, Parallel and Distributed Mining of Probalilistic Frequent Itemsets Using Multiple GPUs, Springer-Verlag Berlin Heidelberg S. Tsutsui and P.Collet, Data Mining Using parallel Multi-objective Evolutionary Algorithms on Graphics Processing Units, Springer-Verlag Berlin Saber Salah, A high scalable parallel algorithm for maximally informative k- itemset mining,springer Verlag London Research Proposal, RCOEM Nagpur Confidential Page 9

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav

CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CUDA PROGRAMMING MODEL Chaithanya Gadiyam Swapnil S Jadhav CMPE655 - Multiple Processor Systems Fall 2015 Rochester Institute of Technology Contents What is GPGPU? What s the need? CUDA-Capable GPU Architecture

More information

Accelerating K-Means Clustering with Parallel Implementations and GPU computing

Accelerating K-Means Clustering with Parallel Implementations and GPU computing Accelerating K-Means Clustering with Parallel Implementations and GPU computing Janki Bhimani Electrical and Computer Engineering Dept. Northeastern University Boston, MA Email: bhimani@ece.neu.edu Miriam

More information

Parallelization of K-Means Clustering Algorithm for Data Mining

Parallelization of K-Means Clustering Algorithm for Data Mining Parallelization of K-Means Clustering Algorithm for Data Mining Hao JIANG a, Liyan YU b College of Computer Science and Engineering, Southeast University, Nanjing, China a hjiang@seu.edu.cn, b yly.sunshine@qq.com

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

This paper proposes: Mining Frequent Patterns without Candidate Generation

This paper proposes: Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation a paper by Jiawei Han, Jian Pei and Yiwen Yin School of Computing Science Simon Fraser University Presented by Maria Cutumisu Department of Computing

More information

Parallel Popular Crime Pattern Mining in Multidimensional Databases

Parallel Popular Crime Pattern Mining in Multidimensional Databases Parallel Popular Crime Pattern Mining in Multidimensional Databases BVS. Varma #1, V. Valli Kumari *2 # Department of CSE, Sri Venkateswara Institute of Science & Information Technology Tadepalligudem,

More information

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS

THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT HARDWARE PLATFORMS Computer Science 14 (4) 2013 http://dx.doi.org/10.7494/csci.2013.14.4.679 Dominik Żurek Marcin Pietroń Maciej Wielgosz Kazimierz Wiatr THE COMPARISON OF PARALLEL SORTING ALGORITHMS IMPLEMENTED ON DIFFERENT

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

Efficient Lists Intersection by CPU- GPU Cooperative Computing

Efficient Lists Intersection by CPU- GPU Cooperative Computing Efficient Lists Intersection by CPU- GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab, Nankai University Outline Introduction Cooperative

More information

Di Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio

Di Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio Di Zhao zhao.1029@osu.edu Ohio State University MVAPICH User Group (MUG) Meeting, August 26-27 2013, Columbus Ohio Nvidia Kepler K20X Intel Xeon Phi 7120 Launch Date November 2012 Q2 2013 Processor Per-processor

More information

Mining Frequent Patterns without Candidate Generation

Mining Frequent Patterns without Candidate Generation Mining Frequent Patterns without Candidate Generation Outline of the Presentation Outline Frequent Pattern Mining: Problem statement and an example Review of Apriori like Approaches FP Growth: Overview

More information

Optimization solutions for the segmented sum algorithmic function

Optimization solutions for the segmented sum algorithmic function Optimization solutions for the segmented sum algorithmic function ALEXANDRU PÎRJAN Department of Informatics, Statistics and Mathematics Romanian-American University 1B, Expozitiei Blvd., district 1, code

More information

6. Dicretization methods 6.1 The purpose of discretization

6. Dicretization methods 6.1 The purpose of discretization 6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms

Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms Int. J. Advanced Networking and Applications 458 Performance Evaluation of Sequential and Parallel Mining of Association Rules using Apriori Algorithms Puttegowda D Department of Computer Science, Ghousia

More information

Unsupervised Learning

Unsupervised Learning Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised

More information

Association Rules. Berlin Chen References:

Association Rules. Berlin Chen References: Association Rules Berlin Chen 2005 References: 1. Data Mining: Concepts, Models, Methods and Algorithms, Chapter 8 2. Data Mining: Concepts and Techniques, Chapter 6 Association Rules: Basic Concepts A

More information

Comparison of FP tree and Apriori Algorithm

Comparison of FP tree and Apriori Algorithm International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Automatic Parallelization and OpenMP 3 GPGPU 2 Multithreaded Programming

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Storage Hierarchy Management for Scientific Computing

Storage Hierarchy Management for Scientific Computing Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

2. Department of Electronic Engineering and Computer Science, Case Western Reserve University

2. Department of Electronic Engineering and Computer Science, Case Western Reserve University Chapter MINING HIGH-DIMENSIONAL DATA Wei Wang 1 and Jiong Yang 2 1. Department of Computer Science, University of North Carolina at Chapel Hill 2. Department of Electronic Engineering and Computer Science,

More information

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining A Survey on Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining Miss. Rituja M. Zagade Computer Engineering Department,JSPM,NTC RSSOER,Savitribai Phule Pune University Pune,India

More information

Nesnelerin İnternetinde Veri Analizi

Nesnelerin İnternetinde Veri Analizi Bölüm 4. Frequent Patterns in Data Streams w3.gazi.edu.tr/~suatozdemir What Is Pattern Discovery? What are patterns? Patterns: A set of items, subsequences, or substructures that occur frequently together

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN:

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: [35] [Rana, 3(12): December, 2014] ISSN: IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A Brief Survey on Frequent Patterns Mining of Uncertain Data Purvi Y. Rana*, Prof. Pragna Makwana, Prof. Kishori Shekokar *Student,

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Unsupervised Learning Clustering Association Rules V2.0 WS 10/11 J. Fürnkranz Different Learning Scenarios Supervised Learning A teacher provides the value for the

More information

A Fast and High Throughput SQL Query System for Big Data

A Fast and High Throughput SQL Query System for Big Data A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190

More information

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Improving the Efficiency of Fast Using Semantic Similarity Algorithm International Journal of Scientific and Research Publications, Volume 4, Issue 1, January 2014 1 Improving the Efficiency of Fast Using Semantic Similarity Algorithm D.KARTHIKA 1, S. DIVAKAR 2 Final year

More information

Comparing the Performance of Frequent Itemsets Mining Algorithms

Comparing the Performance of Frequent Itemsets Mining Algorithms Comparing the Performance of Frequent Itemsets Mining Algorithms Kalash Dave 1, Mayur Rathod 2, Parth Sheth 3, Avani Sakhapara 4 UG Student, Dept. of I.T., K.J.Somaiya College of Engineering, Mumbai, India

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm

Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm Analysis of Different Approaches of Parallel Block Processing for K-Means Clustering Algorithm Rashmi C a ahigh-performance Computing Project, Department of Studies in Computer Science, University of Mysore,

More information

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-&3 -(' ( +-   % '.+ % ' -0(+$, The structure is a very important aspect in neural network design, it is not only impossible to determine an optimal structure for a given problem, it is even impossible to prove that a given structure

More information

DISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING

DISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING DISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING Ms. Pooja Bhise 1, Prof. Mrs. Vidya Bharde 2 and Prof. Manoj Patil 3 1 PG Student, 2 Professor, Department

More information

An improved MapReduce Design of Kmeans for clustering very large datasets

An improved MapReduce Design of Kmeans for clustering very large datasets An improved MapReduce Design of Kmeans for clustering very large datasets Amira Boukhdhir Laboratoire SOlE Higher Institute of management Tunis Tunis, Tunisia Boukhdhir _ amira@yahoo.fr Oussama Lachiheb

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References q This set of slides is mainly based on: " CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory " Slide of Applied

More information

Parallel FFT Program Optimizations on Heterogeneous Computers

Parallel FFT Program Optimizations on Heterogeneous Computers Parallel FFT Program Optimizations on Heterogeneous Computers Shuo Chen, Xiaoming Li Department of Electrical and Computer Engineering University of Delaware, Newark, DE 19716 Outline Part I: A Hybrid

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS Prabodha Srimal Rodrigo Registration No. : 138230V Degree of Master of Science Department of Computer Science & Engineering University

More information

Parallel Trie-based Frequent Itemset Mining on Graphics Processors

Parallel Trie-based Frequent Itemset Mining on Graphics Processors City University of New York (CUNY) CUNY Academic Works Master's Theses City College of New York 2012 Parallel Trie-based Frequent Itemset Mining on Graphics Processors Jay Junjie Yao CUNY City College

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Analyzing Outlier Detection Techniques with Hybrid Method

Analyzing Outlier Detection Techniques with Hybrid Method Analyzing Outlier Detection Techniques with Hybrid Method Shruti Aggarwal Assistant Professor Department of Computer Science and Engineering Sri Guru Granth Sahib World University. (SGGSWU) Fatehgarh Sahib,

More information

Efficient Algorithm for Frequent Itemset Generation in Big Data

Efficient Algorithm for Frequent Itemset Generation in Big Data Efficient Algorithm for Frequent Itemset Generation in Big Data Anbumalar Smilin V, Siddique Ibrahim S.P, Dr.M.Sivabalakrishnan P.G. Student, Department of Computer Science and Engineering, Kumaraguru

More information

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania

OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania OpenCL Base Course Ing. Marco Stefano Scroppo, PhD Student at University of Catania Course Overview This OpenCL base course is structured as follows: Introduction to GPGPU programming, parallel programming

More information

Iteration Reduction K Means Clustering Algorithm

Iteration Reduction K Means Clustering Algorithm Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department

More information

Parallel Processing SIMD, Vector and GPU s cont.

Parallel Processing SIMD, Vector and GPU s cont. Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

Accelerating Unique Strategy for Centroid Priming in K-Means Clustering IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 07 December 2016 ISSN (online): 2349-6010 Accelerating Unique Strategy for Centroid Priming in K-Means Clustering

More information

Data Mining Part 3. Associations Rules

Data Mining Part 3. Associations Rules Data Mining Part 3. Associations Rules 3.2 Efficient Frequent Itemset Mining Methods Fall 2009 Instructor: Dr. Masoud Yaghini Outline Apriori Algorithm Generating Association Rules from Frequent Itemsets

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

An Efficient Hash-based Association Rule Mining Approach for Document Clustering

An Efficient Hash-based Association Rule Mining Approach for Document Clustering An Efficient Hash-based Association Rule Mining Approach for Document Clustering NOHA NEGM #1, PASSENT ELKAFRAWY #2, ABD-ELBADEEH SALEM * 3 # Faculty of Science, Menoufia University Shebin El-Kom, EGYPT

More information

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman)

CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC. Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) CMSC 714 Lecture 6 MPI vs. OpenMP and OpenACC Guest Lecturer: Sukhyun Song (original slides by Alan Sussman) Parallel Programming with Message Passing and Directives 2 MPI + OpenMP Some applications can

More information

K-means based data stream clustering algorithm extended with no. of cluster estimation method

K-means based data stream clustering algorithm extended with no. of cluster estimation method K-means based data stream clustering algorithm extended with no. of cluster estimation method Makadia Dipti 1, Prof. Tejal Patel 2 1 Information and Technology Department, G.H.Patel Engineering College,

More information

Mining of Web Server Logs using Extended Apriori Algorithm

Mining of Web Server Logs using Extended Apriori Algorithm International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the Chapter 6: What Is Frequent ent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc) that occurs frequently in a data set frequent itemsets and association rule

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE)

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : Data Warehousing and Mining (16MC815) Year & Sem: II-MCA & I-Sem Course

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

GPU Implementation of a Multiobjective Search Algorithm

GPU Implementation of a Multiobjective Search Algorithm Department Informatik Technical Reports / ISSN 29-58 Steffen Limmer, Dietmar Fey, Johannes Jahn GPU Implementation of a Multiobjective Search Algorithm Technical Report CS-2-3 April 2 Please cite as: Steffen

More information

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG *

Improved Balanced Parallel FP-Growth with MapReduce Qing YANG 1,a, Fei-Yang DU 2,b, Xi ZHU 1,c, Cheng-Gong JIANG * 2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5

More information

A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING

A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data

More information

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis

GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis Abstract: Lower upper (LU) factorization for sparse matrices is the most important computing step for circuit simulation

More information

B. Tech. Project Second Stage Report on

B. Tech. Project Second Stage Report on B. Tech. Project Second Stage Report on GPU Based Active Contours Submitted by Sumit Shekhar (05007028) Under the guidance of Prof Subhasis Chaudhuri Table of Contents 1. Introduction... 1 1.1 Graphic

More information

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono

Introduction to CUDA Algoritmi e Calcolo Parallelo. Daniele Loiacono Introduction to CUDA Algoritmi e Calcolo Parallelo References This set of slides is mainly based on: CUDA Technical Training, Dr. Antonino Tumeo, Pacific Northwest National Laboratory Slide of Applied

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

A Graph Theoretic Approach to Image Database Retrieval

A Graph Theoretic Approach to Image Database Retrieval A Graph Theoretic Approach to Image Database Retrieval Selim Aksoy and Robert M. Haralick Intelligent Systems Laboratory Department of Electrical Engineering University of Washington, Seattle, WA 98195-2500

More information

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING

IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING IMPLEMENTATION AND COMPARATIVE STUDY OF IMPROVED APRIORI ALGORITHM FOR ASSOCIATION PATTERN MINING 1 SONALI SONKUSARE, 2 JAYESH SURANA 1,2 Information Technology, R.G.P.V., Bhopal Shri Vaishnav Institute

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Redefining and Enhancing K-means Algorithm

Redefining and Enhancing K-means Algorithm Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

More information

Machine Learning : Clustering, Self-Organizing Maps

Machine Learning : Clustering, Self-Organizing Maps Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 11, November 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Haicheng Wu 1, Daniel Zinn 2, Molham Aref 2, Sudhakar Yalamanchili 1 1. Georgia Institute of Technology 2. LogicBlox

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Warps and Reduction Algorithms

Warps and Reduction Algorithms Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum

More information

Parallelism in Knowledge Discovery Techniques

Parallelism in Knowledge Discovery Techniques Parallelism in Knowledge Discovery Techniques Domenico Talia DEIS, Università della Calabria, Via P. Bucci, 41c 87036 Rende, Italy talia@deis.unical.it Abstract. Knowledge discovery in databases or data

More information

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center 3/17/2015 2014 IBM Corporation Outline IBM OpenPower Platform Accelerating

More information

FP-Growth algorithm in Data Compression frequent patterns

FP-Growth algorithm in Data Compression frequent patterns FP-Growth algorithm in Data Compression frequent patterns Mr. Nagesh V Lecturer, Dept. of CSE Atria Institute of Technology,AIKBS Hebbal, Bangalore,Karnataka Email : nagesh.v@gmail.com Abstract-The transmission

More information

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

More information

OpenMP Optimization and its Translation to OpenGL

OpenMP Optimization and its Translation to OpenGL OpenMP Optimization and its Translation to OpenGL Santosh Kumar SITRC-Nashik, India Dr. V.M.Wadhai MAE-Pune, India Prasad S.Halgaonkar MITCOE-Pune, India Kiran P.Gaikwad GHRIEC-Pune, India ABSTRACT For

More information

Introduction to Multicore Programming

Introduction to Multicore Programming Introduction to Multicore Programming Minsoo Ryu Department of Computer Science and Engineering 2 1 Multithreaded Programming 2 Synchronization 3 Automatic Parallelization and OpenMP 4 GPGPU 5 Q& A 2 Multithreaded

More information

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE

A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE A TALENTED CPU-TO-GPU MEMORY MAPPING TECHNIQUE Abu Asaduzzaman, Deepthi Gummadi, and Chok M. Yip Department of Electrical Engineering and Computer Science Wichita State University Wichita, Kansas, USA

More information

Particle Swarm Optimization applied to Pattern Recognition

Particle Swarm Optimization applied to Pattern Recognition Particle Swarm Optimization applied to Pattern Recognition by Abel Mengistu Advisor: Dr. Raheel Ahmad CS Senior Research 2011 Manchester College May, 2011-1 - Table of Contents Introduction... - 3 - Objectives...

More information