Data Mining Techniques Methods Algorithms and Tools

Similar documents
A Comparative Study of Selected Classification Algorithms of Data Mining

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Correlation Based Feature Selection with Irrelevant Feature Removal

SURVEY ON STUDENT INFORMATION ANALYSIS

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

Question Bank. 4) It is the source of information later delivered to data marts.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Introduction to Data Mining and Data Analytics

TIM 50 - Business Information Systems

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

SBKMMA: Sorting Based K Means and Median Based Clustering Algorithm Using Multi Machine Technique for Big Data

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

Oracle9i Data Mining. An Oracle White Paper December 2001

Optimizing Number of Hidden Nodes for Artificial Neural Network using Competitive Learning Approach

Iteration Reduction K Means Clustering Algorithm

data-based banking customer analytics

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

Modelling Structures in Data Mining Techniques

IJMIE Volume 2, Issue 9 ISSN:

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

BIG DATA & HADOOP: A Survey

Introduction to Data Mining

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

DATA WAREHOUING UNIT I

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

TIM 50 - Business Information Systems

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Jarek Szlichta

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

Data Mining: Approach Towards The Accuracy Using Teradata!

The k-means Algorithm and Genetic Algorithm

Credit card Fraud Detection using Predictive Modeling: a Review

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

Text Document Clustering Using DPM with Concept and Feature Analysis

Oracle9i Data Mining. Data Sheet August 2002

International Journal of Advanced Research in Computer Science and Software Engineering

SOCIAL MEDIA MINING. Data Mining Essentials

1 (eagle_eye) and Naeem Latif

Customer Clustering using RFM analysis

Comparative Study of Clustering Algorithms using R

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

International Journal of Mechatronics, Electrical and Computer Technology

Keywords Hadoop, Map Reduce, K-Means, Data Analysis, Storage, Clusters.

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

Basic Data Mining Technique

Disquisition of a Novel Approach to Enhance Security in Data Mining

Data Mining Concepts

Classification and Regression

DATA MINING AND WAREHOUSING

Application of Data Mining in Manufacturing Industry

Performance Analysis of Data Mining Classification Techniques

AMOL MUKUND LONDHE, DR.CHELPA LINGAM

Research Article Apriori Association Rule Algorithms using VMware Environment

Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network

A Performance Assessment on Various Data mining Tool Using Support Vector Machine

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Data Management Glossary

International Journal of Software and Web Sciences (IJSWS)

A Review on Cluster Based Approach in Data Mining

A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES

A Systematic Overview of Data Mining Algorithms

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

SCENARIO BASED ADAPTIVE PREPROCESSING FOR STREAM DATA USING SVM CLASSIFIER

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Web Mining Evolution & Comparative Study with Data Mining

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Gurpreet Kaur 1, Naveen Aggarwal 2 1,2

Analyzing Outlier Detection Techniques with Hybrid Method

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Election Analysis and Prediction Using Big Data Analytics

Clustering of Data with Mixed Attributes based on Unified Similarity Metric

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

Storm Identification in the Rainfall Data Using Singular Value Decomposition and K- Nearest Neighbour Classification

Data warehouse and Data Mining

Chapter 2 BACKGROUND OF WEB MINING

Processing Techniques. Chapter 7: Design and Development and Evaluation of Systems. Online Processing. Real-time Processing

In-Memory Analytics with EXASOL and KNIME //

A Survey Of Issues And Challenges Associated With Clustering Algorithms

DATA MINING DATA KNOWLEDGE DECISION ACTION. Data. Decision. Data mining (analysis) Business modeling (using data mining software) Business hypothesis

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Topics covered 10/12/2015. Pengantar Teknologi Informasi dan Teknologi Hijau. Suryo Widiantoro, ST, MMSI, M.Com(IS)

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

Transcription:

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC, Vol. 6, Issue. 7, July 2017, pg.227 231 Data Mining Techniques Methods Algorithms and Tools Nelofar Rehman Assistant Professor, PG Department Of Computer Science, SSM College Of Engineering and Technology, Kashmir, India nelofar.rehman@gmail.com..***. Abstract: Data mining is the practice of examing large pre-existing database in order to generate new information. It is a powerful technology with great potential to help companies focus on most important information in their data warehouses. Data mining tools predicts future trends and behaviour allowing business to make proactive, knowledge driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. 1.Introduction Data mining is the process of extraction hidden knowledge from volumes of raw data through use of algorithm and techniques drawn from field of statistics, machine learning and data base management system. Data mining also called knowledge discovery in large data enables firm and organization decisions by assembling, accumulating, analyzing and accessing corporate data. It uses variety of tools like query and reporting tools, analytical processing tools and decision support system(dss) tools. 2.Key Properties Of Data Mining Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large data sets and database Data mining can answer questions that cannot be addressed through simple query and reporting techniques. 2017, IJCSMC All Rights Reserved 227

2.1 Automatic Discovery Of Patterns Data mining is accomplished by building models. A model uses algorithm to act on right models of data. The notion of automatic discovery refers to execution of data mining models. Data mining tools can be used to mine data on which they are built but most of the models are generalized to new data. The process of applying a model to new data is known as scoring. 2.2 Prediction Many forms of data mining are predictive example a model might predict income based on education and other demographic factors. Prediction probabilities are known as confidence. Some forms of predictive data mining generate rules which are conditions that imply a given outcome example a rule might specify that a person who has a bachelors degree and lives in a certain neighbourhood is likely to have an income greater than regional average. 2.3 Grouping Other forms of data mining identify natural groupings in data example a model might identify the segment of population that has income within specified range that has a good driving record and that leases a new car on yearly basis. 2.4 Actionable Information Data mining can derive information from large volumes of data example a town planner might use a model that predicts income base on demographies to develop a plan for low income housing. A car leasing agency might use model that identifies customer segments to design a promotion targeting high value customers. 3. Data Mining Process Data mining process has iterative nature. Data mining project doesnot stop when a particular solution is deployed. The result of data mining trigger new business questions which in turn can be used to develop more focussed models. 3.1 Problem Definition This initial phase of data mining project focusses on understanding the project objectives and requirements example our business problem might be "How can sell more of my products to customers"? we might translate this into data mining problem such as "which customers are most likely to purchase the product?"a model that predicts who is most likely to purchase the product must be built on data that describes the customers who have purchased the product in past. Before building the model we must assemble the data that is likely to contain relationship between customers who have purchased the products and customers who have not purchased the product. Customer attributes might include age,no.of children,years of residence.owners and so on. 3.2 Data Gathering and Preparation The data understanding phase involves data collection and exploration. As we take a closer look at data we can determine how well it addresses the business problem. We might decide to remove some of data or add additional data. This is also the time to identify data quality problems and to scan for patterns in data. 2017, IJCSMC All Rights Reserved 228

3.3 Model Building And Evaluation In this phase we select and apply various modelling techniques and callibrate the parameters to optimal values. If the algorithm requires data transformations we will need to step back to previous phase to implement them. At this stage of the project iot is time to evaluate how well the model satisfies originally stated business goal. 3.4 Knowledge Deployment Knowledge deployment is the use of the data mining within a target environment. In the deployment phase insight and actionable information can be derived from data. Deployment can involve scoring the extraction of model details or the integration of data mining within applications data warehouse infrastructure or query and reporting tools. 4. Techniques Of Data Mining To analyse large amount of data, data mining came into picture and is also known as KDD process. To complete process various techniques are deployed so afra. Data mining uses already build tools to get out useful hidden patterns trends and predictions of future can be obtained using techniques. 4.1 Classification Classification is one of the data mining technique which is useful for predicting group membership for data instance example a classification model can be used to identify loan applicants is low, medium or high credit tasks. 4.1.1 Methods a.decision tree induction: A decision tree is a structure that includes a root node branches and leaf node. Each internal node represents a test on attribute,each branch denotes the outcome of test each leaf node holds a class label. The top[most node in tree is root node. The main goal is to predict the output of continuous attribute but data mining is less appropriate for estimating tasks. b.rule: It is represented by set of IF-THEN riles. First of all how these rules are examined and next is how these rules are build and can be generated from data. Expression for rule is 4.2 Clustering IF condition THEN conclusion By examing one or more attributes or classes we can group individual pieces of data together to form a structure opinion. At a simple level,clustering is using one or more attributes as our brain for identifying a cluster of corelating results. 4.2.1 Methods a. Partitioning method: Suppose we are given a database of n objects and partitioning method constructs k partitions of data. Each partition will represent a cluster and k<=n. It means that it will classify data into K groups which satisfy each group contains atleast one object each object must belong to exactly one group 2017, IJCSMC All Rights Reserved 229

b.hierarchial methods: This method creates a hierarchial decomposition of given set of data objects. We can classify hierarchial methods on basis of how hierarchial decomposition is formed. There are two approaches agglomerative approach and division approach. 4.3 Regression Regression is used to predict continuous and numerical target. It predicts no., sales, profit, square footage temperature rates. It is base on training process. It estimates value by comparing already known and predicted value. 4.3.1 Methods a. Linear regression: It is used where relationship between target and predictor can be represented in straight line. y=p1x+p2+e b.non-linear regression: It is used where relationship between target and predictor can't be represented in straight line. 5.Data Mining Algorithms category algorithm classification c4.5,support machine vector clustering Kmean algo Hierarchial BIRCH,CURE 6.Data Mining Tools Rapid Miner:It claims to be "world leading open source system for data and text mining".rapid analytics is a server version of that product. Mahout:This Apache project offers algo for clustering class and batch based calloborative filtering that run on top of Hadoop. Orange:This Project hopes to make data mining fruitful and fun for both novices and experts. It offers a wide variety of visualization plus a toolbox of more than 100 widgets. weka:short for "waikato environment for knowledge analysis".weka offers a set of algo for data mining that we canapply directly to data or use in another java application. DataMelt :It can do mathematical computation data mining statiscal analysis and data visualization. 7.Conclusion This paper presents detailed description odf data mining and its techniques and best methos and algos for techniques. Today all IT professionals engineers and researchers are working on big data. Big data is term of concerning large volumes of complex data sets.the high performance computing paradigm is required to solve the problem of big data. 2017, IJCSMC All Rights Reserved 230

References [1].xingqan hu,lan Davidson,'knowledge Discovery and Data Mining:challenges and Realities",ISBN 978-1-59904-2521 Hershey,New York,2007 [2].Grabmeir.J,Rudolph,'Technique of clustering Algorithms in Data Mining ",Data Mining and Knowledge Discovery,2002 [3] Ali,showkat and Kate A.Smith"On learningalgo selection for claasification"applied soft completing 2006. [4].https://docs.oracle.com/cd/B28359-01/datmine.111/b28129/process.htm [5].https://en.wikipedia.org/wiki/data-mining [6].opensourceforu.com/2017/03/top-10-opensource-data-mining-tools BIOGRAPHIES Nelofar Rehman is Assistant Professor at SSM College Of Engineering And Technology in PG Department Of Computer Sciences from University Of Kashmir, J&K, India. Her field of interest is Data Analysis, Artificial Intelligence and Algorithms. 2017, IJCSMC All Rights Reserved 231