CHAPTER 4 METHODOLOGY AND TOOLS

Similar documents
Index Terms Data Mining, Classification, Rapid Miner. Fig.1. RapidMiner User Interface

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Application of Data Mining in Manufacturing Industry

Pre-Requisites: CS2510. NU Core Designations: AD

A Survey of Statistical Modeling Tools

Community edition(open-source) Enterprise edition

Research Data Analysis using SPSS. By Dr.Anura Karunarathne Senior Lecturer, Department of Accountancy University of Kelaniya

CHAPTER 6 EXPERIMENTS

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 12, 2017 ISSN (online):

WEKA homepage.

Design on Office Automation System based on Domino/Notes Lijun Wang1,a, Jiahui Wang2,b

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

PROJECT 1 DATA ANALYSIS (KR-VS-KP)

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

IST Computational Tools for Statistics I. DEÜ, Department of Statistics

IBM SPSS Statistics and open source: A powerful combination. Let s go

SOFTWARE TOOLS FOR TEACHING UNDERGRADUATE DATA MINING COURSE

ABSTRACT I. INTRODUCTION II. METHODS AND MATERIAL

Advance analytics and Comparison study of Data & Data Mining

Towards Ease of Building Legos in Assessing ehealth Language Technologies

GETTING STARTED WITH DATA MINING

BEST BIG DATA CERTIFICATIONS

TIM 50 - Business Information Systems

SAS Enterprise Miner 7.1

PROJECT PERIODIC REPORT

Performance Analysis of Data Mining Classification Techniques

Spotfire and Tableau Positioning. Summary

SOFTWARE ENGINEERING. Curriculum in Software Engineering. Program Educational Objectives

Using Data Mining to Determine User-Specific Movie Ratings

Prediction Using Regression Analysis

Today. Lecture 17: Reality Mining. Last time

M.Tech Curriculum SEMESTER-III course

Introducing SAS Model Manager 15.1 for SAS Viya

Mining Social Media Users Interest

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

Visual Tools to Lecture Data Analytics and Engineering

Data Mining: STATISTICA

A Comparison of Contemporary Data Mining Tools

A Comparative Study of Selected Classification Algorithms of Data Mining

Upgrading your End User Skills to SharePoint 2013 Course 55026A; 3 Days, Instructor-led

Object Oriented Programming

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Get some Coffee for free Writing Operators with RapidMiner Beans

B.TECH(COMPUTER) Will be equipped with sound knowledge of mathematics, science and technology useful to build complex computer engineering solutions.

International Journal of Advanced Research in Computer Science and Software Engineering

Gnome Data Mine Tools Evaluation Report

An Improved Document Clustering Approach Using Weighted K-Means Algorithm

Chapter 5. Software Tools

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

COCKPIT FP Citizens Collaboration and Co-Creation in Public Service Delivery. Deliverable D Opinion Mining Tools 1st version

Go USA! Get There Faster. 6/29/2010 Process World

Dependability Analysis of Web Service-based Business Processes by Model Transformations

New ensemble methods for evolving data streams

The use of spreadsheets with analytical

A ProM Operational Support Provider for Predictive Monitoring of Business Processes

Introducing Oracle R Enterprise 1.4 -

Bachelor of Science Information Studies School of Information Program Summary

CMPUT 695 Fall 2004 Assignment 2 Xelopes

WhatsApp Group Data Analysis with R

A Framework for Source Code metrics

CE4031 and CZ4031 Database System Principles

Applying big data analytics in practice

New Zealand Government IbM Infrastructure as a service

Data mining: concepts and algorithms

SOFTWARE ENGINEERING. To discuss several different ways to implement software reuse. To describe the development of software product lines.

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

IJMIE Volume 2, Issue 9 ISSN:

In-Memory Analytics with EXASOL and KNIME //

3 Data, Data Mining. Chengkai Li

Chapter 6 Queueing Networks Modelling Software (QNs) and Computing Language(s)

Literature Synthesis - Visualisations

SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines

Installation KNIME AG. All rights reserved. 1

MetaData for Database Mining

Brown University Libraries Technology Plan

Data Mining: An experimental approach with WEKA on UCI Dataset

SAS Viya : The Beauty of REST in Action

EXECUTIVE VIEW. One Identity SafeGuard 2.0. KuppingerCole Report

Data Mining An Overview ITEV, F /18

DATA MINING RESEARCH: RETROSPECT AND PROSPECT

CE4031 and CZ4031 Database System Principles

Introduction to Programming

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Chee Kiam. to sieve through. and the next one. relevant. The advances in Big. (NLB) of Singapore.

Modernizing the Grid for a Low-Carbon Future. Dr. Bryan Hannegan Associate Laboratory Director

Progress DataDirect For Business Intelligence And Analytics Vendors

TIM 50 - Business Information Systems

Development of an e-library Web Application

KNIME for the life sciences Cambridge Meetup

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

TEXT MINING: THE NEXT DATA FRONTIER

AN OVERVIEW AND EXPLORATION OF JMP A DATA DISCOVERY SYSTEM IN DAIRY SCIENCE

Enterprise Architecture Building a Mobile Vision. David Hunt DCH Technology Services Gill Windall University of Greenwich

Visual programming language for modular algorithms

Creating a Recommender System. An Elasticsearch & Apache Spark approach

The Future of TV and Video, Clarified

Recommendation Systems

Transcription:

CHAPTER 4 METHODOLOGY AND TOOLS 4.1 RESEARCH METHODOLOGY In an effort to test empirically the suggested data mining technique, the data processing quality, it is important to find a real-world for effective data processing of database in application domain. In data mining technique, the application domain, data processing applications that have matured to become relatively stable. The data processing through data mining technique method chosen is mainly sophomore and finds the complexity of knowledge discovery data processing in computer applications. The generation process, existing data processing through data mining technique proposal approaches can be spitted into the following categories: 1. Content-based methods recommend for data processing using data mining technique to the user items similar to the ones the user chosen in the past relying on product features and textual item descriptions for processing of data. 2. Collaborative Filtering recommends to the user data processing items that people used with similar preferences knowledge discovery in data processing, liked in the past, taking the behavior, opinions and tastes of a large community of users into account. In this thesis the performance of different data mining algorithm is evaluated empirically by conducting experiments using three different software tools viz. RapidMiner, Weka, and IBM SPSS statistics. 4.2 RAPID MINER TOOL For a successful classification implementation, RapidMiner Studio 6 was used to perform experiments. RapidMiner is one of the world s most widespread and 57

most used open source data mining solutions [16]. The project was born at the University of Dortmund in 2001 and has been developed further by Rapid-I GmbH since 2007. With this academic background, RapidMiner continues to not only address business clients, but also universities and researchers from the most diverse disciplines. Fig.4.1. RapidMiner User Interface RapidMiner has a comfortable user interface (Fig.4.1), where analyses are configured in a process view. RapidMiner uses a modular concept for this, where each step of an analysis (e.g. a pre-processing step or a learning procedure) is illustrated by an operator in the analysis process. These operators have input and output ports via which they can communicate with the other operators in order to receive input data or pass the hanged data and generated models over to the operators that follow. 58

Fig.4.2. A RapidMiner process with of operators for model production Thus a data flow is created through the entire analysis process, as shown in fig. 4.2. The most complex analysis situations and needs can be handled by socalled super-operators, which in turn can contain a complete sub process. A well-known example is the cross-validation, which contains two sub processes. Fig. 4.3. The internal sub processes of a cross-validation A sub process is responsible for producing a model from the respective training data while the second sub process is given this model and any other generated results in order to apply these to the test data and measure the quality of the model in each case. A typical application is shown in fig. 4.3. 4.3 WEKA Weka is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in 1997[17]. It uses the GNU General Public 59

License (GPL). The figure of weka is shown in the figure 4.4. The software is written in the Java language and contains a GUI for interacting with data files and producing visual results. Figure 4.4: Weka GUI It also has a general API, so we can embed WEKA, like any other library, in our own applications to such things as automated server-side datamining tasks. For working of weka we not need the deep knowledge of data mining that s reason it is very popular data mining tool. Weka also provides the graphical user interface of the user and provides many facilities [4, 7]. 4.4 IBM SPSS STATISTICS 20 SPSS was first developed in 1948 by social science researchers at Stanford University as a tool to help them with quantitative research[18]. In fact, the acronym SPSS initially stood for Statistical Package for the Social Sciences. s with I M and T&T, the company (and its software) is simply known by its initials, in part as a testament to its diverse user base. Although the software is most heavily used in social science contexts particularly in psychology, political science and in academia it is also used in medicine, marketing, and many other contexts. SPSS is appealing to many users from 60

less technical and/or mathematical disciplines because it has a particularly user-friendly interface (Figure 4.5) consisting of an Excel-like spreadsheet for the data and menus and buttons for manipulations and analyses. Although this point and click interface makes SPSS particularly attractive for statistical computing novices, individuals who require greater statistical functionality may find the application limiting. Figure 4.5: IBM SPSS Statistics GUI Between 2009 and 2010, the premier vendor for SPSS was called PASW (Predictive Analytics SoftWare) Statistics. The company announced on July 28, 2009 that it was being acquired by IBM. Versions 19.0 and 20.0 are named IBM SPSS Statistics[18]. 61