Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

Similar documents
Data Mining: STATISTICA

I211: Information infrastructure II

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

WEKA homepage.

Tutorial on Machine Learning Tools

Didacticiel - Études de cas

Data mining: concepts and algorithms

Orange3 Educational Add-on Documentation

Using Weka for Classification. Preparing a data file

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Fraud Detection Using Random Forest Algorithm

NMLRG #4 meeting in Berlin. Mobile network state characterization and prediction. P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3)

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Data Mining and Knowledge Discovery: Practice Notes

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

Didacticiel - Études de cas. Comparison of the implementation of the CART algorithm under Tanagra and R (rpart package).

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

SAS Visual Analytics 8.1: Getting Started with Analytical Models

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

From Building Better Models with JMP Pro. Full book available for purchase here.

Function Algorithms: Linear Regression, Logistic Regression

Data Mining and Knowledge Discovery Practice notes Numeric prediction and descriptive DM

Tutorial Case studies

KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland

Classification using Weka (Brain, Computation, and Neural Learning)

Pre-Requisites: CS2510. NU Core Designations: AD

Data Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?

Machine Learning Part 1

CS145: INTRODUCTION TO DATA MINING

Didacticiel Études de cas

Multi-Class Segmentation with Relative Location Prior

Introduction to Artificial Intelligence

Visualization and text mining of patent and non-patent data

A Comparison of Decision Tree Algorithms For UCI Repository Classification

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Chapter 8 The C 4.5*stat algorithm

Epitopes Toolkit (EpiT) Yasser EL-Manzalawy August 30, 2016

Comparative Study of Clustering Algorithms using R

Seminars of Software and Services for the Information Society

Microsoft Office Excel Create a worksheet group. A worksheet group. Tutorial 6 Working With Multiple Worksheets and Workbooks

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats

Data Mining Laboratory Manual

1 Topic. Image classification using Knime.

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

From network-level measurements to expected Quality of Experience. the Skype use case

Types of Data Mining

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Beginning Excel. Revised 4/19/16

Stat 602X Exam 2 Spring 2011

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Performance Analysis of Data Mining Classification Techniques

Lecture 25: Review I

Random Forest A. Fornaser

Data Mining With Weka A Short Tutorial

1. Right-click the worksheet tab you want to rename. The worksheet menu appears. 2. Select Rename.

CS249: ADVANCED DATA MINING

VIDAEXPERT: DATA ANALYSIS Here is the Statistics button.

ONLINE TUTORIAL T1: TEXT MINING PROJECT

Single click Catalogs Pull down File menu Click on make alias. Drag the alias to the desktop. Click on the application Pull down File to make alias

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Demystifying Deep Learning

Applied Machine Learning

Classification and Regression

2015 The MathWorks, Inc. 1

Intro to Artificial Intelligence

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Data Mining and Knowledge Discovery: Practice Notes

Demystifying Deep Learning

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Weka ( )

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining Concepts & Techniques

Model s Performance Measures

For continuous responses: the Actual by Predicted plot how well the model fits the models. For a perfect fit, all the points would be on the diagonal.

Summary. Machine Learning: Introduction. Marcin Sydow

Data Mining Concepts

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

A Brief Introduction to Data Mining

Data analysis case study using R for readily available data set using any one machine learning Algorithm

2. create the workbook file

The Explorer. chapter Getting started

Performance Evaluation of Various Classification Algorithms

KNIME Enalos+ Molecular Descriptor nodes

As a reference, please find a version of the Machine Learning Process described in the diagram below.

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

CISC 4631 Data Mining

Mining Web Data. Lijun Zhang

CLASSIFICATION JELENA JOVANOVIĆ. Web:

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

CS6220: DATA MINING TECHNIQUES

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

WEKA Explorer User Guide for Version 3-4

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Data Imbalance Problem solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process

Transcription:

Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1

Prepare the Data Statistica can read from Excel,.txt and many other types of files Compared with WEKA, Statistica is much easier in terms of data preparing Open an Excel File Click the Import selected sheet to Spreadsheet Select the desired Excel sheet where your data is stored Get variable names from the first row 2

Open an Excel File Change variable type Open an Excel File Change variable type 3

Classification and Regression C&RT Boosting tree Neural Networks C&RT Classification Iris data is used as a example data set 4

C&RT Classification Click Data Mining menu and find the Interactive Trees C&RT Classification View the final tree and understand the results 5

C&RT---Regression Use the CPU data set and select the regression analysis Regression tree structure C&RT---Regression 6

C&RT---Regression Pr redicted values Boosting tree Classification In Data Mining menu and find the Boosted tree classifier and regression 7

Boosting tree Classification See the results and predictor s importance Boosting tree Classification See the results and predictor s importance 8

CPU data set Boosting tree Regression Boosting tree Classification See the results and predictor s importance Pr redicted values 9

Boosting tree Classification See the results and predictor s importance Boosting tree Classification See the results and predictor s importance 10

Neural Networks Classification In Data Mining menu and find the Automated Neural Networks Neural Networks Classification Choose Classification, then select variables 11

Neural Networks Classification Statistica will try a set of different neural networks and keep the best ones Neural Networks Classification See the classification results 12

Neural Networks Classification See the classification results---predictions Neural Networks Classification See the classification results---predictions 13

Neural Networks Classification See the classification results---confusion matrix Neural Networks Regression CPU data set 14

Neural Networks Regression CPU data set, select variables Neural Networks Regression Training and results 15

Neural Networks Regression Predictions Neural Networks Regression Some statistics about the predictions 16

Clustering Use the Deere data set Clustering Select k-means and choose the variables 17

Clustering Choose the distance metrics and initial cluster centers 5 clusters and see the results Clustering 18

Centroids (cluster means) Clustering Clustering Members and their distance to the centroids 19

Use the Deere data set Association rules Association rules Select variables and set up proper parameters 20

Association rules See rules Graphic User Interface Divide CPU data into training and testing data set 21

Graphic User Interface Graphic User Interface Choose different algorithms 22

Graphic User Interface Insert the selected data mining algorithms into workspace Graphic User Interface Select data sources 23

Graphic User Interface Specify whether the data is used to build the model or used as a testing set Graphic User Interface Connect the data with data mining algorithms 24

Graphic User Interface Connect the data with data mining algorithms Graphic User Interface Set up deployment, double click the data mining algorithm icon 25

Graphic User Interface Click Run button Graphic User Interface See the deployment code by double click the icons in Reports section C code 26

Graphic User Interface Test the learnt models by testing data set First disable the connections between training i data set and dthe data mining i algorithms Connect the testing data set with the data mining algorithms Graphic User Interface Test the learnt models 27

Graphic User Interface See the prediction results 28