WEKA homepage.

Similar documents
Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

WEKA Explorer User Guide for Version 3-5-5

WEKA Explorer User Guide for Version 3-4

Tutorial on Machine Learning Tools

WEKA KnowledgeFlow Tutorial for Version 3-5-6

COMP s1 - Getting started with the Weka Machine Learning Toolkit

Using Weka for Classification. Preparing a data file

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI

Short instructions on using Weka

Data Mining: STATISTICA

6.034 Design Assignment 2

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

The Explorer. chapter Getting started

Lab Exercise Three Classification with WEKA Explorer

Hands on Datamining & Machine Learning with Weka

SWETHA ENGINEERING COLLEGE (Approved by AICTE, New Delhi, Affiliated to JNTUA) DATA MINING USING WEKA

Data Mining With Weka A Short Tutorial

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats

Classification using Weka (Brain, Computation, and Neural Learning)

Classifica(on and Clustering with WEKA. Classifica*on and Clustering with WEKA

IV B. Tech I semester (JNTUH-R13)

WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange

Tanagra: An Evaluation

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

Weka ( )

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Dr. Prof. El-Bahlul Emhemed Fgee Supervisor, Computer Department, Libyan Academy, Libya

User Guide Written By Yasser EL-Manzalawy

Data mining: concepts and algorithms

Computing a Gain Chart. Comparing the computation time of data mining tools on a large dataset under Linux.

Data Mining Laboratory Manual

Weka: Practical machine learning tools and techniques with Java implementations

Course Outline. Writing Reports with Report Builder and SSRS Level 1 Course 55123: 2 days Instructor Led. About this course

I211: Information infrastructure II

PROJECT 1 DATA ANALYSIS (KR-VS-KP)

Data Science with R Decision Trees with Rattle

Epitopes Toolkit (EpiT) Yasser EL-Manzalawy August 30, 2016

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

Use Table Styles to format an entire table. Format a table. What do you want to do? Hide All

ClaNC: The Manual (v1.1)

AI32 Guide to Weka. Andrew Roberts 1st March 2005

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

A System for Managing Experiments in Data Mining. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment

Applied Regression Modeling: A Business Approach

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

S2 Text. Instructions to replicate classification results.

List of Exercises: Data Mining 1 December 12th, 2015

Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process

Community edition(open-source) Enterprise edition

Data Mining: Decision Trees

Fraud Detection Using Random Forest Algorithm

Tutorial Case studies

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Model s Performance Measures

What s New in Access 2007

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Data Mining: Scoring (Linear Regression)

Lab Exercise Two Mining Association Rule with WEKA Explorer

Decision Trees Using Weka and Rattle

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

COMM 391 Winter 2014 Term 1. Tutorial 1: Microsoft Excel - Creating Pivot Table

LABORATORY OF DATA SCIENCE. Reminds on Data Mining. Data Science & Business Informatics Degree

CHAPTER 4 METHODOLOGY AND TOOLS

Planning the Electrical Systems

Lecture 25: Review I

Writing Reports with Report Designer and SSRS 2014 Level 1

Module 4 : Spreadsheets

1. make a scenario and build a bayesian network + conditional probability table! use only nominal variable!

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Intrusion Detection Using Data Mining Technique (Classification)

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Applied Regression Modeling: A Business Approach

Predict the box office of US movies

Creating Booklets Using Microsoft Word 2013 on a PC

INSERT SUBTOTALS Database Exercise Sort the Data Department Department Data Tab Sort and Filter Group

Practical Data Mining COMP-321B. Tutorial 5: Article Identification

Function Algorithms: Linear Regression, Logistic Regression

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

LAD-WEKA Tutorial Version 1.0

COMP33111: Tutorial/lab exercise 2

33 Exploring Report Data using Drill Mode

1 Topic. Image classification using Knime.

Section 33: Advanced Charts

THE POSIT TOOLSET WITH GRAPHICAL USER INTERFACE

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

GeoVISTA Studio Tutorial. What is GeoVISTA Studio? Why is it part of the map making and visualization workshop?

alteryx training courses

A Program demonstrating Gini Index Classification

Step by Step GIS. Section 1

Chapter 3. Determining Effective Data Display with Charts

WEKA 1. Weka. Valeria Guevara. Thompson Rivers University

SAP InfiniteInsight 7.0 Modeler - Association Rules Getting Started Guide

Data Visualization via Conditional Formatting

CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE

Transcription:

WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/

Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of data preprocessing tools, learning algorithms and evaluation methods. Graphical user interfaces including data visualization. Environment for comparing learning algorithms. Book and development versions.

@relation data-relation-name @attribute attribute1 real @attribute attribute2 { nominal1, nominal2 } @attribute class { class1, class2 } @data 1, nominal1, class1 2, nominal2, class2 3,?, class1 4, nominal1, class2

The Weka GUI Chooser (class weka.gui.guichooser) provides a starting point for launching Weka s main GUI applications and supporting tools. If one prefers a MDI ( multiple document interface ) appearance, then this is provided by an alternative launcher called Main (class weka.gui.main). The GUI Chooser consists of four buttons one for each of the four major Weka applications and four menus.

An environment for performing experiments and conducting statistical tests between learning schemes. The Weka Experiment Environment enables the user to create, run, modify, and analyse experiments in a more convenient manner than is possible when processing the schemes individually. For example, the user can create an experiment that runs several schemes against a series of datasets and then analyse the results to determine if one of the schemes is (statistically) better than the other schemes.

This environment supports essentially the same functions as the Explorer but with a drag-anddrop interface. One advantage is that it supports incremental learning. The KnowledgeFlow presents a data-flow inspired interface to WEKA. The user can select WEKA components from a tool bar, place them on a layout canvas and connect them together in order to form a knowledge flow for processing and analyzing data. At present, all of WEKA s classifiers, filters, clusterers, loaders and savers are available in the KnowledgeFlow along with some extra tools.

Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line interface. The Simple CLI provides full access to all Weka classes, i.e., classifiers, filters, clusterers, etc., but without the hassle of the CLASSPATH (it facilitates the one, with which Weka was started). It offers a simple Weka shell with separated commandline and output.

An environment for exploring data with WEKA (the rest of this documentation deals with this application in more detail). At the very top of the window, just below the title bar, is a row of tabs. When the Explorer is first started only the first tab is active; the others are greyed out. This is because it is necessary to open (and potentially pre-process) a data set before starting to explore the data.

The tabs are as follows: Preprocess: Choose and modify the data being acted on. Classify: Train and test learning schemes that classify or perform regression. Cluster: Learn clusters for the data. Associate: Learn association rules for the data. Select attributes: Select the most relevant attributes in the data. Visualize: View an interactive 2D plot of the data.

Selecting a Classifier At the top of the classify section is the Classifier box. This box has a text field that gives the name of the currently selected classifier, and its options. The Choose button allows you to choose one of the classifiers that are available in WEKA. There are four test modes available: Use training set: The classifier is evaluated on how well it predicts the class of the instances it was trained on. Supplied test set: The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... Button brings up a dialog allowing you to choose the file to test on. Cross-validation: The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field. Percentage split: The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field. Once the classifier, test options and class have all been set, the learning process is started by clicking on the Start button. While the classifier is busy being trained, the little bird moves around.

When training is complete, several things happen. The Classifier output area to the right of the display is filled with text describing the results of training and testing. A new entry appears in the Result list box. The text in the Classifier output area is split into several sections: Run information: A list of information giving the learning scheme options, relation name, instances, attributes and test mode that were involved in the process. Classifier model (full training set): A textual representation of the classification model that was produced on the full training data. Summary: A list of statistics summarizing how accurately the classifier was able to predict the true class of the instances under the chosen test mode. Detailed Accuracy By Class: A more detailed per-class break down of the classifier s prediction accuracy. Confusion Matrix: Shows how many instances have been assigned to each class. Elements show the number of test examples whose actual class is the row and whose predicted class is the column.

More can be found in WEKA Manual: http://www.cs.waikato.ac.nz/ml/weka/docu mentation.html