CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Similar documents
Decision Trees In Weka,Data Formats

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Basic Concepts Weka Workbench and its terminology

The Explorer. chapter Getting started

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

Hands on Datamining & Machine Learning with Weka

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Using Weka for Classification. Preparing a data file

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Machine Learning Chapter 2. Input

Performance Analysis of Data Mining Classification Techniques

AI32 Guide to Weka. Andrew Roberts 1st March 2005

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Function Algorithms: Linear Regression, Logistic Regression

University of Florida CISE department Gator Engineering. Visualization

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

Machine Learning: Algorithms and Applications Mockup Examination

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Input: Concepts, Instances, Attributes

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Artificial Intelligence

Data Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05

COMP s1 - Getting started with the Weka Machine Learning Toolkit

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville

BerkeleyImageSeg User s Guide

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats

Sabbatical Leave Report

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI

ANOMALY DETECTION ON MACHINE LOG

Decision Trees Using Weka and Rattle

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

Data Mining Laboratory Manual

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Tanagra: An Evaluation

How to Remove Duplicate Rows in Excel

Data Mining With Weka A Short Tutorial

Classification using Weka (Brain, Computation, and Neural Learning)

WEKA Explorer User Guide for Version 3-4

Practical Data Mining COMP-321B. Tutorial 4: Preprocessing

CSC105, Introduction to Computer Science I. Introduction. Perl Directions NOTE : It is also a good idea to

SWETHA ENGINEERING COLLEGE (Approved by AICTE, New Delhi, Affiliated to JNTUA) DATA MINING USING WEKA

Lab Assignment 1. Part 1: Feature Selection, Cleaning, and Preprocessing to Construct a Data Source as Input

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

netzen - a software tool for the analysis and visualization of network data about

STAT 1291: Data Science

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Data organization. So what kind of data did we collect?

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Data Preparation. UROŠ KRČADINAC URL:

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Orange3-Prototypes Documentation. Biolab, University of Ljubljana

WEKA homepage.

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I

1 Anatomy of a Program 4

Back-to-Back Stem-and-Leaf Plots

Chapter 5. Repetition. Contents. Introduction. Three Types of Program Control. Two Types of Repetition. Three Syntax Structures for Looping in C++

LAB #1: DESCRIPTIVE STATISTICS WITH R

Website Development Komodo Editor and HTML Intro

Blackboard for Faculty: Grade Center (631) In this document:

PSS718 - Data Mining

WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange

A System for Managing Experiments in Data Mining. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment

MULTIVARIATE ANALYSIS USING R

Author Prediction for Turkish Texts

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Short instructions on using Weka

WEKA KnowledgeFlow Tutorial for Version 3-5-6

Nearest Neighbor Classification

6.034 Design Assignment 2

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

CSI Lab 02. Tuesday, January 21st

Java Program Structure and Eclipse. Overview. Eclipse Projects and Project Structure. COMP 210: Object-Oriented Programming Lecture Notes 1

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Variables are used to store data (numbers, letters, etc) in MATLAB. There are a few rules that must be followed when creating variables in MATLAB:

Weka VotedPerceptron & Attribute Transformation (1)

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute

The main differences with other open source reporting solutions such as JasperReports or mondrian are:

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

COMP s1 Lecture 1

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25

COMP33111: Tutorial/lab exercise 2

ECO375 Tutorial 1 Introduction to Stata

Page 1. Graphical and Numerical Statistics

Tutorial for the R Statistical Package

TUBE: Command Line Program Calls

MeltLab Reporting Text, CSV or Excel

BaSICS OF excel By: Steven 10.1

molegro data modeller

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

Transcription:

CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1

Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20+ years Well-developed, maintained, supported Open source Windows, Mac and Unix versions http://www.cs.waikato.ac.nz/ml/weka/index.html Lots of help available at the wiki: http://weka.wikispaces.com/!2

Weka Weka is a very rich tool. Many classifiers, clusterers, etc. Any options for each algorithm Many tools for modifying the attributes Many meta-tools for comparing classifiers, generating models, etc. We are going to ignore most of it. This is a getting started exploration. Weka s defaults are generally reasonable.!3

A First Classifier For the first activity we are going to classify irises into three types, using a decision tree. The Weka version of Quinlan s algorithm is called J48. Go through the five steps of the tutorial at http://machinelearningmastery.com/how-torun-your-first-classifier-in-weka/ Note the accuracy, precision, recall, F measure and confusion matrix.!4

More Results After you have run the J48 classifier, you will have an entry in the Result list, which says right-click for options. Choose Visualize tree. What is the first decision? What is the smallest leaf size?!5

Seeing your data For a loaded file, the Preprocess tab shows information about the data. Number if instances, attributes, histogram class distribution pairs of attributes, statistics for each attribute. For iris: How many instances are there? How many attributes? What are mean and standard deviation for sepallength? Look at the histograms for all attributes paired with class. Which looks like a reasonable first choice for a decision tree? Which did Weka choose?!6

Explore Some More Load and examine another of the other datasets that are included with Weka. What did you choose? What attributes did they have? What kind? You can see the actual data from Weka by choosing Edit from the Preprocess tab For iris, what are the attribute headers? For your other dataset what are the headers?!7

Weka Data Format Weka uses a data format called ARFF. Attribute-Relation File Format It s text; you can look at it in an editor (or create it there.) Find the data directory in Weka, open the iris file. It should have two sections, Header and Data!8

ARFF Format Header Section: information about the data the name of the relation a list of the attributes (the columns in the data) their types Data Section comma-separated list, one line/instance Comments Begin with % Good idea to describe class, source, sometimes meanings of attributes!9

Header Section @RELATION declaration: names what we are talking about. String. Quote it if it includes spaces. @RELATION iris @ATTRIBUTE declarations: names each attribute and gives its type. One/attribute, including the class. Must start with a letter. Quote it if includes spaces. @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Irisversicolor,Iris-virginica}!10

Attribute Types Numeric. Can be real or integer. @ATTRIBUTE sepallength NUMERIC Nominal specification: named attributes {} @ATTRIBUTE color {red, green, blue} @ATTRIBUTE class {versicolor, setosa} String: arbitrary text @ATTRIBUTE emailbody string Date. Give date format. @ATTRIBUTE timestamp DATE "yyyy-mmdd"!11

Data section @DATA One line/instance, comma separated Example: For attributes: @Attribute sepallength NUMERIC @Attribute class {setosa, versicolor} @Attribute description STRING @Attribute timestamp DATE yyyy MM dd We might have instances 5.1, setosa, Lovely big flowers, 2014 09 10 4.9, setosa, Nice, 2014 06 03!12

Examples Look at some different files from Weka data: Iris. Detailed, very nice comments. Numeric and nominal attributes. Weather, nominal. No comments, all nominal. Reuters a string attribute.!13

Creating an ARFF file The syllabus has a link to the restaurant data as a.csv file. Download it and convert it into ARFF format. Run J48 on it. How does the tree compare to the one also given in the presentation earlier today? There is an obvious problem if you just add the format information and run J48 this will include example as an attribute. In the Preprocess tab, use the Remove button below the list of attributes to remove example and try J48 again.!14

Importing We don t actually have to go to the trouble of converting by hand. In Preprocess, for Open File, at the bottom of the Open window there is a File Format: choice. Choose CSV and import the original restaurant file. How does it look compared to the one you modified by hand?!15

There is a lot more We will look at a few more of the basic tools in Weka next lab. There is far more than we will get to. Feel free to explore.!16