Lab Exercise Three Classification with WEKA Explorer

Similar documents
Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

Using Weka for Classification. Preparing a data file

6.034 Design Assignment 2

Data Preparation. UROŠ KRČADINAC URL:

WEKA homepage.

Practical Data Mining COMP-321B. Tutorial 4: Preprocessing

Data Preparation. Nikola Milikić. Jelena Jovanović

Lab Exercise Two Mining Association Rule with WEKA Explorer

1. make a scenario and build a bayesian network + conditional probability table! use only nominal variable!

Data Engineering. Data preprocessing and transformation

Practical Data Mining COMP-321B. Tutorial 5: Article Identification

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

User Guide Written By Yasser EL-Manzalawy

Text classification with Naïve Bayes. Lab 3

Data Mining With Weka A Short Tutorial

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Tutorial Case studies

Slides for Data Mining by I. H. Witten and E. Frank

Machine Learning Techniques for Data Mining

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Introduction to Artificial Intelligence

Weka ( )

SSV Criterion Based Discretization for Naive Bayes Classifiers

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats

CSE4334/5334 DATA MINING

Homework2 Chapter4 exersices Hongying Du

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Minimum Spanning Tree

List of Exercises: Data Mining 1 December 12th, 2015

Contents. Preface to the Second Edition

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

WEKA Explorer User Guide for Version 3-4

Data mining: concepts and algorithms

Nearest neighbor classification DSE 220

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

SOCIAL MEDIA MINING. Data Mining Essentials

Chapter 8 The C 4.5*stat algorithm

Subject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

CHAPTER 6 EXPERIMENTS

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute

Computer Vision. Exercise Session 10 Image Categorization

Decision Trees Using Weka and Rattle

Topics In Feature Selection

Predict the box office of US movies

The Explorer. chapter Getting started

Machine Learning with MATLAB --classification

Short instructions on using Weka

2013, IJARCSSE All Rights Reserved Page 628

Machine Learning Part 1

Project 1 Announcement March 22th, 2016

Tutorial on Machine Learning. Impact of dataset composition on models performance. G. Marcou, N. Weill, D. Horvath, D. Rognan, A.

WRAPPER feature selection method with SIPINA and R (RWeka package). Comparison with a FILTER approach implemented into TANAGRA.

Table Of Contents: xix Foreword to Second Edition

Data Mining and Knowledge Discovery: Practice Notes

IMPLEMENTATION OF CLASSIFICATION ALGORITHMS USING WEKA NAÏVE BAYES CLASSIFIER

Classifica(on and Clustering with WEKA. Classifica*on and Clustering with WEKA

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Probabilistic Classifiers DWML, /27

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

WEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov

3 Virtual attribute subsetting

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

AI32 Guide to Weka. Andrew Roberts 1st March 2005

Data Mining and Knowledge Discovery: Practice Notes

Epitopes Toolkit (EpiT) Yasser EL-Manzalawy August 30, 2016

Basic Data Mining Technique

CHAPTER 5 CONTRIBUTORY ANALYSIS OF NSL-KDD CUP DATA SET

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

CLASSIFICATION JELENA JOVANOVIĆ. Web:

TUBE: Command Line Program Calls

A Soft-Computing Approach to Knowledge Flow Synthesis and Optimization

CIS192 Python Programming

REMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

Tutorial on Machine Learning Tools

A User s Guide to Graphical-Model-based Multivariate Analysis

Python With Data Science

Lab A: Using Windows 2000 Help

Contents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2

New ensemble methods for evolving data streams

BerkeleyImageSeg User s Guide

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Classification Algorithms in Data Mining

CP365 Artificial Intelligence

Data Mining and Knowledge Discovery: Practice Notes

31 What s New in IGSS V9. Speaker Notes INSIGHT AND OVERVIEW

DUE By 11:59 PM on Thursday March 15 via make turnitin on acad. The standard 10% per day deduction for late assignments applies.

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Performance Evaluation of Various Classification Algorithms

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Lab 11-1 Creating a Knowledge Base

Data Mining and Knowledge Discovery: Practice Notes

Supervised and Unsupervised Learning (II)

CS 539 Machine Learning Project 1

Transcription:

Lab Exercise Three Classification with WEKA Explorer 1. Fire up WEKA to get the GUI Chooser panel. Select Explorer from the four choices on the right side. 2. We are on Preprocess now. Click the Open file button to bring up a standard dialog through which you can select a file. Choose the customer_labthree.cvs file. 3. To perform classification with Weka, the last attribute in the dataset is taken as class label and it should be nominal. Since the last attribute of data set customer_labthree.cvs is numeric type (1/0), we should convert it to nominal type in next step.

4. Unsupervised attribute filter NumericToNominal is chosen to perform this conversion. Since we would like to convert the last attribute only, change the attributeindices to last. 5. After applying the filter, the last attribute becomes nominal type and it is taken as the class label for the dataset now the data set is visualized in two colors.

6. If the class attribute is not the last attribute, you could set it in edit window. 7. You should also to convert the types of other attributes. Attributes region, townsize, agecat, jobcat, empcat, card2tenurecat, and internet are all nominal values, however, they are treated as numeric type by Weka. And attributes gender, union, equip, wireless, called, callwait, forward, confer, ebill are binary values,

they are treated as numeric types as well. NumericToNominal filter should be applied to convert them. You could also normalize attribute educat to [0, 1] since education categories are rankings. Attribute Selection - Since not all attributes are relevant to the classification job, you should perform attribute selection before training the classifier. 8. You could remove irrelevant attributes by hand. For example, the first attribute custid should be removed. Select it and click Remove button to remove it.

9. You also could run automatic attribute selection. We have introduced two methods of evaluating attributes individually InfoGainAttributeEval and ChiSquaredAttributeEval. The default attribute selection method of Weka is CfsSubsetEval, which evaluates subsets of attributes. 10. To use evaluator InfoGainAttributeEval, a search method Ranker is selected to rank all attributes regarding the evaluation results. We use the full dataset as training dataset. The results show that the first 8 attributes are good.

11. Run feature selection the second time with CfsSubsetEval and BestFirst search method. Compare results of two feature selection methods.

12. If you decide to reduce the dataset by removing unimportant attributes, you could choose to save the reduced dataset by right-click the Result list. Save the file name as customer.arff. Naïve Bayes Classifier: bayes/naïvebayes 13. Open the saved processed data file customer.arff and then click Classify Tab on top of the window. Click Choose button under Classifier. The drop down list of all classifiers show. Choose NaiveBayes from bayes folder.

14. Left click the field of Classifier, choose Show Property from the drop down list. The property window of NaiveBayes opens, if you do not want to use Normal Distribution for numeric data, set usekernelestimator to ture; You also could perform supervised discretization on numeric data by setting usesuperviseddiscretization to ture. Click OK button to save all the settings. 15. To partition the training data set and test data set, choose 10-fold crossvalidation.

16. Click Start button on the left of the window, the algorithm begins to run. The output is showing in the right window. parameters of normal distributions for numeric attributes frequency counts of nominal values NaiveBayes avoids zero frequencies by applying the Laplace correction. Accuracy

K-Nearest-Neighbor: lazy/ibk 17. We would like to perform K-Nearest-Neighbor classification on the same dataset. You could try different K and see what value gives a better result. Compare the results with Naïve Bayes classifier.

Decision Tree: trees/j48 (Implementing C4.5) 18. We would like to build a Decision Tree model on the same given training data set. Take all default values of the parameters.

19. To visualize the decision tree we build, right-click the Result list item for J48. 20. All trained classification models could be saved by right-click the Result list items.

Ensemble (Metaleaning) classifier.meta.voting 21. You could combine multiple classifiers to perfrom an ensemble method.