CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1
Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20+ years Well-developed, maintained, supported Open source Windows, Mac and Unix versions http://www.cs.waikato.ac.nz/ml/weka/index.html Lots of help available at the wiki: http://weka.wikispaces.com/!2
Weka Weka is a very rich tool. Many classifiers, clusterers, etc. Any options for each algorithm Many tools for modifying the attributes Many meta-tools for comparing classifiers, generating models, etc. We are going to ignore most of it. This is a getting started exploration. Weka s defaults are generally reasonable.!3
A First Classifier For the first activity we are going to classify irises into three types, using a decision tree. The Weka version of Quinlan s algorithm is called J48. Go through the five steps of the tutorial at http://machinelearningmastery.com/how-torun-your-first-classifier-in-weka/ Note the accuracy, precision, recall, F measure and confusion matrix.!4
More Results After you have run the J48 classifier, you will have an entry in the Result list, which says right-click for options. Choose Visualize tree. What is the first decision? What is the smallest leaf size?!5
Seeing your data For a loaded file, the Preprocess tab shows information about the data. Number if instances, attributes, histogram class distribution pairs of attributes, statistics for each attribute. For iris: How many instances are there? How many attributes? What are mean and standard deviation for sepallength? Look at the histograms for all attributes paired with class. Which looks like a reasonable first choice for a decision tree? Which did Weka choose?!6
Explore Some More Load and examine another of the other datasets that are included with Weka. What did you choose? What attributes did they have? What kind? You can see the actual data from Weka by choosing Edit from the Preprocess tab For iris, what are the attribute headers? For your other dataset what are the headers?!7
Weka Data Format Weka uses a data format called ARFF. Attribute-Relation File Format It s text; you can look at it in an editor (or create it there.) Find the data directory in Weka, open the iris file. It should have two sections, Header and Data!8
ARFF Format Header Section: information about the data the name of the relation a list of the attributes (the columns in the data) their types Data Section comma-separated list, one line/instance Comments Begin with % Good idea to describe class, source, sometimes meanings of attributes!9
Header Section @RELATION declaration: names what we are talking about. String. Quote it if it includes spaces. @RELATION iris @ATTRIBUTE declarations: names each attribute and gives its type. One/attribute, including the class. Must start with a letter. Quote it if includes spaces. @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Irisversicolor,Iris-virginica}!10
Attribute Types Numeric. Can be real or integer. @ATTRIBUTE sepallength NUMERIC Nominal specification: named attributes {} @ATTRIBUTE color {red, green, blue} @ATTRIBUTE class {versicolor, setosa} String: arbitrary text @ATTRIBUTE emailbody string Date. Give date format. @ATTRIBUTE timestamp DATE "yyyy-mmdd"!11
Data section @DATA One line/instance, comma separated Example: For attributes: @Attribute sepallength NUMERIC @Attribute class {setosa, versicolor} @Attribute description STRING @Attribute timestamp DATE yyyy MM dd We might have instances 5.1, setosa, Lovely big flowers, 2014 09 10 4.9, setosa, Nice, 2014 06 03!12
Examples Look at some different files from Weka data: Iris. Detailed, very nice comments. Numeric and nominal attributes. Weather, nominal. No comments, all nominal. Reuters a string attribute.!13
Creating an ARFF file The syllabus has a link to the restaurant data as a.csv file. Download it and convert it into ARFF format. Run J48 on it. How does the tree compare to the one also given in the presentation earlier today? There is an obvious problem if you just add the format information and run J48 this will include example as an attribute. In the Preprocess tab, use the Remove button below the list of attributes to remove example and try J48 again.!14
Importing We don t actually have to go to the trouble of converting by hand. In Preprocess, for Open File, at the bottom of the Open window there is a File Format: choice. Choose CSV and import the original restaurant file. How does it look compared to the one you modified by hand?!15
There is a lot more We will look at a few more of the basic tools in Weka next lab. There is far more than we will get to. Feel free to explore.!16