CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Size: px
Start display at page:

Download "CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek"

Transcription

1 CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1

2 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20+ years Well-developed, maintained, supported Open source Windows, Mac and Unix versions Lots of help available at the wiki:

3 Weka Weka is a very rich tool. Many classifiers, clusterers, etc. Any options for each algorithm Many tools for modifying the attributes Many meta-tools for comparing classifiers, generating models, etc. We are going to ignore most of it. This is a getting started exploration. Weka s defaults are generally reasonable.!3

4 A First Classifier For the first activity we are going to classify irises into three types, using a decision tree. The Weka version of Quinlan s algorithm is called J48. Go through the five steps of the tutorial at Note the accuracy, precision, recall, F measure and confusion matrix.!4

5 More Results After you have run the J48 classifier, you will have an entry in the Result list, which says right-click for options. Choose Visualize tree. What is the first decision? What is the smallest leaf size?!5

6 Seeing your data For a loaded file, the Preprocess tab shows information about the data. Number if instances, attributes, histogram class distribution pairs of attributes, statistics for each attribute. For iris: How many instances are there? How many attributes? What are mean and standard deviation for sepallength? Look at the histograms for all attributes paired with class. Which looks like a reasonable first choice for a decision tree? Which did Weka choose?!6

7 Explore Some More Load and examine another of the other datasets that are included with Weka. What did you choose? What attributes did they have? What kind? You can see the actual data from Weka by choosing Edit from the Preprocess tab For iris, what are the attribute headers? For your other dataset what are the headers?!7

8 Weka Data Format Weka uses a data format called ARFF. Attribute-Relation File Format It s text; you can look at it in an editor (or create it there.) Find the data directory in Weka, open the iris file. It should have two sections, Header and Data!8

9 ARFF Format Header Section: information about the data the name of the relation a list of the attributes (the columns in the data) their types Data Section comma-separated list, one line/instance Comments Begin with % Good idea to describe class, source, sometimes meanings of attributes!9

10 Header declaration: names what we are talking about. String. Quote it if it includes declarations: names each attribute and gives its type. One/attribute, including the class. Must start with a letter. Quote it if includes sepallength petalwidth class {Iris-setosa,Irisversicolor,Iris-virginica}!10

11 Attribute Types Numeric. Can be real or sepallength NUMERIC Nominal specification: named attributes color {red, green, class {versicolor, setosa} String: arbitrary body string Date. Give date timestamp DATE "yyyy-mmdd"!11

12 Data One line/instance, comma separated Example: For sepallength class {setosa, description timestamp DATE yyyy MM dd We might have instances 5.1, setosa, Lovely big flowers, , setosa, Nice, !12

13 Examples Look at some different files from Weka data: Iris. Detailed, very nice comments. Numeric and nominal attributes. Weather, nominal. No comments, all nominal. Reuters a string attribute.!13

14 Creating an ARFF file The syllabus has a link to the restaurant data as a.csv file. Download it and convert it into ARFF format. Run J48 on it. How does the tree compare to the one also given in the presentation earlier today? There is an obvious problem if you just add the format information and run J48 this will include example as an attribute. In the Preprocess tab, use the Remove button below the list of attributes to remove example and try J48 again.!14

15 Importing We don t actually have to go to the trouble of converting by hand. In Preprocess, for Open File, at the bottom of the Open window there is a File Format: choice. Choose CSV and import the original restaurant file. How does it look compared to the one you modified by hand?!15

16 There is a lot more We will look at a few more of the basic tools in Weka next lab. There is far more than we will get to. Feel free to explore.!16

Decision Trees In Weka,Data Formats

Decision Trees In Weka,Data Formats CS 4510/9010 Applied Machine Learning 1 Decision Trees In Weka,Data Formats Paula Matuszek Fall, 2016 J48: Decision Tree in Weka 2 NAME: weka.classifiers.trees.j48 SYNOPSIS Class for generating a pruned

More information

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

More information

Basic Concepts Weka Workbench and its terminology

Basic Concepts Weka Workbench and its terminology Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know

More information

The Explorer. chapter Getting started

The Explorer. chapter Getting started chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different

More information

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016 ESERCITAZIONE PIATTAFORMA WEKA Croce Danilo Web Mining & Retrieval 2015/2016 Outline Weka: a brief recap ARFF Format Performance measures Confusion Matrix Precision, Recall, F1, Accuracy Question Classification

More information

Hands on Datamining & Machine Learning with Weka

Hands on Datamining & Machine Learning with Weka Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze

More information

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?

More information

Using Weka for Classification. Preparing a data file

Using Weka for Classification. Preparing a data file Using Weka for Classification Preparing a data file Prepare a data file in CSV format. It should have the names of the features, which Weka calls attributes, on the first line, with the names separated

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Machine Learning Chapter 2. Input

Machine Learning Chapter 2. Input Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

AI32 Guide to Weka. Andrew Roberts 1st March 2005

AI32 Guide to Weka. Andrew Roberts   1st March 2005 AI32 Guide to Weka Andrew Roberts http://www.comp.leeds.ac.uk/andyr 1st March 2005 1 Introduction Weka is an excellent system for learning about machine learning techniques. Of course, it is a generic

More information

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing) k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors

More information

Function Algorithms: Linear Regression, Logistic Regression

Function Algorithms: Linear Regression, Logistic Regression CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html

More information

University of Florida CISE department Gator Engineering. Visualization

University of Florida CISE department Gator Engineering. Visualization Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to

More information

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer In part from: Yizhou Sun 2008 What is WEKA? Waikato Environment for Knowledge Analysis It s a data mining/machine learning tool developed by Department of Computer Science,,

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Input: Concepts, Instances, Attributes

Input: Concepts, Instances, Attributes Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Data Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05

Data Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05 Data Mining Tools Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean-Gabriel.Ganascia@lip6.fr DATA BASES Data mining Extraction Data mining Interpretation/

More information

COMP s1 - Getting started with the Weka Machine Learning Toolkit

COMP s1 - Getting started with the Weka Machine Learning Toolkit COMP9417 16s1 - Getting started with the Weka Machine Learning Toolkit Last revision: Thu Mar 16 2016 1 Aims This introduction is the starting point for Assignment 1, which requires the use of the Weka

More information

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts

6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts 6 Subscripting 6.1 Basics of Subscripting For objects that contain more than one element (vectors, matrices, arrays, data frames, and lists), subscripting is used to access some or all of those elements.

More information

BL5229: Data Analysis with Matlab Lab: Learning: Clustering

BL5229: Data Analysis with Matlab Lab: Learning: Clustering BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will

More information

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

What is KNIME? workflows nodes standard data mining, data analysis data manipulation KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and

More information

ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville

ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville Tutorial Outline Overview of the Mining System Architecture Data Formats Components Using

More information

BerkeleyImageSeg User s Guide

BerkeleyImageSeg User s Guide BerkeleyImageSeg User s Guide 1. Introduction Welcome to BerkeleyImageSeg! This is designed to be a lightweight image segmentation application, easy to learn and easily automated for repetitive processing

More information

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association

More information

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats

Data Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,

More information

Sabbatical Leave Report

Sabbatical Leave Report Zdravko Markov, Ph.D. Phone: (860) 832-2711 Associate Professor of Computer Science E-mail: markovz@ccsu.edu Central Connecticut State University URL: http://www.cs.ccsu.edu/~markov/ Sabbatical Leave Report

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:

More information

2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal

2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal 2012 Fall, CENG 514 Data Mining, Homework 3 Key by Dilek Önal SOLUTIONS Task 1 (Data conversion 15 points, Weka commands 10 points = 25 points) You should have implemented a piece of code which converts

More information

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012

An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences. Scott C Merrill. September 5 th, 2012 An Introductory Tutorial: Learning R for Quantitative Thinking in the Life Sciences Scott C Merrill September 5 th, 2012 Chapter 2 Additional help tools Last week you asked about getting help on packages.

More information

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI

DATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI DATA ANALYSIS WITH WEKA Author: Nagamani Mutteni Asst.Professor MERI Topic: Data Analysis with Weka Course Duration: 2 Months Objective: Everybody talks about Data Mining and Big Data nowadays. Weka is

More information

ANOMALY DETECTION ON MACHINE LOG

ANOMALY DETECTION ON MACHINE LOG ANOMALY DETECTION ON MACHINE LOG Data Mining Prof. Sunnie S Chung Ankur Pandit 2619650 Raw Data: NASA HTTP access logs It contain two month's of all HTTP requests to the NASA Kennedy Space Center WWW server

More information

Decision Trees Using Weka and Rattle

Decision Trees Using Weka and Rattle 9/28/2017 MIST.6060 Business Intelligence and Data Mining 1 Data Mining Software Decision Trees Using Weka and Rattle We will mainly use Weka ((http://www.cs.waikato.ac.nz/ml/weka/), an open source datamining

More information

STATS Data Analysis using Python. Lecture 15: Advanced Command Line

STATS Data Analysis using Python. Lecture 15: Advanced Command Line STATS 700-002 Data Analysis using Python Lecture 15: Advanced Command Line Why UNIX/Linux? As a data scientist, you will spend most of your time dealing with data Data sets never arrive ready to analyze

More information

Data Mining Laboratory Manual

Data Mining Laboratory Manual Data Mining Laboratory Manual Department of Information Technology MLR INSTITUTE OF TECHNOLOGY Marri Laxman Reddy Avenue, Dundigal, Gandimaisamma (M), R.R. Dist. Data Mining Laboratory Manual Prepared

More information

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Non-trivial extraction of implicit, previously unknown and potentially useful information from data CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously

More information

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21 Reference Guide Adding a Generic File Store - Importing From a Local or Network Folder Page 1 of 21 Adding a Generic File Store TABLE OF CONTENTS Background First Things First The Process Creating the

More information

Tanagra: An Evaluation

Tanagra: An Evaluation Tanagra: An Evaluation Jessica Enright Jonathan Klippenstein November 5th, 2004 1 Introduction to Tanagra Tanagra was written as an aid to education and research on data mining by Ricco Rakotomalala [1].

More information

How to Remove Duplicate Rows in Excel

How to Remove Duplicate Rows in Excel How to Remove Duplicate Rows in Excel http://www.howtogeek.com/198052/how-to-remove-duplicate-rows-in-excel/ When you are working with spreadsheets in Microsoft Excel and accidentally copy rows, or if

More information

Data Mining With Weka A Short Tutorial

Data Mining With Weka A Short Tutorial Data Mining With Weka A Short Tutorial Dr. Wenjia Wang School of Computing Sciences University of East Anglia (UEA), Norwich, UK Content 1. Introduction to Weka 2. Data Mining Functions and Tools 3. Data

More information

Classification using Weka (Brain, Computation, and Neural Learning)

Classification using Weka (Brain, Computation, and Neural Learning) LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima

More information

WEKA Explorer User Guide for Version 3-4

WEKA Explorer User Guide for Version 3-4 WEKA Explorer User Guide for Version 3-4 Richard Kirkby Eibe Frank July 28, 2010 c 2002-2010 University of Waikato This guide is licensed under the GNU General Public License version 2. More information

More information

Practical Data Mining COMP-321B. Tutorial 4: Preprocessing

Practical Data Mining COMP-321B. Tutorial 4: Preprocessing Practical Data Mining COMP-321B Tutorial 4: Preprocessing Shevaun Ryan Mark Hall June 30, 2008 c 2006 University of Waikato 1 Introduction For this tutorial we will be using the Preprocess panel, the Classify

More information

CSC105, Introduction to Computer Science I. Introduction. Perl Directions NOTE : It is also a good idea to

CSC105, Introduction to Computer Science I. Introduction. Perl Directions NOTE : It is also a good idea to CSC105, Introduction to Computer Science Lab03: Introducing Perl I. Introduction. [NOTE: This material assumes that you have reviewed Chapters 1, First Steps in Perl and 2, Working With Simple Values in

More information

SWETHA ENGINEERING COLLEGE (Approved by AICTE, New Delhi, Affiliated to JNTUA) DATA MINING USING WEKA

SWETHA ENGINEERING COLLEGE (Approved by AICTE, New Delhi, Affiliated to JNTUA) DATA MINING USING WEKA SWETHA ENGINEERING COLLEGE (Approved by AICTE, New Delhi, Affiliated to JNTUA) DATA MINING USING WEKA LAB RECORD N99A49G70E68S51H Data Mining using WEKA 1 WEKA [ Waikato Environment for Knowledge Analysis

More information

Lab Assignment 1. Part 1: Feature Selection, Cleaning, and Preprocessing to Construct a Data Source as Input

Lab Assignment 1. Part 1: Feature Selection, Cleaning, and Preprocessing to Construct a Data Source as Input CIS 660 Data Mining Sunnie Chung Lab Assignment 1 The Marketing department of Adventure Works Cycles wants to increase sales by targeting specific customers for a mailing campaign. The company's database

More information

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline

Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow

More information

netzen - a software tool for the analysis and visualization of network data about

netzen - a software tool for the analysis and visualization of network data about Architect and main contributor: Dr. Carlos D. Correa Other contributors: Tarik Crnovrsanin and Yu-Hsuan Chan PI: Dr. Kwan-Liu Ma Visualization and Interface Design Innovation (ViDi) research group Computer

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Data organization. So what kind of data did we collect?

Data organization. So what kind of data did we collect? Data organization Suppose we go out and collect some data. What do we do with it? First we need to figure out what kind of data we have. To illustrate, let s do a simple experiment and collect the height

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Data Preparation. UROŠ KRČADINAC URL:

Data Preparation. UROŠ KRČADINAC   URL: Data Preparation UROŠ KRČADINAC EMAIL: uros@krcadinac.com URL: http://krcadinac.com Normalization Normalization is the process of rescaling the values to a specific value scale (typically 0-1) Standardization

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Orange3-Prototypes Documentation. Biolab, University of Ljubljana

Orange3-Prototypes Documentation. Biolab, University of Ljubljana Biolab, University of Ljubljana Dec 17, 2018 Contents 1 Widgets 1 2 Indices and tables 11 i ii CHAPTER 1 Widgets 1.1 Contingency Table Construct a contingency table from given data. Inputs Data input

More information

WEKA homepage.

WEKA homepage. WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of

More information

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I BASIC COMPUTATION x public static void main(string [] args) Fundamentals of Computer Science I Outline Using Eclipse Data Types Variables Primitive and Class Data Types Expressions Declaration Assignment

More information

1 Anatomy of a Program 4

1 Anatomy of a Program 4 Objectives Anatomy of a Program 1E3 To understand the role of basic C++ program elements. To provide a skeleton from which to generate simple programs. 1 Anatomy of a Program 1 1 Anatomy of a Program 2

More information

Back-to-Back Stem-and-Leaf Plots

Back-to-Back Stem-and-Leaf Plots Chapter 195 Back-to-Back Stem-and-Leaf Plots Introduction This procedure generates a stem-and-leaf plot of a batch of data. The stem-and-leaf plot is similar to a histogram and its main purpose is to show

More information

Chapter 5. Repetition. Contents. Introduction. Three Types of Program Control. Two Types of Repetition. Three Syntax Structures for Looping in C++

Chapter 5. Repetition. Contents. Introduction. Three Types of Program Control. Two Types of Repetition. Three Syntax Structures for Looping in C++ Repetition Contents 1 Repetition 1.1 Introduction 1.2 Three Types of Program Control Chapter 5 Introduction 1.3 Two Types of Repetition 1.4 Three Structures for Looping in C++ 1.5 The while Control Structure

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Website Development Komodo Editor and HTML Intro

Website Development Komodo Editor and HTML Intro Website Development Komodo Editor and HTML Intro Introduction In this Lecture and Tour we will cover: o Use of the editor that will be used for the Website Development and Javascript Programming sections

More information

Blackboard for Faculty: Grade Center (631) In this document:

Blackboard for Faculty: Grade Center (631) In this document: 1 Blackboard for Faculty: Grade Center (631) 632-2777 Teaching, Learning + Technology Stony Brook University In this document: blackboard@stonybrook.edu http://it.stonybrook.edu 1. What is the Grade Center?..

More information

PSS718 - Data Mining

PSS718 - Data Mining Lecture 5 - Hacettepe University October 23, 2016 Data Issues Improving the performance of a model To improve the performance of a model, we mostly improve the data Source additional data Clean up the

More information

WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange

WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange WEKA Waikato Environment for Knowledge Analysis Performing Classification Experiments Prof. Pietro Ducange 1 The Knowledge Flow Interface It provides an alternative to the Explorer interface The user can

More information

A System for Managing Experiments in Data Mining. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment

A System for Managing Experiments in Data Mining. A Thesis. Presented to. The Graduate Faculty of The University of Akron. In Partial Fulfillment A System for Managing Experiments in Data Mining A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Greeshma

More information

MULTIVARIATE ANALYSIS USING R

MULTIVARIATE ANALYSIS USING R MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for

More information

Author Prediction for Turkish Texts

Author Prediction for Turkish Texts Ziynet Nesibe Computer Engineering Department, Fatih University, Istanbul e-mail: admin@ziynetnesibe.com Abstract Author Prediction for Turkish Texts The main idea of authorship categorization is to specify

More information

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

More information

Short instructions on using Weka

Short instructions on using Weka Short instructions on using Weka G. Marcou 1 Weka is a free open source data mining software, based on a Java data mining library. Free alternatives to Weka exist as for instance R and Orange. The current

More information

WEKA KnowledgeFlow Tutorial for Version 3-5-6

WEKA KnowledgeFlow Tutorial for Version 3-5-6 WEKA KnowledgeFlow Tutorial for Version 3-5-6 Mark Hall Peter Reutemann June 1, 2007 c 2007 University of Waikato Contents 1 Introduction 2 2 Features 3 3 Components 4 3.1 DataSources..............................

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

6.034 Design Assignment 2

6.034 Design Assignment 2 6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment

More information

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai An Implementation of Hierarchical Multi-Label Classification System User Manual by 5331028421 Thanawut Ananpiriyakul 5331039321 Piyapan Poomsilivilai Supervisor Dr. Peerapon Vateekul Department of Computer

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

CSI Lab 02. Tuesday, January 21st

CSI Lab 02. Tuesday, January 21st CSI Lab 02 Tuesday, January 21st Objectives: Explore some basic functionality of python Introduction Last week we talked about the fact that a computer is, among other things, a tool to perform high speed

More information

Java Program Structure and Eclipse. Overview. Eclipse Projects and Project Structure. COMP 210: Object-Oriented Programming Lecture Notes 1

Java Program Structure and Eclipse. Overview. Eclipse Projects and Project Structure. COMP 210: Object-Oriented Programming Lecture Notes 1 COMP 210: Object-Oriented Programming Lecture Notes 1 Java Program Structure and Eclipse Robert Utterback In these notes we talk about the basic structure of Java-based OOP programs and how to setup and

More information

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL We have spent the first part of the course learning Excel: importing files, cleaning, sorting, filtering, pivot tables and exporting

More information

Variables are used to store data (numbers, letters, etc) in MATLAB. There are a few rules that must be followed when creating variables in MATLAB:

Variables are used to store data (numbers, letters, etc) in MATLAB. There are a few rules that must be followed when creating variables in MATLAB: Contents VARIABLES... 1 Storing Numerical Data... 2 Limits on Numerical Data... 6 Storing Character Strings... 8 Logical Variables... 9 MATLAB S BUILT-IN VARIABLES AND FUNCTIONS... 9 GETTING HELP IN MATLAB...

More information

Weka VotedPerceptron & Attribute Transformation (1)

Weka VotedPerceptron & Attribute Transformation (1) Weka VotedPerceptron & Attribute Transformation (1) Lab6 (in- class): 5 DIC 2016-13:15-15:00 (CHOMSKY) ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning

More information

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute

Machine Learning Practical NITP Summer Course Pamela K. Douglas UCLA Semel Institute Machine Learning Practical NITP Summer Course 2013 Pamela K. Douglas UCLA Semel Institute Email: pamelita@g.ucla.edu Topics Covered Part I: WEKA Basics J Part II: MONK Data Set & Feature Selection (from

More information

The main differences with other open source reporting solutions such as JasperReports or mondrian are:

The main differences with other open source reporting solutions such as JasperReports or mondrian are: WYSIWYG Reporting Including Introduction: Content at a glance. Create A New Report: Steps to start the creation of a new report. Manage Data Blocks: Add, edit or remove data blocks in a report. General

More information

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC Attribute Discretization and Selection Clustering NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com Naive Bayes Features Intended primarily for the work with nominal attributes

More information

COMP s1 Lecture 1

COMP s1 Lecture 1 COMP1511 18s1 Lecture 1 1 Numbers In, Numbers Out Andrew Bennett more printf variables scanf 2 Before we begin introduce yourself to the person sitting next to you why did

More information

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25

Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25 Dalhousie University CSCI 2132 Software Development Winter 2018 Lab 2, January 25 In this lab, you will first learn autocompletion, a feature of the Bash shell. You will also learn more about the command

More information

COMP33111: Tutorial/lab exercise 2

COMP33111: Tutorial/lab exercise 2 COMP33111: Tutorial/lab exercise 2 Part 1: Data cleaning, profiling and warehousing Note: use lecture slides and additional materials (see Blackboard and COMP33111 web page). 1. Explain why legacy data

More information

ECO375 Tutorial 1 Introduction to Stata

ECO375 Tutorial 1 Introduction to Stata ECO375 Tutorial 1 Introduction to Stata Matt Tudball University of Toronto Mississauga September 14, 2017 Matt Tudball (University of Toronto) ECO375H5 September 14, 2017 1 / 25 What Is Stata? Stata is

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Tutorial for the R Statistical Package

Tutorial for the R Statistical Package Tutorial for the R Statistical Package University of Colorado Denver Stephanie Santorico Mark Shin Contents 1 Basics 2 2 Importing Data 10 3 Basic Analysis 14 4 Plotting 22 5 Installing Packages 29 This

More information

TUBE: Command Line Program Calls

TUBE: Command Line Program Calls TUBE: Command Line Program Calls March 15, 2009 Contents 1 Command Line Program Calls 1 2 Program Calls Used in Application Discretization 2 2.1 Drawing Histograms........................ 2 2.2 Discretizing.............................

More information

MeltLab Reporting Text, CSV or Excel

MeltLab Reporting Text, CSV or Excel MeltLab Reporting Text, CSV or Excel Graphic Statistical Process Control by MeltLab Systems 844-MeltLab www.meltlab.com Fast Accurate Comprehensive Setting up MeltLab Reporting for ASCII ASCII reporting

More information

BaSICS OF excel By: Steven 10.1

BaSICS OF excel By: Steven 10.1 BaSICS OF excel By: Steven 10.1 Workbook 1 workbook is made out of spreadsheet files. You can add it by going to (File > New Workbook). Cell Each & every rectangular box in a spreadsheet is referred as

More information

molegro data modeller

molegro data modeller molegro data modeller user manual MDM 2010.2.5 for Windows, Linux, and Mac OS X copyright molegro 2007-2010 page 2/173 Molegro ApS Copyright 2007-2010 Molegro ApS. All rights reserved. Molegro Data Modeller

More information

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs

An Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two

More information

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton

A Tour of Sweave. Max Kuhn. March 14, Pfizer Global R&D Non Clinical Statistics Groton A Tour of Sweave Max Kuhn Pfizer Global R&D Non Clinical Statistics Groton March 14, 2011 Creating Data Analysis Reports For most projects where we need a written record of our work, creating the report

More information

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,

In stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time, Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since

More information