CS451 - Assignment 3 Perceptron Learning Algorithm

Similar documents
CS159 - Assignment 2b

CS451 - Assignment 8 Faster Naive Bayes? Say it ain t so...

CS158 - Assignment 9 Faster Naive Bayes? Say it ain t so...

CS201 - Assignment 3, Part 2 Due: Wednesday March 5, at the beginning of class

Due: 9 February 2017 at 1159pm (2359, Pacific Standard Time)

Text Generator Due Sunday, September 17, 2017

CMPSCI 187 / Spring 2015 Sorting Kata

************ THIS PROGRAM IS NOT ELIGIBLE FOR LATE SUBMISSION. ALL SUBMISSIONS MUST BE RECEIVED BY THE DUE DATE/TIME INDICATED ABOVE HERE

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Laboratory 5: Implementing Loops and Loop Control Strategies

EECE.2160: ECE Application Programming

CS 1803 Pair Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 29th, before 6 PM Out of 100 points

CMPSCI 187 / Spring 2015 Hanoi

LARGE MARGIN CLASSIFIERS

CS 170 Algorithms Fall 2014 David Wagner HW12. Due Dec. 5, 6:00pm

CMPSCI 187 / Spring 2015 Hangman

CS52 - Assignment 7. Due Friday 11/13 at 5:00pm.

CS166 Handout 11 Spring 2018 May 8, 2018 Problem Set 4: Amortized Efficiency

Lab 4: On Lists and Light Sabers

Programming Assignment 5 (100 Points) START EARLY!

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

Homework Assignment #3

Due: Tuesday 29 November by 11:00pm Worth: 8%

1-D Time-Domain Convolution. for (i=0; i < outputsize; i++) { y[i] = 0; for (j=0; j < kernelsize; j++) { y[i] += x[i - j] * h[j]; } }

CMSC 201 Spring 2019 Lab 06 Lists

CMPSCI 187 / Spring 2015 Postfix Expression Evaluator

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Computer Science E-119 Fall Problem Set 3. Due before lecture on Wednesday, October 31

It is academic misconduct to share your work with others in any form including posting it on publicly accessible web sites, such as GitHub.

Programming Project 1

Linear Classification and Perceptron

CS201 - Assignment 7 Due: Wednesday April 16, at the beginning of class

CREATING CONTENT WITH MICROSOFT POWERPOINT

EECE.2160: ECE Application Programming

CS 1803 Pair Homework 3 Calculator Pair Fun Due: Wednesday, September 15th, before 6 PM Out of 100 points

Table of Laplace Transforms

Assignment #1: /Survey and Karel the Robot Karel problems due: 1:30pm on Friday, October 7th

Mehran Sahami Handout #7 CS 106A September 24, 2014

CMSC 201 Spring 2018 Project 2 Battleship

Hands on Assignment 1

Control Flow. COMS W1007 Introduction to Computer Science. Christopher Conway 3 June 2003

CS 051 Homework Laboratory #2

Lab 1: Introduction to Java

CMPSCI 187 / Spring 2015 Implementing Sets Using Linked Lists

CS 224N: Assignment #1

Question 1: knn classification [100 points]

Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis

Programming Assignment 8 ( 100 Points )

Repetition Structures

Condition-Controlled Loop. Condition-Controlled Loop. If Statement. Various Forms. Conditional-Controlled Loop. Loop Caution.

Jerry Cain Handout #5 CS 106AJ September 30, Using JSKarel

Data Mining Part 5. Prediction

Comp Assignment 2: Object-Oriented Scanning for Numbers, Words, and Quoted Strings

Homework 4: Hash Tables Due: 5:00 PM, Mar 9, 2018

Assignment #1: and Karel the Robot Karel problems due: 3:15pm on Friday, October 4th due: 11:59pm on Sunday, October 6th

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS 553 Compiler Construction Fall 2006 Project #4 Garbage Collection Due November 27, 2005

CS3901 Intermediate Programming and Data Structures Winter 2008 (Q2 AY 08) Assignment #5. Due: noon Tuesday, 3 March 2008

CMSC 201 Spring 2017 Project 1 Number Classifier

Fall CSEE W4119 Computer Networks Programming Assignment 1 - Simple Online Bidding System

Homework 2: Imperative Due: 5:00 PM, Feb 15, 2019

CMSC 201 Spring 2018 Project 3 Minesweeper

Note: This is a miniassignment and the grading is automated. If you do not submit it correctly, you will receive at most half credit.

Divisibility Rules and Their Explanations

CSCI Lab 9 Implementing and Using a Binary Search Tree (BST)

Lecture 10. Daily Puzzle

You must define a class that represents songs. Your class will implement the Song interface in $master/proj2/cs61b/song.java.

CSI33 Data Structures

Homework 2: HMM, Viterbi, CRF/Perceptron

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

CS 224N: Assignment #1

Fall 2013 Program/Homework Assignment #2 (100 points) -(Corrected Version)

CS 2505 Fall 2013 Data Lab: Manipulating Bits Assigned: November 20 Due: Friday December 13, 11:59PM Ends: Friday December 13, 11:59PM

Graduate-Credit Programming Project

These are notes for the third lecture; if statements and loops.

Lab 9: More Sorting Algorithms 12:00 PM, Mar 21, 2018

CS 211 Programming Practicum Fall 2018

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

Web API Lab. The next two deliverables you shall write yourself.

CS150 - Assignment 5 Data For Everyone Due: Wednesday Oct. 16, at the beginning of class

This is a combination of a programming assignment and ungraded exercises

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

CS 1803 Pair Homework 10 Newsvendor Inventory Policy Due: Monday, November 29th before 6:00 PM Out of 100 points

Lab 4: Bash Scripting

Programming Exercise 4: Neural Networks Learning

CS 116. Lab Assignment # 1 1

QUIZ on Ch.5. Why is it sometimes not a good idea to place the private part of the interface in a header file?

CS Homework 11 p. 1. CS Homework 11

COLLECTIONS & ITERATORS cs2420 Introduction to Algorithms and Data Structures Spring 2015

Lab 03 - x86-64: atoi

Assignment 3 Functions, Graphics, and Decomposition

CS155: Computer Security Spring Project #1

CS 210 Fundamentals of Programming I Spring 2015 Programming Assignment 4

Homework /681: Artificial Intelligence (Fall 2017) Out: October 18, 2017 Due: October 29, 2017 at 11:59PM

CS 2316 Individual Homework 4 Greedy Scheduler (Part I) Due: Wednesday, September 18th, before 11:55 PM Out of 100 points

Distributed Collaboration - Assignment 3: Replicated N-User IM

Computer Science E-119 Fall Problem Set 4. Due prior to lecture on Wednesday, November 28

Midterm 2. 7] Explain in your own words the concept of a handle class and how it s implemented in C++: What s wrong with this answer?

Transcription:

CS451 - Assignment 3 Perceptron Learning Algorithm Due: Sunday, September 29 by 11:59pm For this assignment we will be implementing some of the perceptron learning algorithm variations and comparing both their performance and run-times. As always, make sure to read through the entire handout before starting. You may (and I would strongly encourage you to) work with a partner on this assignment. If you do, you must both be there whenever you are working on the project. If you d like a partner for the lab, e-mail me asap and I can try and pair people up. 1 Requirements Implement two preceptron learning algorithms we talked about in class: 1. Implement a class called PerceptronClassifier that implements the Classifier interface. Your implementation must: Follow the basic algorithm outlined in class (and in the book), updating the weights when a mistake is made. You do NOT need to check for convergence, we will only stop when we have reached the iteration limit. Include an empty constructor. To figure which weights you ll need in your train method you can call the getallfeatureindices() method from the data set that s passed in. The features in the set returend will define the features that your classifier will be learning over, i.e. the weights. 1

Initially all of the weights and the b-value should start out at 0. Include a method call setiterations that takes an integer and sets the number of iterations to iterate over on the data. By default, the number of iterations should be set to 10. Include a tostring method that returns the weights and the b value of the classifier as a string in the following format: 0:weight_0 1:weight_1 2:weight2... n:weight_n b-value that is, <feature_number>:<weight>. The features should be in increasing order. 2. Implement a class called AveragePerceptronClassifier that implements the Classifier interface. This class should follow the same specification as the basic perceptron algorithm (including constructor, setiterations and tostring), however, in addition to recalculating the weight vector each time a mistake is make, we will also keep track of an average weight vector that is the weighted average of all of the weight vectors seen so far (called the average perceptron algorithm in the book and slides). The pseudocode in the book for this algorithm is NOT correct so I have included a correct version here: 2

w = [0,0,...,0] // normal weights, one for each feature b = 0 // normal b u = [0,0,...,0] // aggregate weights, one for each feature b2 = 0 updated = 0 for iter = 1... MaxIter{ for all (x,label) in dataset{ if (w x + b <= 0 ){ // if we misclassify example x using weights w // update our final, weighted weights for each u_i in u: u_i = u_i + updated * w_i b2 = b2 + updated * b // update all the perceptron weights for each w_i in w: w_i = w_i + label * x_i b = b + label updated = 0 }// end of misclassify updated = updated+1 } // end of dataset for lookp } // end of outer loop // do one last weighted update here of the u and b2 weights based on the final weights // divide all of the aggregate weights by the total number of examples for each u_i in u: u_i = u_i/size(dataset) b2 = b2/size(dataset) return (u, b2) This version works by keeping a running total in updated of the number of examples the current weights have gotten correct. When a mistake is made by the weights, we add to our aggregate weights (u and b2) a weighted copy of the current weight vector. In the end, we then normalize the weights by the total number of examples. This approach can suffer from overflow on the weights in some rare circumstances, but it easier to implement. 3

2 Starter Code I have again included code to get you started at: http://www.cs.middlebury.edu/~dkauchak/classes/cs451/assignments/assign3/assign3-starter.tar.gz The starter code includes all of the same code as last time with the following changes: - I have changed the DataSet class to also include a constructor that just takes a CSV file and does NOT require the index of the label. This constructor assumes that the label is the last column. - I have included a class called ClassifierTimer in the ml.classifier package. This class implements functionality for timing how long it takes for a classifier to train and test. 3 Data At: http://www.cs.middlebury.edu/~dkauchak/classes/cs451/assignments/assign3/data/ I have included three.csv files for use on this assignment: - titanic-train.perc.csv: The same training data as our last assignment, but now the labels are 1 for positive and -1 for negative. - simple1.csv: The first example perceptron learning algorithm walkthrough from the notes. - simple2.csv: The second example perceptron learning algorithm walkthrough from the notes. 4 Evaluation Once you have your two perceptron classifier algorithms working, let s see how well they do! Answer the following questions and put them in a file called experiments (pick some reasonable file type). Explicitly label each answer with the question number. 1. What is the accuracy of the two perceptron classifiers on the Titanic data set. To calculate this, generate a random 80/20 split (using dataset.split(0.8)) train the model on the 80% fraction and then evaluate the accuracy on the 20% fraction. Repeat this 100 times and average the results (hint: do the repetition in code :). 2. What is the best iteration limit to use for this data? To answer this, do the same calculations as above (average 100 experiments), but do it for increasing iteration limits. 4

3. Do we see overfitting with this data set? Repeat the experiment from question 2 and calculate the accuracy this time on both the testing data (like before) and the training data. How does this compare to your results for decision trees? Does this make sense? Include a 2-3 sentence explanation. 4. Use the ClassifierTimer class to compare the run-time performance of your classifiers. Do the results make sense? Include a 1-2 sentence explanation. 5 Hints/Advice I have included two very basic CSV files where you know what the answer at each step of the algorithm should be. I strongly encourage you to check your results on these first. Because of the simplicity of the perceptron learning algorithm, it can be challenging to make sure your code is working correctly. To compare against the results in the slides, you ll need to fix b to zero. To do this, just comment out the line in your code that updates b. In addition, you ll need to start out the weight vector at 1,0 (instead of 0,0). Once you ve done this, your answers should be identical to those on the slides. As always, test/debug incrementally!!! Do NOT simply try and code the whole thing up and then start debugging from there. Break the problem into smaller subproblems/submethods, implement them one at a time and test each one before moving on. I know this seems like it takes more time, but the time you put in early on will more than make up for it down the road both in time and grief. Don t forget that you need to randomize the order which you traverse your examples before each iteration. 6 Extra Credit For those who would like to experiment (and push themselves) a bit more (and of course, get a bit of extra credit) you can try out some of these extra credit options. If you try out these options, include an extra file called extra.txt that describes what extra credit you did. Implement a class called VotedPerceptronClassifier that implements the Classifier interface. Your implementation: Should have the same methods as the basic perceptron algorithm (i.e. same constructor and setiterations method), however, you don t have to implement the tostring method. Should keep track of all of the weight vectors found during training along with their votes (i.e. how many examples were correctly classified by that weight vector). The classify method should then calculate a voted majority. 5

If you go this route, also include classification performance in your writeup along with train/test timing. The right way to implement the AveragePerceptronClassifier is to extend the basic PerceptronClassifier class and only alter the train method. For extra credit, implement it this way. To do this, you must minimize the amount of repeated code and maximize the amount of shared code. Find another dataset with at least 5 features and 500 examples and provide a comparison of your perceptron algorithms on it. 7 When You re Done Make sure that your code compiles, that your files are named as specified and that you have followed the specifications exactly (i.e. method names, number of parameters, etc.). Create a directory with your last name, followed by the assignment number, for example, for this assignment mine would be kauchak3. If you worked with a partner, put both last names. Inside this directory, create a code directory and copy all of your code into this directory, maintaining the package structure. Finally, also include your experiments file with the answers from Section??. tar then gzip this folder and submit that file on the submission page on the course web page. 6

Commenting and code style Your code should be commented appropriately (though you don t need to go overboard). The most important things: Your name (or names) and the assignment number should be at the top of each file Each class and method should have an appropriate doctsring If anything is complicated, it should include some comments. There are many possible ways to approach this problem, which makes code style and comments very important here so that I can understand what you did. For this reason, you will lose points for poorly commented or poorly organized code. 7