NOTE: Your grade will be based on the correctness, efficiency and clarity.

Size: px
Start display at page:

Download "NOTE: Your grade will be based on the correctness, efficiency and clarity."

Transcription

1 THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Department of Computer Science COMP328: Machine Learning Fall 2006 Assignment 2 Due Date and Time: November 14 (Tue) 2006, 1:00pm. NOTE: Your grade will be based on the correctness, efficiency and clarity. Task: Nearest Neighbor Classification In this assignment, you are required to implement a Nearest Neighbor Classifier and run it on a 2- dimensional toy dataset. You have to experiment with different distance measures, different values of k, and the use of a local fitting scheme to observe their influence on the classification performance. 1 2D Toy Data The training data has two classes (red and blue) and is shown in The red moon-shaped class contains 2,152 points, while the blue class contains 2,444 samples (Figure 1). Obviously, these two are not linearly separable. The test data is drawn from a regular grid on the 2D plane: {(x,y) : 0.5 x 2.5, 1 y 0}, with a grid width of 0.02 (Figure 2). y x Figure 1: The 2d training set. The coordinates of the training samples are stored in the file data.txt, and those of the test samples are in test.txt, with the following format: 1

2 Figure 2: The test data. line 1... I: [class label [±1 ]] [x coordinate [double]] [y coordinate [double]] line I+1: -1 (this is the end marker) Here, I is the number of training samples. 2 Nearest-Neighbor Classification Recall that there are four ingredients in a memory-based classifier: 1. Distance metric (here, [x 1,x 2,...,x d ] and [y 1,y 2,...,y d ] are two points in R d ): d (a) Euclidean distance: i=1 (x i y i ) 2 ; (b) L 1 distance: d i=1 x i y i ; (c) L distance: max x i y i. i=1,2,...,d 2. Number of nearest neighbors: (a) one (leading to the one-nearest-neighbor classifier); (b) k (leading to the k-nearest-neighbor classifier). 3. Weighting function: 2

3 (a) uniform (b) Gaussian: w i = exp( distance 2 ( x i,query)/k 2 w), where K w is the kernel width. 4. Fitting of the local points: (a) predict with the weighted vote; (b) Predict with a local adaline: If all the neighbors have the same class label (which is trivially the case when k = 1), then predict that the query has this class label. Otherwise, train an adaline using this set of neighbors, and predict the label of the query with the trained adaline. The adaline is of the form f((x 1,x 2 )) = w 0 + w 1 x 1 + w 2 x 2. You should use the Adaline rule for training. The weights are initialized as w 0 = w 1 = w 2 = 0.1. The learning rate η will be specified by the user. For simplicity, we stop the training after 100 iterations. 3 Programming Specification 3.1 Command Syntax The program should be coded in C++ and run under UNIX. The source file must be named classifier.cpp, and the command syntax is (square brackets for parameters that are not always needed): training file test file output file distance type k weighting type [K w] fitting type [eta] training file, testing file, and output file: the file names for the training data, test data, and output, respectively. distance type: -e for Euclidean distance; -l for L 1 distance; i for L distance. k: number of neighbors used. 1: for 1-nearest-neighbor classifier; an integer k > 1: for k-nearest-neighbor classifier. weighting type: -u for uniform weighting; -g for Gaussian weighting. In this case, you also need to specify the kernel width K w. fitting type: -l for predicting with weighted votes; 3

4 -p for fitting a local adaline. In this case, you should also specify the learning rate η. Examples: train.txt test.txt output.txt -e 1 train.txt test.txt output.txt -l 1 train.txt test.txt output.txt -u 1 train.txt test.txt output.txt -e k -u -l train.txt test.txt output.txt -e k -g η -l train.txt test.txt output.txt -e k -u -p η 3.2 Output Format The output file is the file for outputting the test results in each classification task. The format of output.file is: line 1: [testing accuracy [double]] line 2... I+1: [class labels of the test samples (predicted by the classifier) [±1]] line I+2: -1 (this is the end marker) Here, I is the number of test samples. 3.3 Program Structure You should design the class classifier according to the following: typedef double (*dist fun ptr) (double*, double*); typedef void (*wht fun ptr) (double*, double**, int*); typedef int (*fit fun ptr) (double*, double**, int*); class classifier { private: double** train data; int* train labl; double** test data; int* test labl; dist fun ptr p df; 4

5 public: } wht fun ptr p wf; fit fun ptr p ff; int k; double* center; double** local set; int* local set lbl; char* output file; void get train(double** train set, int* train lbl); void get test(double** test set, int* test lbl); void get k(int k); void get dis ptr(dist fun ptr ptr); void get wht ptr(wht fun ptr ptr); void get fit ptr(wht fun ptr ptr); void get ouput(char* output file name); void output(); knn(); classifier(); classifier(); double dis u(double* x, double* y); double dis l(double* x, double* y); double dis i(double* x, double* y); void wht u(double* center, double** local set, int* local set lbl); void wht g(double* center, double** local set, int* local set lbl); int fit l(double* center, double** local set, int* local set lbl); int fit p(double* center, double** local set, int* local set lbl); Specifications of the variables and functions are as follows: double (*dist fun ptr) (double*, double*): format of the distance function. The input arguments are two 1D double arrays (coordinates of samples), and the return value is a distance (double). void (*wht fun ptr) (double* center, double** local set, int* local set lbl): format of the weighting function. The input arguments are: 1D double array (the center point), 2D double array (the k nearest neighbors), and 1D int array (labels of the k neighbors). The function uses the weighting scheme specified to change the labels of the set of neighbors local set lbl. int (*fit fun ptr) (double* center, double** local set, int* local set lbl): format of the fitting function. The input arguments are: 1D double array (the center point), 2D double array (the k nearest neighbors), 1D 5

6 int array (the labels of the nearest neighbors after weighting). The function will return the predicted label of the center point using the weighted labels and the specified fitting scheme (direct combination or fitting with a local adaline). train data: 2D array to store the training data. train labl: 1D array to store the training labels. test data: 2D array to store the test data. test labl: 1D array store the test labels. p df: function pointer of type dist fun ptr, which determines the type of distance measure to be used in the function knn(). p wf: function pointer of type dist fun ptr, which determines the type of weighting scheme to be used in the function knn(). p ff: function pointer of type fit fun ptr, which determines the type of fitting scheme to be used in the function knn(). get train(double** train set, int* train lbl): function to read the training data. The two arguments are passed on to class members train data, train labl. get test(double** test set, int* test lbl): function to read the test data. The two arguments are passed on to class members test data and test labl. get k(int k): function to obtain the user-specified number of nearest neighbors. get dis ptr(dist fun ptr ptr): function to obtain the pointer of the distance function, such as dis u, dis l, dist i, and pass it to the class member p df. get dis ptr(dist fun ptr ptr): function to obtain the pointer of the weighting function, such as wht u, wht g, and pass it to the class member p wf. get fit ptr(fit fun ptr ptr): function to obtain the pointer of the local fitting function, such as fit l, fit p, and pass it to the class member p ff. get ouput(char* output file name): function to get the user-specified output file name, passed on to the class member output file. center, local set, local set lbl: the center, the set of local nearest neighbors, and the label set. Note that to predict the label of each test pattern ( center), you need to find its k nearest neighbors (local set), and the neighbors labels (local set lbl). Then apply the proper weighting scheme and local fitting scheme to perform prediction with these information. Therefore, you should maintain these three class members for each test point. knn(): function to perform classification on the test data. In this function, for every test pattern (center), you should construct the set of local neighbors (local set) and their labels (local set lbl) using the user-specified distance measure (dis e(), dis l(), dis i()) and k. Then you should call the weighting functions (wht u(), wht g()) to re-weight the labels. Finally, you should call the fitting function (fit l(), fit p()) to combine the weighted labels to predict the label of the test point. 6

7 output(): function to output the classification results to output file. dis e(), dis l(), dis i(): distance functions using the Euclidean, L 1, and L distances. wht u(), wht g(): weighting functions using uniform weighting, Gaussian weighting. fit l(), fit p(): fitting functions using direct combination of local labels or fitting a local adaline. 4 Experiment Design and Output You are require to perform the following tasks: 1. 1-NN classification using the Euclidean distance; 2. 1-NN classification using the L 1 distance; 3. 1-NN classification using the L distance; 4. k-nn classification with k = 5, uniform weighting, weighted vote; 5. k-nn classification with k = 9, uniform weighting, weighted vote; 6. k-nn classification with k = 31, uniform weighting, weighted vote; 7. k-nn classification with k = 31, Gaussian weighting (with K w = 0.05), weighted vote; 8. k-nn classification with k = 31, uniform weighting, local fitting by adaline (with η = 0.1). You have to show the classification results on the test data (text.txt) by using red for the positive class, and blue for the negative class. Since the grid we used is very dense, the classification boundary should be easily seen. Altogether, you should provide 8 plots. Besides, we will also use other test data to evaluate the correctness of your program. 5 Submission We will collect your program using the CASS. Please submit the following files: 1. Well-documented program source codes for classifier.cpp. 2. If you have multiple source code files, submit all of them plus a makefile for the compilation. 3. A REPORT file, with your name, student ID, address, and short descriptions of your programs. You should provide the classification results on the 2D plane for all the classification tasks as mentioned in section 4, compare them and make some conclusion on the selection of the parameters. For more detail of CS UNIX account, please read IMPORTANT NOTE: You should NOT modify any file submitted after the assignment collection deadline. 7

8 6 Grading Basic requirement includes program clarity, documentation, and consistency with the assignment specifications. Your program will be compiled under UNIX and tested on a different test data set. Your discussion in the report will also be taken into consideration. 6.1 Late Submission We accept late submissions, but with the following penalties: submit on or before November 15, 1:00pm: deduct 30%; submit on or before November 16, 1:00pm: deduct 60%; submit after November 16, 1:00pm: deduct 100%. 8

NOTE: Your grade will be based on the correctness, efficiency and clarity.

NOTE: Your grade will be based on the correctness, efficiency and clarity. THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Department of Computer Science COMP328: Machine Learning Fall 2006 Assignment 1 Due Date and Time: October 20 (Fri), 2006, 12:00 noon. NOTE: Your grade

More information

Question 1: knn classification [100 points]

Question 1: knn classification [100 points] CS 540: Introduction to Artificial Intelligence Homework Assignment # 8 Assigned: 11/13 Due: 11/20 before class Question 1: knn classification [100 points] For this problem, you will be building a k-nn

More information

Homework Assignment #3

Homework Assignment #3 CS 540-2: Introduction to Artificial Intelligence Homework Assignment #3 Assigned: Monday, February 20 Due: Saturday, March 4 Hand-In Instructions This assignment includes written problems and programming

More information

Distribution-free Predictive Approaches

Distribution-free Predictive Approaches Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

More information

IMPORTANT REMINDERS. Code-packaging info for your tar file. Other reminders, FYI

IMPORTANT REMINDERS. Code-packaging info for your tar file. Other reminders, FYI Carnegie Mellon University 15-826 Multimedia Databases and Data Mining Spring 2016, C. Faloutsos Homework 2 Due Date: Feb. 3, 3:00pm, in class Designed by: Di Jin. IMPORTANT REMINDERS All homeworks are

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

Notes and Announcements

Notes and Announcements Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting

More information

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016. Inf2B assignment 2 (Ver. 1.2) Natural images classification Submission due: 4pm, Wednesday 30 March 2016 Hiroshi Shimodaira and Pol Moreno This assignment is out of 100 marks and forms 12.5% of your final

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

ADVANCED CLASSIFICATION TECHNIQUES

ADVANCED CLASSIFICATION TECHNIQUES Admin ML lab next Monday Project proposals: Sunday at 11:59pm ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 Fall 2014 Project proposal presentations Machine Learning: A Geometric View 1 Apples

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Data Structure and Algorithm Homework #3 Due: 2:20pm, Tuesday, April 9, 2013 TA === Homework submission instructions ===

Data Structure and Algorithm Homework #3 Due: 2:20pm, Tuesday, April 9, 2013 TA   === Homework submission instructions === Data Structure and Algorithm Homework #3 Due: 2:20pm, Tuesday, April 9, 2013 TA email: dsa1@csientuedutw === Homework submission instructions === For Problem 1, submit your source code, a Makefile to compile

More information

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 K-Nearest Neighbors Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Check out review materials Probability Linear algebra Python and NumPy Start your HW 0 On your Local machine:

More information

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013 Voronoi Region K-means method for Signal Compression: Vector Quantization Blocks of signals: A sequence of audio. A block of image pixels. Formally: vector example: (0.2, 0.3, 0.5, 0.1) A vector quantizer

More information

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018 CSCI 567, Fall 18 Haipeng Luo Homework #4 Programming Assignment Due: 11:59 pm, ovember 4, 2018 General instructions Your repository will have now a directory P4/. Please do not change the name of this

More information

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

More information

Data Preprocessing. Supervised Learning

Data Preprocessing. Supervised Learning Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Supervised Learning: Nearest Neighbors

Supervised Learning: Nearest Neighbors CS 2750: Machine Learning Supervised Learning: Nearest Neighbors Prof. Adriana Kovashka University of Pittsburgh February 1, 2016 Today: Supervised Learning Part I Basic formulation of the simplest classifier:

More information

1-Nearest Neighbor Boundary

1-Nearest Neighbor Boundary Linear Models Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We would like here to draw a linear separator, and get so a

More information

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Nearest Neighbor Predictors

Nearest Neighbor Predictors Nearest Neighbor Predictors September 2, 2018 Perhaps the simplest machine learning prediction method, from a conceptual point of view, and perhaps also the most unusual, is the nearest-neighbor method,

More information

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2 Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing

More information

Machine Learning. Classification

Machine Learning. Classification 10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to

More information

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology Text classification II CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2015 Some slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

No more questions will be added

No more questions will be added CSC 2545, Spring 2017 Kernel Methods and Support Vector Machines Assignment 2 Due at the start of class, at 2:10pm, Thurs March 23. No late assignments will be accepted. The material you hand in should

More information

k-nn classification with R QMMA

k-nn classification with R QMMA k-nn classification with R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l1-knn-eng.html#(1) 1/16 HW (Height and weight) of adults Statistics

More information

Character Strings. String-copy Example

Character Strings. String-copy Example Character Strings No operations for string as a unit A string is just an array of char terminated by the null character \0 The null character makes it easy for programs to detect the end char s[] = "0123456789";

More information

PROBLEM 4

PROBLEM 4 PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25

More information

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

More information

Project 5 Handling Bit Arrays and Pointers in C

Project 5 Handling Bit Arrays and Pointers in C CS 255 Project 5 Handling Bit Arrays and Pointers in C Due: Thursday, Apr. 30 by 8:00am. No late submissions! Assignment: This homework is adapted from the CS450 Assignment #1 that Prof. Mandelberg uses

More information

Midterm Examination. Instructor: Gary Chan Date: Saturday, 23 October 2010 Time: 2:30pm 4:00pm Venue: LTC

Midterm Examination. Instructor: Gary Chan Date: Saturday, 23 October 2010 Time: 2:30pm 4:00pm Venue: LTC THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Department of Computer Science & Engineering COMP 152: Object-Oriented Programming and Data Structures Fall 2010 Midterm Examination Instructor: Gary Chan

More information

SEE2030: Introduction to Computer Systems (Fall 2017) Programming Assignment #2:

SEE2030: Introduction to Computer Systems (Fall 2017) Programming Assignment #2: SEE2030: Introduction to Computer Systems (Fall 2017) Programming Assignment #2: Implementing Arithmetic Operations using the Tiny FP (8-bit floating point) representation Due: October 15th (Sunday), 11:59PM

More information

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records, 10/5/2017 MIST.6060 Business Intelligence and Data Mining 1 Distance Measures Nearest Neighbors In a p-dimensional space, the Euclidean distance between two records, a = a, a,..., a ) and b = b, b,...,

More information

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul Mathematics of Data INFO-4604, Applied Machine Learning University of Colorado Boulder September 5, 2017 Prof. Michael Paul Goals In the intro lecture, every visualization was in 2D What happens when we

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

Assignment 3: Inheritance

Assignment 3: Inheritance Assignment 3: Inheritance Due Wednesday March 21 st, 2012 by 11:59 pm. Submit deliverables via CourSys: https://courses.cs.sfu.ca/ Late penalty is 10% per calendar day (each 0 to 24 hour period past due).

More information

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016 Machine Learning 10-701, Fall 2016 Nonparametric methods for Classification Eric Xing Lecture 2, September 12, 2016 Reading: 1 Classification Representing data: Hypothesis (classifier) 2 Clustering 3 Supervised

More information

CS261: HOMEWORK 2 Due 04/13/2012, at 2pm

CS261: HOMEWORK 2 Due 04/13/2012, at 2pm CS261: HOMEWORK 2 Due 04/13/2012, at 2pm Submit six *.c files via the TEACH website: https://secure.engr.oregonstate.edu:8000/teach.php?type=want_auth 1. Introduction The purpose of HW2 is to help you

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

Nearest neighbors classifiers

Nearest neighbors classifiers Nearest neighbors classifiers James McInerney Adapted from slides by Daniel Hsu Sept 11, 2017 1 / 25 Housekeeping We received 167 HW0 submissions on Gradescope before midnight Sept 10th. From a random

More information

Basis Functions. Volker Tresp Summer 2017

Basis Functions. Volker Tresp Summer 2017 Basis Functions Volker Tresp Summer 2017 1 Nonlinear Mappings and Nonlinear Classifiers Regression: Linearity is often a good assumption when many inputs influence the output Some natural laws are (approximately)

More information

Project 2 - Kernel Memory Allocation

Project 2 - Kernel Memory Allocation Project 2 - Kernel Memory Allocation EECS 343 - Fall 2014 Important Dates Out: Monday October 13, 2014 Due: Tuesday October 28, 2014 (11:59:59 PM CST) Project Overview The kernel generates and destroys

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

CS5670: Computer Vision

CS5670: Computer Vision CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4

More information

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

More information

a f b e c d Figure 1 Figure 2 Figure 3

a f b e c d Figure 1 Figure 2 Figure 3 CS2604 Fall 2001 PROGRAMMING ASSIGNMENT #4: Maze Generator Due Wednesday, December 5 @ 11:00 PM for 125 points Early bonus date: Tuesday, December 4 @ 11:00 PM for 13 point bonus Late date: Thursday, December

More information

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods + CS78: Machine Learning and Data Mining Complexity & Nearest Neighbor Methods Prof. Erik Sudderth Some materials courtesy Alex Ihler & Sameer Singh Machine Learning Complexity and Overfitting Nearest

More information

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Data Warehousing Classification-Lazy Learners Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is

More information

On Unix systems, type `make' to build the `svm-train' and `svm-predict' programs. Run them without arguments to show the usages of them.

On Unix systems, type `make' to build the `svm-train' and `svm-predict' programs. Run them without arguments to show the usages of them. Libsvm is a simple, easy-to-use, and efficient software for SVM classification and regression. It can solve C-SVM classification, nu-svm classification, one-class-svm, epsilon-svm regression, and nu-svm

More information

Before submitting the file project4.py, check carefully that the header above is correctly completed:

Before submitting the file project4.py, check carefully that the header above is correctly completed: 1 of 7 8/26/2013 12:43 PM Due date: November 7th, 23:59PM This is a team project. The project is worth 100 points. All the team members will get an equal grade. ONLY the team leader must turn-in the project.

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Midterm Examination CS540-2: Introduction to Artificial Intelligence Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search

More information

Lecture 9. Support Vector Machines

Lecture 9. Support Vector Machines Lecture 9. Support Vector Machines COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Support vector machines (SVMs) as maximum

More information

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2 Jim Lambers ENERGY 211 / CME 211 Autumn Quarter 2007-08 Programming Project 2 This project is due at 11:59pm on Friday, October 17. 1 Introduction In this project, you will implement functions in order

More information

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES CONFERENCES Short Name SIGMOD Full Name Special Interest Group on Management Of Data CONTINUOUS NEAREST NEIGHBOR SEARCH Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology

More information

IV Unit Second Part STRUCTURES

IV Unit Second Part STRUCTURES STRUCTURES IV Unit Second Part Structure is a very useful derived data type supported in c that allows grouping one or more variables of different data types with a single name. The general syntax of structure

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

Cmpt 135 Assignment 2: Solutions and Marking Rubric Feb 22 nd 2016 Due: Mar 4th 11:59pm

Cmpt 135 Assignment 2: Solutions and Marking Rubric Feb 22 nd 2016 Due: Mar 4th 11:59pm Assignment 2 Solutions This document contains solutions to assignment 2. It is also the Marking Rubric for Assignment 2 used by the TA as a guideline. The TA also uses his own judgment and discretion during

More information

3D Object Recognition using Multiclass SVM-KNN

3D Object Recognition using Multiclass SVM-KNN 3D Object Recognition using Multiclass SVM-KNN R. Muralidharan, C. Chandradekar April 29, 2014 Presented by: Tasadduk Chowdhury Problem We address the problem of recognizing 3D objects based on various

More information

Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

More information

Forest Fire Simulation Using Multiple Processes and Pipes

Forest Fire Simulation Using Multiple Processes and Pipes CEG 434/634: Concurrent Software Design (Fall 2002) PROGRAMMING ASSIGNMENT I Forest Fire Simulation Using Multiple Processes and Pipes Distribution date: October 1 (Tuesday) Due Date: October 15 (Tuesday)

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques

More information

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty) Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Support Vector Machines

Support Vector Machines Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

More information

Instance-based Learning

Instance-based Learning Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 15 th, 2007 2005-2007 Carlos Guestrin 1 1-Nearest Neighbor Four things make a memory based learner:

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

Programming Assignment 8 ( 100 Points )

Programming Assignment 8 ( 100 Points ) Programming Assignment 8 ( 100 Points ) Due: 11:59pm Wednesday, November 22 Start early Start often! README ( 10 points ) You are required to provide a text file named README, NOT Readme.txt, README.pdf,

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

CS 584 Data Mining. Classification 1

CS 584 Data Mining. Classification 1 CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for

More information

Assignment 1: grid. Due November 20, 11:59 PM Introduction

Assignment 1: grid. Due November 20, 11:59 PM Introduction CS106L Fall 2008 Handout #19 November 5, 2008 Assignment 1: grid Due November 20, 11:59 PM Introduction The STL container classes encompass a wide selection of associative and sequence containers. However,

More information

Midterm II CS164, Spring 2006

Midterm II CS164, Spring 2006 Midterm II CS164, Spring 2006 April 11, 2006 Please read all instructions (including these) carefully. Write your name, login, SID, and circle the section time. There are 10 pages in this exam and 4 questions,

More information

Machine Learning and Pervasive Computing

Machine Learning and Pervasive Computing Stephan Sigg Georg-August-University Goettingen, Computer Networks 17.12.2014 Overview and Structure 22.10.2014 Organisation 22.10.3014 Introduction (Def.: Machine learning, Supervised/Unsupervised, Examples)

More information

CSE 546 Machine Learning, Autumn 2013 Homework 2

CSE 546 Machine Learning, Autumn 2013 Homework 2 CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page

More information

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551,

More information

Programming Assignment #4

Programming Assignment #4 SSE2030: INTRODUCTION TO COMPUTER SYSTEMS (Fall 2014) Programming Assignment #4 Due: November 15, 11:59:59 PM 1. Introduction The goal of this programing assignment is to enable the student to get familiar

More information

CS 2604 Minor Project 3 Movie Recommender System Fall Braveheart Braveheart. The Patriot

CS 2604 Minor Project 3 Movie Recommender System Fall Braveheart Braveheart. The Patriot Description If you have ever visited an e-commerce website such as Amazon.com, you have probably seen a message of the form people who bought this book, also bought these books along with a list of books

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:

More information

Programming Assignment 2 ( 100 Points )

Programming Assignment 2 ( 100 Points ) Programming Assignment 2 ( 100 Points ) Due: Thursday, October 16 by 11:59pm This assignment has two programs: one a Java application that reads user input from the command line (TwoLargest) and one a

More information

VECTOR SPACE CLASSIFICATION

VECTOR SPACE CLASSIFICATION VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture

More information

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela

More information

9 Classification: KNN and SVM

9 Classification: KNN and SVM CSE4334/5334 Data Mining 9 Classification: KNN and SVM Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2017 (Slides courtesy of Pang-Ning Tan, Michael Steinbach

More information

BTRY/STSCI 6520, Fall 2012, Homework 3

BTRY/STSCI 6520, Fall 2012, Homework 3 BTRY/STSCI 6520, Fall 2012, Homework 3 Due at 8:00am on October 24, 2012, by email (see submission instructions) 1 Introduction K-Means Clustering The K-means algorithm is the simplest and most widely

More information

Arrays 2 CS 16: Solving Problems with Computers I Lecture #12

Arrays 2 CS 16: Solving Problems with Computers I Lecture #12 Arrays 2 CS 16: Solving Problems with Computers I Lecture #12 Ziad Matni Dept. of Computer Science, UCSB Material: Post- Midterm #1 Lecture 7 thru 12 Homework, Labs, Lectures, Textbook Tuesday, 11/14 in

More information