CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Similar documents
5 Learning hypothesis classes (16 points)

CSE 158, Fall 2017: Midterm

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Network Traffic Measurements and Analysis

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

LARGE MARGIN CLASSIFIERS

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Classification and K-Nearest Neighbors

6.034 Quiz 2, Spring 2005

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Machine Learning Classifiers and Boosting

CS6220: DATA MINING TECHNIQUES

6. Linear Discriminant Functions

Lecture #11: The Perceptron

Classification: Linear Discriminant Functions

To earn the extra credit, one of the following has to hold true. Please circle and sign.

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Machine Learning. Chao Lan

Linear Regression and K-Nearest Neighbors 3/28/18

1 Training/Validation/Testing

Stat 602X Exam 2 Spring 2011

CS229 Final Project: Predicting Expected Response Times

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

A Systematic Overview of Data Mining Algorithms

Naïve Bayes for text classification

Artificial Neural Networks (Feedforward Nets)

CS8803: Statistical Techniques in Robotics Byron Boots. Thoughts on Machine Learning and Robotics. (how to apply machine learning in practice)

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Louis Fourrier Fabien Gaie Thomas Rolf

k-nearest Neighbors + Model Selection

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

CSE 190, Spring 2015: Midterm

CS 237: Probability in Computing

6.867 Machine Learning

ADVANCED CLASSIFICATION TECHNIQUES

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Machine Learning: Think Big and Parallel

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

Predict the box office of US movies

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Week 3: Perceptron and Multi-layer Perceptron

CSC 411: Lecture 02: Linear Regression

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

5 Machine Learning Abstractions and Numerical Optimization

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Predicting Popular Xbox games based on Search Queries of Users

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Weka ( )

Linear methods for supervised learning

CS 540: Introduction to Artificial Intelligence

Evaluating Classifiers

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Cross-validation for detecting and preventing overfitting

Neural Network Neurons

PROBLEM 4

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

What is machine learning?

Simple Model Selection Cross Validation Regularization Neural Networks

Machine Learning in Telecommunications

Decision Trees: Representation

Closing Thoughts on Machine Learning (ML in Practice)

Notes and Announcements

Lecture on Modeling Tools for Clustering & Regression

Lecture 25: Review I

Lecture 9: Support Vector Machines

CS570: Introduction to Data Mining

CSE 158. Web Mining and Recommender Systems. Midterm recap

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Notes on Multilayer, Feedforward Neural Networks

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Applying Supervised Learning

Optimization Methods for Machine Learning (OMML)

Machine learning for vision. It s the features, stupid! cathedral. high-rise. Winter Roland Memisevic. Lecture 2, January 26, 2016

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Data Mining and Knowledge Discovery: Practice Notes

Classification Algorithms in Data Mining

Regularization and model selection

Nearest Neighbor Predictors

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Machine Learning / Jan 27, 2010

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Transcription:

CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE EXAM FIRST and organize your time; don t spend too long on any one problem Please write clearly and show all your work. If you need clarification on a problem, please raise your hand and wait for the instructor to come over. Turn in any scratch paper with your eam. 1

Problem 1: (8 points) Separability and Features For each of the following eamples of training data, (1) sketch a classification boundary that separates the data; (2) state whether or not the data are linearly separable, and if not, (3) give a set of features that would allow the data to be separated. (Your features do not need to be minimal, but should not contain any obviously unneeded features.) Problem 2: (8 points) Under- and Over-fitting Suppose that I am training a neural network classifier to recognize faces in images. Using crossvalidation, we discover that my classifier appears to be overfitting the data. Give two ways I could improve my performance be specific. After following some of your advice, we now think that the resulting classifier is underfitting. Give two ways, other than reversing the methods you mentioned above, that we could improve performance; again, be specific. 2

Problem 3: (9 points) Bayes Classifiers and Naïve Bayes Consider the table of measured data given at right. We will use the two observed features 1, 2 to predict the class y. In the case of a tie, we will prefer to predict class y = 0. (a) Write down the probabilities necessary for a naïve Bayes classifier: 1 2 y 0 0 1 0 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1 (b) Using your naïve Bayes model, what value of y is predicted given observation ( 1, 2 ) = (00)? (c) Using your naïve Bayes model, what is the probability p(y = 1 1 = 0, 2 = 1)? 3

Problem 4: (10 points) Gradient Descent Suppose that we have training data {( (1), y (1) ), ( (2), y (2) ),..., ( (m), y (m) )} and we wish to predict y using a nonlinear regression model with two parameters: ŷ = a ep( 1 + b) We decide to train our model using gradient descent on the mean squared error (MSE). (a) Write down the epression for the MSE on our training set. (b) Write down the gradient of the MSE. (c) Give pseudocode for a (batch) gradient descent function theta = train(x,y), including all necessary elements for it to work. 4

Problem 5: (8 points) Cross-validation and Linear Regression Consider the following data points, copied in each part. We wish to perform linear regression to minimize mean squared error. (a) Compute the leave-one-out cross-validation error of a zero-order (constant) predictor. y 2 1 1 2 3 (b) Compute the leave-one-out cross-validation error of a first-order (linear) predictor. y 2 1 1 2 3 5

Problem 6: (12 points) K-Nearest Neighbor Regression Consider a regression problem for predicting the following data points, using the k-nearest neighbor regression algorithm to minimize mean squared error (MSE). In the case of ties, we will prefer to use the neighbor to the left (smaller value). Note: if you prefer, you may leave an arithmetic epression, e.g., leave values as (.6) 2. 4 2 1 1 2.5 3.25 4.75 (a) For k = 1, compute the training error on the provided data. (b) For k = 1, compute the leave-one-out cross-validation error on the data. (c) For k = 3, compute the training error on the provided data. (d) For k = 3, compute the leave-one-out cross-validation error on the data. 6

Problem 7: (4 points) Multiple Choice For the following questions, assume that we have m data points y (i), (i), i = 1... m, each with n features, (i) = [ (i) 1... (i) n ]. Circle one answer for each: True or false: Linear regression can be solved using either matri algebra or gradient descent. True or false: The predictions of a k-nearest neighbor classifier will not be affected if we preprocess the data to normalize the magnitude of each feature. True or false: With enough hidden nodes, a Neural Network can separate any data set. True or false: Increasing the regularization of a linear regression model will decrease the bias. Problem 8: (4 points) Short Answer Give one advantage of stochastic gradient descent over batch gradient descent, and one advantage of batch gradient descent over stochastic. 7

Problem 9: (12 points) Perceptrons and VC Dimension In this problem, consider the following perceptron model on two features: ŷ() = sign( b + w 1 1 + w 2 2 ) and answer the following questions about the decision boundary and the VC dimension. (a) Describe (in words, with diagrams if desired) the possible decision boundaries that can be realized by this classifier (b) What is its VC dimension? Now suppose that I also enforce an additional condition on the parameters of the model: that only one of the two weights w 1, w 2 is non-zero (i.e., one of them must be zero, a feature selection criterion). Note that the training algorithm can choose which parameter is zero, depending on the data. (c) Describe (in words, with diagrams if desired) the decision boundaries that can be realized by this classifier (there should be two cases ). (d) What is its VC dimension? 8

Problem 10: (9 points) Support Vector Machines + + + + 3 1 0 0.5 1 3 Using the above data with one feature (whose values are given below each data point) and a class variable y { 1, +1}, with filled circles indicating y = +1 and squares y = 1 (the sign is also shown above each data point for redundancy), answer the following: (a) Sketch the solution (decision boundary) of a linear SVM on the data, and identify the support vectors. (b) Give the solution parameters w and b, where the linear form is w + b. (c) Calculate the training error: (d) Calculate the leave-one-out cross-validation error for these data: 9