CSE 158, Fall 2017: Midterm

Similar documents
CSE 190, Spring 2015: Midterm

CSE 190, Fall 2015: Midterm

CSE 158. Web Mining and Recommender Systems. Midterm recap

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 25: Review I

Curve Fitting with Linear Models

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

RESAMPLING METHODS. Chapter 05

The exam is closed book, closed notes except your one-page cheat sheet.

10-701/15-781, Fall 2006, Final

Stat 342 Exam 3 Fall 2014

Applying Supervised Learning

CSE548, AMS542: Analysis of Algorithms, Fall 2012 Date: October 16. In-Class Midterm. ( 11:35 AM 12:50 PM : 75 Minutes )

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Classification Algorithms in Data Mining

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

االمتحان النهائي للفصل الدراسي الثاني للعام 2015/2014

Machine Learning: Algorithms and Applications Mockup Examination

I211: Information infrastructure II

To earn the extra credit, one of the following has to hold true. Please circle and sign.

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Nearest neighbor classification DSE 220

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

6.034 Quiz 2, Spring 2005

Machine Learning in Action

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)

Classify My Social Contacts into Circles Stanford University CS224W Fall 2014

Data Preprocessing. Supervised Learning

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

1 Machine Learning System Design

Classification. Slide sources:

Network Traffic Measurements and Analysis

Stat 602X Exam 2 Spring 2011

b 1. If he flips the b over to the left, what new letter is formed? Draw a picture to the right.

CS145: INTRODUCTION TO DATA MINING

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

PRACTICE FINAL EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0

Practice Questions for Midterm

Introduction to Machine Learning. Xiaojin Zhu

A VALIDATION OF THE EFFECTIVENESS OF INNER DEPENDENCE IN AN ANP MODEL

CS371R: Final Exam Dec. 18, 2017

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

6 Model selection and kernels

CS249: ADVANCED DATA MINING

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Classification and Regression

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Classification. 1 o Semestre 2007/2008

Performance Evaluation of Various Classification Algorithms

Predicting Messaging Response Time in a Long Distance Relationship

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Birkbeck (University of London)

Introduction to Automated Text Analysis. bit.ly/poir599

PROBLEM 4

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al.

CP365 Artificial Intelligence

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

k-nearest Neighbors + Model Selection

Predicting Popular Xbox games based on Search Queries of Users

Nearest Neighbor Predictors

Machine Learning / Jan 27, 2010

Natural Language Processing

CSE 131 Introduction to Computer Science Fall 2016 Exam I. Print clearly the following information:

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

This is a set of practice questions for the final for CS16. The actual exam will consist of problems that are quite similar to those you have

Using Machine Learning to Optimize Storage Systems

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

List of Exercises: Data Mining 1 December 12th, 2015

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

5 Learning hypothesis classes (16 points)

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm

Supervised vs unsupervised clustering

Notes and Announcements

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Lecture 9: Support Vector Machines

Data Mining with SPSS Modeler

CSE 573: Artificial Intelligence Autumn 2010

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Midterm Exam Fundamentals of Computer Graphics (COMP 557) Thurs. Feb. 19, 2015 Professor Michael Langer

Note that there are questions printed on both sides of each page!

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

ML 프로그래밍 ( 보충 ) Scikit-Learn

Overview and Practical Application of Machine Learning in Pricing

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Data Science Tutorial

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Machine Learning Classifiers and Boosting

Regularization and model selection

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

10601 Machine Learning. Model and feature selection

Transcription:

CSE 158, Fall 2017: Midterm Name: Student ID: Instructions The test will start at 5:10pm. Hand in your solution at or before 6:10pm. Answers should be written directly in the spaces provided. Do not open or start the test before instructed to do so. Note that the final page contains some algorithms and definitions. Total marks = 26 1

Section 1: Regression and Ranking (6 marks) Unless specified otherwise questions are each worth 1 mark. 1. The following is a list of prices from a local car dealership: No. Model Luxury? Year MPG Horsepower Price 1 Acura MDX Yes 2017 20 290 $50,000 2 Honda Accord No 2017 25 190 $25,000 3 Honda Civic No 2012 23 160 $10,000 4 Honda Civic No 2016 24 170 $18,000 5 Nissan Altima No 2016 30 180 $25,000 6 Acura MDX Yes 2015 18 280 $38,000 7 Lexus RX350 Yes 2015 21 270 $40,000 8 Toyota Prius No 2014 45 120 $28,000 9 Toyota Prius No 2013 40 120 $24,000 Suppose you train a regressor of the following form to predict a vehicle s price: price θ 0 θ 1 [Year] θ 2 [MPG] θ 3 [Is luxury?] What would be the feature representation of the first two vehicles? 1: 2: 2. List two additional features that might be useful for predicting the price of a car, and how you would encode them (2 marks): 1: 2: 3. Suppose that you train two predictors on similar data to predict the price and obtain: Price (Predictor 1) = 40000 100 [MPG] Price (Predictor 2) = 3000010000 [Is luxury?]100 [MPG] The coefficient for MPG is negative for the first predictor, but positive for the second. Can you provide a brief explanation / interpretation of why this could be the case? 4. (Hard) In class we stated that the best possible constant predictor (i.e., y i α) was to set α to be the mean value of y (i.e., α = 1 N i y i). Show that this is the case when minimizing the MSE (hint: compute the derivative of the MSE and find the critical point by solving for α) (2 marks): 2

Section 2: Classification and Diagnostics (8 marks) Suppose you train two (linear) SVM classifiers, A and B, which produce the following separation boundaries: A B 've side 've side 've side 've side 5. What is the performance of the two classifiers in terms of the following (you may leave your expressions unsimplified) (4 marks): Accuracy: B: # True positives: B: # True negatives: B: BER: B: Precision: B: Recall: B: Fscore: B: Precision@5: B: 6. Suppose you were using your classifier to rank emails from important (positive label) to not important. Which of the two classifiers would you prefer and why? 7. Imagine that the goal of a classifier is to predict whether a person is 20 years old. Two features that might be predictive include (a) height, and (b) vocabulary size. Would a Naïve Bayes classifier be suitable to train a predictor based on these two features? Explain why or why not. 3

8. (Critical thinking) A trivial classifier that we did not cover in class is a nearest neighbor classifier. This classifier has no parameters, and simply classifies points in the test set based on their similarity to points in the training set. That is, given a point X i that we wish to classify, we consider all X j in the training set, and select the label of the nearest one: y i = y argminj X i X j 2 2 Describe two settings (e.g. applications, properties of datasets, computational resources available, etc.) in which the nearest neighbor classifier would be (1) preferable to logistic regression, and (2) less preferable than logistic regression (2 marks) 4

Section 3: Clustering / Communities (5 marks) When asked to draw examples, provide 2d sets of points and/or clusters like the following: points clusters 9. Consider running the clique percolation algorithm with K = 3 on the following graph (see pseudocode on final page of exam): what are the communities found by the algorithm? (you can draw your solution directly on the graph) 10. Using the boxes below, draw examples of sets of 2d point sets for which (a) PCA would be more appropriate than hierarchical clustering (b) Hierarchical clustering would be more appropriate than PCA (c) Neither hierarchical clustering nor PCA would be appropriate (3 marks) (a) (b) (c) 11. For the examples above, describe a real pair of features that might be described by the points you drew. ((b) is provided as an example) (2 marks): dimension 1: dimension 2: dimension 1: Latitude dimension 2: Longitude dimension 1: dimension 2: (a) (b) (c) 5

Section 4: Recommender Systems (7 marks) On a popular music streaming website, a few users have listened to the following music: Album Listened? Liked? Nathan Thomas Dhruv Kevin Prateek Nathan Thomas Dhruv Kevin Prateek Lana Del Ray 1 0 1 1 0 1? 1 1? Born to Die 1 0 0 1 0 1?? 1? Ultraviolence 0 1 1 1 0? 1 1 1? Honeymoon 0 1 1 0 0? 1 1?? Lust for Life 1 1 0 1 1 1 1? 1 1 12. Suppose you want to determine which users are similar to each other in terms of their listening behavior. What would be an appropriate metric for determining users similarity, and which two users would be most similar under this metric (list multiple in case of a tie)? (2 marks) 13. Suppose you want to determine which users are similar to each other in terms of their preferences. What would be an appropriate metric for determining users similarity, and which two users would be most similar under this metric (list multiple in case of a tie)? Describe how you handle the? entries (2 marks). 14. (Critical Thinking) Suppose you wanted to design a recommender system to suggest points of interest in a city based on users past activities/behavior/etc. Describe what data you would collect from users, how you would model the problem, and any issues that make this problem different from those we saw in class (3 marks). 6

Precision: Recall: Balanced Error Rate: Fscore: Jaccard similarity: Cosine similarity: Naïve Bayes: {relevant documents} {retrieved documents} {retrieved documents} {relevant documents} {retrieved documents} {relevant documents} 1 (False Positive Rate False Negative Rate) 2 2 precision recall precision recall Sim(A, B) = Sim(A, B) = A B A B A B A B p(label features) p(label) i p(feature i label) p(features) Algorithm 1 Clique percolation with parameter k Initially, all kcliques in the graph are communities while there are two communities that have a (k 1)clique in common do merge both communities into a single community Algorithm 2 Hierarchical clustering Initially, every point is assigned to its own cluster while there is more than one cluster do Compute the center of each cluster Combine the two clusters with the nearest centers Write any additional answers/corrections/comments here: 7