PRACTICE FINAL EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Similar documents
Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Midterm Examination CS540-2: Introduction to Artificial Intelligence

6.034 Quiz 2, Spring 2005

CS 229 Midterm Review

5 Learning hypothesis classes (16 points)

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

10-701/15-781, Fall 2006, Final

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

The exam is closed book, closed notes except your one-page cheat sheet.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Machine Learning. Chao Lan

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Simple Model Selection Cross Validation Regularization Neural Networks

PROBLEM 4

DM6 Support Vector Machines

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Supervised vs unsupervised clustering

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Support Vector Machines

Support Vector Machines

Topic 4: Support Vector Machines

Lecture 7: Support Vector Machine

All lecture slides will be available at CSC2515_Winter15.html

CS 161 Fall 2015 Final Exam

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

CS 8520: Artificial Intelligence

Stat 602X Exam 2 Spring 2011

Discrete Mathematics and Probability Theory Fall 2013 Midterm #2

Machine Learning for NLP

Naïve Bayes for text classification

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Boosting Simple Model Selection Cross Validation Regularization

Weka ( )

Network Traffic Measurements and Analysis

Introduction to Support Vector Machines

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Notes and Announcements

CSE 573: Artificial Intelligence Autumn 2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.

Quadratic Functions Dr. Laura J. Pyzdrowski

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B

Instance-based Learning

CSE 190, Spring 2015: Midterm

Neural Networks: What can a network represent. Deep Learning, Spring 2018

Machine Learning and Pervasive Computing

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

1 Case study of SVM (Rob)

Chapter 9. Classification and Clustering

Performance Measures

ADVANCED CLASSIFICATION TECHNIQUES

Classification: Feature Vectors

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

CS446: Machine Learning Fall Problem Set 4. Handed Out: October 17, 2013 Due: October 31 th, w T x i w

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

CS 540: Introduction to Artificial Intelligence

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

CS 584 Data Mining. Classification 1

Ensemble Methods, Decision Trees

Lecture 7: Decision Trees

1-Nearest Neighbor Boundary

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Data mining with Support Vector Machine

Neural Networks: What can a network represent. Deep Learning, Fall 2018

Math B Regents Exam 0606 Page 1

6 Model selection and kernels

Slides for Data Mining by I. H. Witten and E. Frank

6 Randomized rounding of semidefinite programs

Instance-based Learning

COT 3100 Spring 2010 Midterm 2

Machine Learning: Algorithms and Applications Mockup Examination

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Classification. 1 o Semestre 2007/2008

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Pattern Recognition ( , RIT) Exercise 1 Solution

Predicting Popular Xbox games based on Search Queries of Users

Nonparametric Methods Recap

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Support Vector Machines

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

STA 4273H: Statistical Machine Learning

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Cheng Soon Ong & Christian Walder. Canberra February June 2018

10601 Machine Learning. Model and feature selection

Unsupervised Learning: Clustering

Linear methods for supervised learning

List of Exercises: Data Mining 1 December 12th, 2015

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

NO CALCULATOR ON ANYTHING EXCEPT WHERE NOTED

Transcription:

PRACTICE FINAL EXAM: SPRING 0 CS 675 INSTRUCTOR: VIBHAV GOGATE May 4, 0 The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, continue on the back of the page. Attach your cheat sheet with your exam. NAME UTD-ID if known Problem : Problem : Problem : Problem 4: Problem 5: Problem 6: Problem 7: TOTAL: Solution: EXAM SOLUTIONS

CS 675 SPRING 0 FINAL EXAM, Page of 5 May 4, 0

CS 675 SPRING 0 FINAL EXAM, Page of 5 May 4, 0 SHORT ANSWERS. ( points) The training error of -nearest neighbor is always zero (assuming that the data is not noisy). True or False. Explain. Solution: TRUE. The class of each example is the class of its nearest neighbor, which is itself.. ( points) You are given a dataset with the following statistics: # examples = 500, #attributes=0, #classes=. You are pretty sure that the data is linearly separable. You perform 5-fold cross validation on the dataset using linear support vector machines and decision trees (ID) and obtain the following results: where Table Error Error Error Error Error Average Fold Fold Fold Fold4 Fold5 Error Linear SVM /00 0/00 /00 0/00 /00 /500 ID 7/00 4/00 5/00 4/00 5/00 5/500 Error = #mis-classified examples #test-examples Your boss Mr. LightWeight looks at Table and concludes that ID is the best performing scheme for datasets of this type. Can you support his conclusion? Solution: I cannot support his conclusion (no wonder he is Mr. LightWeight). Linear SVMs are performing better than ID on all folds except the 4-th fold. I suspect that in the 4-th fold, support vectors or examples close to them are in the test set. As a result, linear SVMs have larger error.

CS 675 SPRING 0 FINAL EXAM, Page 4 of 5 May 4, 0. ( points) A Neural network with one hidden layer that contains only polynomial number of (hidden) nodes can represent any Boolean function. True or False. Explain your answer. Solution: FALSE. Some functions, for example, the parity function requires exponential number of hidden nodes, if only one hidden layer is used. 4. ( points) You are given a coin and a thumbtack andyou put Beta priors Beta(00,00) and Beta(,) on the coin and thumbtack respectively. You perform the following experiment: toss both the thumbtack and the coin 00 times. To your surprise, you get 60 heads and 40 tails for both the coin and the thumbtack. Are the following two statements true or false. The MLE estimate of both the coin and the thumbtack is the same but the MAP estimate is not. The MAP estimate of the parameter θ (probability of landing heads) for the coin is greater than the MAP estimate of θ for the thumbtack. Solution: See midterm solutions. 5. ( points) Assuming Boolean attributes, the depth of a decision tree classifier can never be larger than the number of training examples. True or False. Explain. Solution: TRUE. Each node in the decision tree divides the examples into two nonempty sets. Since there are n training examples, the number of nodes from the leaf to the root (i.e., the height) is bounded by n.

CS 675 SPRING 0 FINAL EXAM, Page 5 of 5 May 4, 0 LEARNING THEORY. (4 points) Assume that the data is 4-dimensional and each attribute is continuous (i.e., its domain is the set of reals R). Let the hypothesis space H be a set of axis parallel rectangles in 4-dimensions. What is the VC dimension of H? Explain. Solution: We saw in class that the VC-dimension of axis-parallel rectangles is d where d is the dimensionality of the data. Therefore, the VC-dimension of 4-d rectangles= 4 = 8.

CS 675 SPRING 0 FINAL EXAM, Page 6 of 5 May 4, 0. (4 points) Assume that the data is -dimensional and each attribute can take an integer value from the range [0,99]. Let the hypothesis space H be a set of axis parallel rectangles in -dimensions. Given an upper bound on the number of examples sufficient to assure that any consistent learner will, with probability 99%, output a hypothesis with error at most 0.05. Hint: Given a region in a plane bounded by the points (0,0) and (n-,n-), the number of axis parallel rectangles with integer-valued boundaries is ( n(n) ). Hint: You can use one of the following two formulas: m ǫ (ln(/δ)ln H ) m ǫ (ln(/δ)ln H ) Solution: The hint is wrong. The number of rectangles is (n(n )/). Here, n = 00 and therefore H = ((00)(00 )/) = (000). Since the learner is consistent, we will use the second formula and substitute H = 000, ǫ = 0.05 and δ = 0.0 to get a bound on m.. ( points) Within the setting of PAC learning, it is impossible to assure with probability that the concept will be learned perfectly (i.e., with true error = 0), regardless of how many training examples are provided. True or False. Explain. Solution: True. PAC learning assures that the error is less than ǫ. Therefore, ǫ cannot be zero.

CS 675 SPRING 0 FINAL EXAM, Page 7 of 5 May 4, 0 BAYESIAN NETWORKS. ( points) An empty graph is a trivial D-map. True or False. Explain. Solution: True. A DAG G is a D-map of Pr iff I Pr I G. Since an empty graph represents all possible conditional independence statements about the variables, all CI statements in Pr must be included in I G. Therefore, it is a trivial D-map.. ( points) What is the complexity of computing P(E = e) using variable elimination in the following Bayesian network along the ordering (A,B,C,D). The edges in the Bayesian network are A B, A C, B C, C D, and D E. Solution: The functions after instantiating the evidence variable are P(B A), P(C A), P(C B), P(A), P(D C) and φ(d). The induced graph along the ordering is shown below. A B C D The number of children of A, B, C and D are,, and 0 respectively. Therefore, the width is. The complexity is O(nexp(w )) where n is the number of nonevidence variables and w is the width of the ordering. Therefore, the complexity is O(4exp()).. ( points) What is the complexity of computing P(E = e) using variable elimination in the following Bayesian network along the ordering (B,C,D,A). The edges in the Bayesian network are A B, B C, C D, and D E. Solution: The functions after instantiating the evidence variable are P(B A), P(C B), P(D C), P(A) and φ(d). The induced graph along the ordering is shown below.

CS 675 SPRING 0 FINAL EXAM, Page 8 of 5 May 4, 0 B C D A The number of children of A, B, C and D are,, and 0 respectively. Therefore, the width is. The complexity is O(nexp(w )) where n is the number of nonevidence variables and w is the width of the ordering. Therefore, the complexity is O(4exp()).

CS 675 SPRING 0 FINAL EXAM, Page 9 of 5 May 4, 0 4. (4 points) Consider the Bayesian network given below: A B C E D F Are the following statements true? Explain your answer. ( point) A is conditionally independent of D given {B,C}. ( point) E is (marginally) independent of F. ( points) Which edges will you delete to make A marginally independent of C. Solution: True. The only edges in new graph after running the d-sep algorithm given in class notes are B-A and A-C. Since A and D are disconnected, they are conditionally independent given {B,C}. False. The graph obtained by running the d-sep algorithm is same as the original graph except that it is undirected and it does not include the node D. Remove the edge A C. Now A and C are marginally independent.

CS 675 SPRING 0 FINAL EXAM, Page 0 of 5 May 4, 0 4 DECISION TREES The following is data from last eight Dallas Mavericks game: Game# Opponent Point Guard Fouls Result Weak Strong No Win Strong Strong Many Loss Strong Weak Many Loss 4 Weak Weak Many Loss 5 Strong Weak No Win 6 Weak Weak Few Win 7 Strong Weak Few Loss 8 Strong Strong Few Win. (4 points) What is the entropy of the data set? (Result is the class attribute) Solution: 4 wins and 4 losses. This implies that the entropy is.. ( points) What is the information gain if you split the dataset based on the attribute Fouls? Solution: Gain(fouls) = (/8)(0) (/8)(0) /8[ (/)log (/) (/)log (/)]. ( points) We tell you that Gain(S, Opponent) = Gain(S, P ointguard) = 0.05. Based on your answer in (b) and this information, which attribute will you choose as the root node for the decision tree? Circle the appropriate option below. Opponent PointGuard Fouls Solution: Gain of Fouls is greater than both opponent and PointGuard. Therefore, we will chose Fouls.

CS 675 SPRING 0 FINAL EXAM, Page of 5 May 4, 0 5 NEURAL NETWORKS and LOGISTIC REGRES- SION. (5 points) Consider the data set given above. Assume that the co-ordinates of the points are (,), (,-), (-,) and (-,-). Draw a neural network that will have zero training error on this dataset. (Hint: you will need exactly one hidden layer and two hidden nodes). Solution: There are many possible solutions to this problem. I describe one way below. Notice that the dataset is not linearly separable. Therefore, we will need at least two hidden units. Intuitively, each hidden unit will represent a line that classifies one of the squares (or crosses) correctly but mis-classifies the other. The output unit will resolve the disagreement between the two hidden units. x H x 0ut φ(x,x ).5 H 0.5.5 All hidden and output units are simple threshold units (aka sign units). Recall that each sign unit will output a if w 0 x 0 w x w x 0 and otherwise.

CS 675 SPRING 0 FINAL EXAM, Page of 5 May 4, 0. (5 points) Which of the following will classify the data perfectly and why? Draw the decision boundary. Logistic Regression Logistic Regression with a quadratic kernel. Solution: Logistic Regression with a quadratic kernel. There will be a circle around the squares or a parabola depending on your which quadratic kernel you use.

CS 675 SPRING 0 FINAL EXAM, Page of 5 May 4, 0 6 SUPPORT VECTOR MACHINES 4 4 Consider the dataset given above. In this problem, we will use the data to learn a linear SVM of the form f(x) = sign(w x w x w ), with w =.. (4 points) What is that value of w that the SVM algorithm will output on the given data? Solution: w = w = and w =.. ( points) What is the training set error (expressed as the percentage training points misclassified)? Solution: 0%. All classified correctly.. ( points) What is the 7-fold cross validation error? Solution: /7.

CS 675 SPRING 0 FINAL EXAM, Page 4 of 5 May 4, 0 7 CLUSTERING. (5 points) Starting with two cluster centers indicated by squares, perform k-means clustering onthe six datapoints (denotedby ). In eachpanel, indicate the dataassignment and in the next panel show the new cluster means. Stop when converged or after 6 steps whichever comes first. 4 4 4 0 0 4 0 0 4 0 0 4 4 4 4 0 0 4 0 0 4 0 0 4 Solution: k-means will converge in two steps. In the first step, the points at the top will be assigned to the cluster center at the top and the points at the bottom will be assigned to the cluster center at the bottom. In the second step, the new cluster centers will be the mean of the points at the top and the points at the bottom respectively.. ( points) Explain at a high level how K-means differs from the EM algorithm. Solution: k-means yields hard clustering while EM yields soft clustering.

CS 675 SPRING 0 FINAL EXAM, Page 5 of 5 May 4, 0. ( points) Draw the agglomerative clustering tree for the following data set. You must use single link clustering. 4 0 0 4 Solution: Some judgement calls here. Many solutions possible.