Stat 602X Exam 2 Spring 2011

Similar documents
Stat 342 Exam 3 Fall 2014

Supervised vs unsupervised clustering

CS 229 Midterm Review

STA 4273H: Statistical Machine Learning

Classification: Linear Discriminant Functions

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

The exam is closed book, closed notes except your one-page cheat sheet.

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Naïve Bayes for text classification

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

The Curse of Dimensionality

Network Traffic Measurements and Analysis

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Bayes Classifiers and Generative Methods

DESIGN AND EVALUATION OF MACHINE LEARNING MODELS WITH STATISTICAL FEATURES

Introduction to AI Spring 2006 Dan Klein Midterm Solutions

A Systematic Overview of Data Mining Algorithms

Nearest neighbors classifiers

Tree-based methods for classification and regression

CPSC Coding Project (due December 17)

PRACTICE FINAL EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Ensemble Methods, Decision Trees

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Random Forest A. Fornaser

Trimmed bagging a DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI) Christophe Croux, Kristel Joossens and Aurélie Lemmens

Pattern Recognition for Neuroimaging Data

7. Boosting and Bagging Bagging

Machine Learning. Chao Lan

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Mixture Models and the EM Algorithm

RESAMPLING METHODS. Chapter 05

Classification and Regression Trees

Nonparametric Methods Recap

IEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Algorithms: Decision Trees

A very different type of Ansatz - Decision Trees

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

The supclust Package

Support Vector Machines

Lecture 19: Decision trees

Introduction to Classification & Regression Trees

Machine Learning. Supervised Learning. Manfred Huber

Data Mining: STATISTICA

Laplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC

Please write your initials at the top right of each page (e.g., write JS if you are Jonathan Shewchuk). Finish this by the end of your 3 hours.

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Machine Learning Techniques for Data Mining

R07. FirstRanker. 7. a) What is text mining? Describe about basic measures for text retrieval. b) Briefly describe document cluster analysis.

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

CS6220: DATA MINING TECHNIQUES

Notes based on: Data Mining for Business Intelligence

Applying Supervised Learning

CSE 190, Spring 2015: Midterm

Classification Algorithms in Data Mining

Probabilistic Approaches

Classification and Regression Trees

Computer Vision Group Prof. Daniel Cremers. 6. Boosting

Slides for Data Mining by I. H. Witten and E. Frank

Pattern Recognition ( , RIT) Exercise 1 Solution

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Machine Learning in Action

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

10-701/15-781, Fall 2006, Final

CSEP 573: Artificial Intelligence

Content-based image and video analysis. Machine learning

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Lecture 20: Bagging, Random Forests, Boosting

What is machine learning?

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Data Mining and Knowledge Discovery: Practice Notes

Information Management course

Performance Evaluation of Various Classification Algorithms

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Data mining techniques for actuaries: an overview

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Machine Learning (CS 567)

Non-Parametric Modeling

Data Mining. Lesson 9 Support Vector Machines. MSc in Computer Science University of New York Tirana Assoc. Prof. Dr.

Business Club. Decision Trees

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

Support Vector Machines

Support Vector Machines

Time: 3 hours. Full Marks: 70. The figures in the margin indicate full marks. Answers from all the Groups as directed. Group A.

Nearest neighbor classification DSE 220

Machine Learning: Algorithms and Applications Mockup Examination

Link Prediction for Social Network

Transcription:

Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed

. Below is a small p classification training set (for classes) displayed in graphical and tabular forms (circles are class and squares are class ). y x x 0 3 5 7 0 7 0 8 6 4 7 5 7 7 0 9 4 3 3 0 4 6 7 7 9 a) Using the geometry above (and not trying to solve an optimization problem analytically) find the support vector machine for this classification problem. You may find it helpful to know that if uv,, and w points in and u v then the distance from the point w to the line through u and v is w v u w v u vu uv v u v u b) List the set of support vectors and the "margin" ( M ) for your SVM.

. Below is a different p classification training set for classes (specified as in problem ). y x x 3 5 7 0 7 3 8 6 5 4 7 5 7 7 5 6 3 7 3 0 5 7 6 7 9 a) Using empirical misclassification rate as your splitting criterion and standard forward selection, find a reasonably simple binary tree classifier that has empirical error rate 0. Carefully describe it below, using as many nodes as you need. At the root node: split on x (circle the correct one of these) Classify to Class if (creating Node #) Classify to Class otherwise (creating Node #) At Node # : split on x Classify to Class if (creating Node #3) Classify to Class otherwise (creating Node #4) At Node # : split on x Classify to Class if (creating Node #5) Classify to Class otherwise (creating Node #6) 3

At Node # : split on x Classify to Class if (creating Node #7) Classify to Class otherwise (creating Node #8) At Node # : split on x Classify to Class if (creating Node #7) Classify to Class otherwise (creating Node #8) At Node # : split on x Classify to Class if (creating Node #9) Classify to Class otherwise (creating Node #0) At Node # : split on x Classify to Class if (creating Node #) Classify to Class otherwise (creating Node #) (Add more of these on another page if you need them.) b) Draw in the final set of rectangles corresponding to your binary tree on the graph of the previous page. 4

c) For every sub-tree, T, of your full binary tree above, list in the table below the size (number of final nodes) of the sub-tree, T, and the empirical error rate of its associated classifier. Full tree pruned at Nodes # T (pruned tree size) err None (full tree) (If you need more room in the table, add rows on another page.) d) Using the values in your table from c), find for every 0 a sub-tree of your full tree minimizing C T err 5

3. Here again is the p classification training set specified in problem. y x x 3 5 7 0 7 3 8 6 5 4 7 5 7 7 5 6 3 7 3 0 5 7 6 7 9 Using "stumps" (binary trees with only nodes) as your base classifiers, find the M term AdaBoost classifier for this problem. (Show your work!) 6

4. The machine learning/data mining folklore is full of statements like "combining uncorrelated classifiers through majority voting produces a committee classifier better than every individual in the committee." Consider the scenario outlined in the table below as regards classifiers f, f, and f and a target (class variable) taking values in 0,. 3 Outcome f f f3 Committee Classification y Probability 0 0 0 0 0 0 0 0.008 3 0 0 0.008 4 0 0 0.008 5 0 0.08 6 0 0.08 7 0 0.08 8 0 0 9 0 0 0 0 0 0 0 0 0 0 3 0 4 0 5 0..9..008.9..008.9..008 3.9..08.9..08.9..08 6.9 Fill in the (majority vote) Committee Classification column and say carefully (show appropriate calculations to support your statement(s)) what the example shows about the folklore. 3 7

5. Suppose that in a p linear discriminant analysis problem, 4 transformed means * k k are covariance matrix 0 4 3.5.5,,, and 0 4.5.5 * * * * 3 4 3.5.65 diag 4.75,.5.65 3.5. These have sample a) Suppose that one wants to do reduced rank ( rank ) linear discrimination based on a single real variable, w u u x Identify an appropriate vector u, u and with your choice of vector, give the function f w mapping,,3, 4 that defines this 4-class classifier. b) What set of "prototypes" ( w ) yields the classifier in a)? 8

6. In classification problems, searching for a small number of effective basis functions h, h,, hk p to use in transforming a complicated situation involving input x, into a simpler situation k involving "feature vector" h x, h x,, h k x is a major concern. a) The p toy classification problem with K classes and training data below is not simple. y x 3 0 3 But it's easy to find a single function h x that makes the problem into a linearly separable problem with k p. Name one such function. b) In what specific way(s) does the use of kernels and SVM methodology typically lead to identification of a small number of important features (basis functions) that are effective in -class classification problems? c) If you didn't have available SVM software but DID have available Lasso or LAR regression software, say how you might use the kernel idea and the Lasso/LAR software in the search for a few effective features in a -class classification problem. 9

7. Consider a Bayesian model averaging problem where x takes values in 0, and y takes values in 0,. The quantity of interest is P y x /P y 0 x and there are M models under consideration. We'll suppose that joint probabilities for xy, are as given in the tables below for the two models for some p 0, and r 0, Model Model y\ x 0 y\ x 0 0 p / p / 0.5.5.5.5 r / r / so that under Model, the quantity of interest is.5 / p and under Model, it is r /.5. Suppose that under both models, training data x, y i,,, N are iid. For priors, in Model suppose that i a priori p Beta, and in Model suppose that a priori r Beta, the prior probabilities of the two models are.5. i. Further, suppose that Find the posterior probabilities of the models, T and T and the Bayes model average squared error loss predictor of P y x /P y 0 x. (You may think of the training data as summarized in the 4 counts N the number of training vectors with value x, y.) xy, 0