Stat 342 Exam 3 Fall 2014

Similar documents
Stat 602X Exam 2 Spring 2011

Lecture 20: Bagging, Random Forests, Boosting

Nearest neighbors classifiers

PROBLEM 4

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

The exam is closed book, closed notes except your one-page cheat sheet.

CS 229 Midterm Review

RESAMPLING METHODS. Chapter 05

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

5 Learning hypothesis classes (16 points)

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Network Traffic Measurements and Analysis

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Introduction to Classification & Regression Trees

Nonparametric Classification Methods

7. Boosting and Bagging Bagging

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Machine Learning: An Applied Econometric Approach Online Appendix

Multivariate Data Analysis and Machine Learning in High Energy Physics (V)

Computer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging

I211: Information infrastructure II

Resampling methods (Ch. 5 Intro)

STA 4273H: Statistical Machine Learning

Tree-based methods for classification and regression

Nonparametric Methods Recap

Nonparametric Approaches to Regression

S2 Text. Instructions to replicate classification results.

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Machine Learning. Chao Lan

6.034 Quiz 2, Spring 2005

Model selection and validation 1: Cross-validation

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Machine Learning / Jan 27, 2010

10-701/15-781, Fall 2006, Final

From Building Better Models with JMP Pro. Full book available for purchase here.

6.867 Machine Learning

Statistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

USING CONVEX PSEUDO-DATA TO INCREASE PREDICTION ACCURACY

A Systematic Overview of Data Mining Algorithms

CSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel

Random Forest A. Fornaser

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

CS 237: Probability in Computing

Distribution-free Predictive Approaches

Final Exam. Advanced Methods for Data Analysis (36-402/36-608) Due Thursday May 8, 2014 at 11:59pm

Missing Data Analysis for the Employee Dataset

Cross-validation and the Bootstrap

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Supervised vs unsupervised clustering

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Regularization and model selection

Cross-validation and the Bootstrap

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

An Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

CSE 158, Fall 2017: Midterm

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

AKA: Logistic Regression Implementation

CS1800 Discrete Structures Fall 2016 Profs. Aslam, Gold, Ossowski, Pavlu, & Sprague December 16, CS1800 Discrete Structures Final

An introduction to SPSS

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Applying Supervised Learning

Business Club. Decision Trees

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

1) Give decision trees to represent the following Boolean functions:

Lecture 25: Review I

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

14.2 The Regression Equation

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

k-nearest Neighbors + Model Selection

Estimating Map Accuracy without a Spatially Representative Training Sample

Individualized Error Estimation for Classification and Regression Models

Math 115 Second Midterm March 25, 2010

8. Tree-based approaches

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Random Forests and Boosting

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

What is machine learning?

Lecture 9: Support Vector Machines

CS249: ADVANCED DATA MINING

Classification and Regression Trees

Clipping Lines. Dr. Scott Schaefer

Generative and discriminative classification techniques

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Penalizied Logistic Regression for Classification

Lecture 9. Support Vector Machines

Multi-Class Logistic Regression and Perceptron

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Using Machine Learning to Optimize Storage Systems

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

IOM 530: Intro. to Statistical Learning 1 RESAMPLING METHODS. Chapter 05

COS 226 Fall 2015 Midterm Exam pts.; 60 minutes; 8 Qs; 15 pgs :00 p.m. Name:

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Transcription:

Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can in the available time. I will score each question out of 0 points AND TOTAL YOUR BEST 7 SCORES. (That is, this is a 70 point exam.)

. This is a "no-intercept one-variable linear regression" linear model problem. That is, suppose yi xi i for i,,, N where the i N 0,. N 5 training cases are in the following table. are iid x 3 4 5 6 7 y 6 8.5 0.5 3 0 pts a) Find maximum likelihood estimates of the two (real-valued) parameters and. 0 pts b) Give 95% -sided prediction limits for an additional/future y based on a new value of the predictor, x 5.5. (If you don't have an appropriate table of distribution percentage points with you, indicate exactly what % point of exactly what distribution (including d.f., etc.) you would put where in your calculation.)

. One needs to do classification based on two discrete variables x and x. A large training set (with N 00, 000) is summarized below in terms of counts of training cases with yi, x i, x i of various types. (We implicitly assume these are based on an iid sample from the joint distribution of yx,, x.) y 0 cases y cases x x x 3 x x x 3 x 9,000 3,000 0,000 x 4,000,000 5,000 x 6,000 4,000 8,000 x 3,000,000 5,000 0 pts a) Of interest is an approximately optimal classifier based on the 00,000 training cases. Fill in the table below with either a 0 or in each cell to indicate whether such a classifier classifies to y 0 or to y for that x x pair., Approximate opt x : x x x 3 x x 0 pts b) Compute a (0- loss) training error rate, err, for your choice of classifier in a). 3

3. Here is a toy problem. Consider a -nn classification scheme based on a -dimensional input x. Suppose that a size 5 yx, is as below. N training set of pairs T 0,5,,6,,7, 0,7,,8 Then suppose that 3 bootstrap samples are T T T * * *3 0,5,,6, 0,7,,8,,8 0,5, 0,5,,6,,7, 0,7,7,,7,0,7,,8,,8 0 pts * a) Find the values of the -nn classifiers yˆ b x based on each of the bootstrap samples for the training x 's and record them below. (Break "ties" any way you wish.) Then give values for the boot bootstrap classifier x. * x x 5 x 6 x 7 x 8 * *3 boot x x x 0 pts b) Find the OOB (out-of-bag) (0- loss) error rate for -nn classification based on this small number of bootstrap samples. 4

0 pts 4. For the same toy scenario as in the previous problem, (-nn classification based on the training set T 0,5,,6,,7, 0,7,,8 ) find the 5-fold (0- loss) cross-validation error rate, CV. 0 pts 5. Below is another training set involving a -d predictor x. Argue carefully that there is no support vector classifier based only on x that has (0- loss) training error rate err 0. Then identify (by making a graph, not doing calculus) a maximum margin support vector classifier (with err 0 ) based on x x and x x. y 0 0 0 0 0 x 4 3 0 3 4 Rationale for no perfect SV classifier based on x alone: Maximum margin classifier based on x, x x, x : yˆ x, x I x x 0 0 0 5

0 pts 6. In a logistic regression problem with two standardized predictors x and x, the model x P y 0 x ln 0 xx P y 0 is fit twice, once by maximum likelihood and once by Lasso-penalized maximum likelihood. The two sets of fitted coefficients produced were ˆ.44, ˆ.98, ˆ.66 (run #) 0 ˆ 0.5, ˆ ˆ.4,.8 (run #) Which of the two sets of coefficients do you think is the set of Lasso coefficients and why? The fits were made using a data set with about the same numbers of y 0 and y data points. Based on run #, what classifier would be appropriate for a case where it's known that P y 0.? 0 pts 7. Suppose that 5 different classifiers are fit to a data set. If one wishes to make an ensemble classifier using weights w.3, w., w3., w4.5, and w5.5 what "ensemble" classification is produced by each of the following sets of individual classifier results? (Write 0 or in the empty cell for each set of y ˆ's. Write "0/" for any "ties.") 3 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ensemble 6

0 pts 8. Below is a p classification training set for classes. y x x 0 0 0 3 5 0 7 0 0 7 3 0 8 6 0 5 4 7 5 7 7 5 6 3 7 3 0 5 7 6 7 9 Using empirical misclassification rate as your splitting criterion and forward selection, find a reasonably simple binary tree classifier that has empirical error rate 0. Carefully describe it below, using as many nodes as you need. Then draw in the final set of rectangles corresponding to your binary tree on the graph above. At the root node: split on x (circle the correct one of these) Classify to Class 0 if (creating Node #) Classify to Class otherwise (creating Node #) Classify to Class 0 if (creating Node #3) Classify to Class otherwise (creating Node #4) Classify to Class 0 if (creating Node #5) Classify to Class otherwise (creating Node #6) Classify to Class 0 if (creating Node #7) Classify to Class otherwise (creating Node #8) Classify to Class 0 if (creating Node #7) Classify to Class otherwise (creating Node #8) Classify to Class 0 if (creating Node #9) Classify to Class otherwise (creating Node #0) 7