Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Similar documents
10-701/15-781, Fall 2006, Final

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Unsupervised Learning

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Based on Raymond J. Mooney s slides

CS 229 Midterm Review

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

CS Introduction to Data Mining Instructor: Abdullah Mueen

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

6.867 Machine Learning

2 Geometry Solutions

Clustering and The Expectation-Maximization Algorithm

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Graphical Models & HMMs

Machine Learning (BSMC-GA 4439) Wenke Liu

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Instance-based Learning

Approaches to Clustering

PS Geometric Modeling Homework Assignment Sheet I (Due 20-Oct-2017)

Multi-label classification using rule-based classifier systems

Case Study IV: Bayesian clustering of Alzheimer patients

Bayesian Methods. David Rosenberg. April 11, New York University. David Rosenberg (New York University) DS-GA 1003 April 11, / 19

Mixture Models and the EM Algorithm

K-Means Clustering. Sargur Srihari

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Using Machine Learning to Optimize Storage Systems

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

ECE276A: Sensing & Estimation in Robotics Lecture 11: Simultaneous Localization and Mapping using a Particle Filter

Introduction to Machine Learning CMU-10701

Sensor Models / Occupancy Mapping / Density Mapping

Generative and discriminative classification techniques

Generative and discriminative classification techniques

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Final Exam, F11PE Solutions, Topology, Autumn 2011

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

Clustering web search results

9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Latent Variable Models and Expectation Maximization

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

15 Graph Theory Counting the Number of Relations. 2. onto (surjective).

Style-based Inverse Kinematics

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Introduction to Approximation Algorithms

Nearest neighbors classifiers

Introduction to Mobile Robotics

Clustering Lecture 5: Mixture Model

STA 4273H: Statistical Machine Learning

K-Means Clustering 3/3/17

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

9.1. K-means Clustering

CHAPTER 3 FUZZY RELATION and COMPOSITION

Machine Learning. Supervised Learning. Manfred Huber

To earn the extra credit, one of the following has to hold true. Please circle and sign.

Expectation Maximization: Inferring model parameters and class labels

CHAPTER 3 FUZZY RELATION and COMPOSITION

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

The Multi Stage Gibbs Sampling: Data Augmentation Dutch Example

CLUSTERING. JELENA JOVANOVIĆ Web:

Dynamic Programming Assembly-Line Scheduling. Greedy Algorithms

6.867 Machine Learning

Experimental Evaluation of Latent Variable Models. for Dimensionality Reduction

Transformations Computer Graphics I Lecture 4

Chapter VIII.3: Hierarchical Clustering

ALTERNATIVE METHODS FOR CLUSTERING

Linear Classification and Perceptron

Homework #4 Programming Assignment Due: 11:59 pm, November 4, 2018

Stat 342 Exam 3 Fall 2014

Chapter 6 Continued: Partitioning Methods

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

MC 302 GRAPH THEORY 10/1/13 Solutions to HW #2 50 points + 6 XC points

K-means and Hierarchical Clustering

Math 190: Quotient Topology Supplement

Dynamic Collision Detection

Machine Learning Classifiers and Boosting

3D reconstruction class 11

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Multi-label Classification. Jingzhou Liu Dec

Decomposition of log-linear models

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Chapter 4: Algorithms CS 795

Calibration and emulation of TIE-GCM

MSA220 - Statistical Learning for Big Data

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Where Can We Draw The Line?

Voronoi Diagrams and Delaunay Triangulations. O Rourke, Chapter 5

16.Greedy algorithms

2. A Bernoulli distribution has the following likelihood function for a data set D: N 1 N 1 + N 0

Transcription:

Problem 1 Answer the following questions, and provide an explanation for each question. (5 pt) Can linear regression work when all X values are the same? When all Y values are the same? (5 pt) Can linear regression be used when the X values are actually categories? (c) (5 pt) Will the regression line be the same if you exchange X and Y? (d) (5 pt) Under what circumstances will there be over-fitting in linear regression?

Problem 2 Figure 1: Weather Bayes Nets In this question you must model a problem with 4 binary variables: G = gray, V = Vancouver, R = rain and S = sad. Consider the directed graphical model describing the relationship between these variables shown in Figure 1. (10 pt) Write down an expression for P (S = 1 V = 1) in terms of α, β, γ, δ. (5 pt) Write down an expression for P (S = 1 V = 0). Is this the same or different to P (S = 1 V = 1)? Explain why. (c) (5 pt) Find maximum likelihood estimates of α, β, γ using the following data set, where each row is a training case. V G R S 1 1 1 1 1 1 0 1 1 0 0 0

Problem 3 (10 pt) Canonical correlation analysis (CCA) handles the situation that each data point (i.e., each object) has two representations (i.e., two sets of features), e.g., a web page can be represented by the text on that page, and can also be represented by other pages linked to that page. Now suppose each data point has two representations x and y, each of which is a 2-dimensional feature vector (i.e., x = [x 1, x 2 ] T & y = [y 1, y 2 ] T ). Given a set of data points, CCA finds a pair of projection directions (u, v) to maximize the sample correlation corr(u ˆ T x)(v T y) along the directions u and v. In other words, after we project one representation of data points onto u and the other representation of data points onto v, the two projected representations u T x and v T y should be maximally correlated (intuitively, data points with large values in one projected direction should also have large values in the other projected direction). Now we can see data points shown in the Figure 2, where each data point has two representations x = [x 1, x 2 ] T and y = [y 1, y 2 ] T. Note that data are paired: each point in the left figure corresponds to a specific point in the right figure and vice versa, because these two points are two representations of the same object. Different objects are shown in different gray scales in the two figures (so you should be able to approximately figure out how points are paired). In the right figure we ve given one CCA projection direction v, draw the other CCA projection direction u in the left figure. Figure 2: Draw the CCA projection direction in the left figure

Problem 4 There is a set S consisting of 6 points in the plane shown as below, a = (0, 0), b = (8, 0), c = (16, 0), d = (0, 6), e = (8, 6), f = (16, 6). Now we run the k-means algorithm on those points with k = 3. The algorithm uses the Euclidean distance metric (i.e. the straight line distance between two points) to assign each point to its nearest centroid. Ties are broken in favor of the centroid to the left/down. Two definitions: A k-starting configuration is a subset of k starting points from S that form the initial centroids, e.g. {a, b, c}. A k-partition is a partition of S into k non-empty subsets, e.g. {a, b, e}, {c, d}, {f} is a 3-partition. Clearly any k-partition induces a set of k centroids in the natural manner. A k-partition is called stable if a repetition of the k-means iteration with the induced centroids leaves it unchanged. (10 pt) How many 3-starting configuration are there? (Remember, a 3-starting configuration is just a subset, of size 3, of the six data points). (10 pt) Fill in the following table (like an example in the first line): 3-partition Stable? An example 3-starting configuration that can arrive at the 3- partition after running k-means (or write none if no such 3-starting configuration exists) {a, b},{d, e},{c, f} Y {b, c, e} 4 {a, b, e},{c, d},{f} {a, d},{b, e},{c, f} {a},{d},{b, c, e, f} {a, b},{d},{c, e, f} {a, b, d},{c},{e, f} # of unique 3-starting configuration that arrive at the 3-partition

Problem 5 (30 pt) This problem is about the EM algorithm for mixtures of Bernoullis. The expectation of the complete-data log likelihood with respect to the posterior distribution is given by N K D Q(θ, θ t ) = log π k + x ij log µ kj + (1 x ij ) log(1 µ kj ) i=1 k=1 r ik The pdf of the beta distribution Beta(α, β) is given by f(x; α, β) = Γ(α+β) Γ(α)Γ(β) xα 1 (1 x) β 1. (15 pt) Derive the M step for ML estimation of a mixture of Bernoullis. j=1 µ kj =? (15 pt) Derive the M step for MAP estimation of a mixture of Bernoullis with a Beta(α, β) prior. µ kj =?