One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

Similar documents
Scalable Bayes Clustering for Outlier Detection Under Informative Sampling

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

ALTERNATIVE METHODS FOR CLUSTERING

COMS 4771 Clustering. Nakul Verma

Hierarchical Multiclass Object Classification

Randomized Algorithms for Fast Bayesian Hierarchical Clustering

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Clustering Lecture 5: Mixture Model

08 An Introduction to Dense Continuous Robotic Mapping

K-means and Hierarchical Clustering

Clustering and Visualisation of Data

Inference and Representation

K-Means and Gaussian Mixture Models

Mixture Models and the EM Algorithm

Spatial Latent Dirichlet Allocation

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Warped Mixture Models

Clustering algorithms

Text Document Clustering Using DPM with Concept and Feature Analysis

Chapter 6 Continued: Partitioning Methods

Machine Learning (BSMC-GA 4439) Wenke Liu

Computer Vision. Exercise Session 10 Image Categorization

Deep Generative Models Variational Autoencoders

Pattern Recognition ( , RIT) Exercise 1 Solution

A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data

Note Set 4: Finite Mixture Models and the EM Algorithm

Generative and discriminative classification techniques

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

Global modelling of air pollution using multiple data sources

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Based on Raymond J. Mooney s slides

Lecture 11: Classification

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Improving Recognition through Object Sub-categorization

Learning to Segment Document Images

Cluster Analysis. Debashis Ghosh Department of Statistics Penn State University (based on slides from Jia Li, Dept. of Statistics)

Mixture models and clustering

Problem 1 (20 pt) Answer the following questions, and provide an explanation for each question.

Machine Learning (BSMC-GA 4439) Wenke Liu

Cluster Analysis. Jia Li Department of Statistics Penn State University. Summer School in Statistics for Astronomers IV June 9-14, 2008

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

arxiv: v1 [stat.ap] 1 May 2018

Clustering Relational Data using the Infinite Relational Model

10-701/15-781, Fall 2006, Final

Introduction to Mobile Robotics

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

STA 4273H: Sta-s-cal Machine Learning

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

Chapter 4: Non-Parametric Techniques

Recursive Similarity-Based Algorithm for Deep Learning

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Fisher vector image representation

Machine Learning Lecture 3

Methods for Intelligent Systems

Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web

Latent Variable Models and Expectation Maximization

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Package clusternomics

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Trajectory Analysis and Semantic Region Modeling Using A Nonparametric Bayesian Model

Part Localization by Exploiting Deep Convolutional Networks

Model-based Clustering With Probabilistic Constraints

Expectation Maximization (EM) and Gaussian Mixture Models

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Nonparametric Bayesian Texture Learning and Synthesis

arxiv: v1 [cs.cv] 17 Nov 2016

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Convex Optimizations for Distance Metric Learning and Pattern Classification

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

arxiv: v2 [cs.lg] 25 Aug 2017

Multi-Modal Generative Adversarial Networks

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

SD 372 Pattern Recognition

Expectation Maximization: Inferring model parameters and class labels

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

A noninformative Bayesian approach to small area estimation

Spatial Outlier Detection

Sampling Table Configulations for the Hierarchical Poisson-Dirichlet Process

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Mixtures of Gaussians and Advanced Feature Encoding

Some questions of consensus building using co-association

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

9.1. K-means Clustering

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Automatic Tracking of Moving Objects in Video for Surveillance Applications

Fuzzy Segmentation. Chapter Introduction. 4.2 Unsupervised Clustering.

Probabilistic multi-resolution scanning for cross-sample differences

Accelerating the Relevance Vector Machine via Data Partitioning

Transcription:

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E. Salazar (Reading group) June 10, 2011 1 / 13

Summary Contribution: The authors develop a hierarchical nonparametric Bayesian model for learning a novel category based on a single training example. The mode discovers how to group categories into meaningful super-categories that express different priors for new classes. Given a single example of a novel category, the model is able to quickly infer which super-category the new basic-level category should belong to.

Motivation Categorizing an object requires information about the category s mean and variance in an appropriate feature space Similarity-based approach: - the mean represents the category prototype - the inverse variances (precisions) correspond to the dimensional weights in a category-specific similarity metric In this context, a one-shot learning is not possible because a single example can provide information about the mean but not about the variances (or similarity metric) Proposal: The proposed model leverages higher-order knowledge abstracted from previously learned categories to estimate the new category s prototype (mean) as well as an appropriate similarity metric (variance) from just one single example or more E. Salazar (Reading group) June 10, 2011 3 / 13

Hierarchical Bayesian model Consider observing a set of N input features {x 1,..., x n }, x n R D Features derived from high-dimensional data such images, (e.g. D = 50, 000). Assumption: fixed two-level category hierarchy. Suppose that N objects are partitioned into C basic-level (level-1) categories. That partition is represented by a vector z b = (z b 1,..., zb N ) such that zb n {1,..., C} The C basic-level categories are partitioned into K super-categories (level-2) represented by z s = (z s 1,..., zs C ) such that zs c {1,..., K} E. Salazar (Reading group) June 10, 2011 4 / 13

For any basic-level category c, the distribution over the observed feature vectors is D P (x n zn b = c, θ1 ) = N (x n d µc d, 1/τ d c ) (1) d=1 where θ 1 = {µ c, τ c } C c=1 denotes the level-1 category parameters Let k = zc, s the level-1 category c belong to level-2 category k and θ 2 = {µ k, τ k, α k } K k=1 the level-2 parameters D P (µ c, τ c θ 2, z s ) = P (µ c d, τ d c θ2, z s ) d=1 P (µ c d, τ c d θ2 ) = P (µ c d τ c d, θ2 )P (τ c d θ2 ) (2) = N (µ c d µk d, 1/(ντ c d ))Γ(τ c d αk d, αk d /τ k d )

P (µ c d, τ c d θ2 ) = P (µ c d τ c d, θ2 )P (τ c d θ2 ) = N (µ c d µk d, 1/(ντ c d ))Γ(τ c d αk d, αk d /τ k d ) It can be easily derived that E[µ c ] = µ k, E[τ c ] = τ k That means: the expected values of the basic level-1 parameters θ 1 are given by the corresponding level-2 parameters θ 2 The parameters α k controls the variability of τ c around its mean For the level-2 parameters θ 2 they assume: E. Salazar (Reading group) June 10, 2011 6 / 13

Modeling the number of super-categories The model is presented with a two-level partition z = {z s, z b } that assumes a fixed hierarchy for sharing parameters Aim: To infer the distribution over the possible categories z Proposal: The authors place a nonparametric two-level nested Chinese Restaurant Prior (ncrp) (Blei et al. 2003, 2010) over z The ncrp(γ) extends CRP to nested sequence of partitions, one for each level of the tree (prior: γ Γ(1, 1)) In this case, each observation n is first assigned to the super category z s n using Eq. 4 Its assignment to the basic-level category z b n (under a super category z s n) is again recursively drawn from Eq. 4 Full generative model in Fig. 1 right panel E. Salazar (Reading group) June 10, 2011 7 / 13

Inference Sampling level-1 parameters: Given θ 2 and z the conditional distribution P (µ c, τ c θ 2, z, x) is Normal-Gamma also Sampling level-2 parameters: Given z and θ 1 the conditional distributions over the mean µ k and precision τ k take Gaussian and Inverse-Gamma forms. Also, for α k Sampling assignments z: Given model parameters θ = {θ 1, θ 2 }, combining the likelihood term with the ncrp(γ) prior, the posterior over the assignment z n can be calculated as follows: E. Salazar (Reading group) June 10, 2011 8 / 13

One-shot learning Consider observing a single new instance x of a novel category c Conditioned on the level-2 parameters θ 2 and the tree structure z. - Compute the posterior distribution over the assignments z c using Eq. 5 (which super-category the novel category belong to) - The new category can either be placed under one of the existing super-categories or create its own super-category Given an inferred assignment z c and using Eq. 2, we can infer the posterior mean and precisions terms {µ, τ } for the novel category Also, the conditional probability that a new test input x t belongs to a novel category c is: E. Salazar (Reading group) June 10, 2011 9 / 13

Experimental results The authors present experimental results on the MNIST handwritten digit and MSR Cambridge object recognition image datasets. The proposed model was compared with four base-line models Euclidean: use Euclidean metric, i.e. all precision terms are set to one and are never updated HB-Flat: always uses a single super-category HB-Var: is based on clustering only covariance matrices without taking into account the means of the super-categories MLE: ignores hierarchical Bayes altogether and estimates a category-specific mean and precision from sample averages Oracle: model that always uses the correct, instead of inferred, similarity metric E. Salazar (Reading group) June 10, 2011 10 / 13

MSR Cambridge Dataset The Cambridge Dataset contains images of 24 different categories The figure shows 24 basic-level categories along with a typical partition that the model discovers, where many super-categories contain semantically similar basic-level categories E. Salazar (Reading group) June 10, 2011 12 / 13