Latent Variable Models for Structured Prediction and Content-Based Retrieval

Similar documents
A latent variable ranking model for content-based retrieval

Multiple-Choice Questionnaire Group C

Search Engines. Information Retrieval in Practice

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

CS229: Action Recognition in Tennis

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Introduction to SLAM Part II. Paul Robertson

Computer Vision. Exercise Session 10 Image Categorization

TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation

Transfer Learning Algorithms for Image Classification

An Introduction to Pattern Recognition

Using the Forest to See the Trees: Context-based Object Recognition

Bus Detection and recognition for visually impaired people

Tag Recommendation for Photos

2/15/2009. Part-Based Models. Andrew Harp. Part Based Models. Detect object from physical arrangement of individual features

CS5670: Computer Vision

ImageCLEF 2011

Probabilistic Graphical Models Part III: Example Applications

Lecture 11: Clustering Introduction and Projects Machine Learning

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Experiments of Image Retrieval Using Weak Attributes

Dimension Induced Clustering

Applying Supervised Learning

Tri-modal Human Body Segmentation

Supervised learning. y = f(x) function

ECG782: Multidimensional Digital Signal Processing

Discriminative classifiers for image recognition

Object Detection by 3D Aspectlets and Occlusion Reasoning

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Topological Mapping. Discrete Bayes Filter

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al.

Figure 1: Workflow of object-based classification

Beyond Bags of features Spatial information & Shape models

Trademark Matching and Retrieval in Sport Video Databases

3D Spatial Layout Propagation in a Video Sequence

Large scale object/scene recognition

Segmentation. Bottom up Segmentation Semantic Segmentation

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Statistics of Natural Image Categories

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

BUAA AUDR at ImageCLEF 2012 Photo Annotation Task

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Integrating Visual and Textual Cues for Query-by-String Word Spotting

Online structured learning for Obstacle avoidance

Spatial Latent Dirichlet Allocation

Lecture 18: Human Motion Recognition

Conditional Random Fields for Object Recognition

Dimensionality Reduction using Relative Attributes

Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs

Sports Field Localization

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

Supplementary Material for Ensemble Diffusion for Retrieval

Continuous Visual Vocabulary Models for plsa-based Scene Recognition

Structured Models in. Dan Huttenlocher. June 2010

Urban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL

Person Re-identification for Improved Multi-person Multi-camera Tracking by Continuous Entity Association

Motion Tracking and Event Understanding in Video Sequences

570 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 2, FEBRUARY 2011

International Journal of Innovative Research in Computer and Communication Engineering

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Object of interest discovery in video sequences

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Accelerometer Gesture Recognition

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Visual Recognition and Search April 18, 2008 Joo Hyun Kim

Generative and discriminative classification techniques

Sign Language Recognition using Dynamic Time Warping and Hand Shape Distance Based on Histogram of Oriented Gradient Features

Online Open World Face Recognition From Video Streams

Color Image Segmentation Using a Spatial K-Means Clustering Algorithm

Visuelle Perzeption für Mensch- Maschine Schnittstellen

CONTENT analysis of video is to find meaningful structures

08 An Introduction to Dense Continuous Robotic Mapping

Contexts and 3D Scenes

Facial Expression Recognition Using Non-negative Matrix Factorization

Storyline Reconstruction for Unordered Images

Visual localization using global visual features and vanishing points

Object and Action Detection from a Single Example

From Structure-from-Motion Point Clouds to Fast Location Recognition

Internal Organ Modeling and Human Activity Analysis

A Background Modeling Approach Based on Visual Background Extractor Taotao Liu1, a, Lin Qi2, b and Guichi Liu2, c

Pattern recognition (4)

Separating Objects and Clutter in Indoor Scenes

ECS289: Scalable Machine Learning

Large-scale visual recognition Efficient matching

Content-based Dimensionality Reduction for Recommender Systems

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Local Features and Bag of Words Models

Clustering will not be satisfactory if:

Exploring Geotagged Images for Land-Use Classification

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Gesture Spotting and Recognition Using Salience Detection and Concatenated Hidden Markov Models

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

Transcription:

Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio Torralba

Scene Recognition airport mountain bar Gesture Recognition Hands Crossed Hands Crossed Hands Crossed Hands Opened Hands Opened Hands Opened

Scene Recognition mountain airport Many problems in barvision involve learning mappings from complex image spaces to semantic categories. Gesture Recognition Hands Crossed Hands Crossed Hands Crossed Hands Opened Hands Opened Hands Opened

Why Hidden Variables? Semantic Classes High dimensional Low dimensional Classic mixture model More general

Hands Crossed Hands Crossed Hands Crossed Hands Opened Hands Opened Hands Opened

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Structured Prediction

Temporal Dependencies Part of Speech Tagging Gesture Recognition

Spatial Dependencies Image Annotation

Sequence Prediction Models Hidden variables summarize what is important about the past

Sequence Prediction Models Distributions over single strings. X is discrete set.

Hands Crossed Hands Opened... Discretize features

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Weighted Automata Representation (WA) Operator Model Representation (OOM) Initial State Vector Function Parametrization Model Operators Describes the distribution as a dynamic process

Mapping from standard HMM to WA parametrization j i i j Forward-Backward Equations

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Why Spectral Learning? Spectral Learning: Algebraic method for recovering model parameters from observable statistics. These methods exploit directly the markovianity of the process They are fast, simple and scale easily to large datasets Much faster than alternative approaches based on Expectation Minimization

Hankel Matrix Distribution generated by a WA with n states Sub-block defined by a basis

Duality between n-rank factorizations of Hankel and WAs =

Recovering Operators =

Spectral Method We can recover a parametrization for the distribution from (almost) any rank-n factorization of H. The spectral method uses the thin SVD factorization. Costs depends on number of prefixes and suffixes

Discrete Homogeneous HMM

Modeling paired sequences Conditional Joint

Experiments The task: Recognize actions in tennis (serve, hit, non-hit,...) S S H H NH H H NH NH NH NH S S S The experimental setting: Take sequences from 4 games and cut them in subsequences. Random partition sub-sequences into training and test. Evaluation metric: average F1 (geometric mean of precision and recall) S

The features Video Sequence HOG3D descriptors from player bounding boxes Codebook Hard Index of closest Codebook entry Soft The spectral method can also be used for the soft representation similarity to codebook entry

Results CRF Spectral Joint

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Complex Image Space

Color important for outdoor images less important for indoor images

Latent classes can model variability

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Global Ranking Model D Database elementary relevance functions global ranking function Q Training Queries

Global Ranking Model set of triplet constraints database item a is more relevant to q than item b Ranking Loss function

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Ranking model with latent variables Mixture of specialized ranking functions One for each latent class Log-linear model Feature representation

Ranking model with latent variables Ranking Loss function Alternating optimization strategy

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

Parameter estimation constraints with non-zero loss subgradient with respect to The influence of query q in the update of relevance function g is weighted by the probability that q belongs to class g.

Parameter Estimation more negative = better ranking performance of class g for query q subgradient with respect to If class g predicts good rankings for constraints of query q, then the update will increase the probability that q belongs to g.

Outline Latent Variable Models for Structured Structure Prediction Problem Representing distributions using WA Spectral learning algorithm Examples Mixture Model for Content-Based Image Retrieval Global Ranking Model Mixture Ranking Model Learning a Ranking Function Experiments

SUN Dataset 12000 images, indoor and outdoor scenes. Images are annotated with object tags. Ground-truth relevance function derived from object annotations. 5 Random Partitions: - 6000 database images - 2000 train queries - 1000 validation queries - 2000 test queries - 1000 novel-database

Ground-Truth Constraints We create ranking constraints for each train query: 1- Find top K database nearest neighbors according to ground-truth relevance function. 2- Sample L items from remaining items 3- Generate KL ranking triplets 320,000 total number of ranking triplets.

Model Comparisons Global SVM: Learns a single weighted combination of elementary relevance functions. Transductive SVM: For each test query, it learns a relevance function using ranking contraints from k nearest neighbors. Mixture: A mixture ranking model, the number of hidden states is chosen using validation queries. We report precision and recall curves for predicting the 100 most relevant items for each test query.

Results

Results

Latent Classes

Summary Hidden variables can be useful for a variety of problems involving complex data. Spectral learning methods are a good tool for inducing hidden structure. Future Directions Apply the spectral method to large-scale computer vision tasks. Spectral methods for unsupervised learning over structured data