Scene Grammars, Factor Graphs, and Belief Propagation

Similar documents
Scene Grammars, Factor Graphs, and Belief Propagation

arxiv: v2 [cs.cv] 30 Jul 2018

arxiv: v1 [cs.cv] 3 Jun 2016

Graphical Models for Computer Vision

Machine Learning. Sourangshu Bhattacharya

08 An Introduction to Dense Continuous Robotic Mapping

Segmentation. Bottom up Segmentation Semantic Segmentation

Structured Models in. Dan Huttenlocher. June 2010

ELL 788 Computational Perception & Cognition July November 2015

Learning and Recognizing Visual Object Categories Without First Detecting Features

Algorithms for Markov Random Fields in Computer Vision

Learning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Probabilistic Robotics

Located Hidden Random Fields: Learning Discriminative Parts for Object Detection

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Deep Generative Models Variational Autoencoders

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Markov Random Fields and Gibbs Sampling for Image Denoising

Human Upper Body Pose Estimation in Static Images

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

STA 4273H: Statistical Machine Learning

CS 664 Flexible Templates. Daniel Huttenlocher

A New Approach to Early Sketch Processing

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Visual Motion Analysis and Tracking Part II

A Sample of Monte Carlo Methods in Robotics and Vision. Credits. Outline. Structure from Motion. without Correspondences

Chapter 21 A Stochastic Grammar for Natural Shapes

A Hierarchical Compositional System for Rapid Object Detection

Part I: HumanEva-I dataset and evaluation metrics

COMP90051 Statistical Machine Learning

CRFs for Image Classification

Learning Deep Structured Models for Semantic Segmentation. Guosheng Lin

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

A SYNTAX FOR IMAGE UNDERSTANDING

Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing

Generative and discriminative classification techniques

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Introduction to Machine Learning

Age Invariant Face Recognition Aman Jain & Nikhil Rasiwasia Under the Guidance of Prof R M K Sinha EE372 - Computer Vision and Document Processing

Probabilistic Robotics

Switching Hypothesized Measurements: A Dynamic Model with Applications to Occlusion Adaptive Joint Tracking

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Contextual Recognition of Hand-drawn Diagrams with Conditional Random Fields

Edge and local feature detection - 2. Importance of edge detection in computer vision

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Cue Integration for Figure/Ground Labeling

Probabilistic Tracking and Reconstruction of 3D Human Motion in Monocular Video Sequences

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Learning to parse images of articulated bodies

Problem Set 4. Assigned: March 23, 2006 Due: April 17, (6.882) Belief Propagation for Segmentation

2/15/2009. Part-Based Models. Andrew Harp. Part Based Models. Detect object from physical arrangement of individual features

Scale-Invariant Contour Completion using Conditional Random Fields

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

Lecture 3: Conditional Independence - Undirected

Graphical Models & HMMs

CSE 490R P1 - Localization using Particle Filters Due date: Sun, Jan 28-11:59 PM

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Probabilistic Graphical Models

Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Lecture 9: Undirected Graphical Models Machine Learning

Bayesian Networks Inference (continued) Learning

Statistical and Learning Techniques in Computer Vision Lecture 1: Markov Random Fields Jens Rittscher and Chuck Stewart

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Tracking in Action Space

Robotics Programming Laboratory

Detection of Man-made Structures in Natural Images

Day 3 Lecture 1. Unsupervised Learning

Optimal Denoising of Natural Images and their Multiscale Geometry and Density

Cambridge Interview Technical Talk

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Conditional Random Fields for Object Recognition

What is machine learning?

Context Based Object Categorization: A Critical Survey

10-701/15-781, Fall 2006, Final

Gaussian Processes, SLAM, Fast SLAM and Rao-Blackwellization

Nested Sampling: Introduction and Implementation

Humanoid Robotics. Monte Carlo Localization. Maren Bennewitz

CS 1674: Intro to Computer Vision. Attributes. Prof. Adriana Kovashka University of Pittsburgh November 2, 2016

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization. Wolfram Burgard

OCCLUSION BOUNDARIES ESTIMATION FROM A HIGH-RESOLUTION SAR IMAGE

Supervised texture detection in images

Introduction to Mobile Robotics

Content-based image and video analysis. Machine learning

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Max-Product Particle Belief Propagation

Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

A novel template matching method for human detection

Postprint.

Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Transcription:

Scene Grammars, Factor Graphs, and Belief Propagation Pedro Felzenszwalb Brown University Joint work with Jeroen Chua

Probabilistic Scene Grammars General purpose framework for image understanding and machine perception. What are the objects in the scene, and how are they related? Objects have parts that are (recursively) objects. Relationships provide context for recognition. Relationships are captured by compositional rules.

Example: Image Restoration Clean image x Measured image y = x + n. Ambiguous problem. Impossible to restore a pixel by itself. Context is of key importance. Requires modeling relationships between pixels.

Example: Object Recognition

Example: Object Recognition Context is of key importance. Requires modeling relationships between objects.

Vision as Bayesian Inference The goal is to recover information about the world from an image. Hidden variables X (the world/scene). Observations Y (the image). Consider the posterior distribution and Bayes Rule p(x Y ) = p(y X )p(x ) p(y ) The approach involves an imaging model p(y X ) And a prior distribution p(x )

Modeling scenes p(x ) Scenes are complex high-dimensional structures. The number of possible scenes is very large (infinite), yet scenes have regularities. Faces have eyes. Boundaries of objects are piecewise smooth. The set of regular scenes forms a Language. Scenes can be generate using stochastic grammars.

The Framework Representation: Probabilistic scene grammar. Transformation: Grammar model to factor graph. Inference: Loopy belief propagation. Learning: Maximum likelihood (EM).

Scene Grammar Scenes are structures generated by a stochastic grammar. Scenes are composed of objects of several types. Objects are composed of parts that are (recursively) objects. Parts tend to be in certain relative locations. The parts that make up an object can vary.

Scene Grammar Finite set of symbols (object types) Σ. Finite pose space Ω for each symbol. Finite set of productions R. A 0 {A 1,..., A K } A i Σ Rule selection probabilities p(r). Conditional pose distributions associated with each rule. p i (ω i ω 0 ) Self-rooting probabilities ɛ A.

Generating a scene Set of building blocks, or bricks, (A, ω) Σ Ω. Brick (A, ω) is on if the scene has an object of type A in pose ω. Stochastic process: Initially all bricks are off. Independently turn each brick (A, ω) on with probability ɛ A. The first time a brick is turned on, expand it. Expanding (A, ω): Select a rule A {A 1,..., A K }. Select K poses (ω 1,...,ω K ) conditional on ω. Turn on bricks (A 1, ω 1 ),..., (A K, ω K ).

A grammar for scenes with faces Symbols Σ = {FACE, EYE, NOSE, MOUTH}. Poses space Ω = {(x, y, size)}. Rules: (1) FACE {EYE, EYE, NOSE, MOUTH} (2) EYE {} (3) NOSE {} (4) MOUTH {} Conditional pose distributions for (1) specify typical locations of face parts within a face. Each symbol has a small self rooting probability.

Random scenes with face model

A grammar for images with curves Symbols Σ = {C, P}. Pose of C specifies position and orientation. Pose of P specifies position. Rules: (1) C(x, y, θ) {P(x, y)} (2) C(x, y, θ) {P(x, y), C(x + x θ, y + y θ, θ)} (3) C(x, y, θ) {C(x, y, θ + 1)} (4) C(x, y, θ) {C(x, y, θ 1)} (5) P {}

Random images

Computation Grammar defines a distribution over scenes. A key problem is computing conditional probabilities. What is the probability that there is a nose near location (20, 32) given that there is an eye at location (15, 29)? What is the probability that each pixel in the clean image is on, given the noisy observations?

Factor Graphs A factor graph represents a factored distribution. p(x 1, X 2, X 3, X 4 ) = f 1 (X 1, X 2 )f 2 (X 2, X 3, X 4 )f 3 (X 3, X 4 ) Variable nodes (circles) Factor nodes (squares)

Factor Graph Representation for Scenes Gadget represents a brick Binary random variables X brick on/off R i rule selection C i child selection Factors f 1 Leaky-or f 2 Selection f 3 Selection f D Data model

Σ = {A, B}. Ω = {1, 2}. A(x) B(y) B(x) {}. A(1) Ψ L X Ψ S R Ψ S C C B(1) Ψ L X Ψ S R Ψ S A(2) Ψ L X Ψ S R Ψ S C C B(2) Ψ L X Ψ S R Ψ S

Loopy belief propagation Inference by message passing. µ f v (x v ) = Ψ(x N(f ) ) x N(f )\v u N(f ) µ u f (x u ) In general message computation is exponential in degree of factors. For our factors, message computation is linear in degree.

Conditional inference with LBP Σ = {FACE, EYE, NOSE, MOUTH} FACE {EYE, EYE, NOSE, MOUTH} Marginal probabilities conditional on one eye. Face Eye Nose Mouth Marginal probabilities conditional on two eyes. Face Eye Nose Mouth

Conditional inference with LBP Evidence for an object provides context for other objects. LBP combines bottom-up and top-down influence. LBP captures chains of contextual evidence. LBP naturally combines multiple contextual cues. Face Eye Nose Mouth

Conditional inference with LBP Contour completion with curve grammar.

Face detection p(x Y ) p(y X )p(x ) p(y X ) defined by templates for each symbol. Defines local evidence for each brick in the factor graph. Belief Propagation combines weak local evidence from all bricks.

Face detection results Ground Truth HOG Filters Face Grammar

Scenes with several faces HOG filters Grammar

Curve detection p(x ) defined by a grammar for curves. p(y X ) defined by noisy observations at each pixel X Y

Curve detection dataset Ground-truth: human-drawn object boundaries from BSDS.

Curve detection results

Summary Perceptual inference requires models of the world. Stochastic grammars can model complex scenes. Part-based representation. Scenes and objects have variable structure. Belief propagation provides a general computational engine. The same framework applies to many different tasks.

Future work Data model p(y X ). Grammar learning. Scaling inference to larger grammars.

PERSON {FACE, ARMS, LOWER} FACE {EYES, NOSE, MOUTH} FACE {HAT, EYES, NOSE, MOUTH} EYES {EYE, EYE} EYES {SUNGLASSES} HAT {BASEBALL} HAT {SOMBRERO} LOWER {SHOE, SHOE, LEGS} LEGS {PANTS} LEGS {SKIRT}