Statistics of Natural Image Categories

Similar documents
Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Computer Vision for VLFeat and more...

GIST. GPU Implementation. Prakhar Jain ( ) Ejaz Ahmed ( ) 3 rd May, 2009

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Classification and Detection in Images. D.A. Forsyth

ELL 788 Computational Perception & Cognition July November 2015

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Scene-Centered Description from Spatial Envelope Properties

Schedule for Rest of Semester

Mixture Models and the EM Algorithm

Grundlagen der Künstlichen Intelligenz

Using the Forest to See the Trees: Context-based Object Recognition

Network Traffic Measurements and Analysis

Visual localization using global visual features and vanishing points

Tag Recommendation for Photos

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Contextual priming for artificial visual perception

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Image Analysis, Classification and Change Detection in Remote Sensing

22 October, 2012 MVA ENS Cachan. Lecture 5: Introduction to generative models Iasonas Kokkinos

Automatic Singular Spectrum Analysis for Time-Series Decomposition

Discriminative classifiers for image recognition

Understanding Faces. Detection, Recognition, and. Transformation of Faces 12/5/17

Remote Sensing & Photogrammetry W4. Beata Hejmanowska Building C4, room 212, phone:

Note Set 4: Finite Mixture Models and the EM Algorithm

ECG782: Multidimensional Digital Signal Processing

ECE 5424: Introduction to Machine Learning

Feature Selection for Image Retrieval and Object Recognition

Object recognition (part 2)

Neural Network based textural labeling of images in multimedia applications

Region-based Segmentation and Object Detection

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

What do we mean by recognition?

Object recognition (part 1)

LOCAL AND GLOBAL DESCRIPTORS FOR PLACE RECOGNITION IN ROBOTICS

Dietrich Paulus Joachim Hornegger. Pattern Recognition of Images and Speech in C++

Lecture 8 Object Descriptors

Texture. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors. Frequency Descriptors

Filtering, scale, orientation, localization, and texture. Nuno Vasconcelos ECE Department, UCSD (with thanks to David Forsyth)

Lecture 2: 2D Fourier transforms and applications

CIE L*a*b* color model

IMAGE ANALYSIS, CLASSIFICATION, and CHANGE DETECTION in REMOTE SENSING

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Expectation Maximization (EM) and Gaussian Mixture Models

Relationship between Fourier Space and Image Space. Academic Resource Center

Supervised learning. y = f(x) function

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Fourier Transform and Texture Filtering

Course Administration

PATTERN CLASSIFICATION AND SCENE ANALYSIS

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Chapter 4 Face Recognition Using Orthogonal Transforms

Boundary descriptors. Representation REPRESENTATION & DESCRIPTION. Descriptors. Moore boundary tracking

EE795: Computer Vision and Intelligent Systems

Generative and discriminative classification techniques

CS 534: Computer Vision Segmentation and Perceptual Grouping

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

A Scene Recognition Algorithm Based on Covariance Descriptor

Does everyone have an override code?

Unsupervised Learning

Digital Image Processing. Image Enhancement in the Frequency Domain

Generative and discriminative classification techniques

Clustering Lecture 5: Mixture Model

Texture Segmentation

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Latest development in image feature representation and extraction

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Methods for Intelligent Systems

Dimensionality Reduction using Relative Attributes

Refined Measurement of Digital Image Texture Loss Peter D. Burns Burns Digital Imaging Fairport, NY USA

JMP Book Descriptions

Transfer Learning. Style Transfer in Deep Learning

Hyperspectral Remote Sensing

08 An Introduction to Dense Continuous Robotic Mapping

A Texture Feature Extraction Technique Using 2D-DFT and Hamming Distance

ELEC Dr Reji Mathew Electrical Engineering UNSW

Multiple Model Estimation : The EM Algorithm & Applications

Supervised learning. y = f(x) function

Image classification by a Two Dimensional Hidden Markov Model

CS 534: Computer Vision Segmentation and Perceptual Grouping

Texture. COS 429 Princeton University

Digital Image Processing

Learning global properties of scene images based on their correlational structures

Spatial Outlier Detection

Object Class Recognition using Images of Abstract Regions

Content-based image and video analysis. Machine learning

Shilpa S. Nair, Naveen S., Moni R.S

Robotics Programming Laboratory

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

CS 229 Midterm Review

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

SD 372 Pattern Recognition

Dimension reduction : PCA and Clustering

Inference and Representation

A problem - too many features. TDA 231 Dimension Reduction: PCA. Features. Making new features

Aggregating Descriptors with Local Gaussian Metrics

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Introduction to digital image classification

Transcription:

Statistics of Natural Image Categories Authors: Antonio Torralba and Aude Oliva Presented by: Sebastian Scherer

Experiment Please estimate the average depth from the camera viewpoint to all locations(pixels) in the next picture. Next you will see a circle for 3s, then a picture for 1s. Concentrate on the circle. After that I will ask you about the average depth of the picture. 2

3

4

What would you estimate the mean depth of this picture was? 5

6

7

What is the mean depth of this picture? 8

What is the mean depth of this picture? 9

Problem We want to determine global properties about an image. However avoid Explicit segmentation No object recognition Properties that are important for later stages of image processing are 10 Scale Type

Outline Global features: power spectra Examples Localized spectra Applications of global features 11 Naturalness/Openness classification Scene categorization Object recognition Depth estimation

Power spectra Decompose the image using a discrete Fourier transform: The power spectrum is then given by the amplitude and phase 12

Power spectrum plot Horizontal 50% 80% 13 Contour plot representing a percentage of total energy of the spectrum Vertical

Computing and visualizing a spectrum in Matlab Computing the Spectrum (Matlab): Ifft = abs(fftshift(fft2(i,w,h))); Visualization: imshow(log(ifft)/max(max(log(ifft)))); colormap(cool); (From Jonathan Huang's slides) 14

Interesting fact: 1/f Spectra Natural Image Spectra follow a power law! As(θ) is called the Amplitude Scaling Factor 2-η(θ) is the Frequency Exponent. η clusters around 0 for natural images. Any guesses on why this law holds? 15

Use the spectra of images in order to categorize a scene Why is this a good idea? Texture varies with scene scale: Atmosphere is a low pass filter? vs 16 Sky Mountains Leaves Objects Phase varies with environment: Man-made Nature

Spectral signatures for different scenes All orientations Horizon Buildings 17

Scene scales The point of view that any given observer adopts on a specific scene is constrained by the volume of the scene. 18

What can one spectrum of an image not capture? Generally we have a upright viewpoint. The horizon is towards the top. 19

Non-stationary power spectra Decompose the image using a DFT in local regions: The localized power spectrum is then given by the amplitude and phase for a specific region In their case: 8x8 spatial locations 20

Non-stationary power spectra at different depth scales Man-made Natural 21

What can we do with the power spectra? Replicate perception of humans along different scales Naturalness Openness Semantic categorization Determine context of scene Apply specialized methods after context is determined Object recognition Determine if an object exists in the scene Only presence no location Likely regions of objects Depth estimation 22 Estimate the mean depth of the scene Provides cue for object recognition

Naturalness vs. Openness Projection of images on the second and third principal component. Openness 23 Naturalness is represented by the third principal component

PCA The power spectrum Normalization: Perform PCA on the normalized power spectrum to get the spectral principal components (SPC). 24

The first principal components of a set of images Naturalness Openness 25

Semantic categorization 26

Semantic categorization calculation + w - 27

Object recognition 28

Object recognition Algorithm During training phase learn from a set of annotated pictures. O: object class, v_c: image statistics Bayes rule (without marginalization): Equal prior is assumed: Estimate training set. 29 with a mixture of Gaussians from

Object recognition Results 1 30

Object recognition Results 2 True positive rate 31 True negative rate

Adding spatial information Split the picture in four equal regions. Learn a mixture of Gaussians in order to determine the region where one is most likely to find an object. Given the spectral feature at four location what is the most likely position of a face. Attention will be on the most likely region to find a face. 32

Regions of interest 90% of faces were within a region of 35% of the size of the image of the largest P(x v_c) 33

Gist Use the global features as a prior on the location of objects in a object detection and localization algorithm. Since x is dependent on many factors only learn y and s. where π(q) are the mixing weights, W is the regression matrix, µ are mean vectors, and Σ are covariance matrices for cluster q. 34

Gist Results 35

Depth Estimation Feature vector Have a feature vector v. v' consists of the downsampled energy vector k: wavelet index, x: location, M spatial resolution Feature vector size: M^2 K Apply PCA to reduce the dimensionality of v' to get v. v is a L-dimensional vector obtained by projecting v' on the first L principal components. => v is size L. 36

Depth Estimation - Learning Want to optimize this expression D:depth, v: features, Nc: number of clusters, p(ci): cluster weight, g(v ci): multivariate gaussian, g(d v,ci): Result is a mixture of linear regressions: 37

Depth Estimation Global features 38

Depth Estimation Localized features 39

Scene category from depth 40

Depth Estimation Face Detection Determine the size of an object as Now have approximately the right scale for object detection: 41

Discussion Why do power spectra work so well? Why is there such a large distinction between man-made and human? Are there possibly more distinct classes? Rural streets? How do humans calculate mean depth when estimating the depth for training? 42