Object and Action Detection from a Single Example

Similar documents
Locally Adaptive Regression Kernels with (many) Applications

Training-Free, Generic Object Detection Using Locally Adaptive Regression Kernels

Action Recognition from One Example

Generic Human Action Recognition from a Single Example

Locally Adaptive Regression Kernels with (many) Applications

Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

Action Recognition & Categories via Spatial-Temporal Features

BEING able to automatically detect people in videos is

Motion illusion, rotating snakes

EVENT DETECTION AND HUMAN BEHAVIOR RECOGNITION. Ing. Lorenzo Seidenari

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Evaluation of Local Space-time Descriptors based on Cuboid Detector in Human Action Recognition

Lecture 18: Human Motion Recognition

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Linear Discriminant Analysis for 3D Face Recognition System

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Local Image Features

SUPERVISED NEIGHBOURHOOD TOPOLOGY LEARNING (SNTL) FOR HUMAN ACTION RECOGNITION

Local Image Features

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

The SIFT (Scale Invariant Feature

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

Robotics Programming Laboratory

Scale Invariant Feature Transform

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Adaptive Kernel Regression for Image Processing and Reconstruction

METHODS FOR TARGET DETECTION IN SAR IMAGES

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Evaluation and comparison of interest points/regions

Scale Invariant Feature Transform

Outline 7/2/201011/6/

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

What do we mean by recognition?

Image Processing. Image Features

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors

3D Object Recognition using Multiclass SVM-KNN

Feature Detection. Raul Queiroz Feitosa. 3/30/2017 Feature Detection 1

Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

Final Project Face Detection and Recognition

Learning Human Actions with an Adaptive Codebook

Short Survey on Static Hand Gesture Recognition

A performance evaluation of local descriptors

LAPLACIAN OBJECT: ONE-SHOT OBJECT DETECTION BY LOCALITY PRESERVING PROJECTION Sujoy Kumar Biswas and Peyman Milanfar

Vision and Image Processing Lab., CRV Tutorial day- May 30, 2010 Ottawa, Canada

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

EigenJoints-based Action Recognition Using Naïve-Bayes-Nearest-Neighbor

Action recognition in videos

SCALE INVARIANT FEATURE TRANSFORM (SIFT)

Automatic Image Alignment (feature-based)

Face detection and recognition. Detection Recognition Sally

Applications Video Surveillance (On-line or off-line)

Human Action Recognition Using Independent Component Analysis

Large-Scale Face Manifold Learning

Wikipedia - Mysid

Face Recognition Using SIFT- PCA Feature Extraction and SVM Classifier

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

Linear Discriminant Analysis in Ottoman Alphabet Character Recognition

NIST. Support Vector Machines. Applied to Face Recognition U56 QC 100 NO A OS S. P. Jonathon Phillips. Gaithersburg, MD 20899

Learning video saliency from human gaze using candidate selection

Face Recognition for Mobile Devices

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Dimension Reduction CS534

Shape Matching. Brandon Smith and Shengnan Wang Computer Vision CS766 Fall 2007

Face Recognition using Eigenfaces SMAI Course Project

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

CSE 252B: Computer Vision II

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

CS229: Action Recognition in Tennis

Aggregating Descriptors with Local Gaussian Metrics

Patch-based Object Recognition. Basic Idea

An Implementation on Histogram of Oriented Gradients for Human Detection

Human pose estimation using Active Shape Models

Local invariant features

Reconstruction of Images Distorted by Water Waves

Human Motion Detection and Tracking for Video Surveillance

Key properties of local features

NOVEL PCA-BASED COLOR-TO-GRAY IMAGE CONVERSION. Ja-Won Seo and Seong Dae Kim

Human Activity Recognition Using a Dynamic Texture Based Method

Large scale object/scene recognition

Motion Estimation and Optical Flow Tracking

Face Detection and Recognition in an Image Sequence using Eigenedginess

Selection of Scale-Invariant Parts for Object Class Recognition

Edge and corner detection

Image matching. Announcements. Harder case. Even harder case. Project 1 Out today Help session at the end of class. by Diva Sian.

Advanced Video Content Analysis and Video Compression (5LSH0), Module 4

Adaptive Action Detection

Harder case. Image matching. Even harder case. Harder still? by Diva Sian. by swashford

Dynamic Human Shape Description and Characterization

A GENERIC FACE REPRESENTATION APPROACH FOR LOCAL APPEARANCE BASED FACE VERIFICATION

Automatic Gait Recognition. - Karthik Sridharan

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Part based models for recognition. Kristen Grauman

Dimensionality Reduction and Classification through PCA and LDA

Prof. Feng Liu. Spring /26/2017

Visual Learning and Recognition of 3D Objects from Appearance

Bag of Optical Flow Volumes for Image Sequence Recognition 1

Transcription:

Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29

Take a look at this:

See it here?

How about here?

Or here?

Single Example, No Training! (Most) people can find the Dragon Fruit from one look. Even if they ve never seen it before.

Outline I. Motivation II. III. Overview Object Detection IV. Action Detection V. Conclusion and Future work

Fundamental Problems in Machine Vision Develop a unified framework that can robustly detect objects/actions of interest within images/videos without training query target image query target video 1) Whether objects (actions) are present or not, 2) How many objects (actions)? 3) Where are they located?

Challenges in Detection Objects Actions Besides, Contexts: Degradation: 1) different clothes, 2) different illumination, Medical imaging Underwater Raindrop Noise 3) different background 4) action speed Blur

Outline I. Motivation II. III. System Overview Object Detection IV. Action Detection V. Conclusion and Future work

Object Detection using Local Regression Kernels Local Steering Kernels as Descriptors Using a single example Resemblance Map Detected Similar Objects Query

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3 Query Target Object Detection System Overview -.1 stage 1 Compute local steering kernels Descriptors -.1 -.1 -.1 2) Significance.5 Tests.1 3) Non-maxima Suppression Final result.2.1.2.1 -.1 -.1.5 -.5.2.1.2.1.1 -.1 -.5 -.1.2.1 -.1.2.1 -.1.1 -.1.5 -.5 -.1.2.1.2.1 -.1.1 -.1.5 -.5 -.1 PCA stage 2 stage 3 1Image Feature 2Image Feature.1 1Image Feature 3Image 4Template Feature Feature -.1 -.1 2Image 1Image compute feature images -.1 -.1 4Image Feature.5 3Image Feature.1.2 2Image Feature.1 1Image Feature 4Image Feature -.5 3Image Feature -.1 4Template 2Image Feature Feature 4Image Feature 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity 4Image Feature.2.1 -.1.1.5 -.5 -.1 -.1.2.1.2.1 -.1.5 -.1 5 1 15 2 25 3 35 4.2 -.1.2 -.2.2 -.2 -.2.2 -.2.2 -.2 -.1 -.2 -.2.1.2.2.1.1 -.1 -.5 -.1 -.1.2.1 -.1.2.1 -.1.1.2.2 -.2.2 -.2 -.1 -.2.5 -.5.2 -.2.2 -.2.1 -.1 -.2.2 -.2.1 -.1 -.2 1Image Feature 2Image Feature 1.8.6.4.2 1Image Feature 2Image Feature 3Image Feature 4Image Feature 3Image Feature 4Image Feature 1Image Fea 2Image Fea 3Image Fea.2 4Image Fea -.2.2 -.2.2 -.2.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6 H. Seo and P. Milanfar, Training-free, Generic Object Detection using Locally Adaptive Regression Kernels, Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence

Stage 1: Calculation of Local Descriptors SVD

Robustness of LSK Descriptors 3 3 3 3 1 1 1 1 2 2 2 2 Original Brightness Contrast WGN image change change sigma = 1 1 2 3

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3.2 System Overview.1 : Stage 2 1Image Feature.2.1 -.1.2 -.1.2 -.2 Query Target stage 1 Compute local steering kernels.2.2.1.1 -.1 -.1.1.2.2.1.1 -.1 -.1 -.1.5.1.2 -.5.2.1 -.1.1 -.1 -.1 -.1 -.1 -.1 2) Significance.5 Tests.1 3) Non-maxima Suppression Final result.1.2.1 -.5.5 -.5 -.1 PCA stage 2 compute feature images stage 3.1 1Image Fea.2 2Image Feature.1 -.1 1Image Feature.2 -.1.2.1 2Image Fea.2 1Image Feature.2.1 -.2 -.1.1 -.2 3Image 4Template Feature Feature -.1.2 -.1.5 2Image 3Image Fea.1.2 -.5 1Image 2Image Feature.2 -.1 1Image Feature.2.2.1.2 -.2.1 -.1 -.1 -.1 4Image Feature.5 3Image Feature.1.2 -.5 2Image Feature.1 -.1 1Image Feature.2 -.1 -.1.2 -.2 -.2.2 -.2.1.5 4Image Feature -.2 -.5 -.2 3Image Feature.1 -.1 -.1.2 2Image Feature.5.2 -.1 -.2 -.5 -.2 -.1 -.2 4Image Feature.1 -.1 -.2 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity -.1 -.2 -.2 5 -.1 1 1.8 15 4Image Feature 2.6.5 25.1 3.4 35.2 -.5 4 -.1 -.1 4Image Fea -.2 3Image Feature 2Image Feature.2 -.2 4Image Feature 3Image Feature.2 -.2 4Image Feature.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6

Energy Stage 2: Feature Extraction from Descriptors Densely collected Vectorization Descriptors Apply PCA to for dimensionality reduction Retain the d largest principal components Project and onto d Eigenvalue rank

-.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8 -.9.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 Stage 2: Salient features after PCA LSK Object: Helicopter Query 1Image Target Feature 1 st eigenvector 2 nd eigenvector 3 rd eigenvector 1Image Feature 1Image Feature 1Image Feature 2Image Feature 2Image Feature 2Image Feature 2Image Feature 3Image Feature 3Image Feature 3Image Feature 3Image Feature 1Image Feature 1Image Feature 1Image Feature 1Image Feature 2Image Feature 2Image Feature 2Image Feature 2Image Feature 3Image Feature 3Image Feature 3Image Feature 3Image Feature 4 th eigenvector 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature Eigenvectors 4Image Feature Query features Target features 11

Stage 2: Salient features after PCA LSK Object: Car Query Target.47632.65759.81888.86541 1 st eigenvector.47632.65759.81888.86541 -.4-.2.2 -.4-.2 2 nd.2.4 eigenvector -.4-.2.2.4 -.2.2.4.47632.65759.81888.86541 -.4-.2.2 -.4-.2.2.4 -.4-.2 3 rd.2.4 eigenvector -.2.2.4.65759.81888.86541 -.4-.2.2 -.4-.2.2.4 -.4-.2.2.4 -.2 4 th.2.4 eigenvector -.4-.2.2.4 -.4-.2.2.4 Eigenvectors -.2.2.4 Query features Target features 11

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3 1Image Feature.2 System Overview.1 : Stage 3.2.1 -.1.2 -.1.2 -.2 Query Target.1 1Image Fea.2 2Image Feature.1.2 -.1 1Image Feature.2.2.1 -.1.2.1.1 2Image Fea.2 1Image Feature -.1.2.1 -.2 -.1 -.1.1 -.2 stage 1 stage 2 3Image 4Template Feature Feature -.1.1.2 -.1.5 2Image 3Image Fea.2.1.2 1Image.2.1 PCA -.5 2Image Feature.2 -.1 1Image Feature.2.2.1.2 -.2.1.1 -.1 -.1 -.1 -.1 4Image Fea -.1 -.2 -.2 4Image Feature -.1.5 -.2 3Image Feature.5 3Image Feature.1.1 2Image Feature.2.1 -.5.2.2 2Image Feature.1 -.1.2 -.5 -.1 1Image Feature.2.2 -.1.1 -.1 -.1.2 -.2 -.2 Compute local steering kernels.1 -.1 -.1 -.1.5.1.2 -.5.1 -.1 -.1 -.1 2) Significance Tests.5.1 3) Non-maxima Suppression -.5 Final result compute feature images stage 3 -.2 4Image Feature 3Image Feature.1.5 4Image Feature -.2 -.5 -.2 3Image Feature.1 -.1 -.1.2 2Image Feature 4Image Feature.5.2 -.1 -.2 -.5 -.2 -.1 -.2 4Image Feature 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity -.1 -.1 -.2 -.2 5 -.1 1 1.8 15 4Image Feature 2.6.5 25.1 3.4 35.2 -.5 4 -.1.2 -.2.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6

Stage 3: Finding similarity between features Target image is divided into a set of overlapping patches Query Target Query Target

Stage 3: Correlation based Metric The vector cosine similarity Query Target patch Inner product between two normalized vectors Measures angle while discarding the magnitude

Stage 3: Correlation based Metric The vector cosine similarity Query Target patch Inner product between two normalized vectors Measures angle while discarding the magnitude

Stage 3: Matrix Cosine Similarity What about a set of vectors? Matrix Cosine Similarity Frobenius Inner product between normalized matrices Query Target patch

Stage 3: Matrix Cosine Similarity What about a set of vectors? Matrix Cosine Similarity Frobenius Inner product between normalized matrices Query Target patch A weighted sum of the column-wise vector cosine similarities We can prove optimality of this approach in a naïve Bayes sense.

Stage 3: Generate Resemblance Map Resemblance Map (RM) Describes the proportion of variance in common between two features Lawley-Hotelling Trace statistic.7.6.5 1.8.4.3.2.1.6.4.2

Stage 3: Non-parametric Significance Tests 1. Is any sufficiently similar object present? i.e., 2. How many objects of interest are present? so that ~ 5 % of variance in common probability.4 Empirical PDF 5 1.3 99% confidence level 1 15.8.2.1 Significance level 2 25 3 35 4.6.4.2.2.4.6.8 1 1.2 45 1 2 3 4 5 6

Experimental Results Query Targets Dataset from Weizmann Inst.

Experimental Results query query target:1 target target:3 target

Experimental Results query target target:2 target:1

Experimental Results query target:3 target target:1 target target:2 target 25

Experimental Results query Higher resemblance target target:1 target target:2 Lower resemblance

Detection rate Experimental Results 1.95.9.85.8.75.7.65.6 Weizmann Inst. Object Test Set CIE L*a*b* channel Luminance channel only SIFT descriptor [1999] Shape Context [21] GLOH [25].55.5 1 1.5 2 2.5 3 3.5 4 4.5 x 1-4 Detection rate = TP/(TP+FN) False positive rate = FP/(FP+TN) False positive rate

Experimental Results The MIT-CMU Face Test Set Query

Detection rate Experimental Results 1 The MIT-CMU Face Test Set ROC curve.95.9.85.8.75.2.4.6.8 1 1.2 1.4 x 1-4 False positive rate 36

Gallery Set:1 subjects x 25 different conditions Query Q

Gallery Set:1 subjects x 25 different conditions Query Q 5 1 15 2 25 3 35 4

query target output 1.8 1.6 1.4 1.2 1.8.6.4.2 -.2 query target output 1.2 1.8.6.4.2

query target output.3.25.2.15.1.5 query target output 3.5 3 2.5 2 1.5 1.5

Outline I. Motivation II. III. System Overview Object Detection IV. Action Detection V. Conclusion and Future work

Action Detection System Overview Query stage 1 stage 2 PCA Target Compute space-time local steering kernels compute feature volumes No Motion Estimation No Segmentation No Learning No Prior Information 2) Significance Tests 3) Non-maxima Suppression Final result Frame:256 stage 3 1) Resemblance Map (RM) using Matrix Cosine Similarity H. Seo and P. Milanfar, Generic Action Recognition from a Single Example, Submitted to International Journal of Computer Vision (IJCV), March 29

Stage 1: Space Time Descriptors : 3x3 local covariance matrix : space-time coordinates 3 5 1 4 2 6 1 2 3 4 5 6 First frame Key frame Last frame 38

Experimental Results Shechtman s action test set (Beach walk) Query Typical run time for target (5 frames of 144 x 192) and query (13 frames of 9 x 11) : a little over 1 minute

Experimental Results (Multiple Actions) Multiple queries Automatic cropping Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing

Action Recognition Query Automatic cropping of a short action clip (25 frames) Action Category 1 5 Action Detection 2 6 Scoring 1 2 3 7 Rank 9 8 4 9 least similar most similar

Action Classification Performance Average confusion matrices Classification rate: 96 % Classification rate: 95.66 % Bend Jack Jump 1....1.....1... box hclp.94.2.4.....99.1... Pjump Run Side Skip Walk...1......1......1......11..11.11.67......1... hwav jog run.. 1........95.1.4....5.92.3 Wave1 Wave2.11...89....1. walk....5.1.94 (Weizmann dataset) (KTH dataset) 9 video sequences 6 video sequences Classification rate = 1 (# of miss classification) / (total # of sequences) Evaluation setting: Leave-one-out Classify each testing video as one of the predefined classes by 3-NN (nearest neighbor)

Action Classification Performance Comparison with state-of-the art methods (KTH dataset) Our Approach (1-NN) Our Approach (2-NN) Our Approach (3-NN) 89% 93% 95.66% Our Approach (3-NN) 95.66% Kim et al. (28) 95.33% Ali et al.(28) 87.7% Dollar et al. (25) 81.17% Ning et al. (28) 92.31% Niebles et al. (28) 81.5% Wong et al. (27) 71.16% Classification rate = 1 (# of miss classification) / (total # of sequences) We outperform all the state-of-the art methods on KTH dataset.

Publications H. Seo and P. Milanfar, Training-free, Generic Object Detection using Locally Adaptive Regression Kernels, Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 H. Seo and P. Milanfar, Generic Action Recognition from a Single Example, Submitted to International Journal of Computer Vision (IJCV), March 29 H. Seo and P. Milanfar, Static and Space-time Visual Saliency Detection by Self- Resemblance, Submitted to Journal of Vision (JoV), May 29 H. Seo and P. Milanfar, Detection of Human Actions from a Single Example, Accepted for publication in International Conference on Computer Vision (ICCV), March 29 H. Seo and P. Milanfar, Nonparametric Bottom-Up Saliency Detection by Self- Resemblance, Accepted for IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1st International Workshop on Visual Scene Understanding (ViSU 9), Miami, June, 29 H. Seo and P. Milanfar, Using Local Regression Kernels for Statistical Object Detection, Proceedings of IEEE International Conference on Image Processing (ICIP), San Diego, 28

Conclusions & Future Work Local Steering Kernels are Very Effective Descriptors Simple Approach: PCA + Matrix Cosine Similarity Excellent Detection and Recognition is Achieved without Training Make algorithm scalable for image and (video) retrieval Increase accuracy by incorporating context Detect /recognize objects of interest in general degraded data without explicit restoration