Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29

Take a look at this:

See it here?

How about here?

Or here?

Single Example, No Training! (Most) people can find the Dragon Fruit from one look. Even if they ve never seen it before.

Outline I. Motivation II. III. Overview Object Detection IV. Action Detection V. Conclusion and Future work

Fundamental Problems in Machine Vision Develop a unified framework that can robustly detect objects/actions of interest within images/videos without training query target image query target video 1) Whether objects (actions) are present or not, 2) How many objects (actions)? 3) Where are they located?

Challenges in Detection Objects Actions Besides, Contexts: Degradation: 1) different clothes, 2) different illumination, Medical imaging Underwater Raindrop Noise 3) different background 4) action speed Blur

Outline I. Motivation II. III. System Overview Object Detection IV. Action Detection V. Conclusion and Future work

Object Detection using Local Regression Kernels Local Steering Kernels as Descriptors Using a single example Resemblance Map Detected Similar Objects Query

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3 Query Target Object Detection System Overview -.1 stage 1 Compute local steering kernels Descriptors -.1 -.1 -.1 2) Significance.5 Tests.1 3) Non-maxima Suppression Final result.2.1.2.1 -.1 -.1.5 -.5.2.1.2.1.1 -.1 -.5 -.1.2.1 -.1.2.1 -.1.1 -.1.5 -.5 -.1.2.1.2.1 -.1.1 -.1.5 -.5 -.1 PCA stage 2 stage 3 1Image Feature 2Image Feature.1 1Image Feature 3Image 4Template Feature Feature -.1 -.1 2Image 1Image compute feature images -.1 -.1 4Image Feature.5 3Image Feature.1.2 2Image Feature.1 1Image Feature 4Image Feature -.5 3Image Feature -.1 4Template 2Image Feature Feature 4Image Feature 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity 4Image Feature.2.1 -.1.1.5 -.5 -.1 -.1.2.1.2.1 -.1.5 -.1 5 1 15 2 25 3 35 4.2 -.1.2 -.2.2 -.2 -.2.2 -.2.2 -.2 -.1 -.2 -.2.1.2.2.1.1 -.1 -.5 -.1 -.1.2.1 -.1.2.1 -.1.1.2.2 -.2.2 -.2 -.1 -.2.5 -.5.2 -.2.2 -.2.1 -.1 -.2.2 -.2.1 -.1 -.2 1Image Feature 2Image Feature 1.8.6.4.2 1Image Feature 2Image Feature 3Image Feature 4Image Feature 3Image Feature 4Image Feature 1Image Fea 2Image Fea 3Image Fea.2 4Image Fea -.2.2 -.2.2 -.2.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6 H. Seo and P. Milanfar, Training-free, Generic Object Detection using Locally Adaptive Regression Kernels, Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence

Stage 1: Calculation of Local Descriptors SVD

Robustness of LSK Descriptors 3 3 3 3 1 1 1 1 2 2 2 2 Original Brightness Contrast WGN image change change sigma = 1 1 2 3

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3.2 System Overview.1 : Stage 2 1Image Feature.2.1 -.1.2 -.1.2 -.2 Query Target stage 1 Compute local steering kernels.2.2.1.1 -.1 -.1.1.2.2.1.1 -.1 -.1 -.1.5.1.2 -.5.2.1 -.1.1 -.1 -.1 -.1 -.1 -.1 2) Significance.5 Tests.1 3) Non-maxima Suppression Final result.1.2.1 -.5.5 -.5 -.1 PCA stage 2 compute feature images stage 3.1 1Image Fea.2 2Image Feature.1 -.1 1Image Feature.2 -.1.2.1 2Image Fea.2 1Image Feature.2.1 -.2 -.1.1 -.2 3Image 4Template Feature Feature -.1.2 -.1.5 2Image 3Image Fea.1.2 -.5 1Image 2Image Feature.2 -.1 1Image Feature.2.2.1.2 -.2.1 -.1 -.1 -.1 4Image Feature.5 3Image Feature.1.2 -.5 2Image Feature.1 -.1 1Image Feature.2 -.1 -.1.2 -.2 -.2.2 -.2.1.5 4Image Feature -.2 -.5 -.2 3Image Feature.1 -.1 -.1.2 2Image Feature.5.2 -.1 -.2 -.5 -.2 -.1 -.2 4Image Feature.1 -.1 -.2 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity -.1 -.2 -.2 5 -.1 1 1.8 15 4Image Feature 2.6.5 25.1 3.4 35.2 -.5 4 -.1 -.1 4Image Fea -.2 3Image Feature 2Image Feature.2 -.2 4Image Feature 3Image Feature.2 -.2 4Image Feature.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6

Energy Stage 2: Feature Extraction from Descriptors Densely collected Vectorization Descriptors Apply PCA to for dimensionality reduction Retain the d largest principal components Project and onto d Eigenvalue rank

-.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8 -.9.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 Stage 2: Salient features after PCA LSK Object: Helicopter Query 1Image Target Feature 1 st eigenvector 2 nd eigenvector 3 rd eigenvector 1Image Feature 1Image Feature 1Image Feature 2Image Feature 2Image Feature 2Image Feature 2Image Feature 3Image Feature 3Image Feature 3Image Feature 3Image Feature 1Image Feature 1Image Feature 1Image Feature 1Image Feature 2Image Feature 2Image Feature 2Image Feature 2Image Feature 3Image Feature 3Image Feature 3Image Feature 3Image Feature 4 th eigenvector 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature 4Image Feature Eigenvectors 4Image Feature Query features Target features 11

Stage 2: Salient features after PCA LSK Object: Car Query Target.47632.65759.81888.86541 1 st eigenvector.47632.65759.81888.86541 -.4-.2.2 -.4-.2 2 nd.2.4 eigenvector -.4-.2.2.4 -.2.2.4.47632.65759.81888.86541 -.4-.2.2 -.4-.2.2.4 -.4-.2 3 rd.2.4 eigenvector -.2.2.4.65759.81888.86541 -.4-.2.2 -.4-.2.2.4 -.4-.2.2.4 -.2 4 th.2.4 eigenvector -.4-.2.2.4 -.4-.2.2.4 Eigenvectors -.2.2.4 Query features Target features 11

.3.2.1 -.1 -.2 -.3 -.4.2.1 -.1 -.2 -.3 -.4 -.5 -.6 -.7 -.8.4.3.2.1 -.1 -.2 -.3 -.4.4.3.2.1 -.1 -.2 -.3 1Image Feature.2 System Overview.1 : Stage 3.2.1 -.1.2 -.1.2 -.2 Query Target.1 1Image Fea.2 2Image Feature.1.2 -.1 1Image Feature.2.2.1 -.1.2.1.1 2Image Fea.2 1Image Feature -.1.2.1 -.2 -.1 -.1.1 -.2 stage 1 stage 2 3Image 4Template Feature Feature -.1.1.2 -.1.5 2Image 3Image Fea.2.1.2 1Image.2.1 PCA -.5 2Image Feature.2 -.1 1Image Feature.2.2.1.2 -.2.1.1 -.1 -.1 -.1 -.1 4Image Fea -.1 -.2 -.2 4Image Feature -.1.5 -.2 3Image Feature.5 3Image Feature.1.1 2Image Feature.2.1 -.5.2.2 2Image Feature.1 -.1.2 -.5 -.1 1Image Feature.2.2 -.1.1 -.1 -.1.2 -.2 -.2 Compute local steering kernels.1 -.1 -.1 -.1.5.1.2 -.5.1 -.1 -.1 -.1 2) Significance Tests.5.1 3) Non-maxima Suppression -.5 Final result compute feature images stage 3 -.2 4Image Feature 3Image Feature.1.5 4Image Feature -.2 -.5 -.2 3Image Feature.1 -.1 -.1.2 2Image Feature 4Image Feature.5.2 -.1 -.2 -.5 -.2 -.1 -.2 4Image Feature 3Image Feature.1 1) Resemblance Map (RM).2 using Matrix Cosine Similarity -.1 -.1 -.2 -.2 5 -.1 1 1.8 15 4Image Feature 2.6.5 25.1 3.4 35.2 -.5 4 -.1.2 -.2.1 -.1 -.2 -.1 -.2 1 45 2 3 4 5 6

Stage 3: Finding similarity between features Target image is divided into a set of overlapping patches Query Target Query Target

Stage 3: Correlation based Metric The vector cosine similarity Query Target patch Inner product between two normalized vectors Measures angle while discarding the magnitude

Stage 3: Matrix Cosine Similarity What about a set of vectors? Matrix Cosine Similarity Frobenius Inner product between normalized matrices Query Target patch

Stage 3: Matrix Cosine Similarity What about a set of vectors? Matrix Cosine Similarity Frobenius Inner product between normalized matrices Query Target patch A weighted sum of the column-wise vector cosine similarities We can prove optimality of this approach in a naïve Bayes sense.

Stage 3: Generate Resemblance Map Resemblance Map (RM) Describes the proportion of variance in common between two features Lawley-Hotelling Trace statistic.7.6.5 1.8.4.3.2.1.6.4.2

Stage 3: Non-parametric Significance Tests 1. Is any sufficiently similar object present? i.e., 2. How many objects of interest are present? so that ~ 5 % of variance in common probability.4 Empirical PDF 5 1.3 99% confidence level 1 15.8.2.1 Significance level 2 25 3 35 4.6.4.2.2.4.6.8 1 1.2 45 1 2 3 4 5 6

Experimental Results Query Targets Dataset from Weizmann Inst.

Experimental Results query query target:1 target target:3 target

Experimental Results query target target:2 target:1

Experimental Results query target:3 target target:1 target target:2 target 25

Experimental Results query Higher resemblance target target:1 target target:2 Lower resemblance

Detection rate Experimental Results 1.95.9.85.8.75.7.65.6 Weizmann Inst. Object Test Set CIE L*a*b* channel Luminance channel only SIFT descriptor [1999] Shape Context [21] GLOH [25].55.5 1 1.5 2 2.5 3 3.5 4 4.5 x 1-4 Detection rate = TP/(TP+FN) False positive rate = FP/(FP+TN) False positive rate

Experimental Results The MIT-CMU Face Test Set Query

Detection rate Experimental Results 1 The MIT-CMU Face Test Set ROC curve.95.9.85.8.75.2.4.6.8 1 1.2 1.4 x 1-4 False positive rate 36

Gallery Set:1 subjects x 25 different conditions Query Q

Gallery Set:1 subjects x 25 different conditions Query Q 5 1 15 2 25 3 35 4

query target output 1.8 1.6 1.4 1.2 1.8.6.4.2 -.2 query target output 1.2 1.8.6.4.2

query target output.3.25.2.15.1.5 query target output 3.5 3 2.5 2 1.5 1.5

Outline I. Motivation II. III. System Overview Object Detection IV. Action Detection V. Conclusion and Future work

Action Detection System Overview Query stage 1 stage 2 PCA Target Compute space-time local steering kernels compute feature volumes No Motion Estimation No Segmentation No Learning No Prior Information 2) Significance Tests 3) Non-maxima Suppression Final result Frame:256 stage 3 1) Resemblance Map (RM) using Matrix Cosine Similarity H. Seo and P. Milanfar, Generic Action Recognition from a Single Example, Submitted to International Journal of Computer Vision (IJCV), March 29

Stage 1: Space Time Descriptors : 3x3 local covariance matrix : space-time coordinates 3 5 1 4 2 6 1 2 3 4 5 6 First frame Key frame Last frame 38

Experimental Results Shechtman s action test set (Beach walk) Query Typical run time for target (5 frames of 144 x 192) and query (13 frames of 9 x 11) : a little over 1 minute

Experimental Results (Multiple Actions) Multiple queries Automatic cropping Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing Very Confusing Moving to the Left Very Confusing Very Confusing

Action Recognition Query Automatic cropping of a short action clip (25 frames) Action Category 1 5 Action Detection 2 6 Scoring 1 2 3 7 Rank 9 8 4 9 least similar most similar

Action Classification Performance Average confusion matrices Classification rate: 96 % Classification rate: 95.66 % Bend Jack Jump 1....1.....1... box hclp.94.2.4.....99.1... Pjump Run Side Skip Walk...1......1......1......11..11.11.67......1... hwav jog run.. 1........95.1.4....5.92.3 Wave1 Wave2.11...89....1. walk....5.1.94 (Weizmann dataset) (KTH dataset) 9 video sequences 6 video sequences Classification rate = 1 (# of miss classification) / (total # of sequences) Evaluation setting: Leave-one-out Classify each testing video as one of the predefined classes by 3-NN (nearest neighbor)

Action Classification Performance Comparison with state-of-the art methods (KTH dataset) Our Approach (1-NN) Our Approach (2-NN) Our Approach (3-NN) 89% 93% 95.66% Our Approach (3-NN) 95.66% Kim et al. (28) 95.33% Ali et al.(28) 87.7% Dollar et al. (25) 81.17% Ning et al. (28) 92.31% Niebles et al. (28) 81.5% Wong et al. (27) 71.16% Classification rate = 1 (# of miss classification) / (total # of sequences) We outperform all the state-of-the art methods on KTH dataset.

Publications H. Seo and P. Milanfar, Training-free, Generic Object Detection using Locally Adaptive Regression Kernels, Accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 H. Seo and P. Milanfar, Generic Action Recognition from a Single Example, Submitted to International Journal of Computer Vision (IJCV), March 29 H. Seo and P. Milanfar, Static and Space-time Visual Saliency Detection by Self- Resemblance, Submitted to Journal of Vision (JoV), May 29 H. Seo and P. Milanfar, Detection of Human Actions from a Single Example, Accepted for publication in International Conference on Computer Vision (ICCV), March 29 H. Seo and P. Milanfar, Nonparametric Bottom-Up Saliency Detection by Self- Resemblance, Accepted for IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1st International Workshop on Visual Scene Understanding (ViSU 9), Miami, June, 29 H. Seo and P. Milanfar, Using Local Regression Kernels for Statistical Object Detection, Proceedings of IEEE International Conference on Image Processing (ICIP), San Diego, 28

Conclusions & Future Work Local Steering Kernels are Very Effective Descriptors Simple Approach: PCA + Matrix Cosine Similarity Excellent Detection and Recognition is Achieved without Training Make algorithm scalable for image and (video) retrieval Increase accuracy by incorporating context Detect /recognize objects of interest in general degraded data without explicit restoration