CLASSIFICATION Experiments

Similar documents
Beyond bags of Features

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Beyond Bags of Features

Local Features and Bag of Words Models

Supervised learning. y = f(x) function

Part-based and local feature models for generic object recognition

Bag-of-features. Cordelia Schmid

Object Recognition. Computer Vision. Slides from Lana Lazebnik, Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

Patch Descriptors. CSE 455 Linda Shapiro

Bag of Words Models. CS4670 / 5670: Computer Vision Noah Snavely. Bag-of-words models 11/26/2013

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Beyond bags of features: Adding spatial information. Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

String distance for automatic image classification

TA Section: Problem Set 4

ImageCLEF 2011

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study

CS6670: Computer Vision

Sparse coding for image classification

Developing Open Source code for Pyramidal Histogram Feature Sets

Part based models for recognition. Kristen Grauman

Bias-Variance Trade-off + Other Models and Problems

Preliminary Local Feature Selection by Support Vector Machine for Bag of Features

arxiv: v3 [cs.cv] 3 Oct 2012

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Category-level localization

Object recognition. Methods for classification and image representation

Evaluation of GIST descriptors for web scale image search

Patch Descriptors. EE/CSE 576 Linda Shapiro

Exploring Bag of Words Architectures in the Facial Expression Domain

Aggregating Descriptors with Local Gaussian Metrics

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Learning Representations for Visual Object Class Recognition

Object Classification Problem

IMAGE RETRIEVAL USING VLAD WITH MULTIPLE FEATURES

Improved Spatial Pyramid Matching for Image Classification

Mixtures of Gaussians and Advanced Feature Encoding

Deformable Part Models

Action recognition in videos

P-CNN: Pose-based CNN Features for Action Recognition. Iman Rezazadeh

Scene Recognition using Bag-of-Words

Human Detection and Action Recognition. in Video Sequences

Visual Object Recognition

By Suren Manvelyan,

CS 231A Computer Vision (Fall 2011) Problem Set 4

Beyond Bags of features Spatial information & Shape models

Object Category Detection. Slides mostly from Derek Hoiem

arxiv: v1 [cs.lg] 20 Dec 2013

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Supervised learning. y = f(x) function

Visual words. Map high-dimensional descriptors to tokens/words by quantizing the feature space.

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

The SIFT (Scale Invariant Feature

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Object Detection Using Segmented Images

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Learning Object Representations for Visual Object Class Recognition

Features Points. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Multiple Kernel Learning for Emotion Recognition in the Wild

Using Geometric Blur for Point Correspondence

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

ROBUST SCENE CLASSIFICATION BY GIST WITH ANGULAR RADIAL PARTITIONING. Wei Liu, Serkan Kiranyaz and Moncef Gabbouj

CS229: Action Recognition in Tennis

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Learning Realistic Human Actions from Movies

Artistic ideation based on computer vision methods

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Previous Lecture - Coded aperture photography

Local Feature Detectors

Novelty Detection Using Sparse Online Gaussian Processes for Visual Object Recognition

Computer Vision Course Lecture 04. Template Matching Image Pyramids. Ceyhun Burak Akgül, PhD cba-research.com. Spring 2015 Last updated 11/03/2015

Recap Image Classification with Bags of Local Features

Spatial Pyramids and Two-layer Stacking SVM Classifiers for Image Categorization: A Comparative Study

Computer Vision. Exercise Session 10 Image Categorization

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

Semantic Pooling for Image Categorization using Multiple Kernel Learning

Computer Vision. Recap: Smoothing with a Gaussian. Recap: Effect of σ on derivatives. Computer Science Tripos Part II. Dr Christopher Town

Local features: detection and description. Local invariant features

I2R ImageCLEF Photo Annotation 2009 Working Notes

Large-scale Video Classification with Convolutional Neural Networks

Image Matching. AKA: Image registration, the correspondence problem, Tracking,

TEXTURE CLASSIFICATION METHODS: A REVIEW

Bus Detection and recognition for visually impaired people

Geometric VLAD for Large Scale Image Search. Zixuan Wang 1, Wei Di 2, Anurag Bhardwaj 2, Vignesh Jagadesh 2, Robinson Piramuthu 2

CS 221: Object Recognition and Tracking

Efficient Object Localization with Gaussianized Vector Representation

CS6716 Pattern Recognition

Local features: detection and description May 12 th, 2015

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Object Recognition in Images and Video: Advanced Bag-of-Words 27 April / 61

CV as making bank. Intel buys Mobileye! $15 billion. Mobileye:

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Visual Object Recognition

Recognizing human actions in still images: a study of bag-of-features and part-based representations

OBJECT CATEGORIZATION

Evaluation and comparison of interest points/regions

Outline 7/2/201011/6/

Mining Discriminative Adjectives and Prepositions for Natural Scene Recognition

Midterm Wed. Local features: detection and description. Today. Last time. Local features: main components. Goal: interest operator repeatability

Transcription:

CLASSIFICATION Experiments January 27,2015 CS3710: Visual Recognition Bhavin Modi

Bag of features Object Bag of words

1. Extract features 2. Learn visual vocabulary Bag of features: outline 3. Quantize features using visual vocabulary 4. Represent images by frequencies of visual words Slide Credits: Li Fei-Fei

Bag of features Summary

What about Spatial Information Slide Credits:Cordelia Schmid

Beyond Bag of features Slide Credits: Li Fei-Fei

Spatial Pyramid Matching

Image Representation Slide Credits: Li Fei-Fei

Kernel Function Histogram Intersection Function: Pyramid Match Kernel: Final Kernel is the sum of the separate channels:

Spatial Pyramid Vector Dimensions

Weakness of the model

Experiments Conducted #3 Datasets Used: 15 Scene, Caltech-101, and Graz. Strong Features: SIFT descriptors of 16x16 pixel patches computed over a grid with spacing of 8 pixels. Weak Features: Oriented Edge Points,i.e., points whose gradient magnitude in a given direction exceeds a minimum threshold. Dictionary Size and Levels are tested for different values, M=200,400 and L=0,1,2,3 (not in all cases)

15 Scene One of the most complete scene categories at the time. Each Category has 200 to 400 images. Conclusions made: Using all levels together confers a statistically significant benefit. For strong features single level performance drops as we go from L=2 to L=3, while weak features improve. Performance at L=2 and L=3 is almost equivalent, moving from M=200 to M=400 has a very small performance increase. Performs Better with 13 classes (74.7%) than 15 classes(72.2%) at L=0.

Caltech-101 Has geometric stability and lack of clutter. Contains 31 to 800 images per category. Slide Credits: Cordelia Schmid

Caltech-101 Conclusions: Prone to intra-class variations. Results shown for M=200, M=400 shows no significant improvement. Best performance 64.6% with L=2, M=200 with strong features. Best Classification Rate for 15 scene was 72.2% and it is 64.6% for Caltech-101.

Graz Dataset Has 2 object categories Bikes and People with heavy clutter and pose changes. M=200, L=0 and L=2 for strong features. Conclusions: Improvement for L=0 to L=2 is relatively small since it is difficult to find useful global features. Performance at 86.3% is higher than 15 Scene and Caltech-101.

New Experiments Conducted 1.Used the Caltech-256 dataset (256 Categories) to check if performance decreases on increasing the number of classes. 2. Vary the size of dictionary, M, to see the effects on accuracy. Values used M=10,50, and 200. (200 is said to be the optimal) Control Parameters present (Default Shown): Image size=1000 Grid spacing=8 Patch size=16 Dictionary Size=200 Number of Texton Images=50 Pyramid Levels=3

Why Caltech-256? CALTECH-101 Weaknesses: The dataset is too clean: images are very uniform in presentation, aligned from left to right, and usually not occluded. Limited number of categories. Some categories contain few images: certain categories are not represented as well as others, containing as few as 31 images. For example binocular (33), wild cat (34) Caltech-256 is another image dataset created at the California Institute of technology in 2007, a successor to Caltech-101. It is intended to address some of the weaknesses inherent to Caltech-101.

Slide Credits: Vision.Caltech.edu

Results Experiment 1: Dataset Caltech-256, multiple categories considered Training images=30 per category Test Images=50 per category L=3 (0,1,2) M=200. Experiment 2: Same as above but categories considered=256 1. M=10 2. M=50 3. M=200

14 12 Category: 10 Accuracy: 13.2 10 Accuracy % 8 6 4 Category: 50 Accuracy: 3.64 2 Category: 100 Accuracy: 1.64 Category: 160 Accuracy: 1.125 Category: 256 Accuracy: 1.094 0 0 50 100 150 200 250 300 Number of Categories

1.1 M: 200 Accuracy: 1.094 1 0.9 Accuracy % 0.8 0.7 0.6 M: 50 Accuracy: 0.7109 0.5 0.4 M: 10 Accuracy: 0.3125 0 50 100 150 200 M (Dictionary Size)

Problems As we can see the accuracy% is very low. Which leads to believe that there is some error in implementation, so we try to figure out the reason by performing three debugging steps: All debugging is done on the Catech-256 dataset, for 100 Categories, M=200, L=3, No. of training images=30 per category, No. of testing images=50 per category. Accuracy on Test Set=1.64% (82/5000) Accuracy on Train Set=87.1667% (2615/3000) 1. Compute the Big Kernel 2. Using the inbuilt Linear Kernel and RBF Kernel 3. Calculating Kernel Means Values

1. Calculating the Big Kernel Accuracy=1.64%, No Change Debugging Results 2. Using a Linear or RBF Kernel on the test data and doing a Sanity Check on the training data. Train Set Test Set Linear Kernel 8.4% 0.92% RBF Kernel 8.267% 1% 3. Calculating the ratio of the *mean* K(sample, other samples from same class) values and the *mean* K(sample, samples from different classes ratio) values, for both the train and test kernels. Train Set Test Set Mean K Same Class 0.5267 0.5335 Mean K Diff. Class 0.5270 1.1923

Debugging II We check the predicted Labels on the test set to see the which category was assigned to majority of the images. We see category that 6-Basketball Hoops and 59-Drinking Straw have more than 1000 images assigned to these two categories.

Evaluation on Other Datasets Slide Credits:Cordelia Schmid

Summary Discussion Spatial pyramid representation: appearance of local image patches + coarse global position information Substantial improvement over bag of features Depends on the similarity of image layout

Future Work Done Packing More Information in the Pyramid: 1.Bosch et al. (2007), Used descriptors PHOW and PHOG. 2. Germett et al. (2008), Kernel Codebook uses a Gaussian kernel over every centroid w, every bin gets 1 if descriptor ri is assigned(nearest) to its centroid w, every descriptor contributes some information to every bin(depending on σ). 3.Shengye Yan et al. (2012), Beyond Spatial Pyramid uses a two level feature extraction method using encoding and pooling procedures on the window-based features to acquire new image features.

References 1. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories-Svetlana Lazebnik, Cordelia Schmid, Jean Ponce. 2.Part 1: Bag-of-words models ppt by Li Fei-Fei (Princeton). 3. Recent Advancements on the Bag of Visual Words Model for image classification and concept detection- Costantino Grana and Giuseppe Serra. 4. Bag-of-features for category classification-cordelia Schmid, INRIA. 5. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. 6.Caltech-256 Dataset-www.vision.Caltech.edu

Thank You