CLASSIFICATION Experiments

Size: px

Start display at page:

Download "CLASSIFICATION Experiments"

Edward Ryan
5 years ago
Views:

1 CLASSIFICATION Experiments January 27,2015 CS3710: Visual Recognition Bhavin Modi

2 Bag of features Object Bag of words

3 1. Extract features 2. Learn visual vocabulary Bag of features: outline 3. Quantize features using visual vocabulary 4. Represent images by frequencies of visual words Slide Credits: Li Fei-Fei

4 Bag of features Summary

5 What about Spatial Information Slide Credits:Cordelia Schmid

6 Beyond Bag of features Slide Credits: Li Fei-Fei

7 Spatial Pyramid Matching

8 Image Representation Slide Credits: Li Fei-Fei

9 Kernel Function Histogram Intersection Function: Pyramid Match Kernel: Final Kernel is the sum of the separate channels:

10 Spatial Pyramid Vector Dimensions

11 Weakness of the model

12 Experiments Conducted #3 Datasets Used: 15 Scene, Caltech-101, and Graz. Strong Features: SIFT descriptors of 16x16 pixel patches computed over a grid with spacing of 8 pixels. Weak Features: Oriented Edge Points,i.e., points whose gradient magnitude in a given direction exceeds a minimum threshold. Dictionary Size and Levels are tested for different values, M=200,400 and L=0,1,2,3 (not in all cases)

15 Scene One of the most complete scene categories at the time. Each Category has 200 to 400 images. Conclusions made: Using all levels together confers a statistically significant benefit.

13 15 Scene One of the most complete scene categories at the time. Each Category has 200 to 400 images. Conclusions made: Using all levels together confers a statistically significant benefit. For strong features single level performance drops as we go from L=2 to L=3, while weak features improve. Performance at L=2 and L=3 is almost equivalent, moving from M=200 to M=400 has a very small performance increase. Performs Better with 13 classes (74.7%) than 15 classes(72.2%) at L=0.

14 Caltech-101 Has geometric stability and lack of clutter. Contains 31 to 800 images per category. Slide Credits: Cordelia Schmid

15 Caltech-101 Conclusions: Prone to intra-class variations. Results shown for M=200, M=400 shows no significant improvement. Best performance 64.6% with L=2, M=200 with strong features. Best Classification Rate for 15 scene was 72.2% and it is 64.6% for Caltech-101.

16 Graz Dataset Has 2 object categories Bikes and People with heavy clutter and pose changes. M=200, L=0 and L=2 for strong features. Conclusions: Improvement for L=0 to L=2 is relatively small since it is difficult to find useful global features. Performance at 86.3% is higher than 15 Scene and Caltech-101.

17 New Experiments Conducted 1.Used the Caltech-256 dataset (256 Categories) to check if performance decreases on increasing the number of classes. 2. Vary the size of dictionary, M, to see the effects on accuracy. Values used M=10,50, and 200. (200 is said to be the optimal) Control Parameters present (Default Shown): Image size=1000 Grid spacing=8 Patch size=16 Dictionary Size=200 Number of Texton Images=50 Pyramid Levels=3

18 Why Caltech-256? CALTECH-101 Weaknesses: The dataset is too clean: images are very uniform in presentation, aligned from left to right, and usually not occluded. Limited number of categories. Some categories contain few images: certain categories are not represented as well as others, containing as few as 31 images. For example binocular (33), wild cat (34) Caltech-256 is another image dataset created at the California Institute of technology in 2007, a successor to Caltech-101. It is intended to address some of the weaknesses inherent to Caltech-101.

20 Slide Credits: Vision.Caltech.edu

21 Results Experiment 1: Dataset Caltech-256, multiple categories considered Training images=30 per category Test Images=50 per category L=3 (0,1,2) M=200. Experiment 2: Same as above but categories considered= M=10 2. M=50 3. M=200

22 14 12 Category: 10 Accuracy: Accuracy % Category: 50 Accuracy: Category: 100 Accuracy: 1.64 Category: 160 Accuracy: Category: 256 Accuracy: Number of Categories

23 1.1 M: 200 Accuracy: Accuracy % M: 50 Accuracy: M: 10 Accuracy: M (Dictionary Size)

24 Problems As we can see the accuracy% is very low. Which leads to believe that there is some error in implementation, so we try to figure out the reason by performing three debugging steps: All debugging is done on the Catech-256 dataset, for 100 Categories, M=200, L=3, No. of training images=30 per category, No. of testing images=50 per category. Accuracy on Test Set=1.64% (82/5000) Accuracy on Train Set= % (2615/3000) 1. Compute the Big Kernel 2. Using the inbuilt Linear Kernel and RBF Kernel 3. Calculating Kernel Means Values

25 1. Calculating the Big Kernel Accuracy=1.64%, No Change Debugging Results 2. Using a Linear or RBF Kernel on the test data and doing a Sanity Check on the training data. Train Set Test Set Linear Kernel 8.4% 0.92% RBF Kernel 8.267% 1% 3. Calculating the ratio of the *mean* K(sample, other samples from same class) values and the *mean* K(sample, samples from different classes ratio) values, for both the train and test kernels. Train Set Test Set Mean K Same Class Mean K Diff. Class

26 Debugging II We check the predicted Labels on the test set to see the which category was assigned to majority of the images. We see category that 6-Basketball Hoops and 59-Drinking Straw have more than 1000 images assigned to these two categories.

27 Evaluation on Other Datasets Slide Credits:Cordelia Schmid

28 Summary Discussion Spatial pyramid representation: appearance of local image patches + coarse global position information Substantial improvement over bag of features Depends on the similarity of image layout

29 Future Work Done Packing More Information in the Pyramid: 1.Bosch et al. (2007), Used descriptors PHOW and PHOG. 2. Germett et al. (2008), Kernel Codebook uses a Gaussian kernel over every centroid w, every bin gets 1 if descriptor ri is assigned(nearest) to its centroid w, every descriptor contributes some information to every bin(depending on σ). 3.Shengye Yan et al. (2012), Beyond Spatial Pyramid uses a two level feature extraction method using encoding and pooling procedures on the window-based features to acquire new image features.

30 References 1. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories-Svetlana Lazebnik, Cordelia Schmid, Jean Ponce. 2.Part 1: Bag-of-words models ppt by Li Fei-Fei (Princeton). 3. Recent Advancements on the Bag of Visual Words Model for image classification and concept detection- Costantino Grana and Giuseppe Serra. 4. Bag-of-features for category classification-cordelia Schmid, INRIA. 5. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, Caltech-256 Dataset-

31 Thank You

Beyond bags of Features

Beyond bags of Features Spatial Pyramid Matching for Recognizing Natural Scene Categories Camille Schreck, Romain Vavassori Ensimag December 14, 2012 Schreck, Vavassori (Ensimag) Beyond bags of Features