Part based models for recognition. Kristen Grauman

Size: px

Start display at page:

Download "Part based models for recognition. Kristen Grauman"

Abner Pitts
5 years ago
Views:

1 Part based models for recognition Kristen Grauman UT Austin

2 Limitations of window-based models Not all objects are box-shaped Assuming specific 2d view of object Local components themselves do not necessarily conform to rigid layout Occlusion sensitivity Sliding window expensive Kristen Grauman

3 Part-based and local feature models for recognition Main idea: Rather than a representation based on holistic appearance, decompose the image into: local parts or patches, and their relative spatial relationships Kristen Grauman

4 Origins of the part-based model Fischler & Elschlager 1973 Model has two components parts (2D image fragments) structure structure (configuration of parts)

5 Part-based and local feature models for recognition We ll look at three forms: 1. Bag of words (no geometry) 2. Implicit shape model (star graph for spatial model) 3. Constellation model (fully connected graph for spatial model) Kristen Grauman

6 Spatial models: Connectivity and structure O(N P ) O(NP) Fergus et al. 03 Fei-Fei et al. 03 Leibe et al. 04, 08 Crandall et al. 05 Fergus et al. 05 Crandall et al. 05 Felzenszwalb & Huttenlocher 05 Csurka 04 Vasconcelos 00 Bouchard & Triggs 05 Carneiro & Lowe 06 from [Carneiro & Lowe, ECCV 06]

Total freedom on spatial positions, relative geometry.

7 Bag-of-words model Summarize entire image based on its distribution (histogram) of word occurrences. Total freedom on spatial positions, relative geometry. Vector representation easily usable by most classifiers. Kristen Grauman

8 Bag-of-words model Csurka et al. Visual Categorization with Bags of Keypoints, 2004

9 Words as parts Csurka et al All local features Local features from two selected clusters occurring in this image

10 Naïve Bayes model for classification c arg max c p ( c w ) p ( c ) p ( w c ) N p ( c ) p ( w n c ) n 1 Object class decision Prior prob. of the object classes Image likelihood given the class patches th What assumptions does the model make, and what are their significance?

11 Confusion matrix Example bag of words + Naïve Bayes classification results for generic categorization of objects Kristen Grauman Csurka et al. 2004

12 Clutter or context? Kristen Grauman

13 Sampling strategies Kristen Grauman Specific object Category

14 Sampling strategies Sparse, at interest points Dense, uniformly Randomly To find specific, textured objects, sparse sampling from interest points more reliable. Multiple complementary interest operators offer more image coverage. Multiple l interest t operators Image credits: F-F. Li, E. Nowak, J. Sivic For object categorization, dense sampling offers better coverage. [See Nowak, Jurie & Triggs, ECCV 2006]

15 Local feature correspondence for generic object categories Kristen Grauman

16 Local feature correspondence for generic object categories Comparing bags of words histograms coarsely reflects agreement between local parts (patches, words). But choice of quantization directly determines what we consider to be similar Kristen Grauman

it practical to compare large sets of features based on their partial correspondences.

17 Partially matching sets of features Optimal match: O(m 3 ) Greedy match: O(m 2 log m) Pyramid match: O(m) (m=num pts) We introduce an approximate matching kernel that makes it practical to compare large sets of features based on their partial correspondences. Kristen Grauman [Previous work: Indyk & Thaper, Bartal, Charikar, Agarwal & Varadarajan, ]

18 Pyramid match: main idea Feature space partitions serve to match the local descriptors within successively wider regions. descriptor space Kristen Grauman

19 Pyramid match: main idea Kristen Grauman Histogram intersection counts number of possible matches at a given partitioning.

20 Pyramid match kernel measures difficulty of a match at level l number of newly matched pairs at level For similarity, weights inversely proportional p to bin size (or may be learned) Normalize these kernel values to avoid favoring large sets [Grauman & Darrell, ICCV 2005]

21 Pyramid match kernel Optimal match: O(m 3 ) Pyramid match: O(mL) optimal partial matching Kristen Grauman

22 Highlights of the pyramid match Linear time complexity Formal bounds on expected error Mercer kernel Data-driven partitions allow accurate matches even in high-dim. h feature spaces Strong performance on benchmark object recognition datasets Kristen Grauman

23 Example recognition results: Caltech-101 dataset 101 categories images per class Data provided by Fei-Fei, Fergus, and Perona

24 Recognition results: Caltech-101 dataset Jain, Huynh, & Grauman (2007) Kristen Grauman

25 Example recognition results: Caltech-101 dataset Combination of pyramid match and correspondence kernels uracy Accu [Kapoor et al. IJCV 2009] Number of training examples Kristen Grauman

26 Unordered sets of local features: No spatial layout preserved! Too much? Too little?

27 Spatial pyramid match Make a pyramid of bag-of-words histograms. Provides some loose (global) spatial layout information Sum over PMKs computed in image coordinate space, one per word. [Lazebnik, Schmid & Ponce, CVPR 2006] Kristen Grauman

28 Spatial pyramid match Captures scene categories well---texture-like patterns but with some variability in the positions of all the local pieces. Confusion matrix Kristen Grauman

29 Spatial pyramid match Captures scene categories well---texture-like patterns but with some variability in the positions of all the local pieces. Kristen Grauman

30 Part-based and local feature models for recognition We ll look at three forms: 1. Bag of words (no geometry) 2. Implicit shape model (star graph for spatial model) 3. Constellation model (fully connected graph for spatial model) Kristen Grauman

31 Shape representation in part-based models Star shape model x 1 x 6 x 2 x 5 x 3 x 4 e.g. implicit shape model Parts mutually independent N image features, P parts in the model Kristen Grauman

32 Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] training image annotated with object localization info visual codeword with displacement vectors B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

33 Implicit shape models Visual vocabulary is used to index votes for object position [a visual word = part ] test image B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004

34 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering How will this differ from a vocabulary built How will this differ from a vocabulary built from, e.g., all frames in a movie?

35 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word

36 Implicit shape models: Training 1. Build vocabulary of patches around extracted interest points using clustering 2. Map the patch around each interest point to closest word 3. For each word, store all positions it was found, relative to object center

37 Implicit shape models: Testing 1. Given new test image, extract patches, match to vocabulary words 2. Cast votes for possible positions of object center 3. Search for maxima in voting space 4. (Extract weighted segmentation mask based on stored masks for the codebook occurrences) What is the dimension of the Hough space?

38 Implicit shape models: Testing

39 Example: Results on Cows Original image K. Grauman, B. Leibe Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

40 Example: Results on Cows Original image Interest points K. Grauman, B. Leibe Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

41 Example: Results on Cows Matched Interest Original points patches image K. Grauman, B. Leibe Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

42 Example: Results on Cows Matched Interest Original Votes patches points image K. Grauman, B. Leibe 45 Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

43 Example: Results on Cows 1 st hypothesis K. Grauman, B. Leibe 46 Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

44 Example: Results on Cows 2 nd hypothesis K. Grauman, B. Leibe 47 Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

45 Example: Results on Cows 3 rd hypothesis K. Grauman, B. Leibe 48 Visual PercepObject ptual and ReScognition ensory Augmented Tutorial Computing

46 Detection Results Qualitative Performance Recognizes different kinds of objects Robust to clutter, occlusion, noise, low contrast K. Grauman, B. Leibe 49 Visual PercepObject ptual and ReScognition ensory Augmented Tutorial C omputing

47 Shape representation in part-based models Star shape model x 1 x 6 x 2 x 5 x 3 x 4 e.g. implicit shape model Parts mutually independent N image features, P parts in the model

48 Deformable part model Felzenszwalb et al A hybrid window + part-based model vs Viola & Jones Dalal & Triggs Felzenszwalb et al.

global template + deformable parts Fully trained from bounding

49 Deformable part model Felzenszwalb et al Mixture of deformable part models Each component has global template + deformable parts Fully trained from bounding boxes alone Adapted from Felzenszwalb s slides at

Deformable part model Training: Training data consists of

placements Alternate between optimizing latent t values

50 Deformable part model Training: Training data consists of images with bounding boxes. Need to learn the model structure, filters, deformation costs Latent SVM: score by root + max over part placements Alternate between optimizing latent t values and model parameters. Adapted from Felzenszwalb s slides at

51 Example: Person model

Deformable part model Matching: Define an

approach Adapted from Felzenszwalb s slides

52 Deformable part model Matching: Define an overall score for each root location - Based on best placement of parts High scoring root locations define detections - sliding window approach Adapted from Felzenszwalb s slides at

53 Results: car detections

54 Results: person detections

55 Results: horse detections

56 Results: cat detections

57 Shape representation in part-based models Star shape model Fully connected constellation model x 6 x 1 x 1 x 2 x 6 x 2 x 5 x 3 x 5 x 3 x 4 x 4 e.g. implicit shape model Parts mutually independent e.g. Constellation ti Model Parts fully connected N image features, P parts in the model Slide credit: Rob Fergus

58 Probabilistic constellation model P( image object) P( appearance, shape object) max h P( appearance h, object) p( shape h, object) p( h object) h: assignment Part of features to Part parts descriptors locations Candidate parts Source: Lana Lazebnik

59 Probabilistic constellation model P( image object) P( appearance, shape object) max h P( appearance h, object) p( shape h, object) p( h object) h: assignment of features to parts Part 1 Part 3 Part 2 Source: Lana Lazebnik

60 Probabilistic constellation model P( image object) P( appearance, shape object) max h P( appearance h, object) p( shape h, object) p( h object) h: assignment of features to parts Part 1 Part 3 Part 2 Source: Lana Lazebnik

61 Probabilistic constellation model P( image object) P( appearance, shape object) max h P( appearance h, object) p( shape h, object) p( h object) Distribution over patch descriptors High-dimensional appearance space Source: Lana Lazebnik

62 Probabilistic constellation model P( image object) P( appearance, shape object) max h P( appearance h, object) p( shape h, object) p( h object) Distribution tion over joint part positions 2D image space Source: Lana Lazebnik

63 Example results from constellation model: data from four categories Slide from Li Fei-Fei

64 Face model Appearance: 10 patches closest to mean for each part Recognition results Fergus et al. CVPR 2003

65 Face model Appearance: 10 patches closest to mean for each part Recognition results Test images: size of circles indicates score of hypothesis Fergus et al. CVPR 2003

66 Motorbike model Appearance: 10 patches closest to mean for each part Recognition results Fergus et al. CVPR 2003

67 Spotted cat model Appearance: 10 patches closest to mean for each part Recognition results Fergus et al. CVPR 2003

68 Comparison bag of features bag of features Part-based model Source: Lana Lazebnik

69 Shape representation in part-based models Star shape model Fully connected constellation model x 6 x 1 x 1 x 2 x 6 x 2 x 5 x 3 x 5 x 3 x 4 x 4 e.g. implicit shape model Parts mutually independent Recognition complexity: O(NP) Method: Gen. Hough Transform e.g. Constellation ti Model Parts fully connected Recognition complexity: O(N P ) Method: Exhaustive search N image features, P parts in the model Slide credit: Rob Fergus

70 Part-based models: issues and choices Invariance of the structure representation Part (appearance) representation Learning cost Cost of fitting to new examples Generative vs. discriminative Supervision required for training examples Data-driven vs. knowledge-driven model construction Kristen Grauman

71 Summary: part-based and local feature models for generic object recognition Histograms of visual words to capture global or local layout in the bag-of-words framework Pyramid match kernels Powerful in practice for image recognition Part-based models encode category s part appearance together with 2d layout; allow detection in cluttered image implicit shape model and LSVM model : shape based on layout of all parts relative to a reference part; Generalized Hough for detection constellation model : explicitly model mutual spatial layout between all pairs of parts; exhaustive search for best fit of features to parts

72 Recognition models Instances: recognition by alignment Categories: Holistic appearance models (and sliding window detection) Categories: Local feature and part based models Kristen Grauman

73 OTHER ONGOING CHALLENGES

74 Context Kristen Grauman

75 View invariant invariant models Kristen Grauman

76 Attributes Kristen Grauman

77 Use of video eg e.g., activities + objects Kristen Grauman

78 Annotation Annotator [tiny image montage by Torralba et al.] The complex space of visual objects, activities, and scenes. Kristen Grauman

79 Annotation Labeled data Annotator Accuracy? Active request? Unlabeled Num labels added Current classifiers data active passive Kristen Grauman

80 Very large multi class problems

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011 Previously Part-based and local feature models for generic object recognition Wed, April 20 UT-Austin Discriminative classifiers Boosting Nearest neighbors Support vector machines Useful for object recognition