A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval

Size: px

Start display at page:

Download "A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval"

Bryce Armstrong
5 years ago
Views:

1 A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval Florent Perronnin, Yan Liu and Jean-Michel Renders Xerox Research Centre Europe (XRCE) Textual and Visual Pattern Analysis (TVPA) group To be presented at CVPR 2009

Problem Retrieval as ranking Ranking: given a query,

least large subset) Rational: retrieval is highly

the user (which might be unclear) do not take a

retrieve a specific image or a fixed number of

2 Problem Retrieval as ranking Ranking: given a query, return all images in descending rank order (or at least large subset) Rational: retrieval is highly subjective and the system cannot guess the intent of the user (which might be unclear) do not take a decision? Ranking is sufficient for browsing-type applications, i.e. retrieve a specific image or a fixed number of images Limited practical use: in fully automatic applications: e.g. query-expansion when guarantees are required: e.g. ensure average recall of 90% Page 2

Problem Retrieval as matching In several retrieval

near-duplicate detection Impression sunrise, Claude

problems, the user might expect a subset of images

decision (classification) Choosing an appropriate

3 Problem Retrieval as matching In several retrieval applications the user intent is (fairly) unambiguous: near-duplicate detection Impression sunrise, Claude Monet logo detection Coca-Cola brand scene retrieval Holiday dataset, [Jegou et al.] document retrieval NIST form database For such problems, the user might expect a subset of images Matching: given a pair of images, return a binary decision (classification) Choosing an appropriate threshold can guarantee on average a given: precision level: e.g. useful for query expansion recall level: e.g. retrieve 90% of duplicates Although related, ranking and matching are different problems Page 3

4 Problem Matching and context Is matching only about setting a threshold? Given these two forms, should we declare a match? in a general context of docs: yes in the context of forms: yes in the context of US tax forms: no Humans judge in a context Different contexts can correspond to: different scales of a hierarchy: e.g. mammals, felids, cats, breeds different taxonomies: e.g. waterscape paintings vs impressionist paintings Different contexts can impact the cues used to judge similarity Provide different views on the same problem Page 4

5 Problem Proposed solution Is matching only about setting a threshold? if one does not take into account the context: yes but taking into account the context can provide much better accuracy. We propose a novel family of contextual measures between distributions. Many works model images as distributions: discrete distribution: e.g. bag-of-visual-words [Sivic & Zisserman, Csurka et al.] continuous distribution: e.g. GMM [Goldberger et al., Moreno et al., Vasconcelos] but our framework is not restricted to images (e.g. text, speech, etc.) Coming back to our US tax form example: in the general context of document images: 80% match in the context of US tax forms (NIST form): 4% match Page 5

6 Outline Definition and properties Multinomial distributions (special cases) Application to retrieval Experimental validation Conclusion Page 6

7 Definition and properties Definition Notations: p, q: two distributions to be compared u: a context distribution f: a measure of similarity u By definition: ω 1-ω q p Interpretation: Estimate the mixture of p and u that best approximates u according to f projection of q on the line which joins p and u ω reflects how much p contributes to the approximation Each similarity / distance f has its contextual counterpart. Page 7

8 Definition and properties Basic properties By definition in even if f symmetric Symmetric similarity: if f(q,p) is maximum for p=q (the converse is not true) and the converse seems to hold Page 8

9 Definition and properties Convex optimization Bregman divergences: x and y two distributions in convex Includes Euclidean distance, Mahalanobis distance, Kullback-Leibler divergence, Itakura-Saito divergence Csiszár divergences: x and y two discrete distributions convex Includes Manhattan distance, Kullback-Leibler divergence, Hellinger distance, Rényi divergence If f is a Bregman or Csiszár divergence then convex in ω Page 9

10 Outline Definition and properties Multinomial distributions (special cases) Application to retrieval Experimental validation Conclusion Page 10

11 Multinomial distributions Euclidean distance L2 is known to be a poor measure of distance between multinomial distributions (Gaussian noise assumption) but interesting because of closed-form formula. clipped to u ω 1-ω q p asymmetric symmetric Using symmetric similarity is important in small dimensional spaces. Page 11

12 Multinomial distributions Manhattan distance Equivalent to intersection kernel: Weighted median problem: Piece-wise linear convex function minimum reached at one of the values Solved efficiently using Hoare s algorithm in O(D) asymmetric symmetric Page 12

13 Multinomial distributions KL divergence Objective function similar to that of PLSA: Can be solved iteratively using Expectation-Maximization: E-step: M-step: Slow convergence in practice use gradient-based methods asymmetric symmetric Page 13

14 Multinomial distributions Other distances Hellinger s distance (equivalent to Bhattacharyya similarity): Chi2 divergence: Both lead to convex objective functions. No special-purposed optimization algorithm gradient-based methods Page 14

15 Outline Definition and properties Multinomial distributions (special cases) Application to retrieval Experimental validation Conclusion Page 15

16 Application to retrieval / matching Limitations of a single context A single context might be insufficient for retrieval: cats x u dogs cows use different contexts for different queries No best context for a given query: broad contexts: coarse similarity increase precision at high recalls narrow contexts: fine similarity increase precision at low recalls For each query: use contexts at multiple scales average across contexts Page 16

17 Application to retrieval / matching Multi-scale retrieval algorithm Retrieval algorithm for a given query q: 1) Compute the similarity to all templates and keep closest. Let be the list of indices 2) Estimate the context : 3) For all templates compute: Final similarity: Give more weight to fine similarity than to coarse similarity High computational cost: 1 contextual similarity computation per template per scale Page 17

18 Application to retrieval / matching Speeding-up retrieval We introduce: If convex we have: ω Advantage: much cheaper to compute than Example: Two interesting cases: if if then then Page 18

19 Application to retrieval / matching Speeded-up multi-scale retrieval algorithm Retrieval algorithm for a given query q: 1) Compute the similarity to all templates and keep closest. Let be the list of indices 2) Estimate the context : 3) For all templates compute: Final similarity: Speed-up computation: test Page 19

20 Application to retrieval / matching Relationship with query expansion Query expansion (QE): query system with original image use close images to define new query re-query the system iterate (optional) Two main differences with QE: re-estimate the context model vs query model for QE use mostly irrelevant images vs use (hopefully) only relevant images for QE Page 20

21 Outline Definition and properties Multinomial distributions (special cases) Application to retrieval Experimental validation Conclusion Page 21

22 Experimental validation Holiday dataset The Holiday dataset: 1,491 images, 500 image groups Page 22

23 Experimental validation Holiday dataset Bag-of-visual-words description: SIFT features extracted on dense grids at multiple scales Visual vocabulary (GMM) of approximately 4,000 visual words Each image is encoded as a histogram of soft occurrences, i.e. a multinomial Measure of retrieval accuracy: Average Precision (AP) Ranking: compute one AP per query and average Matching: compute one AP for all queries Page 23

24 Experimental validation Holiday dataset rank match Experiments with contextual KL using a single scale: context-size has large impact on matching, smaller impact on ranking Page 24

25 Experimental validation Holiday dataset Experiments with contextual KL using a single scale: context-size has large impact on matching, smaller impact on ranking different contexts bring complementary information for matching Page 25

26 Experimental validation Holiday dataset rank rank match match Experiments with contextual KL averaging multiple scales: weighted average somewhat better than unweighted average for matching Page 26

27 Experimental validation Holiday dataset match rank Experiments with different measures: large improvement for the matching problem (+20-30%) small improvement for the ranking problem (+ 1-6%) as a bonus poor measures make poor contextual measures (c.f. L2) Page 27

Experimental validation Document dataset 1,400 images = 14 classes of documents x 100 documents per class Run-length description of document images: a run is a set of

28 Experimental validation Document dataset 1,400 images = 14 classes of documents x 100 documents per class Run-length description of document images: a run is a set of consecutive pixels with the same color in a given direction histogram of run-lengths in 4 directions for black and white pixels non-sparse histogram of 1,680 dimensions match rank Page 28

29 Outline Definition and properties Multinomial distributions (special cases) Application to retrieval Experimental validation Conclusion Page 29

30 Conclusion Summary Although ranking and matching are related, they are different problems. Matching makes little sense if we do not specify a context. We introduced a family of contextual measures between distributions. In our framework, each measure has its contextual counterpart. We showed how to compute the contextual similarity in practice for several measures in the case of multinomials. Context has modest impact on ranking but very large impact on matching. Page 30

Preliminary work on hierarchical clustering: a similarity should not be

31 Conclusion Future work Current work restricted to distributions. How to go beyond? How to learn a context model? Preliminary work on hierarchical clustering: a similarity should not be more detailed than needed. adapt similarity (through context) at each level of the hierarchy Page 31

32 Questions? Page 32

Metric Learning for Large Scale Image Classification:

Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Thomas Mensink 1,2 Jakob Verbeek 2 Florent Perronnin 1 Gabriela Csurka 1 1 TVPA - Xerox Research Centre