Attribute learning in large-scale datasets. Olga Russakovsky and Li Fei-Fei

Size: px

Start display at page:

Download "Attribute learning in large-scale datasets. Olga Russakovsky and Li Fei-Fei"

Posy Pitts
5 years ago
Views:

1 Attribute learning in large-scale datasets Olga Russakovsky and Li Fei-Fei

2 Categorization of the visual world Berry Fruit Entity Tree Instrument Furniture

3 Categorization of the visual world Berry Fruit Entity Tree Instrument Furniture

4 Categorization of the visual world Berry Fruit Entity Tree Instrument Furniture

5 Scale of prior work on attributes 4 1 Lampert ' Current work Number of object classes Fu tu re + Kumar '09 Farhadi '09, '10, Wang '10 10,000s go al Berg '10

6 Why use attributes? Object description Targetted retrieval Zero-shot learning Find red round objects. Frogs are green, have heads and legs. What is this? Frog. Orange, furry Farhadi et al. '09 Lampert et al. '09 Object classification Outlier discovery Always yellow Never blue Green Sharp Better model Wang and Mori '10 Visual categorizaion Round Metallic Farhadi et al. '09

7 Overview Obtaining large-scale training data Images Attribute labels Training and evaluating attribute classifiers Performing higher-level tasks: Targetted retrieval Zero-shot learning Future directions

8 Obtaining images: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR Berry Fruit Entity Tree Instrument Furniture

9 Obtaining images: Lots of data > 15,500 synsets, > 11M images WordNet backbone Bounding box annotations 1,000 synsets Definitions Sweet fleshy red fruit Mango Kiwi Edible fruit Fig Pineapple

10 Obtaining semantic attribute labels A semantic attribute is an attribute which can be described using language. Class-level: Text mining Hand labeling Image-level: all frogs are green this particular frog is green Rohrbach '10 Ferrari '07, Berg '10 Lampert '10 Kumar '09, Wang '09, Farhadi '09, '10

11 Text mining in ImageNet False negative Firetruck: Any of various large trucks that carry firemen and equipment to the site of a fire Challenges Red-winged blackbird: blackbird with scarlet patches on the wings Two-spotted ladybug: Red ladybug with a black spot on each wing Coq-au-vin: Chicken and onions and mushrooms braised in red wine and seasonings

12 Semantic attribute labeling Yellow

13 Semantic attribute labeling Yellow

14 Semantic attribute labeling Yellow

15 Semantic attribute labeling Yellow ImageNet attribute dataset 384 synsets x 25 images each = 9600 images x 20 attributes x 3 or 4 people label each Labels: 10% positive, 78% negative

16 20 semantic visual attributes Color Texture Black, brown, gray, green, orange, red, white, Furry, metallic, rough, shiny, smooth, wet, yellow: wooden: Pattern Spotted, striped: Shape Long, rectangular, round:

17 Overview Obtaining large-scale training data Images Attribute labels Training and evaluating attribute classifiers Performing higher-level tasks: Targetted retrieval Zero-shot learning Future directions

18 Training striped classifier RGB codebook, size SIFT codebook, size Shape context codebook, size Binary classifier of striped objects Histogram intersection kernel SVM

19 Training attribute classifiers + Binary classifier of striped objects + Binary classifier of red objects Binary classifier of wooden objects Binary classifiers for each of the 20 semantic attributes

20 Evaluating attribute classifiers 0.9 Histogram intersection kernel SVM fold cross-validation positive training example negative training examples 0.5 bla c bro k wn gra g re y ora en ng e red wh i ye l te low fu me rry tall i rou c gh sh sm in y oo th wo wet od sp en ott e str d ipe d lo n re c g ta n gu la rou r nd ROC area 1 Attribute Regularization chosen on holdout set

21 Evaluating attribute classifiers ROC area Color: 0.87 Texture 0.77 Shape Pattern bla c bro k wn gra g re y ora en ng e red wh i ye l te low fu me rry tall i rou c gh sh sm in y oo th wo wet od sp en ott e str d ipe d lo n re c g ta n gu la rou r nd 0.5 Attribute Lack of labeling consensus: 34% images ambiguously labeled for smooth, 28% images ambiguously labeled for rough, 21% images for any other attribute Large variety Few training examples Global SIFT codebook

22 Overview Obtaining large-scale training data Images Attribute labels Training attribute classifiers Performing higher-level tasks: Targetted retrieval Zero-shot learning Future directions

23 Targetted retrieval Binary classifier of striped objects + - Binary classifier of red objects Best striped examples Image dataset Best red examples Retrieval system Binary classifier of wooden objects Best wooden examples

24 Targetted retrieval Binary classifier of striped objects + - Retrieval system Same 5-fold framework as before Retrieval done on the test images and the ambiguous images Binary classifier of red objects Best red examples Image Results aggregated over all folds dataset Best striped examples Binary classifier of wooden objects Best wooden examples

25 Targetted retrieval Red objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Chosen features: Legend Color SIFT Shape context

26 Targetted retrieval Gray objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

27 Targetted retrieval Smooth objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

28 Targetted retrieval Furry objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

29 Targetted retrieval Spotted objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

30 Targetted retrieval Striped objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

31 Targetted retrieval Round objects Top 20 retrieved images To better interpret results, additionally train an L1regularized logistic regression Top 10 chosen features: Legend Color SIFT Shape context

32 Zero-shot learning Attribute text descriptions of target classes Zero-shot learning system + - Binary classifier of striped objects + Binary classifier of red objects Binary classifier of wooden objects Classifiers trained on other object classes Images of target object classes Images of class 1 Images of class 2... Images of class N

33 Zero-shot learning Attribute text descriptions of target classes Chestnut: brown, smooth Green lizard: green, long Honey badger: black, gray, rough, furry Zebra: striped, white Spitz: white, furry + - Binary classifier of striped objects + Binary classifier of red objects Binary classifier of wooden objects Classifiers trained on other object classes Zero-shot learning system Images of target object classes Images of class 1 Images of class 2... Images of class N

34 Zero-shot learning Attribute text descriptions of target classes Chestnut: brown, smooth Green lizard: green, long Honey badger: black, gray, rough, furry Zebra: striped, white Spitz: white, furry + - Binary classifier of striped objects Binary SVMs +Histogram intersection kernel classifier of Trained on 379 non-target red classes objects Regularized on holdout set Binary classifier of wooden objects Classifiers trained on other object classes Zero-shot learning system Images of target object classes Images of class 1 Images of class 2... Images of class N

furry Zebra: striped, white Spitz: white, furry Binary classifier of

Trained on 379 non-target classes red objects Regularized on holdout set.

object classes Zero-shot learning system Images of target object 25 images

35 Zero-shot learning Attribute text descriptions of target classes Chestnut: brown, smooth Green lizard: green, long Honey badger: black, gray, rough, furry Zebra: striped, white Spitz: white, furry Binary classifier of striped objects Binary Histogram intersection kernel SVMs classifier of Trained on 379 non-target classes red objects Regularized on holdout set Binary classifier of wooden objects Classifiers trained on other object classes Zero-shot learning system Images of target object 25 images of each classes target class Images of class 1 Images of class 2... Images of class N

36 Zero-shot learning Chestnut: brown, smooth Green lizard: green, long Honey badger: black, gray, rough, furry Zebra: black, white, striped, smooth Spitz: white, furry X= TS = Training Set P(chestnut X ) = P(brown X ) x P(smooth X ) x P(not green X ) x... P(brown TS ) x P(smooth TS ) x P(not green TS ) x... Class( X ) = argmax { P(chestnut X ), P(lizard X ), P(zebra X ) } Model of Lampert et al. Learning to Detect Unseen Object Classes... In CVPR, 2009.

Zero-shot learning Chestnut: brown, smooth 52 16 12 12 8 Green lizard: green, long 0 84 0 12 4 Honey badger: black, gray, rough, furry 32 0 60 4 4 Zebra: black, white, striped, smooth 36 8 40 8 8 8 0

37 Zero-shot learning Chestnut: brown, smooth Green lizard: green, long Honey badger: black, gray, rough, furry Zebra: black, white, striped, smooth Spitz: white, furry X= TS = Training Set P(chestnut X ) = P(brown X ) x P(smooth X ) x P(not green X ) x... P(brown TS ) x P(smooth TS ) x P(not green TS ) x... Class( X ) = argmax { P(chestnut X ), P(lizard X ), P(zebra X ) } Model of Lampert et al. Learning to Detect Unseen Object Classes... In CVPR, 2009.

38 Overview Obtaining large-scale training data Images Attribute labels Training attribute classifiers Performing higher-level tasks: Targetted retrieval Zero-shot learning Future directions

39 More attributes Global appearance Semantic Has a strap Orange, furry Similaritybased Local parts Kangarooshaped Zebracolored Has a tail Rooster head Dog body

40 More object classes Berry Fruit Entity Tree Instrument Furniture

41 Attribute learning in large-scale datasets Olga Russakovsky and Li Fei-Fei Many thanks to Alex Berg, Jia Deng, Li-Jia Li, Bangpeng Yao, Juan Carlos Niebles, and all of Stanford vision lab.

Attributes and More Crowdsourcing

Attributes and More Crowdsourcing Computer Vision CS 143, Brown James Hays Many slides from Derek Hoiem Recap: Human Computation Active Learning: Let the classifier tell you where more annotation is needed.