Lessons Learned from Large Scale Crowdsourced Data Collection for ILSVRC. Jonathan Krause

Size: px

Start display at page:

Download "Lessons Learned from Large Scale Crowdsourced Data Collection for ILSVRC. Jonathan Krause"

Austin Todd
5 years ago
Views:

1 Lessons Learned from Large Scale Crowdsourced Data Collection for ILSVRC Jonathan Krause

2 Overview Classification Localization Detection Pelican

3 Overview Classification Localization Detection Pelican

4 Overview Classification Localization Detection Bird Frog

5 Classification Overview 1.4M images 1,000 classes By hand: 5 sec/image 50% images correct 12 hours worked/day Pelican = 324 days!

6 Crowdsourcing Let the crowd do the work for you!

7 Classification Pipeline 1. Collect candidate images for each category 2. Put candidate images on Amazon Mechanical Turk (AMT) 3. AMT workers click on images containing each class 4. Aggregate worker responses into labels

8 Collecting Images Category: Whippet Google Image Search:

9 Problem: Limited Images Web searches are limited Solution: Query Expansion WordNet: Whippet: a small slender dog of greyhound type developed in England whippet dog, whippet greyhound translate into other languages

10 Deploying on AMT Annotate many images at once!

11 Make sure workers understand the classes!

12 Understanding Classes Wikipedia and Google links

13 Understanding Classes Give them a definition delta: a low triangular area of alluvial deposits where a river divid before entering a larger body of water: the Mississippi River delta ; the Nile delta

14 Understanding Classes Test them on the definition

15 Understanding Classes Test them on the definition

16 Understanding Classes Give example images (if you have them) Hard a small slender dog of greyhound type developed in England Easy a small slender dog of greyhound type developed in England +

17 Quality Control Workers on AMT are: Fast Inexpensive Plentiful But they are not: Highly trained Solution: Multiple responses, merge results

18 Quality Control Given Set of (worker, image, response) Want P(image has label) for each image (Optionally) worker quality estimates

19 A Simple Method Majority vote Q: Is this a whippet? Responses: Yes No Yes Yes No No Yes Yes

20 Majority Vote Problems: Doesn t give confidence Hard to measure worker quality Responses: Yes No Yes Yes No No Yes How sure are we it s positive? How good are these workers?

21 One Approach Annotate a subset of images with many annotations Majority vote to determine ground truth Determine confidence given fewer annotations Deng et al. 2009

22 Pro & Con Pro Simple Gives image confidence Con Treats all workers the same Relies on initial majority vote

23 Another Approach Model: Prior of label correct Worker confusion matrix Max-likelihood with EM Dawid, Skene. 1979

24 Another Approach Worker Quality Compute Soft Label: distribution over labels given worker response Calculate expected cost of soft label q: Ipeirotis, Provost, Wang. 2012

25 Pro & Con Pro Gives image confidence Gives worker quality Con More complex Need to run optimization

26 Overview Classification Localization Detection Pelican

27 Localization Overview Classification images 1,000 classes 600k training bounding boxes Pelican Main Challenge: Collecting and verifying bounding boxes

28 Bounding Boxes Requirements: Tight around object Around all object instances Not around other objects bounding boxes for bottle Su, Deng, Fei-Fei. 2012

29 Tasks 1. Draw a bounding box around a single instance 2. Quality verification of bounding box 3. Coverage verification

30 Drawing Intuitively simple.. But the devil is in the details

31 Drawing Things vision researchers take for granted Include all visible parts Include only visible parts Make the bounding box tight Only include a single instance Don t draw over any instances that already have bounding boxes What if there are no unannotated objects? Provide instructions and use a qualification task!

32 Drawing Include all visible parts Good Bad

33 Drawing Include only all visible parts Don t try to complete the object Good Bad

34 Drawing Make the bounding box tight Even though loose is much faster Good Bad

35 Drawing Only include a single instance Good Bad

36 Drawing Don t draw over instances that already have bounding boxes Can enforce this in the UI Good Bad

37 Drawing What if there are no unannotated objects? Give option to annotate no bounding boxes Good Bad No more objects anything else

38 Quality Verification Simpler than bounding box drawing Still has some details Is this bounding box good? YES

39 Quality Verification Details: Still need to know about good bounding boxes Quality control Is this bounding box good? YES

40 Quality Verification Quality control Embed gold standard images Positives: Majority vote Negatives: Perturb the positives Reject annotations if bad answers to these Can be used for almost any type of task! (Optionally) require agreement of more than one annotator

41 Coverage Verification Similar in style to quality verification Just a different question Still need instructions, quality control Any unannotated raccoons? Nope!

42 Bounding Boxes: Misc. Provide definitions and example images! Especially if uncommon objects But also helps with common objects Annotators from different cultures Make sure objects being annotated are actually in your images Do the classification task first

43 Bounding Boxes: Misc. Make qualification tasks Verification tasks are much faster than drawing Corner cases: Each task needs plan for when previous task goes wrong.

44 Detection Overview 456k training images 61k fully-annotated val+test 200 classes Bird Frog

45 Detection Overview 456k training images 61k fully-annotated val+test 200 classes Bird Frog Main Challenge: Annotating all 200 classes in every image.

46 Detection Pipeline 1. Collect images 2. Class presence annotation 3. Bounding box annotation Bird Frog

47 Detection Pipeline 1. Collect images 2. Class presence annotation 3. Bounding box annotation Same as previous Bird Frog

48 Detection Pipeline 1. Collect images 2. Class presence annotation 3. Bounding box annotation Bird Frog

49 Collecting Images Need images that aren t single object-centric Additional queries: Compound object queries ( tiger lion, skunk and cat ) Complex scene queries ( kitchenette, dining table, orchestra )

50 Detection Pipeline 1. Collect images 2. Class presence annotation 3. Bounding box annotation Bird Frog

51 Naive approach: ask for each object Table Chair Horse Dog Cat Bird?????? Question Is there a table? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer Yes

52 Naive approach: ask for each object Table Chair Horse Dog Cat Bird +????? Question Is there a table? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer Yes

53 Naive approach: ask for each object Table Chair Horse Dog Cat Bird + +???? Question Is there a chair? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer Yes

54 Naive approach: ask for each object Table Chair Horse Dog Cat Bird + + -??? Question Is there a horse? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer No

55 Naive approach: ask for each object Table Chair Horse Dog Cat Bird ?? Question Is there a dog? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer No

56 Naive approach: ask for each object Table Chair Horse Dog Cat Bird ? Question Is there a cat? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer No

57 Naive approach: ask for each object Table Chair Horse Dog Cat Bird Question Is there a bird? Machine Crowd Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014 Answer No

58 Naive approach: ask for each object Cost: O(NK) for N images and K objects Table Chair Horse Dog Cat Bird Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

59 Hierarchy Animal Furniture Mammal Table Chair Horse Dog Cat Bird Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

60 Hierarchy Furniture Mammal Animal Table Chair Horse Dog Cat Bird Correlation Sparsity Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

61 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird?????? Question Machine Crowd Answer Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

62 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird?????? Question Is there an animal? Machine Crowd Answer No Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

63 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird?? Question Is there an animal? Machine Crowd Answer No Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

64 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird?? Question Is there furniture? Machine Crowd Answer Yes Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

65 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird?? Machine Question Is there a table? Crowd Answer Yes Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

66 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird +? Machine Question Is there a chair? Crowd Answer Yes Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

67 Better approach: exploit label structure Animal Furniture Mammal Table Chair Horse Dog Cat Bird Machine Question Is there a chair? Crowd Answer Yes Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

68 Selecting the Right Question Goal: Get as much utility (new labels) as possible, for as little cost (worker time) as possible, given a desired level of accuracy. Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

69 Accuracy constraint User-specified accuracy threshold, e.g., 95% Might require only one worker, might require several based on the task Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

70 Cost: worker time (time = money) expected human time to get an answer with 95% accuracy Question (is there ) Cost (second) a thing used to open cans/bottles 14.4 an item that runs on electricity (plugged in or using batteries) 12.6 a stringed instrument 3.4 a canine 2.0 Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

71 Utility: expected # of new labels Table Chair Horse Dog Cat Bird?????? Is there a table? Yes No utility = 1 Table Chair Horse Dog Cat Bird +????? Table Chair Horse Dog Cat Bird -????? Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

72 Utility: expected # of new labels Table Chair Horse Dog Cat Bird?????? Is there a table? Yes No utility = 1 Table Chair Horse Dog Cat Bird +????? Table Chair Horse Dog Cat Bird -????? Pr(Y) = 0.5 Table Chair Horse Dog Cat Bird Table Chair Horse Dog Cat Bird?????? Is there an animal? Pr(N) = 0.5?????? Table Chair Horse Dog Cat Bird?? utility = 0.5 * * 4 = 2 Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

73 Selecting the Right Question Pick the question with the most labels per second Query: Is there a... mammal with claws or fingers Utility (num labels) Cost (worker time in secs) Utility- Cost Ratio (labels per sec) living organism mammal creature without legs land or avian creature Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

74 Results Dataset: 20K images from ImageNet Challenge Labels: 200 basic categories (dog, cat, table ) 64 internal nodes in hierarchy Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

75 Results: accuracy Annotating 10K images with 200 objects Accuracy Threshold per question (parameter) Accuracy (F1 score) Naive approach Accuracy (F1 score) Our approach (75.67) (76.97) (60.17) (60.69) Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

compared to naive approach) 0.95 3.93x 0.90 6.

76 Results: cost Annotating 10K images with 200 objects Accuracy Threshold per question (parameter) Cost saving (our approach compared to naive approach) x x 6 times more labels per second Deng, Russakovsky, Krause, Bernstein, Berg, Fei- Fei. CHI 2014

77 Overview Classification Localization Detection Bird Frog

78 Final Thoughts Provide good instructions Do quality control Visualize results Listen to your workers

79 Questions?

Crowdsourcing Annotations for Visual Object Detection

Crowdsourcing Annotations for Visual Object Detection Hao Su, Jia Deng, Li Fei-Fei Computer Science Department, Stanford University Abstract A large number of images with ground truth object bounding boxes