Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

Size: px

Start display at page:

Download "Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds"

Letitia Parks
5 years ago
Views:

1 Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds Sudheendra Vijayanarasimhan Kristen Grauman Department of Computer Science University of Texas at Austin Austin, Texas

2 Object Detection Challenge: Best results require large amount of cleanly labeled training examples. 2

3 Ways to Reduce Effort Active learning minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

4 Ways to Reduce Effort Active learning Labeled data Unlabeled data minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

5 Ways to Reduce Effort Active learning Labeled data Current category model Unlabeled data minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

6 Ways to Reduce Effort Active learning Labeled data Current category model Unlabeled data Active Selection Function minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

7 Ways to Reduce Effort Active learning Labeled data Current category model Unlabeled data Active Selection Function Actively selected example(s) minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

8 Ways to Reduce Effort Active learning Labeled data Annotator(s) Current category model Unlabeled data Annotation request Active Selection Function Actively selected example(s) minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

9 Ways to Reduce Effort Active learning Labeled data Annotator(s) Current category model Unlabeled data Annotation request Active Selection Function Actively selected example(s) minimize effort by focusing label requests on the most informative examples [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] 3

Ways to Reduce Effort Active learning Crowd-sourced annotations Labeled data Annotator(s) Current category model Unlabeled data Annotation request

ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al.

10 Ways to Reduce Effort Active learning Crowd-sourced annotations Labeled data Annotator(s) Current category model Unlabeled data Annotation request Active Selection Function minimize effort by focusing label requests on the most informative examples Actively selected example(s) [Kapoor et al. ICCV 2007, Qi et al. CVPR 2008, Vijayanarasimhan et al. CVPR 2009, Joshi et al. CVPR 2009, Siddique et al. CVPR 2010] package annotation tasks to obtain from online human workers [von Ahn et al. CHI 2004, Russell et al. IJCV 2007, Sorokin et al. 2008, Welinder et al. ACVHL 2010, Deng et al, CVPR 2009] 3

11 Problem Thus far techniques are only tested in artificially controlled settings: 4

12 Problem Thus far techniques are only tested in artificially controlled settings: use sandbox datasets - dataset s source and scope is fixed 4

13 Problem Thus far techniques are only tested in artificially controlled settings: use sandbox datasets - dataset s source and scope is fixed computational cost of active selection and retraining the model generally ignored - linear/quadratic time 4

14 Problem Thus far techniques are only tested in artificially controlled settings: use sandbox datasets - dataset s source and scope is fixed computational cost of active selection and retraining the model generally ignored - linear/quadratic time crowd-sourced collection requires iterative fine-tuning 4

15 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. 5

16 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. break free from dataset-based learning 5

17 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. break free from dataset-based learning collect information on the fly (no manual intervention) 5

18 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. break free from dataset-based learning collect information on the fly (no manual intervention) large-scale data 5

19 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. break free from dataset-based learning collect information on the fly (no manual intervention) large-scale data Our Approach: Live Learning Live active learning system that autonomously builds models for object detection 5

20 Goal Goal Take active learning and crowd-sourced annotation collection out of the sandbox. break free from dataset-based learning collect information on the fly (no manual intervention) large-scale data Our Approach: Live Learning Live active learning system that autonomously builds models for object detection bicycle Category model 5

21 Our Approach: Live Learning bicycle 6

22 Our Approach: Live Learning bicycle Unlabeled images 6

23 Our Approach: Live Learning bicycle Generate object windows Unlabeled images Unlabeled windows 6

24 Our Approach: Live Learning bicycle Select images to annotate Unlabeled images Generate object windows Unlabeled windows Actively selected examples 6

25 Our Approach: Live Learning Labeled data Online annotation collection bicycle Select images to annotate Unlabeled images Generate object windows Unlabeled windows Actively selected examples 6

26 Our Approach: Live Learning Labeled data Online annotation collection Object representation and classifier bicycle Category model Select images to annotate Unlabeled images Generate object windows Unlabeled windows Actively selected examples 6

27 Main Contributions Linear classification part-based linear detector based on non-linear feature coding 7

28 Main Contributions Linear classification part-based linear detector based on non-linear feature coding Large-scale active selection sub-linear time hashing scheme for efficiently selecting uncertain examples [ Jain, Vijayanarasimhan & Grauman, NIPS 2010] 7

29 Main Contributions Linear classification part-based linear detector based on non-linear feature coding Large-scale active selection sub-linear time hashing scheme for efficiently selecting uncertain examples [ Jain, Vijayanarasimhan & Grauman, NIPS 2010] Live learning results for active detection of unprecedented scale and autonomy for the first time 7

30 Outline Labeled data Online annotation collection? Object representation and classifier bicycle Category model Select images to annotate Unlabeled images Generate object windows Unlabeled windows Actively selected examples 8

31 Outline Labeled data bicycle? Object representation and classifier Online annotation collection Linear classification fast/incremental training using linear SVM Category model Select images to annotate Unlabeled images Generate object windows Unlabeled windows Actively selected examples 8

32 Outline Labeled data bicycle Generate object windows? Object representation and classifier Category model Online annotation collection Select images to annotate Actively selected examples Linear classification fast/incremental training using linear SVM efficient active selection using our hyperplane hash functions Unlabeled images Unlabeled windows 8

33 Object Representation and Classifier Part based object representation 9

34 Object Representation and Classifier Part based object representation Root 9

35 Object Representation and Classifier Part based object representation Root Parts 9

36 Object Representation and Classifier Part based object representation Root Parts Context 9

37 Object Representation and Classifier Part based object representation Sparse Max Pooling Root [φ(r) Parts Context 9

38 Object Representation and Classifier Part based object representation Sparse Max Pooling Root Parts [φ(r), φ(p 1 ) φ(p P ) Context 9

39 Object Representation and Classifier Part based object representation Sparse Max Pooling Root Parts Context [φ(r), φ(p 1 ) φ(p P ), φ(c 1 ) φ(c C )] 9

40 Object Representation and Classifier Part based object representation Sparse Max Pooling Root Parts Context [φ(r), φ(p 1 ) φ(p P ), φ(c 1 ) φ(c C )] Sparse Max Pooling [Yang et al. 10] similar to bag of words: 9

41 Object Representation and Classifier Part based object representation Sparse Max Pooling Root Parts Context [φ(r), φ(p 1 ) φ(p P ), φ(c 1 ) φ(c C )] Sparse Max Pooling [Yang et al. 10] similar to bag of words: sparse coding fuller representation of original features 9

42 Object Representation and Classifier Part based object representation Sparse Max Pooling Root Parts Context [φ(r), φ(p 1 ) φ(p P ), φ(c 1 ) φ(c C )] Sparse Max Pooling [Yang et al. 10] similar to bag of words: sparse coding fuller representation of original features max pooling better discriminability in clutter [Boureau 10]. 9

43 Object Representation and Classifier Part based object representation Score windows as linear sum: Sparse Max Pooling Root Parts Context [φ(r), φ(p 1 ) φ(p P ), φ(c 1 ) φ(c C )] f (O) = w T ϕ(o) P = w r ϕ(r) + i=1 C + w ci ϕ(c i ), i=1 w - linear SVM weights w pi ϕ(p i ) Sparse Max Pooling [Yang et al. 10] similar to bag of words: sparse coding fuller representation of original features max pooling better discriminability in clutter [Boureau 10]. 9

44 Object Representation and Classifier Relationship to existing detection models Spatial pyramid (SP) [Lazebnik et al. 06, Vedaldi et al. 09] Hard VQ +avg pooling local features, discard locs per window + + Latent SVM (LSVM) [Felzenswalb et al. 09] + root parts deformations dense gradients at fixed locs within window + 10

45 Object Representation and Classifier Relationship to existing detection models Spatial pyramid (SP) [Lazebnik et al. 06, Vedaldi et al. 09] Hard VQ +avg pooling local features, discard locs per window + + Sparse code +max pooling Ours root Latent SVM (LSVM) [Felzenswalb et al. 09] + root parts deformations dense gradients at fixed locs within window + parts + local features, discard locs per window (r) (p 1 ) (p P ) 10

46 Object Representation and Classifier Relationship to existing detection models Spatial pyramid (SP) [Lazebnik et al. 06, Vedaldi et al. 09] Hard VQ +avg pooling local features, discard locs per window + + Sparse code +max pooling Ours root Latent SVM (LSVM) [Felzenswalb et al. 09] + root parts deformations dense gradients at fixed locs within window + parts + local features, discard locs per window (r) (p 1 ) (p P ) faster training (linear SVM) 10

47 Object Representation and Classifier Relationship to existing detection models Spatial pyramid (SP) [Lazebnik et al. 06, Vedaldi et al. 09] Hard VQ +avg pooling local features, discard locs per window + + Sparse code +max pooling Ours root Latent SVM (LSVM) [Felzenswalb et al. 09] + root parts deformations dense gradients at fixed locs within window + parts + local features, discard locs per window (r) (p 1 ) (p P ) faster training (linear SVM) results comparable to non-linear detectors 10

48 Selecting Images to Annotate Labeled data Online annotation collection Part-based linear detector bicycle Jumping window prediction Category model Select images to annotate? Actively selected examples Unlabeled images Unlabeled windows 11

Selecting Images to Annotate Labeled data Online annotation collection bicycle Jumping window prediction Part-based linear detector Category model

49 Selecting Images to Annotate Labeled data Online annotation collection bicycle Jumping window prediction Part-based linear detector Category model Select images to annotate? Actively selected examples Efficient active selection select most useful windows from millions Unlabeled images Unlabeled windows 11

50 Active Selection of Object Windows SVM SVM margin margin criterion criterion for activefor selection active selection? w Select point Select nearest point to nearest to hyperplane hyperplane decision decision boundary for labeling. boundary for labeling. x = argmin w T x i x i U [Tong & Koller, 2000; Schohn & Cohen, 2000; [Tong & Koller, Campbell 2000; etschohn al. 2000] & Cohn, 2000; Campbell et al. 2000] Problem: With massive unlabeled pool, cannot afford exhaustive linear scan to make selection. 12

51 Active Selection of Object Windows SVM SVM margin margin criterion criterion for activefor selection active selection? w Select point Select nearest point to nearest to hyperplane hyperplane decision decision boundary for labeling. boundary for labeling. x = argmin w T x i x i U [Tong & Koller, 2000; Schohn & Cohen, 2000; [Tong & Koller, Campbell 2000; etschohn al. 2000] & Cohn, 2000; Campbell et al. 2000] Problem: With massive unlabeled pool, cannot afford exhaustive linear scan to make selection. Problem: With massive unlabeled pool, cannot afford exhaustive linear scan to make selection. 12

52 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing hash function h(.) - high probability of collision when ϕ(o) close to w [Jain, Vijayanarasimhan and Grauman, NIPS 2010] 13

53 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing hash function h(.) - high probability of collision when ϕ(o) close to w [Jain, Vijayanarasimhan and Grauman, NIPS 2010] Unlabeled windows 13

54 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table h( ϕ O )) ( i 1100 Hash table Unlabeled windows 13

55 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table h( ϕ O )) ( i Hash table Unlabeled windows 13

56 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table active learning loop - hash classifier w and retrieve examples Category model Unlabeled windows h( ϕ O )) ( i Hash table 13

57 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table active learning loop - hash classifier w and retrieve examples Category model Unlabeled windows h( w) h( ϕ O )) ( i Hash table 13

58 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table active learning loop - hash classifier w and retrieve examples Category model Unlabeled windows h( w) h( ϕ O )) ( i Hash table Sub-linear Time Retrieval 13

Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.

59 Active Selection of Object Windows Sub-linear time selection through hyperplane hashing [Jain, Vijayanarasimhan and Grauman, NIPS 2010] hash function h(.) - high probability of collision when ϕ(o) close to w preprocess - hash unlabeled windows into table active learning loop - hash classifier w and retrieve examples evaluate 10 3 windows vs for exhaustive Category model Unlabeled windows h( w) h( ϕ O )) ( i Hash table Sub-linear Time Retrieval 13

60 Online Annotation Collection Labeled data? Online annotation collection bicycle Part-based linear detector Category model h( w) on the fly reliable annotations without pruning 1111 Jumping window prediction h( ϕ O )) ( i Hash table Actively selected examples Unlabeled images Unlabeled windows 14

61 Online Annotation Collection Mechanical Turk Interface 15

62 Online Annotation Collection Mechanical Turk Interface post same image to multiple (5-10) annotators 15

63 Online Annotation Collection Mechanical Turk Interface post same image to multiple (5-10) annotators cluster all bounding boxes to obtain consensus 15

64 Online Annotation Collection Mechanical Turk Interface post same image to multiple (5-10) annotators cluster all bounding boxes to obtain consensus 15

65 Summary: Live Learning Labeled data Consensus (Mean shift) Part-based linear detector bicycle h( w) 1100 Category model Jumping window prediction ( O )) h ϕ ( i Hash table Actively selected examples Unlabeled images Unlabeled windows 16

66 Results PASCAL 2007 challenge 20 different objects under changes in viewpoint, scale, and background clutter training and test examples given an image detect all objects 17

67 Results PASCAL 2007 challenge 20 different objects under changes in viewpoint, scale, and background clutter training and test examples given an image detect all objects Live learning on Flickr 6 of the most challenging PASCAL objects New Flickr testset 17

68 Results PASCAL 2007 challenge 20 different objects under changes in viewpoint, scale, and background clutter training and test examples given an image detect all objects Live learning on Flickr 6 of the most challenging PASCAL objects New Flickr testset Features 30,000 SIFT features densely extracted 60,000 words with hierarchical kmeans sparse coding using LLC [Yang et al. 10] 17

69 Results PASCAL 2007 challenge 20 different objects under changes in viewpoint, scale, and background clutter training and test examples given an image detect all objects Live learning on Flickr 6 of the most challenging PASCAL objects New Flickr testset Features 30,000 SIFT features densely extracted 60,000 words with hierarchical kmeans sparse coding using LLC [Yang et al. 10] Implementation 12 parts from LSVM detector 100 images per active iteration 17

70 Sandbox Results (PASCAL 2007) Comparison to state-of-art aero. cat dog sheep sofa train bicyc. bird boat bottl bus Mean BoF SP Ours part-based, single feature representation, linear model

71 Sandbox Results (PASCAL 2007) Comparison to state-of-art aero. cat dog sheep sofa train bicyc. bird boat bottl bus Mean BoF SP Ours LSVM+HOG SP+MKL [Felzenszwalb et al. 09] 2 [Vedaldi et al. 09] part-based, single feature representation, linear model competitive with state-of-art (better for 6 classes) 18

72 Live Learning Results Live learning tested on PASCAL testset 19

73 Live Learning Results Live learning tested on PASCAL testset bird boat dog potted plant sheep chair Ours Previous best Significant improvements in state-of-art on challenging categories 19

74 Live Learning Results Live learning tested on PASCAL testset bird boat dog potted plant sheep chair Ours Previous best Significant improvements in state-of-art on challenging categories Computation Time Active selection Training Detection per image Ours + active 10 mins 5 mins 150 secs LSVM [Felzenszwalb et al. 2009] 3 hours 4 hours 2 secs SP+MKL [Vedaldi et al. 2009] 93 hours > 2 days 67 secs Our approach s efficiency makes live learning feasible. 19

75 Live Learning Results Live learning tested on Flickr testset General test set of web images 20

76 Live Learning Results Live learning tested on Flickr testset General test set of web images keyword+image: randomly select a keyword filtered image to get bounding box keyword+window: randomly select a window within a keyword filtered image and obtain binary label 20

77 Live Learning Results Live learning tested on Flickr testset General test set of web images Average Precision Live active (ours) Keyword+image Keyword+window boat dog bird pottedplant sheep Annotations added, out of 3 million examples chair keyword+image: randomly select a keyword filtered image to get bounding box keyword+window: randomly select a window within a keyword filtered image and obtain binary label 20

78 Live Learning Results Live learning tested on Flickr testset General test set of web images Average Precision Live active (ours) Keyword+image Keyword+window boat dog bird pottedplant sheep Annotations added, out of 3 million examples chair keyword+image: randomly select a keyword filtered image to get bounding box keyword+window: randomly select a window within a keyword filtered image and obtain binary label dramatic improvements for most categories 20

79 Live Learning Results Live learning tested on Flickr testset General test set of web images Average Precision Live active (ours) Keyword+image Keyword+window boat dog bird pottedplant sheep Annotations added, out of 3 million examples chair keyword+image: randomly select a keyword filtered image to get bounding box keyword+window: randomly select a window within a keyword filtered image and obtain binary label dramatic improvements for most categories outperforms status quo approach of learning 20

80 Conclusions autonomous online learning - break-free from sandbox learning no intervention in example/annotation selection or pruning obtains results better than state-of-art on challenging PASCAL dataset largest scale active learning results to our knowledge 21

Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds

To appear, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds Sudheendra