End-to-End Localization and Ranking for Relative Attributes

Size: px

Start display at page:

Download "End-to-End Localization and Ranking for Relative Attributes"

Roderick Knight
5 years ago
Views:

1 End-to-End Localization and Ranking for Relative Attributes Krishna Kumar Singh and Yong Jae Lee Presented by Minhao Cheng

2 [Farhadi et al. 2009, Kumar et al. 2009, Lampert et al. 2009, [Slide: Xiao and Lee, ICCV 2015] Berg et al. 2010, Rastegari et al. 2012, ] Visual attributes High heel Smile Cozy Mountainous

3 Relative attributes < Is she smiling? Hard to say... Lot easier to say "the right one is more smiling" [Slide: Xiao and Lee, ICCV 2015] [Parikh & Grauman 2011, Shrivastava et al. 2012, Kovashka et al. 2013, Sandeep et al. 2014, ]

4 Localization of attributes Cozy Smile Mountainous [Slide: Xiao and Lee, ICCV 2015] Spatial regions that are most relevant to a particular attribute

5 Prior work on localizing attributes Attribute localization with pre-trained detectors: [Bourdev et al. 2011, Zhang et al. 2014, Sandeep et al. 2014] Requires strong human supervision or binary attribute annotations Attribute localization with human-in-the-loop: [Duan et al. 2012] [Slide: Xiao and Lee, ICCV 2015] Attribute localization with binary attributes: [Berg et al. 2010, Bourdev et al. 2011, Duan et al. 2012, Zhang et al. 2014]

6 Prior work on localizing attributes [Slide: Xiao and Lee, ICCV 2015] Attribute localization in weakly-supervised setting: [Xiao and Lee, ICCV 2015] Pipeline where features, localizer, and classifier are trained separately and sequentially; suboptimal and slow

7 End-to-end network for attribute localization and ranking Our idea: jointly learn features, localizer, and ranker end-to-end using deep network [Singh and Lee, ECCV 2016]

8 End-to-end network for attribute localization and ranking Our idea: jointly learn features, localizer, and classifier end-to-end using deep network Attribute: Smile Training Training pairs [Singh and Lee, ECCV 2016]

9 End-to-end network for attribute localization and ranking Our idea: jointly learn features, localizer, and classifier end-to-end using deep network Attribute: Smile Training Training pairs Weak Strong Testing Test images

[Singh and Lee, ECCV 2016] Overview of our

Attribute: Smile I 1 Localization Ranker V 1

10 [Singh and Lee, ECCV 2016] Overview of our end-to-end approach Goal: Given pairs of ordered training images, simultaneously localize attribute in each image and learn a ranker Attribute: Smile I 1 Localization Ranker V 1 Siamese (S 1 ) Loss Function I 2 Localization Ranker V 2 Siamese (S 2 )

11 Our end-to-end approach 96 θ Grid generator V Localization I Ranker [Singh and Lee, ECCV 2016]

12 Our end-to-end approach 96 θ Grid generator Localization I Localization network discovers the region-of-interest for the attribute Learn transformation parameters mapping input to output Spatial Transformer s [Jaderberg et al. 2014] [Singh and Lee, ECCV 2016]

13 Our end-to-end approach 96 θ Grid generator Localization 8192 V 1 I Ranker Ranker network takes the localized region to produce a ranking score Combine the global image for global context [Singh and Lee, ECCV 2016]

14 [Singh and Lee, ECCV 2016] Training Attribute: Smile I 1 Localization Ranker V 1 Siamese (S 1 ) Loss Function I 2 Localization Ranker V 2 Siamese (S 2 ) Cross entropy:

15 [Singh and Lee, ECCV 2016] Training Attribute: Smile I 1 Localization Ranker V 1 Siamese (S 1 ) Loss Function I 2 Localization Ranker V 2 Siamese (S 2 ) Localized region can fall outside image bounds making learning difficult

16 [Singh and Lee, ECCV 2016] Training Attribute: Smile I 1 Localization Ranker V 1 Siamese (S 1 ) Loss Function I 2 Localization Ranker V 2 Siamese (S 2 ) Optimized using backpropagation, mini-batch Stochastic Gradient Descent

17 [Singh and Lee, ECCV 2016] Progression of localized region over training epochs Attribute: Dark hair Attribute: Smile Training epochs Heatmap: distribution of localized region across entire training dataset

18 Testing Localization Ranker V test Siamese (S 1 ) Test image Localize the relevant attribute region Produce a ranking score for the test image [Singh and Lee, ECCV 2016]

19 Experiments: Relative attribute datasets LFW-10 (2k images) [Sandeep et al., CVPR 2014] UTZappos50k (50k images) [Yu & Grauman, CVPR 2014] Visible teeth, Eyes open, Dark hair, Smile, Good looking... [Singh and Lee, ECCV 2016] Pointy, Open, Sporty, Comfort

20 Results: Discovered regions and ranking on LFW-10 Faces Weak Strong Smile Bald Dark hair Eyes open Our network discovers relevant attribute regions Leads to accurate rankings

21 Results: Discovered regions and ranking on LFW-10 Faces Weak Strong Good looking Masculine Young [Singh and Lee, ECCV 2016] Global attributes are harder to interpret Focus more on larger areas

22 [Singh and Lee, ECCV 2016] Results: Discovered regions and ranking UT-Zap50K Shoes Weak Strong Comfort Open Pointy Sporty

23 Results: Image pair ranking accuracy % of test image pairs whose predicted relative attribute ranking is correct State-of-the-art results on LFW-10, UT-Zap50K, OSR, Shoe-with-Attribute Combing global image context w/ localized fine-grained information performs best [Singh and Lee, ECCV 2016]

24 Conclusions Novel end-to-end network for ranking and localizing attributes. State-of-the-art performance on the attribute ranking performance on benchmark face, shoe, and outdoor scene datasets. Our Our approach is 100 times faster than [Xiao & Lee].

25 Question What if we can use multiple localization network instead of one to help to get a better performance? (like we can use the eye s feature to help ranking the smile attribute as well)

End-to-End Localization and Ranking for Relative Attributes

End-to-End Localization and Ranking for Relative Attributes Krishna Kumar Singh and Yong Jae Lee University of California, Davis Abstract. We propose an end-to-end deep convolutional network to simultaneously