AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)

Size: px

Start display at page:

Download "AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015)"

Pierce Clarke
5 years ago
Views:

2 AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon.

3 State-of-the-art frameworks for object detection.

4 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14]

5 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] Object proposal.

6 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] CNN Object proposal.

7 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] SVM CNN Object proposal.

8 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.

9 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.

10 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).

11 State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).

12 State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]

13 State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]

14 State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] X 1 y 1 X 2 y 2 CNN

15 State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 )

16 State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 ) ( ) Direct mapping from an image to an exact bounding box is relatively difficult for a CNN.

17 Idea: Ensemble of weak prediction.

18 Idea: Ensemble of weak prediction.

19 Idea: Ensemble of weak prediction.

20 Idea: Ensemble of weak prediction.

21 Idea: Ensemble of weak prediction.

22 Idea: Ensemble of weak prediction.

23 Idea: Ensemble of weak prediction. Stop signal

24 Idea: Ensemble of weak prediction. Stop signal

25 Idea: Ensemble of weak prediction. Stop signal Stop signal

26 Idea: Ensemble of weak prediction. Stop signal Stop signal

27 Model: Rather than CNN regression model, use CNN classification model.

28 Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

29 Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

30 Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

31 Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 F F Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.

32 Iterative test: Ensemble of weak directions.

33 Iterative test: Ensemble of weak directions.

34 Iterative test: Ensemble of weak directions.

35 Iterative test: Ensemble of weak directions.

36 Iterative test: Ensemble of weak directions.

37 Iterative test: Ensemble of weak directions.

38 Iterative test: Ensemble of weak directions.

39 Iterative test: Ensemble of weak directions.

40 Training AttentionNet.

41 Training AttentionNet. 1. Generating training samples.

42 Training AttentionNet. 2. Minimizing the loss function by back-propagation and stochastic gradient descent. L = 1 2 L softmax y TL, t TL L softmax y BR, t BR.

43 Result. (Good examples.)

44 Result. (Good examples.)

45 Result. (Bad examples.)

47 How to detect multiple instance?

48 Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.

49 *Fast extraction of multi-scale dense activations.

50 *Fast extraction of multi-scale dense activations Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

51 *Fast extraction of multi-scale dense activations Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

52 *Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6

53 *Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 Conv. 7 Conv. 6

54 *Fast extraction of multi-scale dense activations.

55 *Fast extraction of multi-scale dense activations.

56 *Fast extraction of multi-scale dense activations.

57 *Fast extraction of multi-scale dense activations. 4,096 Multi-scale dense activations.

58 *Fast extraction of multi-scale dense activations. 4,096 Each activation vector comes from each patch. Multi-scale dense activations.

59 Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.

60 Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint.

61 Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Satisfying { TL, BR }: Start iterative test.

62 Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.

63 Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.

64 Extension to multiple-instance: Overall architecture for sliding window search.

65 Extension to multiple-instance: Merging multiple bounding boxes.

66 Extension to multiple-instance: Merging multiple bounding boxes.

67 Extension to multiple-instance: Merging multiple bounding boxes.

68 Extension to multiple-instance: Merging multiple bounding boxes.

69 Extension to multiple-instance: Merging multiple bounding boxes.

70 Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person RCNN. PASCAL VOC 2012 Person. RCNN-based.

71 Evaluation on PASCAL VOC Series. AttentionNet. PASCAL VOC 2007 Person RCNN. AttentionNet. PASCAL VOC 2012 Person. RCNN-based.

72 Evaluation on PASCAL VOC Series. AttentionNet+RCNN. PASCAL VOC 2007 Person RCNN. AttentionNet+RCNN. PASCAL VOC 2012 Person. RCNN-based.

Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person. 58.

73 Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person Precision-recall curve on PASCAL VOC 2007 Person. PASCAL VOC 2012 Person.

Spatial Localization and Detection. Lecture 8-1

Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday