AttentionNet for Accurate Localization and Detection of Objects. (To appear in ICCV 2015) Donggeun Yoo, Sunggyun Park, Joon-Young Lee, Anthony Paek, In So Kweon.
State-of-the-art frameworks for object detection.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14]
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] Object proposal.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] CNN Object proposal.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] SVM CNN Object proposal.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal.
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).
State-of-the-art frameworks for object detection. 1. Region-CNN framework. [Gkioxari et al., CVPR 14] BB Reg. NMS SVM CNN Object proposal. ( ) The maximally scored region is prone to focus on discriminative part (e.g. face) rather than entire object (e.g. human body).
State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]
State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13]
State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] X 1 y 1 X 2 y 2 CNN
State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 )
State-of-the-art frameworks for object detection. 2. Detection by CNN-regression. [Szegedy et al., NIPS 13] (X 2,Y 2 ) X 1 y 1 X 2 y 2 CNN (X 1,Y 1 ) ( ) Direct mapping from an image to an exact bounding box is relatively difficult for a CNN.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction.
Idea: Ensemble of weak prediction. Stop signal
Idea: Ensemble of weak prediction. Stop signal
Idea: Ensemble of weak prediction. Stop signal Stop signal
Idea: Ensemble of weak prediction. Stop signal Stop signal
Model: Rather than CNN regression model, use CNN classification model.
Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.
Model: Rather than CNN regression model, use CNN classification model. Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.
Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 Bottom-right direction prediction. Top-left direction prediction. Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.
Model: Rather than CNN regression model, use CNN classification model. [ 3 directions, stop signal, no object ] R 5 [ 3 directions, stop signal, no object ] R 5 F F Fully connected. Fully connected. Convolution. Convolution. Convolution. Pooling. Normalization. Convolution. Pooling. Normalization. Convolution.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Iterative test: Ensemble of weak directions.
Training AttentionNet.
Training AttentionNet. 1. Generating training samples.
Training AttentionNet. 2. Minimizing the loss function by back-propagation and stochastic gradient descent. L = 1 2 L softmax y TL, t TL + 1 2 L softmax y BR, t BR.
Result. (Good examples.)
Result. (Good examples.)
Result. (Bad examples.)
How to detect multiple instance?
Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.
*Fast extraction of multi-scale dense activations.
*Fast extraction of multi-scale dense activations. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6
*Fast extraction of multi-scale dense activations. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6
*Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6
*Fast extraction of multi-scale dense activations. Idea: Fully connection can be equally implemented by convolutional layer. 227 227 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 FC 7 FC 6 322 322 3 Conv. 5 Conv. 4 Conv. 3 Conv. 2 Conv. 1 FC 8 Conv. 7 Conv. 6
*Fast extraction of multi-scale dense activations.
*Fast extraction of multi-scale dense activations.
*Fast extraction of multi-scale dense activations.
*Fast extraction of multi-scale dense activations. 4,096 Multi-scale dense activations.
*Fast extraction of multi-scale dense activations. 4,096 Each activation vector comes from each patch. Multi-scale dense activations.
Extension to multiple-instance: 1. Fast multi-scale sliding window search using fully-convolutional network.
Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint.
Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Satisfying { TL, BR }: Start iterative test.
Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.
Extension to multiple-instance: 2. Early rejection with { TL, BR } constraint. Un-satisfying { TL, BR }: Reject. Un-satisfying { TL, BR }: Reject. Satisfying { TL, BR }: Start iterative test.
Extension to multiple-instance: Overall architecture for sliding window search.
Extension to multiple-instance: Merging multiple bounding boxes.
Extension to multiple-instance: Merging multiple bounding boxes.
Extension to multiple-instance: Merging multiple bounding boxes.
Extension to multiple-instance: Merging multiple bounding boxes.
Extension to multiple-instance: Merging multiple bounding boxes.
Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person. 58.7 RCNN. PASCAL VOC 2012 Person. RCNN-based.
Evaluation on PASCAL VOC Series. AttentionNet. PASCAL VOC 2007 Person. 58.7 RCNN. AttentionNet. PASCAL VOC 2012 Person. RCNN-based.
Evaluation on PASCAL VOC Series. AttentionNet+RCNN. PASCAL VOC 2007 Person. 58.7 RCNN. AttentionNet+RCNN. PASCAL VOC 2012 Person. RCNN-based.
Evaluation on PASCAL VOC Series. PASCAL VOC 2007 Person. 58.7 Precision-recall curve on PASCAL VOC 2007 Person. PASCAL VOC 2012 Person.