PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

Size: px
Start display at page:

Download "PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL"

Transcription

1 PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing, P.R. China {louyingxin; jiangzhuqing; menad}@bupt.edu.cn 2 Academy of Broadcasting Science, Beijing, P.R. China {fuguangtao; zhouyun}@abs.ac.cn ABSTRACT Our Pt-Net is a novel object detection network based on a pre-trained and multi-feature VGG-16 network. Firstly, Pt- Net is initialized by a pre-trained VGG-16 model and its own CNN output via a linear combination. Secondly, Pt-Net generates proposals via particle filter method on Conv5 feature map and crops the multi-feature maps which are combined by fusing hierarchical CNN features in corresponding positions. After that, we apply multi-feature concatenation for the cropped parts for more image feature information and adopt a novel two-dimensional overlap area loss function for localization. Finally, we apply our Pt-Net on both object detection task and face detection task which are trained on the PAS- CAL VOC dataset and WIDER FACE dataset. Pt-Net can achieve a map of 76.8% on the detection of PASCAL VOC 2007 dataset and state-of-the-art results on the FDDB benchmark at 43 fps on an NVIDIA GTX 1070p GPU. Index Terms Convolutional Neural Networks, Pretrained model, Particle filter, Multi-feature, Overlap Loss 1. INTRODUCTION Deep convolutional neural networks (CNNs) [1] have been used in many domains especially for computer vision tasks and made impressive improvements such as object detection which includes determining their categories (object classification) and finding the locations of objects in an image (object localization). Since the successful usage of trained CNNs on the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [1], object detection using CNNs has made it possible to replace the traditional image features such as SIFT [2] and HOG [3] with high level object representations obtained from the output of a CNN model. The CNN-based object detectors have achieved state-of-the-art results in many application tasks such as object detection [4, 5, 6, 7, 8], face detection [9, 10, 11, 12] and others [13, 14]. State-of-the-art object detection methods, such as R- CNN [4] proposed by Girshick et al., typically adopt region Thanks to National Science Foundation of China (NO , NO ) for funding. proposal methods which is a pre-processing step to provide a set of candidate bounding boxes that roughly localize the objects in an image and then refine the rough proposals to achieve precise localization. Selective Search (SS) [15] which greedily merges superpixels based on engineered lowlevel features is one of the most popular proposal methods. Then, the proposals in each image are warped into a fixed size for CNNs input and transformed into 4096-dimensional feature vectors. However, R-CNN has a problem of heavily computational burden in extracting features for each proposal. Hence, SPP-net [16] and Fast R-CNN [5] are proposed to extract image features for once through CNNs. Fast R-CNN [5] with region proposal network has been an impressive detector based on the PASCAL VOC [17] and ImageNet [1] datasets. However, region proposal generation is implemented on GPU at nearly 2s per image which can be a major computational bottleneck in the detection pipeline. To reduce time, Faster R-CNN [6] proposes RPNs (Regional Proposal Networks) which share convolutional (Conv) layer parameters and use 3 scales and 3 aspect ratios for 9 anchors in each grid cell and gets high-quality region proposals via the VGG-16 [18] network. However, Faster R-CNN [6] still has a few drawbacks: (1) The classical VGG-16 [18] network architecture for Faster R-CNN [6] is only initialized by the ILSVRC [1] parameter weights. However, the output of CNNs is also meaningful and we can combine them for more precise detection. (2) It only adopts Conv5 feature map which is not accurate enough for object detection and bounding boxes cannot cover objects nicely for localization. We can combine multiple layers because lower layer has natural high-resolution features for localization and higher layer has more semantic information for classification. (3) Faster R-CNN [6] generates proposals using 9 anchors which is coarse for enclosing the various objects. So we propose a rapid method of generating proposals via particle filter. (4) We cannot enclose objects precisely via traditional smooth L1 bounding-box regression which has poor adaptive ability. Therefore, we are thinking about adding coordinate dimension with a two-dimensional overlap area loss function /17/$ IEEE 1280 GlobalSIP 2017

2 Fig. 1: Pt-Net architecture. Our model (1) is firstly initialized by a pre-trained VGG-16 model and its own CNN output via a linear combination, (2) inputs an image to the optimally pre-trained CNNs, (3) aggregates the outputs of selected layers into multi-feature maps, (4) generates proposals on Conv5 feature map via particle filter method, (5) maps the proposals to multifeature maps and crops on the maps, (6) concatenates the cropped regions and (7) classifies and localizes via a novel overlap loss function for object detection task and face detection task. To summarize, our Pt-Net can achieve higher accuracy and lower time for both tasks of object detection and face detection. Our contributions are: Pre-trained VGG-16 network. Classical VGG-16 network architecture is always directly initialized from the ILSVRC which ignores the output of CNN features. So, we propose a linear combination of ImageNet caffemodel and CNN output via proportion parameters. Multi-feature architecture. As we know, higher layers represent more semantic information for better classification and lower layers represent more original image information for more precise localization. Hence, we adopt multi-feature maps and concatenation from different layers. Particle filter for proposals. Traditional proposals generation using SS or RPN is time-consuming or less precise. Therefore, we consider sampling proposals via particle filter according to object features which can realize faster and more accurate detection. A novel overlap loss function. We propose a novel overlap loss function which regresses the bounding box as a two-dimensional overlap area loss instead of four independent variables in one-dimensional coordinate so optimization has the unity of entirety Pre-trained model 2. METHODS The CNN architecture for detection needs much time to train parameters. Hence, we always initialize networks using pretrained VGG-16 caffemodel from ImageNet and then finetune CNN parameters. However, traditional methods directly use the pre-trained model and ignore its own output which means a lot. So we consider combining ImageNet caffemodel and CNN output via proportion parameters as following: F = C F 1 + (1 C) F 2 (1) where F 1 is the CNN output and F 2 is the pre-trained caffemodel. C N(µ, σ 2 ) which represents proportion parameter is a stochastic variable obeying Gaussian distribution. In actual scenes, we choose another means of expression for C: C = µ + σ e (2) where e N(0, 1) stands for standard Gaussian distribution. As a result, we can compute the 13 convolutional layers gradients of VGG-16 for SGD (stochastic gradient descent) [19] during back propagation passing by the addition part Multi-feature Most state-of-the-art detectors such as Faster R-CNN [6] only use the last convolutional layer output as feature map. However, the single layer feature map cannot represent perfectly both classification and localization information. As we know, lower layers have more original information for better localization and higher layers have more semantic information for classification [20, 21, 22]. To combine the advantages of both sides, we fuse the the output of Conv1, Conv2, Conv3 and Conv5 layers and then they are connected into multi-feature maps. After the proposals mapping to multi-feature maps, a 1 1 convolution is firstly applied to preserve the receptive field of 1281

3 the previous layer and reduce computation before the 3 3 and 5 5 convolution. Next, a max pooling layer for multi-feature maps and a 1 1 convolution adding concatenated rectified linear unit (C.ReLU) [23] which can reduce half of the computation are used to obtain object spatial localization and full context features [21]. Finally, all the parts are concatenated for more precise detection. The coordinates are related to each other, so we consider regressing the overlap area between the two boxes: x 1 = max(x 1, x 1 ) y 1 = max(y 1, ỹ 1 ) x 2 = min(x 2, x 2 ) y 2 = min(y 2, ỹ 2 ) I = (x 2 x 1) (y 2 y 1) U = (x 2 x 1 ) (y 2 y 1 ) + ( x 2 x 1 ) (ỹ 2 ỹ 1 ) I 2.3. Proposals generation Overlap loss = ln I U (4) Fast R-CNN [5] adopts SS which is a time-consuming step to get object proposals. Faster R-CNN [6] generates proposals by RPN which need a 3*3 sliding window on the feature map and then scores each anchors. In this paper, we propose a new method of generating proposals based on particle filter [24, 25]. Firstly, we divide Conv5 feature map into a 6*6 grid and generate proposals for 32*32 pixels at each grid cell center. Secondly, according to the generated proposals features, we can set particles around targets via Gaussian distribution which means that we put more particles z i close to ground truth Z and less particles away from it. Thirdly, we need to compute and normalize the similarity w i between proposals x and particles z i and choose the most similar proposals. Finally, we repeat for N times to update the original proposals x = ( x 1, ỹ 1, x 2, ỹ 2 ) and get the new ones x = (x 1, y 1, x 2, y 2 ) as following: w i = p( x z i ) = 1 exp [ ( x zi ) 2 2πσ 2σ 2 ] w i w i = N i=1 w i N x = w i x (3) 2.4. Overlap loss i=1 where I is the intersection area and U is the union area of proposals and ground truth boxes. During back propagation, we try to reduce the localization loss via SGD [19] algorithm and compute the gradients as follows: L = U I I U U I U 2 ( I = UI = U I I U UI ) U where is a positive part and is a negative part which means we try to enlarge the intersection area and diminish the union area for minimizing loss. The partial derivative of x (the same to y) is described as follows: x 1 = y 1 y 2 x 1 = y 1 y 2(x 1 > x 1 ) (5) = y 2 y 1 x 2 = y 2 y x 1(x 2 < x 2 ) (6) 2 3. EXPERIMENTAL RESULTS 3.1. Experimental setup For object detection We evaluate Pt-Net on PASCAL VOC 2007 [17] dataset which is the object detection benchmark and compare our results with state-of-the-art methods which are initialized by an ImageNet [1] pre-trained VGG-16 network. The mini-batch SGD in fine-tuning is set to 60k and the size of mini-batch is set to 10. We use 0.9 momentum and initialized learning rate. The learning rate is then decreased by a factor of 0.1 after a set of iterations such as when detecting a plain. Fig. 2: Overlap loss between proposal and ground truth box Faster R-CNN [6] depicts object bounding boxes with 4 coordinate variables which are optimized respectively via s- mooth L1 loss [6]. In Fig. 2, the purple predicting box can be defined as a 4-dimensional vector x = (x 1, y 1, x 2, y 2 ) and the red ground truth box can be defined as x = ( x 1, ỹ 1, x 2, ỹ 2 ) For face detection We train Pt-Net on WIDER FACE [26] dataset which includes 12,880 images and 159,424 faces in the training set. And our Pt-Net is evaluated on the FDDB [27] benchmark which contains 5,171 annotated faces in 2,845 images. A face bounding box is regarded as true positive if it has an Intersection over Union (IoU) larger than 0.5 with a face ground truth. 1282

4 Table 1: Detection results on VOC 2007 test set. All methods use train set (union of VOC07 trainval, VOC07 test, and VOC12 trainval) and VGG-16 network. PT: pre-trained, PF: particle filter (C=0.5), MF: multi-feature, OL: overlap loss. Method PT PF MF OL map aero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tv Faster [6] Ours[1] Ours[2] Ours[3] Ours[4] Ours[5] Table 2: Different proportion C of the pre-trained VGG-16 model and CNN output in the linear addition structure. C map (VOC2007) Object detection results Pre-trained model parameters Table 1 shows that Pt-Net can achieve a 76.8% map (mean Average Precision) based on the four proposed methods and is 3.6% map higher than Faster R-CNN [6]. If only adding the pre-trained method (C=0.5), our model can improve 1.0% map than baseline [6]. Table 2 shows the detection results of various proportion C. As we can see, C=0.5 can reach the best result which means pre-trained model and its own CNN output are both important. Higher or lower than 0.5, map is in a decreasing trend so we choose 0.5 in our framework Overlap loss and Multi-feature Fig. 3 shows the performances of smooth L1 and overlap loss based on Faster R-CNN [6]. As we can see from the result: (1) overlap loss can handle objects of different scales, (2) overlap loss can enclose objects more tightly than smooth L1 loss, (3) overlap loss can detect small objects better than smooth L1 loss where too small objects are ignored. In addition, overlap loss improves detection result by 1.0% map and 0.9% map for multi-feature compared to [6] in Table 1. Fig. 3: Bounding box regression for localization. Top: s- mooth L1 loss. Bottom: overlap loss. Fig. 4: ROC curves of state-of-the-art face detection methods on FDDB. Left: continuous scores; Right: discrete scores. Pt-Net with state-of-the-art detectors R-CNN, Fast R-CNN, Faster R-CNN via ROC curves on FDDB. The results shows that accuracy ranking is Pt-Net, Faster R-CNN, Fast R-CNN and R-CNN and the speed ranking is the same. Therefore, the proposed model gains both accuracy and speed based on the pre-trained and elaborately designed multi-feature CNNs. As we can see, the true positive rates of continuous and discrete scores are and at 1000 false positives respectively. The Pt-Net can achieve high true positive rate at less than 200 false positives. Our Pt-Net can run at a high speed of 43 fps (N=15) on images in VGA resolution with an NVIDIA GTX 1070p GPU and is potential to be applied in the real-time face detection system Detection speed Our Pt-Net with particle filter method of generating proposals can achieve 13 fps (frames per second) at N=40 iteration resampling times on an NVIDIA GTX 1070p GPU and VOC 2007 test set. Fast R-CNN [5] with SS method has a speed of 0.5 fps and Faster R-CNN [6] uses RPNs instead of SS at a speed of 7 fps so Pt-Net achieves a higher speed than both Face detection results Fig. 4 shows the comparisons among region-based face detection methods on FDDB [27] benchmark. We compare our 4. CONCLUSIONS In the paper, we have introduced Pt-Net detection architecture based on a pre-trained VGG-16 network for both object and face detection tasks. Our Pt-Net achieves a 76.8% map via four methods: (1) a linear combination of pre-trained model and CNN output, (2) multi-feature maps and concatenation from multiple layers, (3) generating proposals via particle filter method, (4) a novel overlap area loss function for localization. In a word, our Pt-Net can perform well both in speed and accuracy comparable to state-of-the-art detectors and can be applied well in many fields like object and face detection. 1283

5 5. REFERENCES [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp [2] D. Lowe, Distinctive image features from scaleinvariant keypoints, IJCV, vol. 60, no. 2, pp , [3] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in CVPR. IEEE, 2005, vol. 1, pp [4] R. Girshick, J. Donahueand T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in CVPR, 2014, pp [5] R. Girshick, Fast r-cnn, in ICCV, 2015, pp [6] S. Ren, K. He, R. Girshick, and J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in NIPS, 2015, pp [7] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, arxiv preprint arxiv: , [8] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, ssd: single shot multibox detector, arxiv preprint arxiv: , [9] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, A convolutional neural network cascade for face detection, in CVPR, 2015, pp [10] L. Huang, Y. Yang, Y. Deng, and Y. Yu, Densebox: Unifying landmark localization with end to end object detection, arxiv preprint arxiv: , [11] P. Dollár and C. L. Zitnick, Fast edge detection using structured forests, IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp , [12] S. Yang, P. Luo, C. C. Loy, and X. Tang, From facial parts responses to face detection: A deep learning approach, in ICCV, 2015, pp [13] J. Hosang, M. Omran, R. Benenson, and B. Schiele, Taking a deeper look at pedestrians, in CVPR, 2015, pp [14] J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, Scale-aware fast r-cnn for pedestrian detection, arxiv preprint arxiv: , [15] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, Selective search for object recognition, IJCV, vol. 104, no. 2, pp , [16] K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, in ECCV. Springer, 2014, pp [17] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, The pascal visual object classes (voc) challenge, IJCV, vol. 88, no. 2, pp , [18] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , [19] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and L. Jackel, Backpropagation applied to handwritten zip code recognition, Neural computation, vol. 1, no. 4, pp , [20] T. Kong, A. Yao, Y. Chen, and F. Sun, Hypernet: Towards accurate region proposal generation and joint object detection, arxiv preprint arxiv: , [21] S. Bell, C. L. Zitnick, K. Bala, and R. Girshick, Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks, arxiv preprint arxiv: , [22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in CVPR, 2015, pp [23] W. Shang, K. Sohn, D. Almeida, and H. Lee, Understanding and improving convolutional neural networks via concatenated rectified linear units, arxiv preprint arxiv: , [24] K. Nummiaro, E. Koller-Meier, and L. J. V. Gool, Object tracking with an adaptive color-based particle filter, Lecture Notes in Computer Science, vol. 2449, no. 02, pp , [25] E. Yang and M. Jeon, Object tracking with the level set method and the particle filtering, Lecture Notes in Computer Science, [26] S. Yang, P. Luo, C. C. Loy, and X. Tang, Wider face: A face detection benchmark, in CVPR, 2016, pp [27] V. Jain and E. G. Learned-Miller, Fddb: A benchmark for face detection in unconstrained settings, UMass Amherst Technical Report,

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

arxiv: v1 [cs.cv] 4 Jun 2015

arxiv: v1 [cs.cv] 4 Jun 2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks arxiv:1506.01497v1 [cs.cv] 4 Jun 2015 Shaoqing Ren Kaiming He Ross Girshick Jian Sun Microsoft Research {v-shren, kahe, rbg,

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Optimizing Object Detection:

Optimizing Object Detection: Lecture 10: Optimizing Object Detection: A Case Study of R-CNN, Fast R-CNN, and Faster R-CNN Visual Computing Systems Today s task: object detection Image classification: what is the object in this image?

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

arxiv: v1 [cs.cv] 5 Oct 2015

arxiv: v1 [cs.cv] 5 Oct 2015 Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential

More information

G-CNN: an Iterative Grid Based Object Detector

G-CNN: an Iterative Grid Based Object Detector G-CNN: an Iterative Grid Based Object Detector Mahyar Najibi 1, Mohammad Rastegari 1,2, Larry S. Davis 1 1 University of Maryland, College Park 2 Allen Institute for Artificial Intelligence najibi@cs.umd.edu

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

arxiv: v3 [cs.cv] 18 Oct 2017

arxiv: v3 [cs.cv] 18 Oct 2017 SSH: Single Stage Headless Face Detector Mahyar Najibi* Pouya Samangouei* Rama Chellappa University of Maryland arxiv:78.3979v3 [cs.cv] 8 Oct 27 najibi@cs.umd.edu Larry S. Davis {pouya,rama,lsd}@umiacs.umd.edu

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Pedestrian Detection based on Deep Fusion Network using Feature Correlation Pedestrian Detection based on Deep Fusion Network using Feature Correlation Yongwoo Lee, Toan Duc Bui and Jitae Shin School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South

More information

Object Detection and Its Implementation on Android Devices

Object Detection and Its Implementation on Android Devices Object Detection and Its Implementation on Android Devices Zhongjie Li Stanford University 450 Serra Mall, Stanford, CA 94305 jay2015@stanford.edu Rao Zhang Stanford University 450 Serra Mall, Stanford,

More information

Real-time object detection towards high power efficiency

Real-time object detection towards high power efficiency Real-time object detection towards high power efficiency Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang Tsinghua University,

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

arxiv: v1 [cs.cv] 15 Oct 2018

arxiv: v1 [cs.cv] 15 Oct 2018 Instance Segmentation and Object Detection with Bounding Shape Masks Ha Young Kim 1,2,*, Ba Rom Kang 2 1 Department of Financial Engineering, Ajou University Worldcupro 206, Yeongtong-gu, Suwon, 16499,

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector SSD: Single Shot MultiBox Detector Wei Liu 1(B), Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, and Alexander C. Berg 1 1 UNC Chapel Hill, Chapel Hill, USA {wliu,cyfu,aberg}@cs.unc.edu

More information

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster

More information

An algorithm for highway vehicle detection based on convolutional neural network

An algorithm for highway vehicle detection based on convolutional neural network Chen et al. EURASIP Journal on Image and Video Processing (2018) 2018:109 https://doi.org/10.1186/s13640-018-0350-2 EURASIP Journal on Image and Video Processing RESEARCH An algorithm for highway vehicle

More information

arxiv: v1 [cs.cv] 9 Aug 2017

arxiv: v1 [cs.cv] 9 Aug 2017 BlitzNet: A Real-Time Deep Network for Scene Understanding Nikita Dvornik Konstantin Shmelkov Julien Mairal Cordelia Schmid Inria arxiv:1708.02813v1 [cs.cv] 9 Aug 2017 Abstract Real-time scene understanding

More information

arxiv: v1 [cs.cv] 19 Feb 2019

arxiv: v1 [cs.cv] 19 Feb 2019 Detector-in-Detector: Multi-Level Analysis for Human-Parts Xiaojie Li 1[0000 0001 6449 2727], Lu Yang 2[0000 0003 3857 3982], Qing Song 2[0000000346162200], and Fuqiang Zhou 1[0000 0001 9341 9342] arxiv:1902.07017v1

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Gated Bi-directional CNN for Object Detection

Gated Bi-directional CNN for Object Detection Gated Bi-directional CNN for Object Detection Xingyu Zeng,, Wanli Ouyang, Bin Yang, Junjie Yan, Xiaogang Wang The Chinese University of Hong Kong, Sensetime Group Limited {xyzeng,wlouyang}@ee.cuhk.edu.hk,

More information

Adaptive Object Detection Using Adjacency and Zoom Prediction

Adaptive Object Detection Using Adjacency and Zoom Prediction Adaptive Object Detection Using Adjacency and Zoom Prediction Yongxi Lu University of California, San Diego yol7@ucsd.edu Tara Javidi University of California, San Diego tjavidi@ucsd.edu Svetlana Lazebnik

More information

arxiv: v1 [cs.cv] 3 Apr 2016

arxiv: v1 [cs.cv] 3 Apr 2016 : Towards Accurate Region Proposal Generation and Joint Object Detection arxiv:64.6v [cs.cv] 3 Apr 26 Tao Kong Anbang Yao 2 Yurong Chen 2 Fuchun Sun State Key Lab. of Intelligent Technology and Systems

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 CS 1674: Intro to Computer Vision Object Recognition Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 Different Flavors of Object Recognition Semantic Segmentation Classification + Localization

More information

arxiv: v1 [cs.cv] 6 Jul 2017

arxiv: v1 [cs.cv] 6 Jul 2017 RON: Reverse Connection with Objectness Prior Networks for Object Detection arxiv:1707.01691v1 [cs.cv] 6 Jul 2017 Tao Kong 1 Fuchun Sun 1 Anbang Yao 2 Huaping Liu 1 Ming Lu 3 Yurong Chen 2 1 State Key

More information

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION

HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION HIERARCHICAL JOINT-GUIDED NETWORKS FOR SEMANTIC IMAGE SEGMENTATION Chien-Yao Wang, Jyun-Hong Li, Seksan Mathulaprangsan, Chin-Chin Chiang, and Jia-Ching Wang Department of Computer Science and Information

More information

Traffic Multiple Target Detection on YOLOv2

Traffic Multiple Target Detection on YOLOv2 Traffic Multiple Target Detection on YOLOv2 Junhong Li, Huibin Ge, Ziyang Zhang, Weiqin Wang, Yi Yang Taiyuan University of Technology, Shanxi, 030600, China wangweiqin1609@link.tyut.edu.cn Abstract Background

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks

Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks Xuepeng Shi 1,2 Shiguang Shan 1,3 Meina Kan 1,3 Shuzhe Wu 1,2 Xilin Chen 1 1 Key Lab of Intelligent Information Processing

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

Improved Face Detection and Alignment using Cascade Deep Convolutional Network Improved Face Detection and Alignment using Cascade Deep Convolutional Network Weilin Cong, Sanyuan Zhao, Hui Tian, and Jianbing Shen Beijing Key Laboratory of Intelligent Information Technology, School

More information

arxiv: v2 [cs.cv] 8 Apr 2018

arxiv: v2 [cs.cv] 8 Apr 2018 Single-Shot Object Detection with Enriched Semantics Zhishuai Zhang 1 Siyuan Qiao 1 Cihang Xie 1 Wei Shen 1,2 Bo Wang 3 Alan L. Yuille 1 Johns Hopkins University 1 Shanghai University 2 Hikvision Research

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Improving Small Object Detection

Improving Small Object Detection Improving Small Object Detection Harish Krishna, C.V. Jawahar CVIT, KCIS International Institute of Information Technology Hyderabad, India Abstract While the problem of detecting generic objects in natural

More information

arxiv: v1 [cs.cv] 26 May 2017

arxiv: v1 [cs.cv] 26 May 2017 arxiv:1705.09587v1 [cs.cv] 26 May 2017 J. JEONG, H. PARK AND N. KWAK: UNDER REVIEW IN BMVC 2017 1 Enhancement of SSD by concatenating feature maps for object detection Jisoo Jeong soo3553@snu.ac.kr Hyojin

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

Mimicking Very Efficient Network for Object Detection

Mimicking Very Efficient Network for Object Detection Mimicking Very Efficient Network for Object Detection Quanquan Li 1, Shengying Jin 2, Junjie Yan 1 1 SenseTime 2 Beihang University liquanquan@sensetime.com, jsychffy@gmail.com, yanjunjie@outlook.com Abstract

More information

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention

Ventral-Dorsal Neural Networks: Object Detection via Selective Attention Ventral-Dorsal Neural Networks: Object Detection via Selective Attention Mohammad K. Ebrahimpour UC Merced mebrahimpour@ucmerced.edu Jiayun Li UCLA jiayunli@ucla.edu Yen-Yun Yu Ancestry.com yyu@ancestry.com

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Hand Detection For Grab-and-Go Groceries

Hand Detection For Grab-and-Go Groceries Hand Detection For Grab-and-Go Groceries Xianlei Qiu Stanford University xianlei@stanford.edu Shuying Zhang Stanford University shuyingz@stanford.edu Abstract Hands detection system is a very critical

More information

Detection and Localization with Multi-scale Models

Detection and Localization with Multi-scale Models Detection and Localization with Multi-scale Models Eshed Ohn-Bar and Mohan M. Trivedi Computer Vision and Robotics Research Laboratory University of California San Diego {eohnbar, mtrivedi}@ucsd.edu Abstract

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation

Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation Md Atiqur Rahman and Yang Wang Department of Computer Science, University of Manitoba, Canada {atique, ywang}@cs.umanitoba.ca

More information

DeepBox: Learning Objectness with Convolutional Networks

DeepBox: Learning Objectness with Convolutional Networks DeepBox: Learning Objectness with Convolutional Networks Weicheng Kuo Bharath Hariharan Jitendra Malik University of California, Berkeley {wckuo, bharath2, malik}@eecs.berkeley.edu Abstract Existing object

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

arxiv: v1 [cs.cv] 15 Aug 2018

arxiv: v1 [cs.cv] 15 Aug 2018 SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection arxiv:88.97v [cs.cv] 5 Aug 8 Yonghyun Kim [ 8 785], Bong-Nam Kang [ 688 75], and Daijin Kim [ 86 85] Department

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

arxiv: v3 [cs.cv] 17 May 2018

arxiv: v3 [cs.cv] 17 May 2018 FSSD: Feature Fusion Single Shot Multibox Detector Zuo-Xin Li Fu-Qiang Zhou Key Laboratory of Precision Opto-mechatronics Technology, Ministry of Education, Beihang University, Beijing 100191, China {lizuoxin,

More information

Faster R-CNN Implementation using CUDA Architecture in GeForce GTX 10 Series

Faster R-CNN Implementation using CUDA Architecture in GeForce GTX 10 Series INTERNATIONAL JOURNAL OF ELECTRICAL AND ELECTRONIC SYSTEMS RESEARCH Faster R-CNN Implementation using CUDA Architecture in GeForce GTX 10 Series Basyir Adam, Fadhlan Hafizhelmi Kamaru Zaman, Member, IEEE,

More information

Defect Detection from UAV Images based on Region-Based CNNs

Defect Detection from UAV Images based on Region-Based CNNs Defect Detection from UAV Images based on Region-Based CNNs Meng Lan, Yipeng Zhang, Lefei Zhang, Bo Du School of Computer Science, Wuhan University, Wuhan, China {menglan, yp91, zhanglefei, remoteking}@whu.edu.cn

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

FCHD: A fast and accurate head detector

FCHD: A fast and accurate head detector JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 FCHD: A fast and accurate head detector Aditya Vora, Johnson Controls Inc. arxiv:1809.08766v2 [cs.cv] 26 Sep 2018 Abstract In this paper, we

More information

arxiv: v3 [cs.cv] 18 Nov 2014

arxiv: v3 [cs.cv] 18 Nov 2014 Do More Drops in Pool 5 Feature Maps for Better Object Detection Zhiqiang Shen Fudan University zhiqiangshen13@fudan.edu.cn Xiangyang Xue Fudan University xyxue@fudan.edu.cn arxiv:1409.6911v3 [cs.cv] 18

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

arxiv: v1 [cs.cv] 24 May 2016

arxiv: v1 [cs.cv] 24 May 2016 Dense CNN Learning with Equivalent Mappings arxiv:1605.07251v1 [cs.cv] 24 May 2016 Jianxin Wu Chen-Wei Xie Jian-Hao Luo National Key Laboratory for Novel Software Technology, Nanjing University 163 Xianlin

More information

Detecting Faces Using Inside Cascaded Contextual CNN

Detecting Faces Using Inside Cascaded Contextual CNN Detecting Faces Using Inside Cascaded Contextual CNN Kaipeng Zhang 1, Zhanpeng Zhang 2, Hao Wang 1, Zhifeng Li 1, Yu Qiao 3, Wei Liu 1 1 Tencent AI Lab 2 SenseTime Group Limited 3 Guangdong Provincial

More information

arxiv: v1 [cs.cv] 12 Apr 2016

arxiv: v1 [cs.cv] 12 Apr 2016 CRAFT Objects from Images Bin Yang 1 Junjie Yan 2 Zhen Lei 1 Stan Z. Li 1 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 2 Tsinghua University arxiv:1604.03239v1

More information

arxiv: v1 [cs.cv] 31 Mar 2017

arxiv: v1 [cs.cv] 31 Mar 2017 End-to-End Spatial Transform Face Detection and Recognition Liying Chi Zhejiang University charrin0531@gmail.com Hongxin Zhang Zhejiang University zhx@cad.zju.edu.cn Mingxiu Chen Rokid.inc cmxnono@rokid.com

More information

arxiv: v2 [cs.cv] 1 Oct 2014

arxiv: v2 [cs.cv] 1 Oct 2014 Deformable Part Models are Convolutional Neural Networks Tech report Ross Girshick Forrest Iandola Trevor Darrell Jitendra Malik UC Berkeley {rbg,forresti,trevor,malik}@eecsberkeleyedu arxiv:14095403v2

More information

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material-

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material- Single-Shot Refinement Neural Network for Object Detection -Supplementary Material- Shifeng Zhang 1,2, Longyin Wen 3, Xiao Bian 3, Zhen Lei 1,2, Stan Z. Li 4,1,2 1 CBSR & NLPR, Institute of Automation,

More information

Object Detection in Sports Videos

Object Detection in Sports Videos Object Detection in Sports Videos M. Burić, M. Pobar, M. Ivašić-Kos University of Rijeka/Department of Informatics, Rijeka, Croatia matija.buric@hep.hr, marinai@inf.uniri.hr, mpobar@inf.uniri.hr Abstract

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

Face Detection with the Faster R-CNN

Face Detection with the Faster R-CNN Face Detection with the Huaizu Jiang University of Massachusetts Amherst Amherst MA 3 hzjiang@cs.umass.edu Erik Learned-Miller University of Massachusetts Amherst Amherst MA 3 elm@cs.umass.edu Abstract

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

Robust Object detection for tiny and dense targets in VHR Aerial Images

Robust Object detection for tiny and dense targets in VHR Aerial Images Robust Object detection for tiny and dense targets in VHR Aerial Images Haining Xie 1,Tian Wang 1,Meina Qiao 1,Mengyi Zhang 2,Guangcun Shan 1,Hichem Snoussi 3 1 School of Automation Science and Electrical

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

arxiv: v2 [cs.cv] 28 May 2017

arxiv: v2 [cs.cv] 28 May 2017 Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection Xianzhi Du 1, Mostafa El-Khamy 2, Jungwon Lee 2, Larry S. Davis 1 arxiv:1610.03466v2 [cs.cv] 28 May 2017 1 Computer

More information