2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN:

Size: px
Start display at page:

Download "2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN:"

Transcription

1 2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN: Aircraft Detection in Remote Sensing Images via CNN Multi-scale Feature Representation Jia-qi WANG 1, Xin NIU 1, Peng ZHANG 1, Yong DOU 1 and Fei XIA 2 1 National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha City, Hunan Province, China 2 Institute of Electronic Information Warfare, Naval University of Engineering, Wuhan City, Hubei Province, China Keywords: Aircraft detection, Remote sensing, Single Shot Detection (SSD), CNN. Abstract. Aircraft detection in remote sensing images is an intractable challenge. The current aircraft detection methods have limited representative capabilities and heavy computational costs. This paper studies how to apply multi-scale feature representation of Convolutional Neural Networks (CNN) to aircraft detection by qualitatively and quantitatively analyzing the performance of Single Shot Detection (SSD) approach. At first, we find that low-level detectors are not robust enough to detect as the semantic gap issue. Therefore we propose a data driven hyper-parameter selection method to alleviate this problem by determining appropriate hyper-parameters of sliding window and default box shape. Besides, we employ a multi-scale training strategy to enhance low-level predictive detectors. Finally, we propose an accurate and efficient aircraft detection framework. Experimental results illustrate that our method could achieve 96.84% AP at 20 FPS on NVIDIA TITAN X. Compared with original SSD method, our proposed approach achived 2.13% AP improvement. Introduction With the rapid development of remote sensing technologies over the past decade, object detection in very high resolution (VHR) remote sensing images has gained sustained interest, particularly aircraft detection, due to the high value of research and application both in military and civilian areas. Aircraft detection can be viewed as a task consists of both the object location and object recognition, which is a typical large-range and multi-scale object detection problem. As shown in Figure 1, the increased size and spatial resolution of remote sensing make aircraft detection in large-size VHR image (length of side ranges from 3000 to pixels and covers an area of 1-8 km 2 ) an intractable challenge: the complex and cluttered background is more misleading. Besides, aircraft detection usually suffers from huge intra-class variability in visual appearance and object scale, which restricts the improvement of detection performance. The conventional studies for machine learning-based aircraft detection is mainly composed of hand-craft feature extraction and classifier training. These methods exploited low-level features such as the shape feature, texture feature, or local image feature (e.g., histogram of oriented gradients (HOG) and scale-invariant feature transform (SIFT)). For example, Liu et al. [1] proposed a coarse-to-fine process which integrates shape prior and region information to detect aircraft. Polat et al. [2] developed a learning-based aircraft detection system using Gabor features and support vector machine (SVM). Generally, feature representation of object take on a crucial role for the detection task. However, feature engineering heavily relies on professional experience and prior knowledge. Meanwhile, the hand-craft feature has limited capacity in high-level semantic information as well as generalization ability. Recently, deep convolutional neural networks (CNNs) based object detection methods have been the most prevalent solution for general object in nature scene images, which are able to automatically extract hierarchical and discriminative feature representation from low-level abstraction (e.g., edge and texture) to high-level abstraction (e.g., class information) [3] by end-to-end learning. According to whether creating proposal areas, current state-of-the-art deep ConvNet object detectors can be divided into two main forms: series of region based detection methods with R-CNN [4] as a 186

2 representative, and detectors without yielding region proposals such as YOLO [5] and SSD [6]. R-CNN adopted selective search [7] to generate region proposals before running classification with a ConvNet. Fast R-CNN [8] and Faster R-CNN [9] speeded up R-CNN by directly producing proposals on the high-level convolutional feature map and using region of interest (RoI) pooling to share ConvNet forward computation. Zhang et al. [10] proposed a coupled CNN method, which combined a candidate region proposal network and a localization network to extract the proposals and simultaneously locate the aircraft in large-size VHR images. Zhang et al. [11] developed a two-stage algorithm to detect building with a sliding windows approach and a CNN classifier. Despite ConvNets feature representation is richer in semantic information as well as more robust to variance in scale, detecting objects at very different appearances and scales in VHR remote sensing images remains a great challenge, as the contradiction of both keeping high semantic information and high-resolution levels among convolutional feature layers. To overcome this obstacle, a series of recent works improve R-CNN by combining features from multiple layers before making prediction or combining predictions from different layers in a ConvNet. These approaches include Hypercolumns [12], HyperNet [13], ION [14], SDP CNN [15], and MS-CNN [16]. Similarly, Tang et al. [17] proposed a hyper region proposal network (HRPN) to extract vehicle-like targets with a combination of hierarchical feature maps, and thus successfully utilized Faster R-CNN for vehicle detection in VHR remote sensing images. Although improved Faster R-CNN methods that combine multiple layers have relieved the conflict between feature spatial resolution and feature semantic information, these approaches have been too computationally intensive for large-size VHR remote sensing images. In fact, a deep ConvNet computes a feature hierarchy layer by layer with subsampling layers, which has an inherent multi-scale feature representation. Liu et al. [6] presented Single Shot Detection (SSD) without resampling pixels or features for bounding box hypotheses, and it performs better than Faster R-CNN in the balance of accuracy and speed. SSD method is one of the first attempts at using a ConvNet s pyramid feature hierarchy as if it were a featured image pyramid [18], thus it naturally leverages the multi-scale feature representations of a ConvNet s feature hierarchy, and therefore it is suited to multi-scale object detection, especially small object. Due to the huge difference between VHR remote sensing images and nature scene images, directly employing SSD for aircraft detection in such large-size image faces severe challenges: (1) aircraft (length of side ranges from 25 to 480 pixels) in VHR remote sensing images (range from 3000 to pixels) are relatively smaller than those in nature scene images, thus increasing the difficult of task, especially for the lower-level predictive layers (e.g., conv4_3, conv7) to detect small objects;(2) As shown in Figure 4, the aircraft category contains various kinds of airplanes (e.g., fighter, helicopter, transport aircraft), thus the detection task encounters huge intra-class variability in visual appearance and object scale. In this paper, we qualitatively and quantitatively analyze the performance of Single Shot Detection (SSD) applied to aircraft detection in VHR remote sensing images, and investigate the semantic gap problem that exists in multi-scale feature representation detection method. Based on these findings, we propose a Data Driven Hyper-parameter Selection method to tune parameters of sliding window and default box using prior statistics information. Besides, we propose Multi-scale Training as a data augment method to mainly enhance lower-level predictive layer detectors by more small scale training data. Finally, we propose an accurate and efficient aircraft detection framework (see Figure 1). Experiments demonstrate that our method could achieve 96.84% AP at 20 FPS using NVIDIA TITAN X. In comparison with original SSD, 2.13% AP improvement could be reaped. In section 2, we detail the aircraft detection methodology. In Section 3, experiments and analysis are shown. Finally, we conclude this paper in section

3 Figure 1. Proposed framework. Training/testing samples are cropped by sliding window. SSD is trained to detect different scale of objects with corresponding predictive layer detector. Then Non Maximum Suppression (NMS) is used per category to redundant boxes. Finally the framework generates predictions of category and location. Figure 2. Score maps of predicted objects, whicn are the most activated layers among all predictive layers. As for the two-class classification, there are two score maps constructing a class predictor, denoting background and aircraft respectively. In SSD, the lower left airplane is responsed strongest by layer 2, therefore its location in layer 2 score map is highlight. However, the large transport airplanes only activate higher predicted layer (in this case is layer 3). Proposed Methods The framework of our aircraft detection method is illustrated in Figure 1. For training, we construct multi-scale training dataset by cropping original large-size images into patches using two different scales sliding windows. Then, the correspondence between ground truth and the default boxes is established by an improved matching strategy we proposed. Finally, SSD network takes all the training image patches as input for training, and each predictive layer produces a fixed set of detection predictions using a set of convolutional filters. Note that a detector corresponds to a certain aspect ratio default box from one of six predictive layers, and it is composed of a group of filters for scoring confidence and regressing offset to default box. That is to say, SSD totally has 33 detectors (3+5*6), each independent detector contains 4 filters to predict offsets of default box locations (i.e., x, y, width, height) and class number filters to predict category scores. As a matter of fact, SSD can be considered as a class-specific RPN [19]. It means a kind of fully convolutional strategy that replaces the 2-class (object or not) fully-connected(fc) classifier layer of RPN with a multi-class convolutional classifier layer. For testing, a large-size VHR remote sensing image is cropped into image patches. Then, SSD takes these image patches as input, predicts scores for the presence of each object category in each default box and produces shape offsets to accumulate with prior default box per feature map location. Finally, Non Maximum Suppression (NMS) is applied on these predictions for each category to generate final output bounding boxes. We can regard the process of SSD prediction as traveling each location of each predictive layer feature map using different default box s detector. The number of predictions is altogether affected by feature map size, default boxes number and categories number. For instance, SSD300 outputs 7308 (38*38*3+(19*19+10*10+5*5+3*3+1)*6) predictions per 188

4 category. By combining predictions for all default boxes with different scales and aspect ratios from all locations of many feature maps, we have a diverse set of predictions, covering various input object sizes and shapes. As shown in Figure 2, different scale of aircraft responses corresponding scale predictive layer. Data Driven Hyper-parameter Selection Sliding Window. SSD is a kind of fully convolutional neural network, which can theoretically take arbitrary size image as input. For nature scene images, generally the original images are resized to fixed size (e.g., 300*300 or 500*500) for a trade-off between accuracy and speed. Whereas, due to the limitation of GPU memory, it is hard to process a whole VHR remote sensing image (side length from 3000 to pixels) in a deep convolutional neural network. Scanning the original image with sliding window to crop large-size into small patches is a feasible approach to tackle this problem, but few studies account for how to select hyper-parameters (i.e., sliding window size, stride) in VHR remote sensing images. According to the statistical analysis of training samples as well as the characteristics of SSD model, we propose some principles of hyper-parameter selection. As for the size of sliding window, its lower bound should larger than the length of longest side among the ground truth bounding boxes. Under the hypothesis of identically distribution between training data set and testing data set, we can gain the prior information of bounding box size by calculating statistic in training data set. Figure 3 (a) shows the distribution of side length ranges from 25 pixels to 470 pixels, which focus on 25 pixels to 200 pixels. Meanwhile, in consideration of GPU memory capacity, the size of sliding window can t be too large. In order to make a trade-off between accuracy and speed, we usually compress original image into a fixed size (e.g., SSD300), thus saves GPU memory and computational overhead. Experiment in Section 3 shows that the compression ratio is better no more than 1/3. Reasonable selection of sliding window is directly relating to the detection speed and recall rate for a whole VHR remote sensing image: excessively short stride generates a mass of overlaps, thus leads to redundant and computationally expensive convolutional operations; On the other side, too long stride may miss some aircrafts that are at the edge of sliding window. To find a suitable stride for certain data set, we propose a formula to determine the range of stride as follow: Supper bound Lwin LmaxLen (1) where S upper bound denotes upper bound of stride, L win is length of sliding window, L maxlen is max length of longer bounding box side. Based on the aforementioned principles, we can conclude that as long as the GPU memory capacity enough, the larger the sliding window the better. Take SSD300 as an example, the suitable sliding window size is 900*900, with a stride 400. Scale and Aspect Ratio. SSD applied default boxes with different shapes to several feature maps of different scale predictive layers, so the default box can be understood as a kind of multi-scale anchor box used in RPN (Region Proposal Network). At training phase we need to establish the correspondence between the ground truth and the default boxes by a matching strategy, making different scale predictive layers train/predict corresponding objects possible. The original paper of SSD proposed an one-to-many mapping strategy to ensure that each ground truth has at least one matched default box. Note that the positive training samples are matched default boxes but not the original ground truth. Original SSD designed the tiling so that specific feature map locations learn to be responsive to specific areas of the image and particular scales of the objects [6]. The scale of the default boxes for each predictive feature map is computed as follow: smax smin sk smin k k m m 1 1, 1, (2) 189

5 where s min is 0.2 and s max is 0.95, meaning the lowest layer has a scale of 0.2 and the highest layer has a scale of 0.95 (the proportion relative to input size). Accordingly, Liu et al. empirically imposed different aspect ratios for the defaults boxes, and denotes them as a r {1, 2, 3, 1/2, 1/3}. Figure 3. Distribution of ground truth bounding boxes width and height. Yet in practice, different datasets have their own characteristics, it is more reasonable to design a distribution of default boxes to best fit a specific dataset. Therefore, we propose a Data Driven Hyper-parameter Selection method to fine tune parameters of scale and aspect ratio. Firstly, we collect and process statistic data about side length and aspect ratio, analyzing the distribution of aspect ratio. As show in Figure 3 (a), the scatter chart denotes the width and height distribution of all ground truth bounding boxes. We use least square method to fit the slope of width and height over all points, which is plotted by the blue line. The fittest slope is 0.971, which is very closed to 1. The lower right black line is lower bound of aspect ratio (0.667), and the upper left black line is upper bound of aspect ratio (2.13). In Figure 3 (b), the distribution of bounding boxes aspect ratio also reflect similar rule: the variation range of aspect ratio is very narrow (0.6~2.2), which mainly focus on 1:1. According to the statistics of our dataset, we can specific hyper-parameters of aspect ratio as {1, 13/10, 2, 10/13, 2}. The mapping strategy of SSD at training phase makes it possible to leverage multi-scale feature representation and train different predictive layer with corresponding scale objects. However, the empirical scale factors may not suitable for other dataset. For example, our aircraft detection dataset contains quantities of small bounding box. If we still use original scale parameters, the basic size of default box in first layer is 90 (input size 900, scale factor is 0.1), then most of ground truth bounding boxes will be matched with default box of first layer. This phenomenon makes higher predictive layer lack training data, eventually it harms the total detection performance. In order to match default boxes with ground truth more balanced, we propose an algorithm to calculate scale factors for all predictive layers with a given dataset. As algorithm 1 shown, given training dataset and SSD model, our proposed method will calculate exact scale factor for each predictive layer. The compression ratio depends on both input size and number of subsampling. Take SSD300 using 900*900 input image as an example, the compression ratio of first predictive layer (conv4_3) is 24, which is caused by resizing 900 to 300 as well as 3 times half pooling operations. By that analogy, the scale factor of higher layers are 48, 96, 192, 384 and 768. Our method then projects regions of bounding boxes to six predictive layers using scale factor perspectively and checks whether they have suitable feature resolution (1*1~3*3). Note that the upper bound of maximum depends on the kernel size (e.g., 3*3). Finally, we divide median of each predictive layer s bounding box side length by original input size and gain scale factor. 190

6 According to the Experiment in Section 3, our proposed method is able to determine scale parameters according to a given dataset, which makes more predictive layers well-trained and leverages the advantages of multi-scale feature representation. Multi-Scale Training The original SSD used a fixed input resolution of 300*300 (or 500*500), making the single-scale training data insufficient to train multi-scale detectors over multiple predictive layers. Besides, the data augmentation of original SSD has a drawback that the randomly sampled image patches can only be up sampling, which may lead to insufficient training in lower level predictive layer detectors. Instead of fixing the input image size, we employ two sizes of sliding windows in train phase to build multi-scale training dataset. In consider of the aforementioned principles that select sliding hyper-parameters, we choose 900*900 and 1500*1500 sliding windows for SSD500 training phase. Experiment below shows that this improvement can make Average Precision (AP) raise by 1.5%. Experiments In this section, we qualitatively and quantitatively analyze the performance of Single Shot Detection (SSD) applied to aircraft detection in VHR remote sensing images, provide experimental basis for our proposed methods, and finally prove our proposed detection frame is effective and feasible. Our experiments were implemented based on the deep learning framework Caffe [20], running on a server with Intel Xeon E CPU and a NVIDIA TITAN X (Maxwell architecture) GPU. The operating system was Ubuntu LTS. Dataset We tested our method on a collection of level-20 Google Earth VHR remote sensing images with 0.27m ground sample distance (GSD), which means one pixel in image equals 0.27 meters length on the land. There are 140 VHR remote sensing images whose size are ranging from 3000*3000 to 10000*10000 pixels, which are acquired from 68 civil and military airports all over the world. We manually labeled the aircrafts using ENVI software. As shown in Figure 4, our dataset contains various kinds of aircraft, such as fighter, helicopter, bomber, transport plane, AWACS (Airborne Warning And Control System), etc. In order to impartially validate the detection performance and generalization of the trained model, we divided the total VHR remote sensing dataset into training dataset (95 images) and testing dataset (45 images). Then we scanned the original large-size images with 900*900 and 1500*1500 sliding windows to generate plenty of training samples. A total of

7 sliding windows with aircraft objects were used as a training dataset. A total of 2612 sliding windows with 6856 aircraft objects were used as a testing dataset, which was generated by a single 900*900 sliding window. Figure 4. Various kinds of aircraft in our dataset. SSD300 VS. Faster R-CNN. In order to demonstrate the advantages of multi-scale feature representation, we compare the detection performance between SSD300 and Faster R-CNN. Figure 5 shows the Precision-Recall curve of Faster R-CNN and original SSD300 respectively. Both methods have comparable recall rate, but Faster R-CNN performs weakly in precision. Combined with analysis of Figure 6, we found that this phenomenon may be caused by predictive confusion: multi-scale samples were trained in a single scale representation layer. For example, Figure 6 first row shows that Faster R-CNN confuses overall aircraft with its tail; the model also confuses small airplane with airplane-like background. Table 1 shows the comparison between Faster R-CNN and SSD300, SSD outperforms the Faster R-CNN both in accuracy and speed. Figure 5. Precision-Recall curve of Faster R-CNN and SSD. 192

8 Figure 6. Detection results of Faster R-CNN and SSD300. Red, green, yellow colors represent high confidence (> 0.5), low confidence prediction (< 0.5) and ground truth respectively. Table 1. Detection performance comparison. Method Average Precision(AP) FPS (Frame per Second) Faster R-CNN 87.84% 6 SSD300 (Original) 92.49% 47 SSD300 (HPS) 93.24% 47 SSD500 (Original) 94.71% 20 SSD500(HPS) 95.34% 20 SSD500(HPS+MST) 96.84% 20 Input Resolution VS. Semantic Information. SSD leverages CNN inherent multi-scale feature representation to enhance detection performance of multiple scale objects especially small objects. This in-network feature hierarchy produces feature maps of different spatial resolutions, but introduces large semantic gaps [18] caused by different depths. It means that high-resolution maps have low-level semantic information that harm their representational capacity for object recognition. To fill this gap, the original author advised enlarging the input resolution for small object dataset such as MS COCO. However, few people explain why SSD500 outperforms SSD300. In this experiment we will investigate which factor dominates the improvement of the SSD method. We trained/tested SSD300 and SSD500 on the same resolution (900*900) training dataset, the only distinguish is their compression ratios are different: SSD300 is 1/3 and SSD500 is 5/9. Thereby, the input resolution of SSD500 is higher than SSD300. We calculated total Average Precision and each predictive layer s AP, finding changing trend between two kinds of SSD architecture. As shown in Figure 7, the overall AP increases from 92.49% to 94.71%, but enlarging the input resolution can t improve the performance of the low-layer detectors. Higher-level predictive detectors with richer semantic information play a significant role in the improvement of performance. Otherwise, detectors with high feature resolution but weak semantic information seem not robust enough for challenging objects detection. 193

9 Figure 7. P-R curve of SSD300 and SSD500. Preliminary experiments above shows that lower layers detectors may not perform well as the existence of semantic gap, we conjecture that under the condition of ensuring ground truth has enough feature resolution (at least 1*1) on the feature map, the higher layer s detector is trained/tested, the better detection performance SSD has. To verify this hypothesis, we conduct the following experiment. We use low resolution images (900*900 compress into 300*300) training SSD300 model, then testing it with multiply resolution images(900, 600, 400), observing the distribution of predicted object number which belongs to corresponding layer. As shown in Figure 8, the same fine tuned SSD model is applied to three different resolution testing datasets. The top peak of AP is tested in 600*600 resolution dataset. Particularly, the number of predicted objects in layer 2 significantly increases from 27% (SSD300_900) to 41% (SSD300_600), which indicates that taking fully advantege of higher predictive layer helps SSD improve its performance. With the input resolution increasing to 300*300, the AP decreases sharply. Figure 8. Test SSD300 with multiple resolution. Data Driven Hyper-parameter Selection. Inspired by the former experiment, we propose a data driven method to fine tune hyper-parameter including aspect radios and scale factor per predictive layer. The core idea of our method is trying to match ground truth bounding boxes with default boxes from multiple layers evenly by a set of fine tuned hyper-parameters, which is able to avoid too many ground truth only matching with low-level predictive layer while higher level predictive layer lack training samples. Compare with mapping strategy that uses empirical hyper-parameters, our method can help us sufficiently train higher level predictive detectors with more matched samples. Figure

10 shows comparison detection performance between original SSD300 and optimized hyper-parameter SSD300. Figure 9 (a) (b) shows the histogram of predicted objects ratio over six predictive layers.it is obvious that optimized hyper-parameter SSD300 leverage more balanced predictive layer to detect objects. Meanwhile, the Precision-Recall curve demonstrates that our method works effectively, raising AP by 1%. Figure 9. Comparison original SSD with Hyper-parameter Selection SSD. Multi-scale Training. In experiment 2, we have demonstrated that low level feature representation may not robust enough to detect a wide range scale of objects, and feature representation with more semantic information mainly contributes the detection improvement. In experiment 3, we propose a data driven method to fine tune hyper-parameters of scale and aspect ratio, which improves SSD mapping strategy. As a matter of fact, the proposed method makes a trade-off: it forces the SSD to detect objects with relatively higher predictive layer which has richer semantic information in spite of lower feature spatial resolution. In order to strengthen training of low-level detectors, multi-scale training is very necessary. We utilize two resolution training samples (900*900, 1500*1500) and compress them into 500*500 to train SSD500. When in the test phase, only 900*900 patches are used to detect aircraft. Figure 10 shows the prominent result for the aircraft detection: AP rises to ; both recall and precision stay at very high level. Figure 10. Comparison among SSD500_origina, SSD500_ Hyper-parameter Selection and SSD500_ Hyper-parameter Selection+Multi-scale Training. 195

11 Summary In this paper, we explore how to apply state-of-the-art SSD detection method to the aircraft detection in VHR remote sensing images. We find out that there is semantic gap in the multi-scale feature representation, and propose a data driven method to make multiple predictive layer trained more balanced and sufficient. Besides, we employ multi-scale training to enhance low-level predictive layers. Finally, we demonstrate our aircraft detection framework using CNN multi-scale feature representation is accuracy and efficient. Inspired by newest detection methods employing multi-scale feature representation [18][21], we will focus on feature fusion of multi-scale layers to solve the semantic gap problem for the further research. Acknowledgements We acknowledge support by the National Natural Science Foundation of China under Grants of U and References [1] G. Liu, X. Sun, K. Fu, and H. Wang, Aircraft recognition in high-resolution satellite images using coarse-to-fine shape prior[j], Geoscience and Remote Sensing Letters, IEEE, vol. 10, no. 3, pp , [2] Polat E, Yildiz C. Stationary Aircraft Detection from Satellite Images[J]. IU-Journal of Electrical & Electronics Engineering, 2012, 12(2): [3] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives[j]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8): [4] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[c]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: [5] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[c].// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: [6] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[c].// European Conference on Computer Vision. Springer International Publishing, 2016: [7] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[j]. International journal of computer vision, 2013, 104(2): [8] Girshick R. Fast r-cnn[c].// Proceedings of the IEEE International Conference on Computer Vision. 2015: [9] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[c]//advances in neural information processing systems. 2015: [10] Zhang F, Du B, Zhang L, et al. Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(9): [11] Zhang Q, Wang Y, Liu Q, et al. CNN based suburban building detection using monocular high resolution Google Earth images[c]//geoscience and Remote Sensing Symposium (IGARSS), 2016 IEEE International. IEEE, 2016:

12 [12] Hariharan B, Arbeláez P, Girshick R, et al. Hypercolumns for object segmentation and fine-grained localization[c]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: [13] Kong T, Yao A, Chen Y, et al. HyperNet: towards accurate region proposal generation and joint object detection[c]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: [14] Bell S, Lawrence Zitnick C, Bala K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[c]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: [15] Yang F, Choi W, Lin Y. Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers[c]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: [16] Cai Z, Fan Q, Feris R S, et al. A unified multi-scale deep convolutional neural network for fast object detection[c]//european Conference on Computer Vision. Springer International Publishing, 2016: [17] Tang T, Zhou S, Deng Z, et al. Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining[J]. Sensors, 2017, 17(2): 336. [18] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[J]. arxiv preprint arxiv: , [19] Li Y, He K, Sun J. R-fcn: Object detection via region-based fully convolutional networks[c]//advances in Neural Information Processing Systems. 2016: [20] Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[c]//proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014: [21] Shrivastava A, Sukthankar R, Malik J, et al. Beyond Skip Connections: Top-Down Modulation for Object Detection[J]. arxiv preprint arxiv: ,

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

arxiv: v1 [cs.cv] 15 Oct 2018

arxiv: v1 [cs.cv] 15 Oct 2018 Instance Segmentation and Object Detection with Bounding Shape Masks Ha Young Kim 1,2,*, Ba Rom Kang 2 1 Department of Financial Engineering, Ajou University Worldcupro 206, Yeongtong-gu, Suwon, 16499,

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA

JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS. Zhao Chen Machine Learning Intern, NVIDIA JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS Zhao Chen Machine Learning Intern, NVIDIA ABOUT ME 5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Pedestrian Detection based on Deep Fusion Network using Feature Correlation Pedestrian Detection based on Deep Fusion Network using Feature Correlation Yongwoo Lee, Toan Duc Bui and Jitae Shin School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies

Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) e-isjn: A4372-3114 Impact Factor: 7.327 Volume 6, Issue 12, December 2018 International Journal of Advance Research in Computer Science and Management Studies Research Article

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Video annotation based on adaptive annular spatial partition scheme

Video annotation based on adaptive annular spatial partition scheme Video annotation based on adaptive annular spatial partition scheme Guiguang Ding a), Lu Zhang, and Xiaoxu Li Key Laboratory for Information System Security, Ministry of Education, Tsinghua National Laboratory

More information

RON: Reverse Connection with Objectness Prior Networks for Object Detection

RON: Reverse Connection with Objectness Prior Networks for Object Detection RON: Reverse Connection with Objectness Prior Networks for Object Detection Tao Kong 1, Fuchun Sun 1, Anbang Yao 2, Huaping Liu 1, Ming Lu 3, Yurong Chen 2 1 Department of CST, Tsinghua University, 2 Intel

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

arxiv: v1 [cs.cv] 18 Jun 2017

arxiv: v1 [cs.cv] 18 Jun 2017 Using Deep Networks for Drone Detection arxiv:1706.05726v1 [cs.cv] 18 Jun 2017 Cemal Aker, Sinan Kalkan KOVAN Research Lab. Computer Engineering, Middle East Technical University Ankara, Turkey Abstract

More information

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b

An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 5th International Conference on Advanced Materials and Computer Science (ICAMCS 2016) An Object Detection Algorithm based on Deformable Part Models with Bing Features Chunwei Li1, a and Youjun Bu1, b 1

More information

Semantic Segmentation

Semantic Segmentation Semantic Segmentation UCLA:https://goo.gl/images/I0VTi2 OUTLINE Semantic Segmentation Why? Paper to talk about: Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, and T. Darrell,

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster

More information

Object Detection. Part1. Presenter: Dae-Yong

Object Detection. Part1. Presenter: Dae-Yong Object Part1 Presenter: Dae-Yong Contents 1. What is an Object? 2. Traditional Object Detector 3. Deep Learning-based Object Detector What is an Object? Subset of Object Recognition What is an Object?

More information

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey Evangelos MALTEZOS, Charalabos IOANNIDIS, Anastasios DOULAMIS and Nikolaos DOULAMIS Laboratory of Photogrammetry, School of Rural

More information

End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images

End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images remote sensing Article End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images Zhong Chen 1,2,3, Ting Zhang 1,2,3 and Chao Ouyang 1,2,3, * 1 School of Automation, Huazhong University

More information

Yield Estimation using faster R-CNN

Yield Estimation using faster R-CNN Yield Estimation using faster R-CNN 1 Vidhya Sagar, 2 Sailesh J.Jain and 2 Arjun P. 1 Assistant Professor, 2 UG Scholar, Department of Computer Engineering and Science SRM Institute of Science and Technology,Chennai,

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Classifying a specific image region using convolutional nets with an ROI mask as input

Classifying a specific image region using convolutional nets with an ROI mask as input Classifying a specific image region using convolutional nets with an ROI mask as input 1 Sagi Eppel Abstract Convolutional neural nets (CNN) are the leading computer vision method for classifying images.

More information

SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector SSD: Single Shot MultiBox Detector Wei Liu 1(B), Dragomir Anguelov 2, Dumitru Erhan 3, Christian Szegedy 3, Scott Reed 4, Cheng-Yang Fu 1, and Alexander C. Berg 1 1 UNC Chapel Hill, Chapel Hill, USA {wliu,cyfu,aberg}@cs.unc.edu

More information

Martian lava field, NASA, Wikipedia

Martian lava field, NASA, Wikipedia Martian lava field, NASA, Wikipedia Old Man of the Mountain, Franconia, New Hampshire Pareidolia http://smrt.ccel.ca/203/2/6/pareidolia/ Reddit for more : ) https://www.reddit.com/r/pareidolia/top/ Pareidolia

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

Cascade Region Regression for Robust Object Detection

Cascade Region Regression for Robust Object Detection Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Cascade Region Regression for Robust Object Detection Jiankang Deng, Shaoli Huang, Jing Yang, Hui Shuai, Zhengbo Yu, Zongguang Lu, Qiang Ma, Yali

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

An algorithm for highway vehicle detection based on convolutional neural network

An algorithm for highway vehicle detection based on convolutional neural network Chen et al. EURASIP Journal on Image and Video Processing (2018) 2018:109 https://doi.org/10.1186/s13640-018-0350-2 EURASIP Journal on Image and Video Processing RESEARCH An algorithm for highway vehicle

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

arxiv: v1 [cs.cv] 30 Apr 2018

arxiv: v1 [cs.cv] 30 Apr 2018 An Anti-fraud System for Car Insurance Claim Based on Visual Evidence Pei Li Univeristy of Notre Dame BingYu Shen University of Notre dame Weishan Dong IBM Research China arxiv:184.1127v1 [cs.cv] 3 Apr

More information

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1 Common computer vision tasks Image Classification: one label is generated for

More information

Hand Detection For Grab-and-Go Groceries

Hand Detection For Grab-and-Go Groceries Hand Detection For Grab-and-Go Groceries Xianlei Qiu Stanford University xianlei@stanford.edu Shuying Zhang Stanford University shuyingz@stanford.edu Abstract Hands detection system is a very critical

More information

Robust Object detection for tiny and dense targets in VHR Aerial Images

Robust Object detection for tiny and dense targets in VHR Aerial Images Robust Object detection for tiny and dense targets in VHR Aerial Images Haining Xie 1,Tian Wang 1,Meina Qiao 1,Mengyi Zhang 2,Guangcun Shan 1,Hichem Snoussi 3 1 School of Automation Science and Electrical

More information

Time Stamp Detection and Recognition in Video Frames

Time Stamp Detection and Recognition in Video Frames Time Stamp Detection and Recognition in Video Frames Nongluk Covavisaruch and Chetsada Saengpanit Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand E-mail: nongluk.c@chula.ac.th

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

Photo OCR ( )

Photo OCR ( ) Photo OCR (2017-2018) Xiang Bai Huazhong University of Science and Technology Outline VALSE2018, DaLian Xiang Bai 2 Deep Direct Regression for Multi-Oriented Scene Text Detection [He et al., ICCV, 2017.]

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Deformable Part Models

Deformable Part Models CS 1674: Intro to Computer Vision Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 9, 2016 Today: Object category detection Window-based approaches: Last time: Viola-Jones

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

arxiv: v1 [cs.cv] 26 May 2017

arxiv: v1 [cs.cv] 26 May 2017 arxiv:1705.09587v1 [cs.cv] 26 May 2017 J. JEONG, H. PARK AND N. KWAK: UNDER REVIEW IN BMVC 2017 1 Enhancement of SSD by concatenating feature maps for object detection Jisoo Jeong soo3553@snu.ac.kr Hyojin

More information

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Supplementary Material: Pixelwise Instance Segmentation with a Dynamically Instantiated Network Anurag Arnab and Philip H.S. Torr University of Oxford {anurag.arnab, philip.torr}@eng.ox.ac.uk 1. Introduction

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Traffic Multiple Target Detection on YOLOv2

Traffic Multiple Target Detection on YOLOv2 Traffic Multiple Target Detection on YOLOv2 Junhong Li, Huibin Ge, Ziyang Zhang, Weiqin Wang, Yi Yang Taiyuan University of Technology, Shanxi, 030600, China wangweiqin1609@link.tyut.edu.cn Abstract Background

More information

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document

Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor Supplemental Document Franziska Mueller 1,2 Dushyant Mehta 1,2 Oleksandr Sotnychenko 1 Srinath Sridhar 1 Dan Casas 3 Christian Theobalt

More information

arxiv: v3 [cs.cv] 18 Oct 2017

arxiv: v3 [cs.cv] 18 Oct 2017 SSH: Single Stage Headless Face Detector Mahyar Najibi* Pouya Samangouei* Rama Chellappa University of Maryland arxiv:78.3979v3 [cs.cv] 8 Oct 27 najibi@cs.umd.edu Larry S. Davis {pouya,rama,lsd}@umiacs.umd.edu

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

Combining Selective Search Segmentation and Random Forest for Image Classification

Combining Selective Search Segmentation and Random Forest for Image Classification Combining Selective Search Segmentation and Random Forest for Image Classification Gediminas Bertasius November 24, 2013 1 Problem Statement Random Forest algorithm have been successfully used in many

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Research on Integration of Video Vehicle Data Statistics and Model Parameter Correction

Research on Integration of Video Vehicle Data Statistics and Model Parameter Correction Research on Integration of Video Vehicle Data Statistics and Model Parameter Correction Abstract Jing Zhang 1, a, Lin Zhang 1, b and Changwei Wang 1, c 1 North China University of Science and Technology,

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Improving Small Object Detection

Improving Small Object Detection Improving Small Object Detection Harish Krishna, C.V. Jawahar CVIT, KCIS International Institute of Information Technology Hyderabad, India Abstract While the problem of detecting generic objects in natural

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks

Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks Video Gesture Recognition with RGB-D-S Data Based on 3D Convolutional Networks August 16, 2016 1 Team details Team name FLiXT Team leader name Yunan Li Team leader address, phone number and email address:

More information

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018

CS 1674: Intro to Computer Vision. Object Recognition. Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 CS 1674: Intro to Computer Vision Object Recognition Prof. Adriana Kovashka University of Pittsburgh April 3, 5, 2018 Different Flavors of Object Recognition Semantic Segmentation Classification + Localization

More information

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore), WordFences: Text Localization and Recognition ICIP 2017 Andrei Polzounov (Universitat Politecnica de Catalunya, Barcelona, Spain), Artsiom Ablavatski (A*STAR Institute for Infocomm Research, Singapore),

More information