A Novel Representation and Pipeline for Object Detection

Size: px
Start display at page:

Download "A Novel Representation and Pipeline for Object Detection"

Transcription

1 A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University Manik Dhar Stanford University Abstract Object detection is an important problem in Computer Vision research. Neural network based models have not reached performance as high as they have reached in object classification which is an intimately related task. These methods usually consider the background as another class for an object classifier which doesn t exploit the different nature of background as compared to objects. We propose a novel training criterion which tackles background separately. At the same time, we examine how Learning without Forgetting and finetuning perform in transferring from the classification to the detection task. We train on the canonical PASCAL VOC dataset. We provide results for a small network trained from scratch and results for a larger network pre-trained on ImageNet followed by finetuning with and without Learning without forgetting for object detection. 1. Introduction Image perception, or the ability to understand the contents of an image is the holy grail in computer vision and artificial intelligence. The task of image classification was very hard for computers, until very recently. With the advent of deep convolutional neural networks, computers are now able to beat humans on image classification (at least on large scale datasets like ImageNet). A related task, object detection aims to localize and classify an object in an image. Having good models for object detection is very important in a variety of tasks including medical imaging, surveillance and object tracking, among others. Current state-of-the-art object detection models faster-rcnn [7] and SPP-Net [5] re-factor existing classification models like AlexNet and ZF-Net to suit the requirements of object detection. Parts of the image which does not contain the object is generally called background. Making a distinction between background and object is crucial, since most images contain background and a bulk of the image is usually background in most natural images. Most state-of-the-art object recognition algorithms treat background as just another category, along with object categories. Classification is usually performed on a region in the image large enough to hold objects completely, but small enough to exclude a lot of background. Treating background as another category does not make intuitive sense since it is present universally in every natural image, while a given object is usually not present in all images. Another distinction is that Background images can potentially have large intra-class variation, while most objects have lesser intra-class variation. Therefore, we should not be using the same representation that is used to perform classification to also perform object detection. In this work, we place special emphasis on background and design good loss functions that can force a neural network to activate only for objects and not activate at all for non-objects (or background). This will translate to learning a better representation specific to object detection. 2. Previous work Object detection is a much harder problem compared to object classification. The object needs to be localized within a region along with identifying it. R-CNN was one of the earlier attempts which involved finding region proposals using a method like selective search and then later using a convolutional neural network to extract features from it for Object detection and classification [4]. At the same time, another approach involving framing the localization as a regression problem was proposed [9] and it did not perform as well as R-CNN. R-CNN is slow to train and test and consumes a lot of disk space. For that reason, variants were proposed to increase the training and testing time. SPP- Net [5] and Fast-RCNN [3] are the two most notable variants. Faster-RCNN [8] uses a region proposal network to produce object proposal leading to an end-to-end trainable system for object detection. The methods mentioned train a new layer on top of fc7 layer of AlexNet. This new layer accounts for new classes and an extra background class which ideally captures everything apart from the objects of interest. Our approach is independent to the previous work and can be easily adapted 1

2 to state of the art object detection architectures like Faster- RCNN. Our approach is vastly different, in that we force our representation to output a non-zero vector only if the input image is an object. This way, we force our neural network to encode information in all the neurons of the feature layer. This is not necessarily the case in networks like RCNN (a subset of neurons might actually be sufficient). We propose this with the intuition that part of the power of a deep neural network comes from its ability to learn a distributed representation that it can combine in multiple ways. Learning without forgetting (LwF) was introduced in [10] as a replacement transfer learning strategy for fine tuning; in that, the model also performs well on the original task. Apart from the obvious advantage of performing well on both old and new task, LwF also acts as a regularizer while training for the new task and therefore prevents over fitting on the new task. However, in all their experiments, the authors train their models on a large-scale dataset like ImageNet [1] or Places2 dataset for image classification and transfer the knowledge to smaller datasets like PASCAL VOC [2] for classification. Their old and new skills are the same (namely classification). They show some evidence that performing LwF on a very different task (like classifying different kind of images) within the same skill domain (classification) will significantly degrade its performance on the old task [10]. In particular, they show that training a model for classification on Places2 and performing LwF on CUB dataset resulted in a significant degradation of performance on Places2, since these tasks are very dissimilar. An interesting related question is to see if LwF works well when applied on dissimilar skills (like classification to localization/bounding box regression). To this end we use AlexNet pre-trained on ImageNet and train it via Learning without Forgetting. LSDA: Large Scale Detection through Adaptation [6] is a method to train an object detection network where the training set contains images for classes but bounding box labels only for a subset of these classes. The changes we discuss for R-CNN can also be adapted for LSDA. We discuss this further in a later section. 3. Main Contributions 3.1. A New Representation for Object Detection This is obtained using a novel loss function that forces the neural network to activate only to objects and not activate at all to non-objects. This translates to pushing feature vectors corresponding to non-objects to the origin of the feature space and to the surface of a unit hyper-sphere for feature vectors corresponding to objects. Figure 1: Example from PASCAL VOC detection dataset 3.2. Compare Transfer Learning Strategies We compare Learning without Forgetting [10] and finetuning transfer learning strategies on learning a new skill like object detection, using weights learned for image classification. The idea is that learning without forgetting strategy acts as a regularizer and therefore is a better transfer learning strategy on small datasets. 4. Dataset Used While there are multiple datasets out there to train learning algorithms for object detection, we use PASCAL VOC 2012 (for detection) for training, validation and testing. PASCAL VOC (for detection) consists of images containing objects belonging to 20 different categories. It consists of a bunch of transportation vehicles, animals (including people) and everyday objects. The metadata, for each image, consists of a list of all objects in the image and their corresponding ground truth bounding boxes. An example from PASCAL VOC can be seen in figure Region Proposals Ground truth regions themselves are not sufficient to train a neural network since it does not contain background regions explicitly. [4] provide bounding boxes for the train and test sets they use. They obtain these by running the images through a selective search algorithm. However, these region do not come pre-assigned with a label. It is upon us to use the ground truth bounding box information to infer what the bounding boxes from selective search consists of. We write our program to assign labels to these region proposals.

3 loss on top of the feature layer to classify the object. For m classes and n image crops, the cross-entropy loss is: n m 1 XX (i) 1{yj = 1} log(softmaxθ (φ(x(i) ), j)) n i=1 j=1 T Figure 2: Crops from selective search produced by [4]. The top row consists of objects while the bottom row has background crops Label Assignment For Proposed Regions We use the Intersection over Union (IoU) metric to assign labels. For each proposal, we find the IoU over all ground truth bounding boxes with a threshold of 0.7. i.e. we are only interested in ground truth bounding boxes that have an IoU of > 0.7. If we manage to find multiple such ground truth boxes, we assign the label corresponding to the maximum IoU ground truth box to the proposal. If we fail to find IoU values crossing a threshold of 0.7, we treat the proposal to be background. For each image, we have about 2500 bounding boxes as obtained from selective search. With the threshold we use, we find that roughly 10% corresponds to some object, while the remaining 90% are background. We provide some examples of crops thus generated in figure eθj φ(x) softmaxθ (φ(x), j) = P m T eθk φ(x) k=1 where θ is the classifier weight vector, x is an input image crop, y is the one-hot vector for the class labels and φ is a function representing the neural network architecture. The RCNN model [4] also has a similar loss function for object classification. The main distinction is that our classifier will not have a background class, whereas the RCNN classifier treats the background as another class Loss Function to Control L2 norm The loss function should penalize high L2 norm values for non-objects, and low L2 norm values for objects. For this we design the following two loss functions: Spherical Hinge Loss Spherical Softmax Loss We define the L2 norm hinge loss as follows: n (i) 1X {(kφ(x(i) )k22 1)( 1)1{ky k1 =1} }+ n i=1 Engineering Limitations Due to hardware limitations (disk space) we were forced to use a subset of the full dataset. We obtain about crops corresponding to objects and more than 1M background training examples. However, we randomly discard most of them and keep only randomly chosen background crops for training. Spherical Hinge Loss where {x}+ = x if x > 0, else 0 Here, 1{kyk1 = 1} indicates if an image crop contains an object or not. For example, if there is no object in the image crop, the class vector will be zero. 5. Technical Details Our goal is to get a good representation for object detection. As mentioned before, we want to force the neural network to produce non-zero activation only when it is fed an object. For background crops, it should ideally not produce activation. Concretely, this means that the L2 norm of the final features layer should be zero for non-objects and close to 1 for objects. We achieve this by designing loss functions that force the norm of the final features to be zero for non-objects Loss Function for Object Classification Given that the proposed region has an object in it, we train a softmax classifier with the standard cross-entropy Spherical Softmax Loss In this approach we train a 2-class softmax classifier on the square of the norm of the last feature layer to find background images. We provide the equation below (which is simplified because there are only 2 classes and the feature is just a scalar): n 2 1X (1{kyk1 = 0} log(1 + ekkφ(x)k2 +b ) n i=1 2 +1{kyk1 = 1} log(1 + e kkφ(x)k2 b )) k and b are two scalar parameters which we need to train over.

4 where, z (i) j = (z(i) j ) 1 T m img k=1 (z (k) j ) 1 T, ẑ (i) j = (ẑ(i) j ) 1 T m img k=1 z (i) j = softmax θ img(φ(x (i) )) (ẑ (k) j ) 1 T This loss function ensures that information about the previous task is maintained and acts as a regularization term for our object detection task. Figure 3: Schematic of the three layer convolutional neural network 5.3. Neural Network Architectures Used Three Layer CNN In order to quickly validate our hypothesis on using the loss functions on L 2 norm, we use a three layer neural network since it is fast and easy to train. In a bid to reduce the number of parameters, we resize all crops to have a size of We provide a schematic of our neural network in figure AlexNet Once we validated our hypothesis of using L 2 norm for classification, we started using AlexNet pretrained on ImageNet. We use AlexNet to compare finetuning and LwF transfer learning strategies. The reason for this choice is that pretrained weights TensorFlow is available online and is one of the simplest deep networks for analysis Learning without Forgetting (LwF) The network is initially trained on classifying the ImageNet dataset. To ensure that previously learned capabilities are not forgotten, we use Learning without Forgetting (LwF) transfer learning strategy. LwF also provides good regularization while training the weights of the network. Let φ represent the original network, θ img be the original weights for the ImageNet classifier and m img be the number of classes in ImageNet. We compute, ẑ (i) = softmax θimg( φ(x (i) )) where softmax θimg is the output of the softmax layer for the ImageNet classifier which will be an m img size vector. We use knowledge distillation loss to minimize the change in the output of the old task. The loss function for learning without forgetting is, 1 n m img j=1 n i=1 ẑ (i) j log(z (i) j ) 6. Experiments 6.1. Experiment 1: Comparing Detection Pipelines There are three different to take in an image crop and perform classification during inference: RCNN like classification: Here the object is treated as just another class and the network is trained to classify crops into one of 21 categories. The schematic is shown in figure 4 Network trained using Spherical Hinge Loss: The norm of the final features is first computed. If the norm is less than 1, it is declared background. Otherwise, it is passed through a softmax classifier which classifies it into one of 20 object categories. A linear combination of the two loss values is taken with the weights being hyper-parameters. For our experiments, we use a weight of 1 for each of the loss values. Network trained using Spherical Softmax: In this pipeline, the norm is used to directly infer (due to binary softmax loss) whether or not it is an object. If it is an object, it is passed through a softmax classifier which classifies it into one of 20 object categories. A linear combination of the two loss values is taken with the weights being hyper-parameters. For our experiments, we use a weight of 1 for each of the loss values. The schematic for the latter two networks is depicted in figure 5. In order to compare these three approaches, we use a three layer convolutional neural network as the base network, as mentioned mentioned previously. In the first experiment, we compare the classification accuracy on a fixed validation set for all the three pipelines mentioned above Experiment 2: Comparing Transfer Learning Strategies RCNN uses fine-tuning transfer learning strategy on AlexNet weights learned on ImageNet. We want to see if the newly introduced Learning without Forgetting (LwF) fine-tuning strategy works better than fine-tuning. We found from the first experiment that network trained with Spherical Hinge Loss (we refer to it as SHL) performs better

5 Figure 4: Schematic of the network used for RCNN like classification. Base network is a three layer CNN Figure 6: Schematic of SPH with LwF loss function added Figure 5: Schematic of the network used for classification using Spherical Hinge Loss and Spherical Softmax Loss on the L 2 norm. Base network is a three layer CNN than both RCNN like object classification (we refer to it as RCNN(ours)) and network trained with Spherical Softmax Loss (we refer to it as SSL). Therefore, we narrow down this experiment to comparing finetuning strategies on RCNN(ours) and SHL. The base network used is AlexNet pre-trained on ImageNet. Figure 7: Schematic of RCNN(ours) with LwF loss function added Finetuning We finetune RCNN(ours) and SHL from conv4 layer onward. The reason for this choice is that initial layers of the neural network are found to be simple edge detectors and Gabor like filters, which generalize well across multiple datasets. However, we expect it not to work very well on SPH since it tries to drastically alter the distribution of data in the space of representation Learning without Forgetting (LwF) We use AlexNet trained on ImageNet as an anchor network whose weights never get updated. We use a second copy of AlexNet pre-trained on ImageNet but update the weights according to a loss function which takes into account the LwF loss. Similar to finetuning, we only update weights of the base network from conv4 layer onward. The schematic for such a network for SHL is given in figure 6 and the same for RCNN(ours) is given in figure 7. The final loss function is the linear combination of each of the loss functions given in Figure 8: Histogram (log-log scale) of the squared-norm values of the prefinal layer for objects (red) and non object (green) with spherical hinge loss the respective figures, where the weights in the linear combination are hyper-parameters. 7. Results 7.1. Experiment 1 From 6.1, we find that SHL performs better than RCNN(ours) and the network trained on SSL. We train the

6 Figure 9: Histogram (log-log scale) of the squared-norm values of the prefinal layer for objects (red) and non object (green) with spherical softmax loss Figure 10: t-sne diagram for objects (red) and background (blue) for RCNN(ours) model over 100 epochs and obtain validation accuracies in steps of 10 epochs. We report the best accuracies among them in table 7.1. We use the following abbreviations, OCA = Object classification accuracy. This is the classification accuracy among the 20 object categories. CA = Over all classification accuracy of all objects, including background BC = Background classification. This essentially measures the accuracy of classifying between objects and non-objects. Model OCA CA BC RCNN like classifier Spherical Hinge Loss Net Spherical Softmax Loss Net Discussion We also plot histograms of the norm-squared value of the pre-final layer of SHL 8 and SSL 9. We observe that SHL leads to nice and clear separation between the two classes, while there is a lot more overlap between the object and non-object classes when we use the Spherical Softmax Loss. This is again validated when we compare the classification accuracies. We use t-sne to visualize how object and background images are distributed in the embedding space in figures 10, 11 and 10. We see that for the vanilla network, the background images are distributed haphazardly whereas for SHL and SSL, they are more concentrated. They are more concentrated for SHL, than SFL. Figure 11: t-sne diagram for objects (red) and background (blue) for SPH 7.2. Experiment Comparison between SHL and RCNN (ours) for LwF strategy From figures 13, 14 and 15, we find that while SHL performs better than RCNN(ours) on OCA, it performs very badly on CA and BC metrics Comparison between SHL and RCNN (ours) for finetuning strategy From figures 16, 17 and 18, we find that while RCNN(ours) performs better than SHL on all the accuracy metrics.

7 Figure 12: t-sne diagram for objects (red) and background (blue) for the spherical softmax network Figure 14: Comparison of CA for SHL and RCNN(ours) with LwF strategy Figure 13: Comparison of OCA for SHL and RCNN(ours) with LwF strategy Figure 15: Comparison of BC for SHL and RCNN(ours) with LwF strategy Comparison of SHL for finetuning and LwF transfer learning strategies From figures 19, 20 and 21, we find that SHL with LwF performs better than SHL with FT on OCA and worse worse on CA and BC metrics Discussion Observations and is not at all surprising. The reason is that we use pre-trained AlexNet which is trained to perform classification. SHL imposes drastic constraints on the embeddings of the neural network, whereas RCNN(ours) can simply build off of the weight structures produced by pre-training on ImageNet. Also, since we do not train all the network weights, this effect is even more pronounced. Figure 16: Comparison of OCA for SHL and RCNN(ours) with Finetuning strategy

8 Figure 17: Comparison of CA for SHL and RCNN(ours) with Finetuning strategy Figure 20: Comparison of CA for SHL on Finetuning strategy against SHL on LwF strategy Figure 18: Comparison of BC for SHL and RCNN(ours) with Finetuning strategy Figure 21: Comparison of BC for SHL on Finetuning strategy against SHL on LwF strategy using special loss functions like the Spherical Hinge Loss and the Spherical Softmax Loss on the L 2 norm of the embeddings of the base network. We perform two experiments: 6.1 and 6.2 and find that when we train networks with random initialization, using Spherical Hinge Loss on the L 2 norm is more effective than RCNN(ours), where background is treated as another class. Figure 19: Comparison of OCA for SHL on Finetuning strategy against SHL on LwF strategy 8. Conclusions We base our exploration on the intuition that background should be treated differently. We try to incorporate this by However, when warm-starting the learning process (both finetuning and LwF) on ImageNet pre-trained networks, RCNN(ours) performs better than SPH. The reason for this has been discussed in We believe that training SPH on large scale object detection datasets like ImageNet for object detection and then using transfer learning techniques for smaller datasets like PASCAL VOC might actually perform better than the original RCNN [4].

9 Figure 22: Detection with the LSDA network. Given an image, extract region proposals, reshape the regions to fit into the network size and finally produce detection scores per category for the region. Layers with red dots/fill indicate they have been modified/learned during fine-tuning with available bounding box annotated data. Learning without forgetting can allow us to protect these layers from loosing information about classes in set A. Background detection can be done using the spherical hinge loss. 9. Future Direction 9.1. Fast and Faster-RCNN The methods we describe can also be run on the Fast and Faster-RCNN network. Their current implementation is in Caffe. We had started out with working on TensorFlow and implementing Fast and Faster-RCNN requires Region of Interest Pooling layers (introduced in Fast-RCNN) which is not currently implemented in tensorflow. Open-source implementations we found didn t work well. Therefore, we decided to run experiments on the R-CNN network instead. It is important to note that the improvements made by Fast and Faster-RCNN networks are orthogonal to our modifications in purpose and therefore can be used together to create a object detection system LSDA LSDA: Large Scale Detection through Adaptation network [6] solves a more general problem. They consider a scenario where you have a dataset with training data for classification, but only a subset of that dataset has bounding box training data for object detection. Their method allows them to train a network which can solve the object detection problem for the whole dataset. We describe their approach here. The set of classes is split in two depending on whether they have bounding box label images. Say set A doesn t and set B does. They start out with a network which is trained for classification on the whole dataset (A B). Unlike a usual classification network they don t use normalized softmax values and instead use linear scores which can lie anywhere over the reals. After the training we have final layer which provides object detection scores for the whole dataset. f A correspond to cells which provide scores for classes in set A. Similarly we have f B. Next, they initialize new empty cells in the last layer to encode object detection information for the background and the classes with bounding box data available δb. The object detection score for classes in B is computed by adding the classification scores and the scores from the new cells f B + δb. For classes in set A they find the nearest neighbors (according to the weights in the last layer) in set B and average their scores to get approximate scores for a δa layer if the data was available. The additions we consider for RCNN: Spherical Hinge Loss and LwF can be used on the LSDA network as well. While training, all the previous layers are finetuned over data corresponding to set B. The LwF loss would act as a regularizer and prevent knowledge about set A from being lost. Similarly for background classification, Hinge Loss can be used instead. References [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, [2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. 2 [3] R. Girshick. Fast r-cnn. In International Conference on Computer Vision (ICCV),

10 [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, , 2, 3, 8 [5] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/ , [6] J. Hoffman, S. Guadarrama, E. S. Tzeng, R. Hu, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. Lsda: Large scale detection through adaptation. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages Curran Associates, Inc., , 9 [7] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS), [8] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages Curran Associates, Inc., [9] C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks for object detection. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages Curran Associates, Inc., [10] D. H. Zhizhong Li. Learning without forgetting. arxiv preprint arxiv: v2,

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Object detection with CNNs

Object detection with CNNs Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks

Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

OBJECT DETECTION HYUNG IL KOO

OBJECT DETECTION HYUNG IL KOO OBJECT DETECTION HYUNG IL KOO INTRODUCTION Computer Vision Tasks Classification + Localization Classification: C-classes Input: image Output: class label Evaluation metric: accuracy Localization Input:

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset

Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.

Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab. [ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

arxiv: v1 [cs.cv] 31 Mar 2016

arxiv: v1 [cs.cv] 31 Mar 2016 Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.

More information

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization

Supplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Hand Detection For Grab-and-Go Groceries

Hand Detection For Grab-and-Go Groceries Hand Detection For Grab-and-Go Groceries Xianlei Qiu Stanford University xianlei@stanford.edu Shuying Zhang Stanford University shuyingz@stanford.edu Abstract Hands detection system is a very critical

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou

MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart

A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT

More information

Gradient of the lower bound

Gradient of the lower bound Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

arxiv: v1 [cs.cv] 26 Jun 2017

arxiv: v1 [cs.cv] 26 Jun 2017 Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

arxiv: v3 [cs.cv] 2 Jun 2017

arxiv: v3 [cs.cv] 2 Jun 2017 Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv:1703.01976v3 [cs.cv] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang

EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign

More information

Mimicking Very Efficient Network for Object Detection

Mimicking Very Efficient Network for Object Detection Mimicking Very Efficient Network for Object Detection Quanquan Li 1, Shengying Jin 2, Junjie Yan 1 1 SenseTime 2 Beihang University liquanquan@sensetime.com, jsychffy@gmail.com, yanjunjie@outlook.com Abstract

More information

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,

International Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18, REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong

More information

G-CNN: an Iterative Grid Based Object Detector

G-CNN: an Iterative Grid Based Object Detector G-CNN: an Iterative Grid Based Object Detector Mahyar Najibi 1, Mohammad Rastegari 1,2, Larry S. Davis 1 1 University of Maryland, College Park 2 Allen Institute for Artificial Intelligence najibi@cs.umd.edu

More information

arxiv: v1 [cs.cv] 5 Oct 2015

arxiv: v1 [cs.cv] 5 Oct 2015 Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs

Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials

Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University

More information

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors

[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors [Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering

More information

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Instance-aware Semantic Segmentation via Multi-task Network Cascades Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

Part Localization by Exploiting Deep Convolutional Networks

Part Localization by Exploiting Deep Convolutional Networks Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.

More information

Rich feature hierarchies for accurate object detection and semant

Rich feature hierarchies for accurate object detection and semant Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018 Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster

More information

Geometry-aware Traffic Flow Analysis by Detection and Tracking

Geometry-aware Traffic Flow Analysis by Detection and Tracking Geometry-aware Traffic Flow Analysis by Detection and Tracking 1,2 Honghui Shi, 1 Zhonghao Wang, 1,2 Yang Zhang, 1,3 Xinchao Wang, 1 Thomas Huang 1 IFP Group, Beckman Institute at UIUC, 2 IBM Research,

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL

PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing,

More information

Deep Residual Learning

Deep Residual Learning Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st

More information

Project 3 Q&A. Jonathan Krause

Project 3 Q&A. Jonathan Krause Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations

More information

Learning with Side Information through Modality Hallucination

Learning with Side Information through Modality Hallucination Learning with Side Information through Modality Hallucination Judy Hoffman Saurabh Gupta Trevor Darrell EECS Department, UC Berkeley {jhoffman, sgupta, trevor}@eecs.berkeley.edu Abstract We present a modality

More information

DeepBox: Learning Objectness with Convolutional Networks

DeepBox: Learning Objectness with Convolutional Networks DeepBox: Learning Objectness with Convolutional Networks Weicheng Kuo Bharath Hariharan Jitendra Malik University of California, Berkeley {wckuo, bharath2, malik}@eecs.berkeley.edu Abstract Existing object

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

Subspace Alignment Based Domain Adaptation for RCNN Detector

Subspace Alignment Based Domain Adaptation for RCNN Detector RAJ, NAMBOODIRI, TUYTELAARS: ADAPTING RCNN DETECTOR 1 Subspace Alignment Based Domain Adaptation for RCNN Detector Anant Raj anantraj@iitk.ac.in Vinay P. Namboodiri vinaypn@iitk.ac.in Tinne Tuytelaars

More information

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016

Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury

CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION Jawadul H. Bappy and Amit K. Roy-Chowdhury Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521 ABSTRACT

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN

Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN Rongqiang Qian,

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space

Towards Real-Time Automatic Number Plate. Detection: Dots in the Search Space Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Layerwise Interweaving Convolutional LSTM

Layerwise Interweaving Convolutional LSTM Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

arxiv: v1 [cs.cv] 29 Sep 2016

arxiv: v1 [cs.cv] 29 Sep 2016 arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision

More information

LSDA: Large Scale Detection through Adaptation

LSDA: Large Scale Detection through Adaptation LSDA: Large Scale Detection through Adaptation Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, EECS, UC Berkeley, EE, Tsinghua University {jhoffman, sguada, tzeng, jdonahue}@eecs.berkeley.edu

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Real-Time Grasp Detection Using Convolutional Neural Networks

Real-Time Grasp Detection Using Convolutional Neural Networks Real-Time Grasp Detection Using Convolutional Neural Networks Joseph Redmon 1, Anelia Angelova 2 Abstract We present an accurate, real-time approach to robotic grasp detection based on convolutional neural

More information

Learning Transferable Features with Deep Adaptation Networks

Learning Transferable Features with Deep Adaptation Networks Learning Transferable Features with Deep Adaptation Networks Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan Presented by Changyou Chen October 30, 2015 1 Changyou Chen Learning Transferable Features

More information

arxiv: v1 [cs.cv] 4 Jun 2015

arxiv: v1 [cs.cv] 4 Jun 2015 Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks arxiv:1506.01497v1 [cs.cv] 4 Jun 2015 Shaoqing Ren Kaiming He Ross Girshick Jian Sun Microsoft Research {v-shren, kahe, rbg,

More information

Dataset Augmentation with Synthetic Images Improves Semantic Segmentation

Dataset Augmentation with Synthetic Images Improves Semantic Segmentation Dataset Augmentation with Synthetic Images Improves Semantic Segmentation P. S. Rajpura IIT Gandhinagar param.rajpura@iitgn.ac.in M. Goyal IIT Varanasi manik.goyal.cse15@iitbhu.ac.in H. Bojinov Innit Inc.

More information

arxiv: v1 [cs.cv] 26 Jul 2018

arxiv: v1 [cs.cv] 26 Jul 2018 A Better Baseline for AVA Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman DeepMind Carnegie Mellon University University of Oxford arxiv:1807.10066v1 [cs.cv] 26 Jul 2018 Abstract We introduce

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information