A Novel Representation and Pipeline for Object Detection
|
|
- Elaine Harmon
- 5 years ago
- Views:
Transcription
1 A Novel Representation and Pipeline for Object Detection Vishakh Hegde Stanford University Manik Dhar Stanford University Abstract Object detection is an important problem in Computer Vision research. Neural network based models have not reached performance as high as they have reached in object classification which is an intimately related task. These methods usually consider the background as another class for an object classifier which doesn t exploit the different nature of background as compared to objects. We propose a novel training criterion which tackles background separately. At the same time, we examine how Learning without Forgetting and finetuning perform in transferring from the classification to the detection task. We train on the canonical PASCAL VOC dataset. We provide results for a small network trained from scratch and results for a larger network pre-trained on ImageNet followed by finetuning with and without Learning without forgetting for object detection. 1. Introduction Image perception, or the ability to understand the contents of an image is the holy grail in computer vision and artificial intelligence. The task of image classification was very hard for computers, until very recently. With the advent of deep convolutional neural networks, computers are now able to beat humans on image classification (at least on large scale datasets like ImageNet). A related task, object detection aims to localize and classify an object in an image. Having good models for object detection is very important in a variety of tasks including medical imaging, surveillance and object tracking, among others. Current state-of-the-art object detection models faster-rcnn [7] and SPP-Net [5] re-factor existing classification models like AlexNet and ZF-Net to suit the requirements of object detection. Parts of the image which does not contain the object is generally called background. Making a distinction between background and object is crucial, since most images contain background and a bulk of the image is usually background in most natural images. Most state-of-the-art object recognition algorithms treat background as just another category, along with object categories. Classification is usually performed on a region in the image large enough to hold objects completely, but small enough to exclude a lot of background. Treating background as another category does not make intuitive sense since it is present universally in every natural image, while a given object is usually not present in all images. Another distinction is that Background images can potentially have large intra-class variation, while most objects have lesser intra-class variation. Therefore, we should not be using the same representation that is used to perform classification to also perform object detection. In this work, we place special emphasis on background and design good loss functions that can force a neural network to activate only for objects and not activate at all for non-objects (or background). This will translate to learning a better representation specific to object detection. 2. Previous work Object detection is a much harder problem compared to object classification. The object needs to be localized within a region along with identifying it. R-CNN was one of the earlier attempts which involved finding region proposals using a method like selective search and then later using a convolutional neural network to extract features from it for Object detection and classification [4]. At the same time, another approach involving framing the localization as a regression problem was proposed [9] and it did not perform as well as R-CNN. R-CNN is slow to train and test and consumes a lot of disk space. For that reason, variants were proposed to increase the training and testing time. SPP- Net [5] and Fast-RCNN [3] are the two most notable variants. Faster-RCNN [8] uses a region proposal network to produce object proposal leading to an end-to-end trainable system for object detection. The methods mentioned train a new layer on top of fc7 layer of AlexNet. This new layer accounts for new classes and an extra background class which ideally captures everything apart from the objects of interest. Our approach is independent to the previous work and can be easily adapted 1
2 to state of the art object detection architectures like Faster- RCNN. Our approach is vastly different, in that we force our representation to output a non-zero vector only if the input image is an object. This way, we force our neural network to encode information in all the neurons of the feature layer. This is not necessarily the case in networks like RCNN (a subset of neurons might actually be sufficient). We propose this with the intuition that part of the power of a deep neural network comes from its ability to learn a distributed representation that it can combine in multiple ways. Learning without forgetting (LwF) was introduced in [10] as a replacement transfer learning strategy for fine tuning; in that, the model also performs well on the original task. Apart from the obvious advantage of performing well on both old and new task, LwF also acts as a regularizer while training for the new task and therefore prevents over fitting on the new task. However, in all their experiments, the authors train their models on a large-scale dataset like ImageNet [1] or Places2 dataset for image classification and transfer the knowledge to smaller datasets like PASCAL VOC [2] for classification. Their old and new skills are the same (namely classification). They show some evidence that performing LwF on a very different task (like classifying different kind of images) within the same skill domain (classification) will significantly degrade its performance on the old task [10]. In particular, they show that training a model for classification on Places2 and performing LwF on CUB dataset resulted in a significant degradation of performance on Places2, since these tasks are very dissimilar. An interesting related question is to see if LwF works well when applied on dissimilar skills (like classification to localization/bounding box regression). To this end we use AlexNet pre-trained on ImageNet and train it via Learning without Forgetting. LSDA: Large Scale Detection through Adaptation [6] is a method to train an object detection network where the training set contains images for classes but bounding box labels only for a subset of these classes. The changes we discuss for R-CNN can also be adapted for LSDA. We discuss this further in a later section. 3. Main Contributions 3.1. A New Representation for Object Detection This is obtained using a novel loss function that forces the neural network to activate only to objects and not activate at all to non-objects. This translates to pushing feature vectors corresponding to non-objects to the origin of the feature space and to the surface of a unit hyper-sphere for feature vectors corresponding to objects. Figure 1: Example from PASCAL VOC detection dataset 3.2. Compare Transfer Learning Strategies We compare Learning without Forgetting [10] and finetuning transfer learning strategies on learning a new skill like object detection, using weights learned for image classification. The idea is that learning without forgetting strategy acts as a regularizer and therefore is a better transfer learning strategy on small datasets. 4. Dataset Used While there are multiple datasets out there to train learning algorithms for object detection, we use PASCAL VOC 2012 (for detection) for training, validation and testing. PASCAL VOC (for detection) consists of images containing objects belonging to 20 different categories. It consists of a bunch of transportation vehicles, animals (including people) and everyday objects. The metadata, for each image, consists of a list of all objects in the image and their corresponding ground truth bounding boxes. An example from PASCAL VOC can be seen in figure Region Proposals Ground truth regions themselves are not sufficient to train a neural network since it does not contain background regions explicitly. [4] provide bounding boxes for the train and test sets they use. They obtain these by running the images through a selective search algorithm. However, these region do not come pre-assigned with a label. It is upon us to use the ground truth bounding box information to infer what the bounding boxes from selective search consists of. We write our program to assign labels to these region proposals.
3 loss on top of the feature layer to classify the object. For m classes and n image crops, the cross-entropy loss is: n m 1 XX (i) 1{yj = 1} log(softmaxθ (φ(x(i) ), j)) n i=1 j=1 T Figure 2: Crops from selective search produced by [4]. The top row consists of objects while the bottom row has background crops Label Assignment For Proposed Regions We use the Intersection over Union (IoU) metric to assign labels. For each proposal, we find the IoU over all ground truth bounding boxes with a threshold of 0.7. i.e. we are only interested in ground truth bounding boxes that have an IoU of > 0.7. If we manage to find multiple such ground truth boxes, we assign the label corresponding to the maximum IoU ground truth box to the proposal. If we fail to find IoU values crossing a threshold of 0.7, we treat the proposal to be background. For each image, we have about 2500 bounding boxes as obtained from selective search. With the threshold we use, we find that roughly 10% corresponds to some object, while the remaining 90% are background. We provide some examples of crops thus generated in figure eθj φ(x) softmaxθ (φ(x), j) = P m T eθk φ(x) k=1 where θ is the classifier weight vector, x is an input image crop, y is the one-hot vector for the class labels and φ is a function representing the neural network architecture. The RCNN model [4] also has a similar loss function for object classification. The main distinction is that our classifier will not have a background class, whereas the RCNN classifier treats the background as another class Loss Function to Control L2 norm The loss function should penalize high L2 norm values for non-objects, and low L2 norm values for objects. For this we design the following two loss functions: Spherical Hinge Loss Spherical Softmax Loss We define the L2 norm hinge loss as follows: n (i) 1X {(kφ(x(i) )k22 1)( 1)1{ky k1 =1} }+ n i=1 Engineering Limitations Due to hardware limitations (disk space) we were forced to use a subset of the full dataset. We obtain about crops corresponding to objects and more than 1M background training examples. However, we randomly discard most of them and keep only randomly chosen background crops for training. Spherical Hinge Loss where {x}+ = x if x > 0, else 0 Here, 1{kyk1 = 1} indicates if an image crop contains an object or not. For example, if there is no object in the image crop, the class vector will be zero. 5. Technical Details Our goal is to get a good representation for object detection. As mentioned before, we want to force the neural network to produce non-zero activation only when it is fed an object. For background crops, it should ideally not produce activation. Concretely, this means that the L2 norm of the final features layer should be zero for non-objects and close to 1 for objects. We achieve this by designing loss functions that force the norm of the final features to be zero for non-objects Loss Function for Object Classification Given that the proposed region has an object in it, we train a softmax classifier with the standard cross-entropy Spherical Softmax Loss In this approach we train a 2-class softmax classifier on the square of the norm of the last feature layer to find background images. We provide the equation below (which is simplified because there are only 2 classes and the feature is just a scalar): n 2 1X (1{kyk1 = 0} log(1 + ekkφ(x)k2 +b ) n i=1 2 +1{kyk1 = 1} log(1 + e kkφ(x)k2 b )) k and b are two scalar parameters which we need to train over.
4 where, z (i) j = (z(i) j ) 1 T m img k=1 (z (k) j ) 1 T, ẑ (i) j = (ẑ(i) j ) 1 T m img k=1 z (i) j = softmax θ img(φ(x (i) )) (ẑ (k) j ) 1 T This loss function ensures that information about the previous task is maintained and acts as a regularization term for our object detection task. Figure 3: Schematic of the three layer convolutional neural network 5.3. Neural Network Architectures Used Three Layer CNN In order to quickly validate our hypothesis on using the loss functions on L 2 norm, we use a three layer neural network since it is fast and easy to train. In a bid to reduce the number of parameters, we resize all crops to have a size of We provide a schematic of our neural network in figure AlexNet Once we validated our hypothesis of using L 2 norm for classification, we started using AlexNet pretrained on ImageNet. We use AlexNet to compare finetuning and LwF transfer learning strategies. The reason for this choice is that pretrained weights TensorFlow is available online and is one of the simplest deep networks for analysis Learning without Forgetting (LwF) The network is initially trained on classifying the ImageNet dataset. To ensure that previously learned capabilities are not forgotten, we use Learning without Forgetting (LwF) transfer learning strategy. LwF also provides good regularization while training the weights of the network. Let φ represent the original network, θ img be the original weights for the ImageNet classifier and m img be the number of classes in ImageNet. We compute, ẑ (i) = softmax θimg( φ(x (i) )) where softmax θimg is the output of the softmax layer for the ImageNet classifier which will be an m img size vector. We use knowledge distillation loss to minimize the change in the output of the old task. The loss function for learning without forgetting is, 1 n m img j=1 n i=1 ẑ (i) j log(z (i) j ) 6. Experiments 6.1. Experiment 1: Comparing Detection Pipelines There are three different to take in an image crop and perform classification during inference: RCNN like classification: Here the object is treated as just another class and the network is trained to classify crops into one of 21 categories. The schematic is shown in figure 4 Network trained using Spherical Hinge Loss: The norm of the final features is first computed. If the norm is less than 1, it is declared background. Otherwise, it is passed through a softmax classifier which classifies it into one of 20 object categories. A linear combination of the two loss values is taken with the weights being hyper-parameters. For our experiments, we use a weight of 1 for each of the loss values. Network trained using Spherical Softmax: In this pipeline, the norm is used to directly infer (due to binary softmax loss) whether or not it is an object. If it is an object, it is passed through a softmax classifier which classifies it into one of 20 object categories. A linear combination of the two loss values is taken with the weights being hyper-parameters. For our experiments, we use a weight of 1 for each of the loss values. The schematic for the latter two networks is depicted in figure 5. In order to compare these three approaches, we use a three layer convolutional neural network as the base network, as mentioned mentioned previously. In the first experiment, we compare the classification accuracy on a fixed validation set for all the three pipelines mentioned above Experiment 2: Comparing Transfer Learning Strategies RCNN uses fine-tuning transfer learning strategy on AlexNet weights learned on ImageNet. We want to see if the newly introduced Learning without Forgetting (LwF) fine-tuning strategy works better than fine-tuning. We found from the first experiment that network trained with Spherical Hinge Loss (we refer to it as SHL) performs better
5 Figure 4: Schematic of the network used for RCNN like classification. Base network is a three layer CNN Figure 6: Schematic of SPH with LwF loss function added Figure 5: Schematic of the network used for classification using Spherical Hinge Loss and Spherical Softmax Loss on the L 2 norm. Base network is a three layer CNN than both RCNN like object classification (we refer to it as RCNN(ours)) and network trained with Spherical Softmax Loss (we refer to it as SSL). Therefore, we narrow down this experiment to comparing finetuning strategies on RCNN(ours) and SHL. The base network used is AlexNet pre-trained on ImageNet. Figure 7: Schematic of RCNN(ours) with LwF loss function added Finetuning We finetune RCNN(ours) and SHL from conv4 layer onward. The reason for this choice is that initial layers of the neural network are found to be simple edge detectors and Gabor like filters, which generalize well across multiple datasets. However, we expect it not to work very well on SPH since it tries to drastically alter the distribution of data in the space of representation Learning without Forgetting (LwF) We use AlexNet trained on ImageNet as an anchor network whose weights never get updated. We use a second copy of AlexNet pre-trained on ImageNet but update the weights according to a loss function which takes into account the LwF loss. Similar to finetuning, we only update weights of the base network from conv4 layer onward. The schematic for such a network for SHL is given in figure 6 and the same for RCNN(ours) is given in figure 7. The final loss function is the linear combination of each of the loss functions given in Figure 8: Histogram (log-log scale) of the squared-norm values of the prefinal layer for objects (red) and non object (green) with spherical hinge loss the respective figures, where the weights in the linear combination are hyper-parameters. 7. Results 7.1. Experiment 1 From 6.1, we find that SHL performs better than RCNN(ours) and the network trained on SSL. We train the
6 Figure 9: Histogram (log-log scale) of the squared-norm values of the prefinal layer for objects (red) and non object (green) with spherical softmax loss Figure 10: t-sne diagram for objects (red) and background (blue) for RCNN(ours) model over 100 epochs and obtain validation accuracies in steps of 10 epochs. We report the best accuracies among them in table 7.1. We use the following abbreviations, OCA = Object classification accuracy. This is the classification accuracy among the 20 object categories. CA = Over all classification accuracy of all objects, including background BC = Background classification. This essentially measures the accuracy of classifying between objects and non-objects. Model OCA CA BC RCNN like classifier Spherical Hinge Loss Net Spherical Softmax Loss Net Discussion We also plot histograms of the norm-squared value of the pre-final layer of SHL 8 and SSL 9. We observe that SHL leads to nice and clear separation between the two classes, while there is a lot more overlap between the object and non-object classes when we use the Spherical Softmax Loss. This is again validated when we compare the classification accuracies. We use t-sne to visualize how object and background images are distributed in the embedding space in figures 10, 11 and 10. We see that for the vanilla network, the background images are distributed haphazardly whereas for SHL and SSL, they are more concentrated. They are more concentrated for SHL, than SFL. Figure 11: t-sne diagram for objects (red) and background (blue) for SPH 7.2. Experiment Comparison between SHL and RCNN (ours) for LwF strategy From figures 13, 14 and 15, we find that while SHL performs better than RCNN(ours) on OCA, it performs very badly on CA and BC metrics Comparison between SHL and RCNN (ours) for finetuning strategy From figures 16, 17 and 18, we find that while RCNN(ours) performs better than SHL on all the accuracy metrics.
7 Figure 12: t-sne diagram for objects (red) and background (blue) for the spherical softmax network Figure 14: Comparison of CA for SHL and RCNN(ours) with LwF strategy Figure 13: Comparison of OCA for SHL and RCNN(ours) with LwF strategy Figure 15: Comparison of BC for SHL and RCNN(ours) with LwF strategy Comparison of SHL for finetuning and LwF transfer learning strategies From figures 19, 20 and 21, we find that SHL with LwF performs better than SHL with FT on OCA and worse worse on CA and BC metrics Discussion Observations and is not at all surprising. The reason is that we use pre-trained AlexNet which is trained to perform classification. SHL imposes drastic constraints on the embeddings of the neural network, whereas RCNN(ours) can simply build off of the weight structures produced by pre-training on ImageNet. Also, since we do not train all the network weights, this effect is even more pronounced. Figure 16: Comparison of OCA for SHL and RCNN(ours) with Finetuning strategy
8 Figure 17: Comparison of CA for SHL and RCNN(ours) with Finetuning strategy Figure 20: Comparison of CA for SHL on Finetuning strategy against SHL on LwF strategy Figure 18: Comparison of BC for SHL and RCNN(ours) with Finetuning strategy Figure 21: Comparison of BC for SHL on Finetuning strategy against SHL on LwF strategy using special loss functions like the Spherical Hinge Loss and the Spherical Softmax Loss on the L 2 norm of the embeddings of the base network. We perform two experiments: 6.1 and 6.2 and find that when we train networks with random initialization, using Spherical Hinge Loss on the L 2 norm is more effective than RCNN(ours), where background is treated as another class. Figure 19: Comparison of OCA for SHL on Finetuning strategy against SHL on LwF strategy 8. Conclusions We base our exploration on the intuition that background should be treated differently. We try to incorporate this by However, when warm-starting the learning process (both finetuning and LwF) on ImageNet pre-trained networks, RCNN(ours) performs better than SPH. The reason for this has been discussed in We believe that training SPH on large scale object detection datasets like ImageNet for object detection and then using transfer learning techniques for smaller datasets like PASCAL VOC might actually perform better than the original RCNN [4].
9 Figure 22: Detection with the LSDA network. Given an image, extract region proposals, reshape the regions to fit into the network size and finally produce detection scores per category for the region. Layers with red dots/fill indicate they have been modified/learned during fine-tuning with available bounding box annotated data. Learning without forgetting can allow us to protect these layers from loosing information about classes in set A. Background detection can be done using the spherical hinge loss. 9. Future Direction 9.1. Fast and Faster-RCNN The methods we describe can also be run on the Fast and Faster-RCNN network. Their current implementation is in Caffe. We had started out with working on TensorFlow and implementing Fast and Faster-RCNN requires Region of Interest Pooling layers (introduced in Fast-RCNN) which is not currently implemented in tensorflow. Open-source implementations we found didn t work well. Therefore, we decided to run experiments on the R-CNN network instead. It is important to note that the improvements made by Fast and Faster-RCNN networks are orthogonal to our modifications in purpose and therefore can be used together to create a object detection system LSDA LSDA: Large Scale Detection through Adaptation network [6] solves a more general problem. They consider a scenario where you have a dataset with training data for classification, but only a subset of that dataset has bounding box training data for object detection. Their method allows them to train a network which can solve the object detection problem for the whole dataset. We describe their approach here. The set of classes is split in two depending on whether they have bounding box label images. Say set A doesn t and set B does. They start out with a network which is trained for classification on the whole dataset (A B). Unlike a usual classification network they don t use normalized softmax values and instead use linear scores which can lie anywhere over the reals. After the training we have final layer which provides object detection scores for the whole dataset. f A correspond to cells which provide scores for classes in set A. Similarly we have f B. Next, they initialize new empty cells in the last layer to encode object detection information for the background and the classes with bounding box data available δb. The object detection score for classes in B is computed by adding the classification scores and the scores from the new cells f B + δb. For classes in set A they find the nearest neighbors (according to the weights in the last layer) in set B and average their scores to get approximate scores for a δa layer if the data was available. The additions we consider for RCNN: Spherical Hinge Loss and LwF can be used on the LSDA network as well. While training, all the previous layers are finetuned over data corresponding to set B. The LwF loss would act as a regularizer and prevent knowledge about set A from being lost. Similarly for background classification, Hinge Loss can be used instead. References [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, [2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. 2 [3] R. Girshick. Fast r-cnn. In International Conference on Computer Vision (ICCV),
10 [4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition, , 2, 3, 8 [5] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR, abs/ , [6] J. Hoffman, S. Guadarrama, E. S. Tzeng, R. Hu, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. Lsda: Large scale detection through adaptation. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages Curran Associates, Inc., , 9 [7] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS), [8] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages Curran Associates, Inc., [9] C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks for object detection. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages Curran Associates, Inc., [10] D. H. Zhizhong Li. Learning without forgetting. arxiv preprint arxiv: v2,
Object Detection Based on Deep Learning
Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
More informationDeep learning for object detection. Slides from Svetlana Lazebnik and many others
Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationSpatial Localization and Detection. Lecture 8-1
Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationFaster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic
More informationYiqi Yan. May 10, 2017
Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field
More informationA FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationObject Detection on Self-Driving Cars in China. Lingyun Li
Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not
More informationFinal Report: Smart Trash Net: Waste Localization and Classification
Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given
More informationLecture 5: Object Detection
Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based
More informationObject detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation
Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN
More informationComparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neural Networks Nikiforos Pittaras 1, Foteini Markatopoulou 1,2, Vasileios Mezaris 1, and Ioannis Patras 2 1 Information Technologies
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document
More informationOBJECT DETECTION HYUNG IL KOO
OBJECT DETECTION HYUNG IL KOO INTRODUCTION Computer Vision Tasks Classification + Localization Classification: C-classes Input: image Output: class label Evaluation metric: accuracy Localization Input:
More informationVisual features detection based on deep neural network in autonomous driving tasks
430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian
More informationDeep Learning for Object detection & localization
Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified
More informationUnified, real-time object detection
Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav
More informationCIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm
CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their
More informationLecture 7: Semantic Segmentation
Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationApplication of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset
Application of Convolutional Neural Network for Image Classification on Pascal VOC Challenge 2012 dataset Suyash Shetty Manipal Institute of Technology suyash.shashikant@learner.manipal.edu Abstract In
More informationContent-Based Image Recovery
Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts
More informationYOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object
More informationDirect Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab.
[ICIP 2017] Direct Multi-Scale Dual-Stream Network for Pedestrian Detection Sang-Il Jung and Ki-Sang Hong Image Information Processing Lab., POSTECH Pedestrian Detection Goal To draw bounding boxes that
More informationComputer Vision Lecture 16
Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:
More informationObject Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018
Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4
More informationarxiv: v1 [cs.cv] 31 Mar 2016
Object Boundary Guided Semantic Segmentation Qin Huang, Chunyang Xia, Wenchao Zheng, Yuhang Song, Hao Xu and C.-C. Jay Kuo arxiv:1603.09742v1 [cs.cv] 31 Mar 2016 University of Southern California Abstract.
More informationSupplementary Material: Unconstrained Salient Object Detection via Proposal Subset Optimization
Supplementary Material: Unconstrained Salient Object via Proposal Subset Optimization 1. Proof of the Submodularity According to Eqns. 10-12 in our paper, the objective function of the proposed optimization
More informationClassification of objects from Video Data (Group 30)
Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time
More informationHand Detection For Grab-and-Go Groceries
Hand Detection For Grab-and-Go Groceries Xianlei Qiu Stanford University xianlei@stanford.edu Shuying Zhang Stanford University shuyingz@stanford.edu Abstract Hands detection system is a very critical
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationMULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK. Wenjie Guan, YueXian Zou*, Xiaoqun Zhou
MULTI-SCALE OBJECT DETECTION WITH FEATURE FUSION AND REGION OBJECTNESS NETWORK Wenjie Guan, YueXian Zou*, Xiaoqun Zhou ADSPLAB/Intelligent Lab, School of ECE, Peking University, Shenzhen,518055, China
More informationProceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong
, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA
More informationTRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK
TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.
More informationA CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN. Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart
A CLOSER LOOK: SMALL OBJECT DETECTION IN FASTER R-CNN Christian Eggert, Stephan Brehm, Anton Winschel, Dan Zecha, Rainer Lienhart Multimedia Computing and Computer Vision Lab University of Augsburg ABSTRACT
More informationGradient of the lower bound
Weakly Supervised with Latent PhD advisor: Dr. Ambedkar Dukkipati Department of Computer Science and Automation gaurav.pandey@csa.iisc.ernet.in Objective Given a training set that comprises image and image-level
More informationRich feature hierarchies for accurate object detection and semantic segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification
More informationarxiv: v1 [cs.cv] 26 Jun 2017
Detecting Small Signs from Large Images arxiv:1706.08574v1 [cs.cv] 26 Jun 2017 Zibo Meng, Xiaochuan Fan, Xin Chen, Min Chen and Yan Tong Computer Science and Engineering University of South Carolina, Columbia,
More informationFine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task
Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a
More informationarxiv: v3 [cs.cv] 2 Jun 2017
Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv:1703.01976v3 [cs.cv] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and
More informationAn Exploration of Computer Vision Techniques for Bird Species Classification
An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex
More informationEFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS. Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang
EFFECTIVE OBJECT DETECTION FROM TRAFFIC CAMERA VIDEOS Honghui Shi, Zhichao Liu*, Yuchen Fan, Xinchao Wang, Thomas Huang Image Formation and Processing (IFP) Group, University of Illinois at Urbana-Champaign
More informationMimicking Very Efficient Network for Object Detection
Mimicking Very Efficient Network for Object Detection Quanquan Li 1, Shengying Jin 2, Junjie Yan 1 1 SenseTime 2 Beihang University liquanquan@sensetime.com, jsychffy@gmail.com, yanjunjie@outlook.com Abstract
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,
REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]
More informationSupplementary material for Analyzing Filters Toward Efficient ConvNet
Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable
More informationR-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection
The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection Zeming Li, 1 Yilun Chen, 2 Gang Yu, 2 Yangdong
More informationG-CNN: an Iterative Grid Based Object Detector
G-CNN: an Iterative Grid Based Object Detector Mahyar Najibi 1, Mohammad Rastegari 1,2, Larry S. Davis 1 1 University of Maryland, College Park 2 Allen Institute for Artificial Intelligence najibi@cs.umd.edu
More informationarxiv: v1 [cs.cv] 5 Oct 2015
Efficient Object Detection for High Resolution Images Yongxi Lu 1 and Tara Javidi 1 arxiv:1510.01257v1 [cs.cv] 5 Oct 2015 Abstract Efficient generation of high-quality object proposals is an essential
More informationKaggle Data Science Bowl 2017 Technical Report
Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li
More informationIntroduction to Deep Learning for Facial Understanding Part III: Regional CNNs
Introduction to Deep Learning for Facial Understanding Part III: Regional CNNs Raymond Ptucha, Rochester Institute of Technology, USA Tutorial-9 May 19, 218 www.nvidia.com/dli R. Ptucha 18 1 Fair Use Agreement
More informationRegionlet Object Detector with Hand-crafted and CNN Feature
Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationFeature-Fused SSD: Fast Detection for Small Objects
Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn
More informationCS6501: Deep Learning for Visual Recognition. Object Detection I: RCNN, Fast-RCNN, Faster-RCNN
CS6501: Deep Learning for Visual Recognition Object Detection I: RCNN, Fast-RCNN, Faster-RCNN Today s Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationRecognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials
Recognize Complex Events from Static Images by Fusing Deep Channels Supplementary Materials Yuanjun Xiong 1 Kai Zhu 1 Dahua Lin 1 Xiaoou Tang 1,2 1 Department of Information Engineering, The Chinese University
More information[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors
[Supplementary Material] Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors Junhyug Noh Soochan Lee Beomsu Kim Gunhee Kim Department of Computer Science and Engineering
More informationInstance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further
More informationFully Convolutional Networks for Semantic Segmentation
Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling
More informationVolumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material
Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details
More informationPart Localization by Exploiting Deep Convolutional Networks
Part Localization by Exploiting Deep Convolutional Networks Marcel Simon, Erik Rodner, and Joachim Denzler Computer Vision Group, Friedrich Schiller University of Jena, Germany www.inf-cv.uni-jena.de Abstract.
More informationRich feature hierarchies for accurate object detection and semant
Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018 Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster
More informationGeometry-aware Traffic Flow Analysis by Detection and Tracking
Geometry-aware Traffic Flow Analysis by Detection and Tracking 1,2 Honghui Shi, 1 Zhonghao Wang, 1,2 Yang Zhang, 1,3 Xinchao Wang, 1 Thomas Huang 1 IFP Group, Beckman Institute at UIUC, 2 IBM Research,
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationPT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL
PT-NET: IMPROVE OBJECT AND FACE DETECTION VIA A PRE-TRAINED CNN MODEL Yingxin Lou 1, Guangtao Fu 2, Zhuqing Jiang 1, Aidong Men 1, and Yun Zhou 2 1 Beijing University of Posts and Telecommunications, Beijing,
More informationDeep Residual Learning
Deep Residual Learning MSRA @ ILSVRC & COCO 2015 competitions Kaiming He with Xiangyu Zhang, Shaoqing Ren, Jifeng Dai, & Jian Sun Microsoft Research Asia (MSRA) MSRA @ ILSVRC & COCO 2015 Competitions 1st
More informationProject 3 Q&A. Jonathan Krause
Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations
More informationLearning with Side Information through Modality Hallucination
Learning with Side Information through Modality Hallucination Judy Hoffman Saurabh Gupta Trevor Darrell EECS Department, UC Berkeley {jhoffman, sgupta, trevor}@eecs.berkeley.edu Abstract We present a modality
More informationDeepBox: Learning Objectness with Convolutional Networks
DeepBox: Learning Objectness with Convolutional Networks Weicheng Kuo Bharath Hariharan Jitendra Malik University of California, Berkeley {wckuo, bharath2, malik}@eecs.berkeley.edu Abstract Existing object
More information3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing
3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)
More informationSubspace Alignment Based Domain Adaptation for RCNN Detector
RAJ, NAMBOODIRI, TUYTELAARS: ADAPTING RCNN DETECTOR 1 Subspace Alignment Based Domain Adaptation for RCNN Detector Anant Raj anantraj@iitk.ac.in Vinay P. Namboodiri vinaypn@iitk.ac.in Tinne Tuytelaars
More informationHuman Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016
Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human
More informationComputer Vision Lecture 16
Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period
More informationCNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION. Jawadul H. Bappy and Amit K. Roy-Chowdhury
CNN BASED REGION PROPOSALS FOR EFFICIENT OBJECT DETECTION Jawadul H. Bappy and Amit K. Roy-Chowdhury Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521 ABSTRACT
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationSSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang
SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation
More informationRoad Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN
2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN Rongqiang Qian,
More informationCEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015
CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationTowards Real-Time Automatic Number Plate. Detection: Dots in the Search Space
Towards Real-Time Automatic Number Plate Detection: Dots in the Search Space Chi Zhang Department of Computer Science and Technology, Zhejiang University wellyzhangc@zju.edu.cn Abstract Automatic Number
More informationDeep Learning in Visual Recognition. Thanks Da Zhang for the slides
Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationObject Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR
Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization
More informationarxiv: v1 [cs.cv] 29 Sep 2016
arxiv:1609.09545v1 [cs.cv] 29 Sep 2016 Two-stage Convolutional Part Heatmap Regression for the 1st 3D Face Alignment in the Wild (3DFAW) Challenge Adrian Bulat and Georgios Tzimiropoulos Computer Vision
More informationLSDA: Large Scale Detection through Adaptation
LSDA: Large Scale Detection through Adaptation Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, EECS, UC Berkeley, EE, Tsinghua University {jhoffman, sguada, tzeng, jdonahue}@eecs.berkeley.edu
More informationExtend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network
Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of
More informationReal-Time Grasp Detection Using Convolutional Neural Networks
Real-Time Grasp Detection Using Convolutional Neural Networks Joseph Redmon 1, Anelia Angelova 2 Abstract We present an accurate, real-time approach to robotic grasp detection based on convolutional neural
More informationLearning Transferable Features with Deep Adaptation Networks
Learning Transferable Features with Deep Adaptation Networks Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan Presented by Changyou Chen October 30, 2015 1 Changyou Chen Learning Transferable Features
More informationarxiv: v1 [cs.cv] 4 Jun 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks arxiv:1506.01497v1 [cs.cv] 4 Jun 2015 Shaoqing Ren Kaiming He Ross Girshick Jian Sun Microsoft Research {v-shren, kahe, rbg,
More informationDataset Augmentation with Synthetic Images Improves Semantic Segmentation
Dataset Augmentation with Synthetic Images Improves Semantic Segmentation P. S. Rajpura IIT Gandhinagar param.rajpura@iitgn.ac.in M. Goyal IIT Varanasi manik.goyal.cse15@iitbhu.ac.in H. Bojinov Innit Inc.
More informationarxiv: v1 [cs.cv] 26 Jul 2018
A Better Baseline for AVA Rohit Girdhar João Carreira Carl Doersch Andrew Zisserman DeepMind Carnegie Mellon University University of Oxford arxiv:1807.10066v1 [cs.cv] 26 Jul 2018 Abstract We introduce
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More information